Employment

Senior Site Reliability Engineer, Loblaw Companies Limited

Toronto, Canada

2021 to present

Architected and implemented an Observability Platform using Golang, defining SRE principles with SLI, SLO, and Error Budget, enhancing issue identification and automated alerting using grafana dashboard templating.

Manage and maintain Kubernetes clusters across multiple environments ensuring 99.99% uptime and to deploy 100+ applications.

Enhance the performance of the Kubernetes cluster with seamless version upgrades, monitoring with real time metrics (VM cluster), Auto-scaling and Developed inhouse Operators.

Replaced the single instance Prometheus for time series data with Victoriametrics, which is fast, scalable, fast data ingestion, light-speed querying

Automated infrastructure provisioning using Terraform and Ansible, reducing deployment times by 40%.

Played a key-role in moving the standalone application running in VM to GKE (Google Kubernetes Engine) using helm,gitlab pipelines, vault.

Improved performance of services with the help of Akamai CDN and built IAC with gitlab pipelines for version rollouts.

Collaborate with the team for MR reviews/feedbacks, System design, coding in Go, Python, Bash.

Improved the application observability by instrument using opentelemetry.

Managed Linux-based servers (RHEL 6,7,8), centos, ubuntu, ensuring optimal performance and security and troubleshooting/debug issues related to it.

Site Reliability Engineer, PhonePe (A Walmart - Flipkart Company)

Bangalore, India

2020 to 2021

Developed and implemented cgroup monitoring agents in Mesos slave machines using Golang. Introduced alerting and visualized the metrics using Riemann, Influx and Grafana, which helps in identifying the high resource consumed Docker containers.

Modified the Traefik log parsing agent written in Python to support multiple logging formats. This provided more details in the logs and simplified the troubleshooting efforts.

Pinpointed the TCP Retransmission issue between two payloads due to internal firewalls and removing the same reduced the latency of the applications.

Introduced load balancing in DNS resolvers, because failure of a single resolver might cause potential outages.

Devops Engineer, Olacabs

Bangalore, India

2018 to 2020

Architected and designed an in-house cache platform using Kubernetes services like EKS and AKS, Helm, Redis, HaProxy, Gitlab Pipelines to replace the AWS Elasticache Service. This platform helped in saving ~100k dollars per month bills.

Implemented a centralized logging platform for Kubernetes workload using Filebeat, Kafka and Graylog.

Played a key role in the setup of cloud native architectures (Mesos) of Ola & Foodpanda in AWS & Azure.

Structured and Implemented Terraform modules for resource provisioning. The reusable and use case specific nature of the module makes feather extension easier, more flexible and easy provisioning.

Made traffic routing more robust with help of multilayer load balancing using tools like kong, haproxy, nginx, etc.

Contributed to the server bootstrap and configuration management using Chef, Ansible.

Introduced automation tools in Python, Golang.

Implemented In-house Release Management by replacing Github-Travis with Gitlab-Jenkins also structured Gitops model for pipelining.

System Engineer, Endurance International Group

Bangalore, India

2017 to 2018

Designed and Implemented a queue based data migration tool to sync data between locations. This tool removed the overhead of manually syncing the data. Written in Python, frontend is PHP.

Provide day to day configuration, monitoring and support for specific aspects of systems to standards as applicable.

Troubleshooting operating system level/hardware issues, boot freezing, memory crash, high load, performance tuning, security, etc. in live production Linux servers to ensure 99.9% uptime.

Jr.System/DC Engineer, Hostdime Data Center Service

Kerala, India

2016 to 2017

Configure, troubleshooting and maintenance of network services, DNS, HTTP, FTP, NFS, SMB, SMTP, SSH, NTP, etc. using log files and tweak settings.

Ensure network stability in Switches, Router, Hub, ISP in Datacenter.

Maintenances and setup of bare metal servers in data center based on the requirement.

Collaborated with the Openstack implementation team in data center.

Managing Linux / Apache / MySQL / Php web application stack.

Education

B.Tech Gradution in Computer Science and Engineering

MG University

India

2011 to 2015

B.Tech CSE is a comprehensive course in computer applications and system. It deals with designing and developing computer software and hardware processes.

Skills

AWS,Azure,GCP

Kubernetes, Mesos-Marathon

Chef, Ansible

Bash, Python, GO

Github, Gitlab

Docker

Terraform

Travis, Jenkins

Nginx, Haproxy, Kong

Prometheus, Grafana, Filebeat, Sensu, Victoriametrics

Certifications

Redhat Certified Engineer

Redhat, Bangalore, India

2015

Certification: A Red Hat Certified Engineer (RHCE) is a Red Hat Certified System Administrator (RHCSA) who is ready to automate Red Hat® Enterprise Linux tasks, integrate Red Hat emerging technologies, and apply automation for efficiency and innovation.

Computer Networking

IIT Delhi ACM in association with Network Bulls, India

2013

Certification: ACM-IIT Delhi in association with Network Bulls introduces an Industrial Training Program for 15 to 30 days. Under this program, the training will be provided on live Cisco Devices i.e. Cisco Routers and Cisco Switches.

Hobbies

Science, Finance, Travelling