Engineering Manager - SRE (Site Reliability)
- Category: IT Engineer & Developer Jobs
- Location: Chennai, Chennai, Tamil Nadu
- Job Type: Full Time / Part Time
- Salary: Estimated: $ 15K to 27K
- Published on: 2025/09/16
Grow your career internally or refer a friend to athenahealth!
athenahealth is a progressive, innovation-driven software product company dedicated to transforming healthcare through cutting-edge cloud solutions. We partner with healthcare organizations to improve clinical and financial outcomes by building modern technology on an open, connected ecosystem that drives meaningful insights for our customers and their patients. We take pride in our values-driven culture, offering a flexible work-life balance and fostering an environment of innovation. As a testament to our industry leadership and rapid growth, we were acquired by Bain Capital for $17B in 2021, and we continue to launch new strategic product initiatives to push the boundaries of healthcare technology.
We are headquartered in Boston, US, and our India offices are in Chennai, Bangalore and Pune.
Position Summary: We are looking for a Site Reliability Engineering (SRE) Manager to lead our Cloud Infrastructure Engineering team in Chennai R&D. This team ensures the continuous availability of the technologies and systems that power athenahealth’s services. We manage thousands of servers, petabytes of storage, and process thousands of web requests per second, all while supporting rapid growth. Our goal is to create a seamless operating system for the medical office—abstracting administrative complexities so doctors can focus on patient care.
About the Team: We are a team of passionate Site Reliability Engineers focused on automation, reliability, and scalability. We operate within an agile framework, prioritizing impactful projects that support business needs.
We manage a hybrid cloud platform, making data-driven decisions on the best infrastructure solutions. Automation is at the heart of everything we do—eliminating repetitive tasks so we can focus on projects that drive real innovation.
Key Responsibilities:
Team Leadership & Development
• Lead, mentor, and develop a team of SREs, fostering a culture of collaboration, accountability, and continuous learning.
• Build a high-performing team focused on operational excellence, reliability, and scalability.
• Partner with Engineering, Product, and Project Management teams to align priorities and drive cross-functional collaboration.
Service Reliability & Performance
• Define and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) for critical systems.
• Monitor and enhance the reliability, availability, and performance of all production services and infrastructure.
• Drive improvements in incident management, root cause analysis, and postmortem processes.
• Implement proactive monitoring, alerting, and incident response strategies.
System Automation & Scalability
• Lead automation efforts to eliminate manual tasks, improve system reliability, and streamline operations.
• Implement best practices for system design, capacity planning, and cost optimization.
• Work closely with engineering teams to build scalable, resilient, and efficient systems.
Collaboration & Cross-functional Engagement
• Advocate for reliability best practices across engineering and product teams.
• Ensure reliability is embedded in the development lifecycle by reviewing code, design, and deployment strategies.
• Align with other engineering managers on long-term goals, technical debt, and infrastructure investments.
Process & Efficiency Improvement
• Continuously improve incident management, deployment pipelines, and system observability.
• Champion automation, monitoring, alerting, and reporting tools.
• Use data-driven insights to measure and optimize operational performance.
Preferred Qualifications:
• Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
• 10+ years of experience in building, scaling, and supporting highly available systems and services.
• 2-3 years of experience in managing and mentoring technical teams, with expertise in containerization (Docker, Kubernetes - On-prem & Cloud).
• Strong background in Platform Engineering, TechOps, FinOps, and DevSecOps in a hybrid cloud environment.
• Expertise in Infrastructure-as-Code (Terraform, Crossplane, Puppet, Ansible) and API integration.
• Proficiency in at least one scripting or programming language (Python, Go, Ruby, etc.).
• Hands-on experience with Linux systems, VMware, cloud platforms (AWS), and observability tools (Prometheus, Grafana, ELK, CloudWatch, Splunk).
• Strong understanding of site reliability principles, telemetry, and monitoring best practices.
• Experience with large-scale distributed systems and cloud-native architectures.
• Familiarity with configuration management tools (Ansible, Chef, Puppet).
• Solid grasp of security best practices and compliance standards.
Have you notified your current manager of your application?
Related jobs
-
Senior Engineer – Frontend
A Amdocs Software Engineer / Specialist / TL - Tibco (3 - + Years) Amdocs • Pune, Maharashtra • via Amdocs Careers 1+ hours ago Full–time No Degree Mentioned Apply on Amdocs Careers Job description Job ID: 1+6131 Required Travel :Minimal Location: [[...
-
Customer Success Engineer
Company Description Opsera Unified DevOps is a platform designed for enterprise software teams to streamline development processes and enhance productivity. The platform offers capabilities such as CI/CD pipeline automation, DevSecOps, release orches...
-
Lead Software Engineer - Dotnet
About Trimble Trimble is a leading provider of advanced positioning solutions that maximize productivity and enhance profitability for our customers. We are an exciting, entrepreneurial company, with a history of exceptional growth coupled with a dis...