We are looking for a motivated SRE Site Reliability Engineer to join our team. As a reliability engineer, you will maintain applications and services once they are live by measuring and monitoring availability, latency, and overall system health.
- Maintain applications and services once they are live by measuring and monitoring availability, latency, and overall system health.
- Run containerized applications in production (Kubernetes, Dockers, etc)
- Drive root cause analysis exercise for issues
- Minimum 6 years of work experience with the latest 2 years of hands-on managing Live production servers.
- 5+ years experience of Python coding skills
- 3+ years experience with Ansible coding skills
- Expertise in cloud platforms like AWS and, GCP
- Solid experiences with Linux system administration
- Able to develop system integration test strategies
- Comfortable doing their day to day development in a Linux environment
- Solid experience with CI/CD platforms
- Familiar with traditional Git workflows for working in a collaborative environment
- Familiar with secrets management
- Experience running containerized applications in production (Kubernetes, Dockers, etc)
- Ability to identify system bottlenecks and recommend solutions to solve the availability issue
- Proven expertise in system-level debugging
- Working experience in building massively scalable high-performance applications and web services
- Experience with infrastructure automation, infrastructure as code, automated application deployment
- Strong Linux systems knowledge
- Strong understanding of distributed systems
- Ability to drive root cause analysis exercise for issues
To apply, send your profile to email@example.com with the subject line SRE Site Reliability Engineer.