Aktuelle Jobs
Entdecken und Bewerben Sie sich für Jobs
Senior Reliability Engineer (m/f/d)
Our client, a rapidly growing company, is seeking a seasoned Platform Reliability Lead to join their dynamic engineering team. This individual will play a pivotal role in ensuring the stability, security, and scalability of their core platform, empowering teams to deliver exceptional customer experiences.
The Role:
The Platform Reliability Lead will be the driving force behind the evolution and optimization of their AWS-based infrastructure. Their expertise will be crucial in maturing monitoring, security, and developer tooling, fostering a culture of reliability and efficiency. This leader will empower development teams by enabling them to self-serve and iterate rapidly, while maintaining the highest standards of platform performance and security.
Key Responsibilities:
- Core Platform Leadership: The Lead will oversee the EKS and Aurora MySQL infrastructure, driving automation for releases, scaling, and remediation. They will implement Infrastructure as Code and champion designs aligned with reliability objectives.
- Monitoring & Alerting Excellence: This role will enhance monitoring solutions, improve dashboarding, and identify instrumentation gaps. They will transition towards customer experience monitoring and SLO management.
- Security & Auditability: The Lead will maintain and evolve security practices, integrate security tooling into the development lifecycle, and collaborate with security professionals to enhance the security posture.
- Developer Tooling Enhancement: This individual will optimize release and incident management, and expand focus to broader developer enablement, including load testing, scalability design, DORA metrics, and developer environments.
- Coaching & Enablement: They will foster a culture of self-support and simplification, empowering teams to own and advance the reliability agenda.
What They're Looking For:
- Extensive experience leading the design and operation of Kubernetes solutions on AWS EKS.
- Proven ability to manage and optimize AWS infrastructure, including defining and implementing best practices.
- Demonstrated success in maturing monitoring and alerting capabilities.
- Proactive and collaborative approach to simplifying execution through monitoring and platform improvements.
- Strong ability to investigate and understand unfamiliar tooling.
- Solid understanding of software development principles and practices.
Bonus Points:
- Experience with development, service understanding, and tooling optimization.
- Proficiency in delivering infrastructure as code.
- Experience enhancing release pipelines and deployment patterns.
- Significant experience in a 24x7 environment, including incident management.
- Ability to identify and utilize data for effective issue diagnosis.
- Experience in early-stage scale-up companies.
- Passion for automation and monitoring.
- Strong mentorship and knowledge-sharing skills.
- Ability to balance risk reduction and development velocity.