الوظائف الحالية

اكتشف و تقدم بالطلب الآن

Senior Reliability Engineer (m/f/d)

Permanent
Dubai, United Arab Emirates
10.03.2025

Our client, a rapidly growing company, is seeking a seasoned Platform Reliability Lead to join their dynamic engineering team. This individual will play a pivotal role in ensuring the stability, security, and scalability of their core platform, empowering teams to deliver exceptional customer experiences.

The Role:

The Platform Reliability Lead will be the driving force behind the evolution and optimization of their AWS-based infrastructure. Their expertise will be crucial in maturing monitoring, security, and developer tooling, fostering a culture of reliability and efficiency. This leader will empower development teams by enabling them to self-serve and iterate rapidly, while maintaining the highest standards of platform performance and security.

Key Responsibilities:

  • Core Platform Leadership: The Lead will oversee the EKS and Aurora MySQL infrastructure, driving automation for releases, scaling, and remediation. They will implement Infrastructure as Code and champion designs aligned with reliability objectives.
  • Monitoring & Alerting Excellence: This role will enhance monitoring solutions, improve dashboarding, and identify instrumentation gaps. They will transition towards customer experience monitoring and SLO management.
  • Security & Auditability: The Lead will maintain and evolve security practices, integrate security tooling into the development lifecycle, and collaborate with security professionals to enhance the security posture.
  • Developer Tooling Enhancement: This individual will optimize release and incident management, and expand focus to broader developer enablement, including load testing, scalability design, DORA metrics, and developer environments.
  • Coaching & Enablement: They will foster a culture of self-support and simplification, empowering teams to own and advance the reliability agenda.

What They're Looking For:

  • Extensive experience leading the design and operation of Kubernetes solutions on AWS EKS.
  • Proven ability to manage and optimize AWS infrastructure, including defining and implementing best practices.
  • Demonstrated success in maturing monitoring and alerting capabilities.
  • Proactive and collaborative approach to simplifying execution through monitoring and platform improvements.
  • Strong ability to investigate and understand unfamiliar tooling.
  • Solid understanding of software development principles and practices.

Bonus Points:

  • Experience with development, service understanding, and tooling optimization.
  • Proficiency in delivering infrastructure as code.
  • Experience enhancing release pipelines and deployment patterns.
  • Significant experience in a 24x7 environment, including incident management.
  • Ability to identify and utilize data for effective issue diagnosis.
  • Experience in early-stage scale-up companies.
  • Passion for automation and monitoring.
  • Strong mentorship and knowledge-sharing skills.
  • Ability to balance risk reduction and development velocity.

هل أنت جاهز للغد؟

قم بالتسجيل الآن