Emplois en direct

Découvrez et Postulez pour des emplois

IT Automation & Monitoring Engineer

Contract
,
11.10.2024

IT Automation & Monitoring Engineer

 ,

 

Contract

Our Client

Halian is currently looking on the behalf of an International Financial Institution for an IT Automation & Monitoring Engineer with the following profile:

Role Overview:
The IT Automation & Monitoring Engineer will be responsible for integrating advanced automation practices with system monitoring to improve operational efficiency.
This role involves managing and automating tasks across cloud platforms, leveraging monitoring tools like Splunk to track performance, and providing critical incident and problem management.
You will use automation tools to reduce manual interventions and ensure that the IT infrastructure operates smoothly 24/7.
The ideal candidate will have strong experience in cloud environments, automation tools, and monitoring solutions, combined with the ability to respond to incidents effectively and work under pressure.
Familiarity with release management, incident escalation, and the ability to execute predefined processes in collaboration with different teams is crucial.

Key Responsibilities:

Automation and Monitoring:
- Continuously monitor IT infrastructure for equipment failures, performance degradation, or system alerts using tools such as Splunk, Dynatrace, SCOM, and SquaredUp.
- Develop and maintain Splunk Dashboards to visualize system health, monitor key performance metrics (KPIs), and analyze infrastructure events.
- Leverage Kubernetes, Terraform, Docker, and other container technologies for infrastructure orchestration and management.
- Automate routine tasks, system checks, and service requests using Azure DevOps, Jenkins, Automic (scheduler), and PowerShell/Python scripting.
- Integrate workflows with cloud platforms such as Azure and AWS to enhance infrastructure scalability and performance.

Incident, Problem, and Release Management:
- Experience with incident management, ensuring that incidents are properly logged, escalated, and resolved within Service Level Agreements (SLAs).
- Engage problem management to identify root causes and apply long-term fixes for recurring issues.
- Coordinate and execute release management tasks and business-critical operations, following predefined processes.
- Apply stress-resistant decision-making abilities in high-pressure situations, ensuring the alignment of actions with business objectives.

Process Optimization and Continuous Improvement:
- Identify inefficiencies in operational processes and implement automation to enhance performance and reduce manual interventions.
- Continuously update and refine operational and technical documentation, contributing to knowledge-sharing within the team.
- Collaborate with cross-functional teams to design and implement new solutions that improve reliability and speed of service deliver

Job Skills and Requirements:
Automation Expertise:
- Strong hands-on experience with automation tools such as Azure DevOps, Jenkins, Automic, and script-based automation using PowerShell and Python.
- Proven track record of managing cloud-based infrastructures with Azure, Kubernetes, and Docker for orchestration and container management.

Monitoring Expertise:
- Proficient in using Splunk for monitoring infrastructure, creating custom dashboards, and analyzing real-time system performance.
- Experience with IT monitoring tools like Splunk, Dynatrace, SCOM, Grafana, SquaredUp, Aria Operations for tracking and visualizing infrastructure health.

Incident, Release, and Problem Management:
- Knowledge of ITIL best practices for incident and change management, including the ability to manage incidents in a 24/7 operations environment.
- Strong understanding of ITSM tools for workflow management, incident logging, and problem resolution.

Technical and Analytical Skills:
- Proficiency in managing complex IT environments with a focus on automation and process efficiency.
- Ability to diagnose system issues, interpret performance data, and provide practical, timely solutions.
- Experience in handling large-scale data systems, cloud technologies, and automation at scale.

Communication and Collaboration:
- Excellent verbal and written communication skills, with the ability to explain technical concepts to both technical and non-technical stakeholders.
- Comfortable working in a high-pressure environment, interacting with senior management and external clients when necessary.
- Strong organizational skills with the ability to manage multiple tasks simultaneously in a structured manner.

Preferred Qualifications:
- ITIL certification or willingness to obtain one.
- Experience in cloud infrastructure and high-pressure environments with mission-critical responsibilities.
- Demonstrated ability to lead automation initiatives, driving efficiency and reliability in IT operations.
- Stress resistance
- Team player

Halian Group

With over 25 years of experience, we have come to understand that innovation is the only way to provide agile, practical solutions that transform businesses and careers.

Our resourcing and smart services help you to realize tomorrow’s potential. Discover the amazing things possible when you bring the right people and the right technologies together.

#LI-AM1