Emplois en direct

Découvrez et Postulez pour des emplois

Data Engineer - Gen AI (m/f/d)

Contract
Amman, Egypt
24.01.2025
We are seeking an inventive and forward-thinking Data Engineer to join our innovative team. In this role, you will not just follow the traditional paths of data engineering; instead, you'll break new ground by bringing a fresh, creative perspective to every project. Your self-motivation and ability to think differently will be key as you design and implement smart data solutions that go beyond the ordinary



Our Tech Stack:
  • Languages: SQL & Python
  • Pipeline orchestration tool: Dagster (Legacy: Airflo
  • Data stores: Snowflake, Clickhouse
  • Platforms & Services: Docker, Kubernetes
  • PaaS: AWS (ECS/EKS, DMS, Kinesis, Glue, Bedrock, Athena, S3 and others.)
  • ETL: FiveTran & DBT for transformation
  • IaC: Terraform (with Terragrunt)

Key Responsibilities:
  • Design and Implement Innovative Data Solutions: Develop and maintain advanced ETL pipelines using SQL, Python, and Generative AI, transforming traditional data processes into highly efficient and automated solutions.
  • Orchestrate Complex Data Workflows: Utilize tools such as Dagster and Airflow for sophisticated pipeline orchestration, ensuring seamless integration and automation of data processes.
  • Leverage Generative AI for Data Solutions: Create and implement smart data solutions using Generative AI techniques like Retrieval-Augmented Generation (RAG). This includes building solutions that retrieve and integrate external data sources with LLMs to provide accurate and contextually enriched responses.
  • Employ Prompt Engineering: Develop and refine prompt engineering techniques to effectively communicate with large language models (LLMs), enhancing the accuracy and relevance of generated responses in various applications.
  • Utilise Embeddings and Vector Databases: Apply embedding language models to convert data into numerical representations, storing them in vector databases. Perform relevancy searches using these embeddings to match user queries with the most relevant data.
  • Incorporate Semantic Search Techniques: Implement semantic search to enhance the accuracy and relevance of search results, ensuring that data retrieval processes are highly optimised and contextually aware.
  • Collaborate Across Teams: Work closely with cross-functional teams, including data science, business analytics to understand and deliver on unique and evolving data requirements.
  • Ensure High-Quality Data Flow: Leverage stream, batch, and Change Data Capture (CDC) processes to ensure a consistent and reliable flow of high-quality data across all systems.
  • Enable Business User Empowerment: Use data transformation tools like DBT to prepare and curate datasets, empowering business users to perform self-service analytics.
  • Maintain Data Quality and Consistency: Implement rigorous standards to ensure data quality and consistency across all data stores, continuously innovating to improve data reliability.
  • Monitor and Enhance Pipeline Performance: Regularly monitor data pipelines to identify and resolve performance and reliability issues, using innovative approaches to keep systems running optimally.
Essential Experience:
  • 3+ years of experience as a data engineer.
  • Proficiency in SQL and Python.
  • Experience with modern cloud data warehousing and data lake solutions such as Snowflake, BigQuery, Redshift, and Azure Synapse.
  • Expertise in ETL/ELT processes, and experience building and managing batch and streaming data processing pipelines.
  • Strong ability to investigate and troubleshoot data issues, providing both short-term fixes and long-term solutions.
  • Experience with Generative AI, including Retrieval-Augmented Generation (RAG), prompt engineering, and embedding techniques for creating and managing vector databases.
  • Knowledge of AWS services, including DMS, Glue, Bedrock, SageMaker, and Athena
  • Familiarity with dbt or other data transformation tools
Other Desired Experience:
  • Familiarity with AWS Bedrock Agents and experience in fine-tuning models for specific use cases, enhancing the performance of AI-driven applications.
  • Proficiency in implementing semantic search to enhance the accuracy and relevance of data retrieval.
#LI-KM1

Êtes-vous prêt pour demain?

Inscrivez-vous en ligne - cela ne prend que 10 minutes.