Emplois en direct
Découvrez et Postulez pour des emplois
Tous les emplois
0
Data Engineer - Gen AI (m/f/d)
Contract
Amman, Egypt
24.01.2025
We are seeking an inventive and forward-thinking Data Engineer to join our innovative team. In this role, you will not just follow the traditional paths of data engineering; instead, you'll break new ground by bringing a fresh, creative perspective to every project. Your self-motivation and ability to think differently will be key as you design and implement smart data solutions that go beyond the ordinary
Our Tech Stack:
Key Responsibilities:
Our Tech Stack:
- Languages: SQL & Python
- Pipeline orchestration tool: Dagster (Legacy: Airflo
- Data stores: Snowflake, Clickhouse
- Platforms & Services: Docker, Kubernetes
- PaaS: AWS (ECS/EKS, DMS, Kinesis, Glue, Bedrock, Athena, S3 and others.)
- ETL: FiveTran & DBT for transformation
- IaC: Terraform (with Terragrunt)
Key Responsibilities:
- Design and Implement Innovative Data Solutions: Develop and maintain advanced ETL pipelines using SQL, Python, and Generative AI, transforming traditional data processes into highly efficient and automated solutions.
- Orchestrate Complex Data Workflows: Utilize tools such as Dagster and Airflow for sophisticated pipeline orchestration, ensuring seamless integration and automation of data processes.
- Leverage Generative AI for Data Solutions: Create and implement smart data solutions using Generative AI techniques like Retrieval-Augmented Generation (RAG). This includes building solutions that retrieve and integrate external data sources with LLMs to provide accurate and contextually enriched responses.
- Employ Prompt Engineering: Develop and refine prompt engineering techniques to effectively communicate with large language models (LLMs), enhancing the accuracy and relevance of generated responses in various applications.
- Utilise Embeddings and Vector Databases: Apply embedding language models to convert data into numerical representations, storing them in vector databases. Perform relevancy searches using these embeddings to match user queries with the most relevant data.
- Incorporate Semantic Search Techniques: Implement semantic search to enhance the accuracy and relevance of search results, ensuring that data retrieval processes are highly optimised and contextually aware.
- Collaborate Across Teams: Work closely with cross-functional teams, including data science, business analytics to understand and deliver on unique and evolving data requirements.
- Ensure High-Quality Data Flow: Leverage stream, batch, and Change Data Capture (CDC) processes to ensure a consistent and reliable flow of high-quality data across all systems.
- Enable Business User Empowerment: Use data transformation tools like DBT to prepare and curate datasets, empowering business users to perform self-service analytics.
- Maintain Data Quality and Consistency: Implement rigorous standards to ensure data quality and consistency across all data stores, continuously innovating to improve data reliability.
- Monitor and Enhance Pipeline Performance: Regularly monitor data pipelines to identify and resolve performance and reliability issues, using innovative approaches to keep systems running optimally.
- 3+ years of experience as a data engineer.
- Proficiency in SQL and Python.
- Experience with modern cloud data warehousing and data lake solutions such as Snowflake, BigQuery, Redshift, and Azure Synapse.
- Expertise in ETL/ELT processes, and experience building and managing batch and streaming data processing pipelines.
- Strong ability to investigate and troubleshoot data issues, providing both short-term fixes and long-term solutions.
- Experience with Generative AI, including Retrieval-Augmented Generation (RAG), prompt engineering, and embedding techniques for creating and managing vector databases.
- Knowledge of AWS services, including DMS, Glue, Bedrock, SageMaker, and Athena
- Familiarity with dbt or other data transformation tools
- Familiarity with AWS Bedrock Agents and experience in fine-tuning models for specific use cases, enhancing the performance of AI-driven applications.
- Proficiency in implementing semantic search to enhance the accuracy and relevance of data retrieval.
#LI-KM1