Duration:
4 months to start
Job Description:
This position is focused on building efficient and robust data ingestion solutions within a cloud-based environment, specifically leveraging Azure Databricks. The role entails:
- Designing and implementing data ingestion pipelines:
You will be responsible for architecting and developing data pipelines that can efficiently bring in data from various sources (such as databases, flat files, APIs, or streaming data) into the organization’s Azure Databricks platform. The expectation is that these pipelines will be highly performant, scalable, and reliable, meeting the demands of large-scale data processing. - Developing scalable and reusable frameworks:
The position requires you to create frameworks and tools that enable the easy ingestion of different types of data (structured, semi-structured, unstructured) and formats (CSV, JSON, Parquet, etc.). These frameworks should promote code reuse, maintainability, and adaptability to new data sources, reducing repetitive development effort and ensuring consistency across data ingestion processes.
Skills Required:
Necessary:
- Strong expertise with 8+ years of experience in Azure Databricks:
You should have in-depth, hands-on experience with Azure Databricks, including building and orchestrating Spark-based data pipelines, managing clusters, and working with notebooks and workflows. - Apache PySpark:
Proficiency in PySpark is essential for developing distributed data processing solutions within Databricks, enabling large-scale data transformation and ingestion tasks.
- Azure Data Factory:
Familiarity with Azure Data Factory is beneficial as it is often used for orchestrating and scheduling data workflows across Azure services. - ADLS (Azure Data Lake Storage) / Key Vaults:
Experience with Azure Data Lake Storage is valuable for handling large volumes of raw and processed data. Knowledge of Azure Key Vaults is advantageous for securely managing secrets and credentials required by data pipelines.
#LI-Remote