Design, develop, and maintain scalable ETL pipelines using Python, targeting batch and streaming data architectures.
Work with big data technologies like Spark/PySpark, Hadoop, Kafka, and Airflow for efficient ingestion and processing of large data sets.
Model, implement, and optimize relational (SQL) and NoSQL databases—ensuring data quality, integrity, and performance.
Build robust APIs and modular data processing components that integrate cleanly with data warehousing ecosystems.
Monitor and troubleshoot ETL workflows, ensuring high availability, low latency, and reliability.
Implement automation, observability, and CI/CD pipelines, ensuring smooth deployment and version control via Git.
Collaborate closely with data analysts, data scientists, and product teams to understand data needs and deliver end-to-end solutions.