Modernizing legacy data pipelines (SAS-based) into scalable, cloud-ready data engineering solutions using PySpark, Snowflake, and Airflow.
Legacy System (SAS) β Data Ingestion β Transformation Layer (PySpark / Snowflake) β Orchestration (Airflow) β Curated Data Layer β Analytics / Reporting
- PySpark
- Snowflake
- Apache Airflow
- SQL
Recreated SAS ETL logic using PySpark
Designed scalable data pipeline using Snowflake
Implemented DAGs for pipeline scheduling and monitoring
- Legacy ETL modernization
- Scalable pipeline design
- Performance optimization techniques
- Data validation strategies
- Add streaming pipelines
- Implement data quality framework
- Integrate Delta Lake