Skip to content

varun1507/data-platform-modernization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

Data Platform Modernization

πŸ“Œ Objective

Modernizing legacy data pipelines (SAS-based) into scalable, cloud-ready data engineering solutions using PySpark, Snowflake, and Airflow.


🧠 Architecture Overview

Legacy System (SAS) ↓ Data Ingestion ↓ Transformation Layer (PySpark / Snowflake) ↓ Orchestration (Airflow) ↓ Curated Data Layer ↓ Analytics / Reporting


πŸ› οΈ Tech Stack

  • PySpark
  • Snowflake
  • Apache Airflow
  • SQL

πŸ“ Projects Included

πŸ”Ή SAS to PySpark Transformation

Recreated SAS ETL logic using PySpark

πŸ”Ή Snowflake ELT Pipeline

Designed scalable data pipeline using Snowflake

πŸ”Ή Airflow Orchestration

Implemented DAGs for pipeline scheduling and monitoring


🎯 Key Highlights

  • Legacy ETL modernization
  • Scalable pipeline design
  • Performance optimization techniques
  • Data validation strategies

πŸš€ Future Enhancements

  • Add streaming pipelines
  • Implement data quality framework
  • Integrate Delta Lake

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors