Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Data/dezoomcamp/2025/data.csv
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ https://github.com/FeloXbit/Ethereum-Block-Analytics.git,Ethereum Network Perfor
https://github.com/brukeg/lewis-hamilton-brilliance,Formula 1 Career Data Warehouse,Batch,"The repository uses Kestra as a workflow orchestrator with scheduled ETL jobs, dbt for transformations, and Terraform for infrastructure provisioning. The code shows a batch-oriented pipeline with ingestion scripts, scheduled transformations, and Looker dashboards for visualization. While Kestra can handle streaming, the configuration and structure indicate batch processing with scheduled data pulls and transformations.",GCP
https://github.com/dmytrovoytko/stock-market-data-engineering,Unknown,Unknown,No files fetched,Unknown
https://github.com/Juwon-Ogunseye/bitcoin-etl-pipeline,WBTC Blockchain Analytics Pipeline,Batch,"The repository uses Apache Airflow with DAGs (etl_dags.py, test_dag.py) to orchestrate scheduled ETL jobs. The pipeline runs daily with tasks for data extraction, loading to ClickHouse, and running dbt models. This is a classic batch processing architecture using workflow orchestrators.",AWS
https://github.com/SapientSapiens/capstoneproject-2025-dez,NYC Taxi Analytics Pipeline,Batch,"The project uses Kestra workflow orchestrator with scheduled flows (hourly_air_quality, daily_air_quality) to run periodic ETL jobs. The code shows scheduled data fetching from APIs, loading to GCS/BigQuery, and dbt transformations - all characteristic of batch processing rather than continuous streaming.",GCP
https://github.com/SapientSapiens/capstoneproject-2025-dez,Air Quality Analysis Data Pipeline,Batch,"The project uses Kestra workflow orchestrator with scheduled flows (hourly_air_quality, daily_air_quality) to run periodic ETL jobs. The code shows scheduled data fetching from APIs, loading to GCS/BigQuery, and dbt transformations - all characteristic of batch processing rather than continuous streaming.",GCP
https://github.com/dmitrievdeveloper/de_project/tree/main/air_pollution,Unknown,Unknown,No files fetched,Unknown
https://github.com/3d150n-marc3l0/de-zoomcamp-2025-capstone-baywheels,Bay Wheels Data Pipeline,Batch,"The project uses Kestra workflow orchestrator with scheduled flows (docker-compose.yml, flows/*.yaml) that run ETL jobs periodically (e.g., monthly scheduled data loading). No streaming components like Kafka, Kinesis, or Flink are present.",GCP
https://github.com/hbg108/tfl-data-visualization/tree/main,TfL Footfall Data Transformation Pipeline,Batch,"The project uses Kestra workflow orchestrator with scheduled flows (e.g., 04_station_footfall_scheduled.yaml) to run periodic ETL jobs. Data is pulled from TfL sources, transformed, and loaded to BigQuery. The architecture includes dbt transformations and Looker Studio visualization, all characteristic of batch processing rather than continuous streaming.",GCP
Expand Down