Skip to content

leela56/f1-databricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

Formula 1 Data Engineering Pipeline

End-to-end data engineering project that ingests, transforms, and analyzes Formula 1 race data using Databricks and PySpark.

What It Covers

  • Ingests raw F1 race data (circuits, drivers, lap times, results, pit stops) from the Ergast API
    • Applies Bronze to Silver to Gold transformations using the Databricks Medallion Architecture
      • Stores processed data in Delta Lake tables for reliable, versioned access
        • Produces race analytics: driver standings, constructor rankings, lap performance

        • Tech Stack

          • Platform: Databricks
            • Processing: PySpark, Spark SQL
              • Storage: Delta Lake
                • Language: Python
                  • Source: Ergast F1 Developer API

                  • Setup

                    1. Import notebooks into Databricks workspace
                      1. Attach to a cluster (Databricks Runtime 10.4+)
                        1. Run ingestion, then transformation, then analysis notebooks

About

End-to-end Formula 1 data engineering pipeline built on Databricks and PySpark with Delta Lake

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages