Skip to content

codezelaca/de-airflow-simple-pipeline

Repository files navigation

Airflow Simple ETL Pipeline

A simple yet complete ETL (Extract, Transform, Load) data pipeline built with Apache Airflow that extracts weather data from an external API, transforms it, and loads it into a PostgreSQL database.

📋 Project Overview

This project demonstrates a real-world ETL pipeline implementation using Apache Airflow and the Astronomer CLI. The pipeline fetches current weather data from the Open-Meteo API for a specific location (London, UK by default) and stores it in a PostgreSQL database for analysis.

🚀 Features

  • Automated Data Collection: Daily scheduled extraction of weather data
  • RESTful API Integration: Uses Airflow HTTP provider to fetch data from Open-Meteo API
  • Data Transformation: Processes and structures raw API responses
  • PostgreSQL Storage: Persists transformed data with timestamps
  • TaskFlow API: Modern Airflow implementation using Python decorators
  • Docker-based Deployment: Containerized environment for easy setup and deployment

🏗️ Architecture

The pipeline consists of three main tasks:

  1. Extract: Fetches current weather data from Open-Meteo API using HTTP Hook
  2. Transform: Processes and structures the weather data (temperature, wind speed, direction, weather code)
  3. Load: Inserts the transformed data into PostgreSQL database

📁 Project Structure

airflow-simple-pipeline/
├── dags/
│   ├── etlweather.py          # Main weather ETL DAG
│   └── exampledag.py           # Example DAG for reference
├── tests/
│   └── dags/
│       └── test_dag_example.py # Unit tests for DAGs
├── include/                    # Additional project files
├── plugins/                    # Custom Airflow plugins
├── Dockerfile                  # Docker image configuration
├── docker-compose.yml          # Docker compose setup
├── requirements.txt            # Python dependencies
├── packages.txt                # OS-level dependencies
├── airflow_settings.yaml       # Airflow connections and variables
└── README.md                   # This file

🔧 Prerequisites

  • Docker Desktop installed and running
  • Astronomer CLI (astro) installed
  • Basic understanding of Apache Airflow concepts

🛠️ Installation & Setup

  1. Clone the repository

    git clone https://github.com/codezelaca/airflow-simple-pipeline.git
    cd airflow-simple-pipeline
  2. Start the Airflow environment

    astro dev start

    This command spins up five Docker containers:

    • Postgres: Airflow's metadata database
    • Scheduler: Monitors and triggers tasks
    • DAG Processor: Parses DAG files
    • API Server: Serves the Airflow UI and API
    • Triggerer: Handles deferred tasks
  3. Access the Airflow UI

  4. Configure Airflow Connections

    In the Airflow UI, add the following connections:

    • HTTP Connection (for Open-Meteo API)

      • Conn ID: open_meteo_api
      • Conn Type: HTTP
      • Host: https://api.open-meteo.com
    • PostgreSQL Connection (database connection)

      • Conn ID: postgres_default
      • Conn Type: Postgres
      • Host: postgres
      • Schema: postgres
      • Login: postgres
      • Password: postgres
      • Port: 5432

📊 DAG Details

Weather ETL Pipeline (weather_etl_pipeline)

  • Schedule: Daily (@daily)
  • Start Date: January 24, 2026
  • Catchup: Disabled
  • Location: London, UK (Latitude: 51.5074, Longitude: -0.1278)

Data Collected:

  • Temperature (°C)
  • Wind speed (km/h)
  • Wind direction (degrees)
  • Weather code
  • Timestamp

🧪 Testing

Run DAG tests using pytest:

astro dev pytest

📝 Configuration

Customize the pipeline by modifying variables in dags/etlweather.py:

LATITUDE = '51.5074'   # Change to your desired latitude
LONGITUDE = '-0.1278'  # Change to your desired longitude

🗄️ Database Schema

The pipeline creates a weather_data table with the following schema:

CREATE TABLE weather_data (
    latitude FLOAT,
    longitude FLOAT,
    temperature FLOAT,
    windspeed FLOAT,
    winddirection FLOAT,
    weathercode INT,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

🔍 Monitoring

Monitor your DAGs in the Airflow UI:

  • View task execution history
  • Check logs for debugging
  • Monitor success/failure rates
  • Set up alerts for task failures

🛑 Stopping the Environment

astro dev stop

To stop and remove all containers:

astro dev kill

📚 Additional Resources

🤝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

📄 License

This project is open source and available under the MIT License.

👤 Author

codezelaca


Happy Data Engineering! 🚀

About

Complete ETL (Extract, Transform, Load) data pipeline built with Apache Airflow that extracts weather data from an external API, transforms it, and loads it into a PostgreSQL database

Topics

Resources

Stars

Watchers

Forks

Contributors