A simple yet complete ETL (Extract, Transform, Load) data pipeline built with Apache Airflow that extracts weather data from an external API, transforms it, and loads it into a PostgreSQL database.
This project demonstrates a real-world ETL pipeline implementation using Apache Airflow and the Astronomer CLI. The pipeline fetches current weather data from the Open-Meteo API for a specific location (London, UK by default) and stores it in a PostgreSQL database for analysis.
- Automated Data Collection: Daily scheduled extraction of weather data
- RESTful API Integration: Uses Airflow HTTP provider to fetch data from Open-Meteo API
- Data Transformation: Processes and structures raw API responses
- PostgreSQL Storage: Persists transformed data with timestamps
- TaskFlow API: Modern Airflow implementation using Python decorators
- Docker-based Deployment: Containerized environment for easy setup and deployment
The pipeline consists of three main tasks:
- Extract: Fetches current weather data from Open-Meteo API using HTTP Hook
- Transform: Processes and structures the weather data (temperature, wind speed, direction, weather code)
- Load: Inserts the transformed data into PostgreSQL database
airflow-simple-pipeline/
├── dags/
│ ├── etlweather.py # Main weather ETL DAG
│ └── exampledag.py # Example DAG for reference
├── tests/
│ └── dags/
│ └── test_dag_example.py # Unit tests for DAGs
├── include/ # Additional project files
├── plugins/ # Custom Airflow plugins
├── Dockerfile # Docker image configuration
├── docker-compose.yml # Docker compose setup
├── requirements.txt # Python dependencies
├── packages.txt # OS-level dependencies
├── airflow_settings.yaml # Airflow connections and variables
└── README.md # This file
- Docker Desktop installed and running
- Astronomer CLI (
astro) installed - Basic understanding of Apache Airflow concepts
-
Clone the repository
git clone https://github.com/codezelaca/airflow-simple-pipeline.git cd airflow-simple-pipeline -
Start the Airflow environment
astro dev start
This command spins up five Docker containers:
- Postgres: Airflow's metadata database
- Scheduler: Monitors and triggers tasks
- DAG Processor: Parses DAG files
- API Server: Serves the Airflow UI and API
- Triggerer: Handles deferred tasks
-
Access the Airflow UI
- Navigate to: http://localhost:8080
- Default credentials:
admin/admin
-
Configure Airflow Connections
In the Airflow UI, add the following connections:
-
HTTP Connection (for Open-Meteo API)
- Conn ID:
open_meteo_api - Conn Type:
HTTP - Host:
https://api.open-meteo.com
- Conn ID:
-
PostgreSQL Connection (database connection)
- Conn ID:
postgres_default - Conn Type:
Postgres - Host:
postgres - Schema:
postgres - Login:
postgres - Password:
postgres - Port:
5432
- Conn ID:
-
- Schedule: Daily (
@daily) - Start Date: January 24, 2026
- Catchup: Disabled
- Location: London, UK (Latitude: 51.5074, Longitude: -0.1278)
Data Collected:
- Temperature (°C)
- Wind speed (km/h)
- Wind direction (degrees)
- Weather code
- Timestamp
Run DAG tests using pytest:
astro dev pytestCustomize the pipeline by modifying variables in dags/etlweather.py:
LATITUDE = '51.5074' # Change to your desired latitude
LONGITUDE = '-0.1278' # Change to your desired longitudeThe pipeline creates a weather_data table with the following schema:
CREATE TABLE weather_data (
latitude FLOAT,
longitude FLOAT,
temperature FLOAT,
windspeed FLOAT,
winddirection FLOAT,
weathercode INT,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Monitor your DAGs in the Airflow UI:
- View task execution history
- Check logs for debugging
- Monitor success/failure rates
- Set up alerts for task failures
astro dev stopTo stop and remove all containers:
astro dev killContributions are welcome! Feel free to open issues or submit pull requests.
This project is open source and available under the MIT License.
codezelaca
- GitHub: @codezelaca
Happy Data Engineering! 🚀