Skip to content

astronomer/agents

agents

AI agent tooling for data engineering workflows. Includes an MCP server for Airflow, a CLI tool (af) for interacting with Airflow from your terminal, and skills that extend AI coding agents with specialized capabilities for working with Airflow and data warehouses. Works with Claude Code, Cursor, and other agentic coding tools.

Built by Astronomer. Apache 2.0 licensed and compatible with open-source Apache Airflow.

Table of Contents

Installation

Quick Start

npx skills add astronomer/agents --skill '*'

This installs all Astronomer skills into your project via skills.sh. You'll be prompted to select which agents to install to. To also select skills individually, omit the --skill flag.

Claude Code users: We recommend using the plugin instead (see Claude Code section below) for better integration with MCP servers and hooks.

Compatibility

Skills: Works with 25+ AI coding agents including Claude Code, Cursor, VS Code (GitHub Copilot), Windsurf, Cline, and more.

MCP Server: Works with any MCP-compatible client including Claude Desktop, VS Code, and others.

Claude Code

# Add the marketplace and install the plugin
claude plugin marketplace add astronomer/agents
claude plugin install data@astronomer

The plugin includes the Airflow MCP server that runs via uvx from PyPI. Data warehouse queries are handled by the analyzing-data skill using a background Jupyter kernel.

Cursor

Cursor supports both MCP servers and skills.

MCP Server - Click to install:

Add Airflow MCP to Cursor

Skills - Install to your project:

npx skills add astronomer/agents --skill '*' -a cursor

This installs skills to .cursor/skills/ in your project.

Manual MCP configuration

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "airflow": {
      "command": "uvx",
      "args": ["astro-airflow-mcp", "--transport", "stdio"]
    }
  }
}
Enable hooks (skill suggestions, session management)

Create .cursor/hooks.json in your project:

{
  "version": 1,
  "hooks": {
    "beforeSubmitPrompt": [
      {
        "command": "$CURSOR_PROJECT_DIR/.cursor/skills/airflow/hooks/airflow-skill-suggester.sh",
        "timeout": 5
      }
    ],
    "stop": [
      {
        "command": "uv run $CURSOR_PROJECT_DIR/.cursor/skills/analyzing-data/scripts/cli.py stop",
        "timeout": 10
      }
    ]
  }
}

What these hooks do:

  • beforeSubmitPrompt: Suggests data skills when you mention Airflow keywords
  • stop: Cleans up kernel when session ends

Other MCP Clients

For any MCP-compatible client (Claude Desktop, VS Code, etc.):

# Airflow MCP
uvx astro-airflow-mcp --transport stdio

# With remote Airflow
AIRFLOW_API_URL=https://your-airflow.example.com \
AIRFLOW_USERNAME=admin \
AIRFLOW_PASSWORD=admin \
uvx astro-airflow-mcp --transport stdio

Features

The data plugin bundles an MCP server and skills into a single installable package.

MCP Server

Server Description
Airflow Full Airflow REST API integration via astro-airflow-mcp: DAG management, triggering, task logs, system health

Skills

Data Discovery & Analysis

Skill Description
init Initialize schema discovery - generates .astro/warehouse.md for instant lookups
analyzing-data SQL-based analysis to answer business questions (uses background Jupyter kernel)
checking-freshness Check how current your data is
profiling-tables Comprehensive table profiling and quality assessment

Data Lineage

Skill Description
tracing-downstream-lineage Analyze what breaks if you change something
tracing-upstream-lineage Trace where data comes from
annotating-task-lineage Add manual lineage to tasks using inlets/outlets
creating-openlineage-extractors Build custom OpenLineage extractors for operators

DAG Development

Skill Description
airflow Main entrypoint - routes to specialized Airflow skills
setting-up-astro-project Initialize and configure new Astro/Airflow projects
managing-astro-local-env Manage local Airflow environment (start, stop, logs, troubleshoot)
authoring-dags Create and validate Airflow DAGs with best practices
testing-dags Test and debug Airflow DAGs locally
debugging-dags Deep failure diagnosis and root cause analysis

Migration

Skill Description
migrating-airflow-2-to-3 Migrate DAGs from Airflow 2.x to 3.x

User Journeys

Data Analysis Flow

flowchart LR
    init["/data:init"] --> analyzing["/data:analyzing-data"]
    analyzing --> profiling["/data:profiling-tables"]
    analyzing --> freshness["/data:checking-freshness"]
Loading
  1. Initialize (/data:init) - One-time setup to generate warehouse.md with schema metadata
  2. Analyze (/data:analyzing-data) - Answer business questions with SQL
  3. Profile (/data:profiling-tables) - Deep dive into specific tables for statistics and quality
  4. Check freshness (/data:checking-freshness) - Verify data is up to date before using

DAG Development Flow

flowchart LR
    setup["/data:setting-up-astro-project"] --> authoring["/data:authoring-dags"]
    setup --> env["/data:managing-astro-local-env"]
    authoring --> testing["/data:testing-dags"]
    testing --> debugging["/data:debugging-dags"]
Loading
  1. Setup (/data:setting-up-astro-project) - Initialize project structure and dependencies
  2. Environment (/data:managing-astro-local-env) - Start/stop local Airflow for development
  3. Author (/data:authoring-dags) - Write DAG code following best practices
  4. Test (/data:testing-dags) - Run DAGs and fix issues iteratively
  5. Debug (/data:debugging-dags) - Deep investigation for complex failures

Airflow CLI (af)

The af command-line tool lets you interact with Airflow directly from your terminal. Install it with:

uvx --from astro-airflow-mcp af --help

For frequent use, add an alias to your shell config (~/.bashrc or ~/.zshrc):

alias af='uvx --from astro-airflow-mcp af'

Then use it for quick operations like af health, af dags list, or af runs trigger <dag_id>.

See the full CLI documentation for all commands and instance management.

Configuration

Warehouse Connections

Configure data warehouse connections at ~/.astro/agents/warehouse.yml:

my_warehouse:
  type: snowflake
  account: ${SNOWFLAKE_ACCOUNT}
  user: ${SNOWFLAKE_USER}
  auth_type: private_key
  private_key_path: ~/.ssh/snowflake_key.p8
  private_key_passphrase: ${SNOWFLAKE_PRIVATE_KEY_PASSPHRASE}
  warehouse: COMPUTE_WH
  role: ANALYST
  databases:
    - ANALYTICS
    - RAW

Store credentials in ~/.astro/agents/.env:

SNOWFLAKE_ACCOUNT=xyz12345
SNOWFLAKE_USER=myuser
SNOWFLAKE_PRIVATE_KEY_PASSPHRASE=your-passphrase-here  # Only required if using an encrypted private key

Supported databases:

Type Package Description
snowflake Built-in Snowflake Data Cloud
postgres Built-in PostgreSQL
bigquery Built-in Google BigQuery
sqlalchemy Any SQLAlchemy driver Auto-detects packages for 25+ databases (see below)
Auto-detected SQLAlchemy databases

The connector automatically installs the correct driver packages for:

Database Dialect URL
PostgreSQL postgresql:// or postgres://
MySQL mysql:// or mysql+pymysql://
MariaDB mariadb://
SQLite sqlite:///
SQL Server mssql+pyodbc://
Oracle oracle://
Redshift redshift://
Snowflake snowflake://
BigQuery bigquery://
DuckDB duckdb:///
Trino trino://
ClickHouse clickhouse://
CockroachDB cockroachdb://
Databricks databricks://
Amazon Athena awsathena://
Cloud Spanner spanner://
Teradata teradata://
Vertica vertica://
SAP HANA hana://
IBM Db2 db2://

For unlisted databases, install the driver manually and use standard SQLAlchemy URLs.

Example configurations
# PostgreSQL
my_postgres:
  type: postgres
  host: localhost
  port: 5432
  user: analyst
  password: ${POSTGRES_PASSWORD}
  database: analytics

# BigQuery
my_bigquery:
  type: bigquery
  project: my-gcp-project
  credentials_path: ~/.config/gcloud/service_account.json

# SQLAlchemy (any supported database)
my_duckdb:
  type: sqlalchemy
  url: duckdb:///path/to/analytics.duckdb
  databases: [main]

# Redshift (via SQLAlchemy)
my_redshift:
  type: sqlalchemy
  url: redshift+redshift_connector://${REDSHIFT_USER}:${REDSHIFT_PASSWORD}@${REDSHIFT_HOST}:5439/${REDSHIFT_DATABASE}
  databases: [my_database]

Airflow

The Airflow MCP auto-discovers your project when you run Claude Code from an Airflow project directory (contains airflow.cfg or dags/ folder).

For remote instances, set environment variables:

Variable Description
AIRFLOW_API_URL Airflow webserver URL
AIRFLOW_USERNAME Username
AIRFLOW_PASSWORD Password
AIRFLOW_AUTH_TOKEN Bearer token (alternative to username/password)

Usage

Skills are invoked automatically based on what you ask. You can also invoke them directly with /data:<skill-name>.

Getting Started

  1. Initialize your warehouse (recommended first step):

    /data:init
    

    This generates .astro/warehouse.md with schema metadata for faster queries.

  2. Ask questions naturally:

    • "What tables contain customer data?"
    • "Show me revenue trends by product"
    • "Create a DAG that loads data from S3 to Snowflake daily"
    • "Why did my etl_pipeline DAG fail yesterday?"

Development

See CLAUDE.md for plugin development guidelines.

Local Development Setup

# Clone the repo
git clone https://github.com/astronomer/agents.git
cd agents

# Test with local plugin
claude --plugin-dir .

# Or install from local marketplace
claude plugin marketplace add .
claude plugin install data@astronomer

Adding Skills

Create a new skill in skills/<name>/SKILL.md with YAML frontmatter:

---
name: my-skill
description: When to invoke this skill
---

# Skill instructions here...

After adding skills, reinstall the plugin:

claude plugin uninstall data@astronomer && claude plugin install data@astronomer

Troubleshooting

Common Issues

Issue Solution
Skills not appearing Reinstall plugin: claude plugin uninstall data@astronomer && claude plugin install data@astronomer
Warehouse connection errors Check credentials in ~/.astro/agents/.env and connection config in warehouse.yml
Airflow not detected Ensure you're running from a directory with airflow.cfg or a dags/ folder

Contributing

Contributions welcome! Please read our Code of Conduct and Contributing Guide before getting started.

Roadmap

Skills we're likely to build:

DAG Operations

  • CI/CD pipelines for DAG deployment
  • Performance optimization and tuning
  • Monitoring and alerting setup
  • Data quality and validation workflows

Astronomer Open Source

  • Cosmos - Run dbt projects as Airflow DAGs
  • DAG Factory - Generate DAGs from YAML
  • Other open source projects we maintain

Conference Learnings

  • Reviewing talks from Airflow Summit, Coalesce, Data Council, and other conferences to extract reusable skills and patterns

Broader Data Practitioner Skills

  • Churn prediction, data modeling, ML training, and other workflows that span DE/DS/analytics roles

Don't see a skill you want? Open an issue or submit a PR!

License

Apache 2.0


Made with ❤️ by Astronomer