text-to-sql-agent

An open-source AI agent that converts natural language into SQL. Ask questions about your data in plain English -- the agent generates, executes, and explains SQL queries using Claude's extended thinking capabilities.

Built with LangGraph + FastAPI + Next.js. Supports PostgreSQL, MySQL, and CSV uploads.

Live Demo

What Is This?

A full-stack AI-powered SQL assistant. You provide a database connection (or upload CSVs), then ask questions like:

"What were the top 10 products by revenue last quarter?"
"Show me customers who haven't ordered in 90 days"
"Explain this query: SELECT ..."
"Optimize this slow JOIN for me"

The agent reasons through the problem, writes SQL, executes it, validates the results, and iterates if needed -- all streamed to a chat UI in real time.

Three Modes

Mode	What it does
Generate	Interprets your question, writes SQL, runs it, returns results with analysis
Explain	Takes a SQL query and breaks it down clause-by-clause in plain English
Optimize	Analyzes query performance and rewrites it with suggested indexes

What You'll Learn

This project is a practical reference for building production-grade AI agents. By reading the code, you'll learn:

Agent architecture with LangGraph -- stateful graph with LLM and Tool nodes, conditional edges, and persistent checkpointing
Extended thinking -- how Claude reasons step-by-step before generating SQL, and how interleaved reasoning improves accuracy
Human-in-the-loop -- LangGraph interrupt() to let users review/modify SQL before execution
Streaming SSE from FastAPI -- real-time token-by-token streaming over Server-Sent Events with a POST-based approach
Tool calling -- structured tool definitions, input validation with Pydantic, and read-only SQL enforcement via sqlglot
Multi-database connectors -- abstract connector pattern with PostgreSQL, MySQL, and DuckDB implementations
Prompt engineering -- Jinja2 templates with conditional system prompts, XML-structured schema context, and mode-specific instructions
API key auth -- SHA-256 hashed keys with in-memory TTL cache
Encrypted credential storage -- Fernet symmetric encryption for stored database passwords
Full-stack deployment -- Docker multi-stage builds, Compose orchestration, non-root containers

Architecture

                    ┌─────────────────────────────────────────┐
                    │              Next.js Frontend            │
                    │  Chat UI  ·  CSV Upload  ·  DB Connect  │
                    └──────────────────┬──────────────────────┘
                                       │ SSE (POST)
                    ┌──────────────────▼──────────────────────┐
                    │             FastAPI Backend              │
                    │                                         │
                    │  ┌───────────────────────────────────┐  │
                    │  │         LangGraph Agent            │  │
                    │  │                                    │  │
                    │  │   START ──► LLM Node ──► Tool Node │  │
                    │  │              ▲              │      │  │
                    │  │              └──────────────┘      │  │
                    │  │         (conditional loop)         │  │
                    │  └──────────────┬────────────────────┘  │
                    │                 │                        │
                    │    ┌────────────▼────────────┐          │
                    │    │    Query Executor Tool   │          │
                    │    │  sqlglot validation      │          │
                    │    │  human-in-the-loop       │          │
                    │    └─────┬──────────┬────────┘          │
                    │          │          │                    │
                    └──────────┼──────────┼────────────────────┘
                               │          │
              ┌────────────────▼─┐  ┌─────▼───────────────┐
              │     DuckDB       │  │  PostgreSQL / MySQL  │
              │  (CSV uploads)   │  │  (live connections)  │
              └──────────────────┘  └─────────────────────┘

Agent loop: The LLM node calls Claude, which may request tool calls. The conditional edge checks if the response contains tool calls -- if yes, route to the Tool node, which executes the query and returns results. The loop continues until the LLM produces a final text response.

Checkpointing: Every conversation turn is persisted to PostgreSQL via LangGraph's AsyncPostgresSaver, enabling multi-turn conversations with full history.

Tech Stack

Backend

Component	Technology	Why
LLM	Anthropic Claude	Extended thinking for step-by-step SQL reasoning
Agent Framework	LangGraph	Stateful graph with checkpointing, streaming, and interrupts
API	FastAPI	Async Python, SSE streaming, auto-generated OpenAPI docs
SQL Validation	sqlglot	Parse and validate SQL is read-only before execution
CSV Engine	DuckDB	In-memory analytical SQL engine for uploaded CSVs
Checkpointing	PostgreSQL + psycopg3	Persistent conversation state and app tables
Encryption	cryptography (Fernet)	Symmetric encryption for stored DB credentials
Prompts	Jinja2	Templated system prompts with conditional modes
Config	pydantic-settings	Type-safe settings from environment variables
Observability	LangSmith (optional)	Trace agent runs, debug prompt/tool interactions

Frontend

Component	Technology
Framework	Next.js 15 (App Router)
Language	TypeScript
Styling	Tailwind CSS
Markdown	react-markdown + remark-gfm
SSE Client	Custom fetch + ReadableStream parser

Infrastructure

Component	Technology
Containers	Docker (multi-stage, non-root)
Orchestration	Docker Compose
Package Manager	uv (Python), npm (Node)
Linting	Ruff, MyPy

Getting Started

Prerequisites

Python 3.12+
uv
Node.js 20+
PostgreSQL (local or remote)
Anthropic API key

1. Clone the repo

git clone https://github.com/shreyabaid007/text-to-sql-agent.git
cd text-to-sql-agent

2. Configure environment

cp .env.example .env

Edit .env and fill in your values:

DB__HOST=localhost
DB__PORT=5432
DB__DATABASE=text_to_sql
DB__USER=postgres
DB__PASSWORD=your_password
DB__SCHEMA_NAME=public

ANTHROPIC_API_KEY=sk-ant-...

3. Install dependencies

# Backend
uv sync --all-groups

# Frontend
cd frontend && npm install && cd ..

4. Initialize the database

uv run python scripts/initialize_db.py

This creates the PostgreSQL schema, LangGraph checkpoint tables, and all application tables (API keys, connections, query history, usage tracking).

5. Start the backend

make dev

API runs at http://localhost:8000 -- docs at http://localhost:8000/docs.

6. Start the frontend

cd frontend
cp .env.local.example .env.local
npm run dev

UI runs at http://localhost:3000.

7. Create an API key

curl -s -X POST http://localhost:8000/v1/api/keys \
  -H "Content-Type: application/json" \
  -d '{"name": "dev"}' | python -m json.tool

Copy the key value and paste it into the frontend sidebar.

8. Start chatting

Upload a CSV or connect a database from the sidebar, then ask questions about your data.

Docker Deployment

# Generate a Fernet key for credential encryption
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Configure .env with production values, then:
docker compose -f docker-compose.prod.yml --profile init up init-db   # one-time DB init
docker compose -f docker-compose.prod.yml up -d                       # start everything

This starts PostgreSQL, the backend, and the frontend. See the docker-compose.prod.yml for all configurable options.

Project Structure

.
├── server.py                        # FastAPI app entry point
├── docker-compose.prod.yml          # Production deployment
├── Dockerfile                       # Backend container
├── .env.example                     # Environment template
│
├── prompts/
│   └── system.jinja2                # LLM system prompt (generate/explain/optimize modes)
│
├── queries/ddl/                     # SQL migration files
│   ├── create_schema.sql
│   ├── create_api_keys.sql
│   ├── create_connections.sql
│   ├── create_schema_catalog.sql
│   ├── create_query_history.sql
│   ├── create_usage_events.sql
│   └── create_indexes.sql
│
├── scripts/
│   └── initialize_db.py             # One-time DB bootstrap
│
├── src/app/
│   ├── api/
│   │   ├── lifespan.py              # App startup/shutdown lifecycle
│   │   ├── dependencies.py          # FastAPI dependency injection (auth, DB pool)
│   │   ├── rate_limit.py            # Token-bucket rate limiter middleware
│   │   └── routers/
│   │       ├── chat.py              # SSE streaming chat endpoint
│   │       ├── data.py              # CSV upload + DuckDB table management
│   │       ├── connections.py       # Database connection CRUD
│   │       ├── keys.py              # API key management
│   │       ├── catalog.py           # Schema introspection cache
│   │       ├── history.py           # Query history
│   │       └── usage.py             # Token usage tracking
│   │
│   ├── config/                      # pydantic-settings configuration
│   ├── connectors/                  # Database connector abstraction
│   │   ├── base.py                  # ABC: DatabaseConnector
│   │   ├── postgres.py              # PostgreSQL connector (psycopg)
│   │   ├── mysql.py                 # MySQL connector (aiomysql)
│   │   ├── duckdb_connector.py      # DuckDB connector
│   │   └── service.py              # ConnectionService (encrypt, introspect, execute)
│   │
│   ├── crypto/                      # Fernet encrypt/decrypt utilities
│   ├── db/                          # Connection pool + SQL query loader
│   │
│   ├── graphs/                      # LangGraph agent
│   │   └── chat/
│   │       ├── graph.py             # Graph construction (LLM → Tool loop)
│   │       ├── state.py             # ChatGraphState dataclass
│   │       ├── edges.py             # Conditional edge: is_tool_call
│   │       └── nodes/
│   │           ├── llm.py           # LLM node (streaming, primary/fallback models)
│   │           └── tool.py          # Tool execution node
│   │
│   ├── tools/
│   │   ├── base.py                  # Generic BaseTool[State, Input, Output]
│   │   ├── db.py                    # QueryExecutorTool (sqlglot validation, HITL interrupt)
│   │   └── handler.py              # Tool dispatch
│   │
│   └── models/                      # Pydantic request/response models
│
└── frontend/                        # Next.js 15 app
    ├── Dockerfile
    ├── src/
    │   ├── app/page.tsx             # Main page (sidebar + chat layout)
    │   ├── components/
    │   │   ├── ChatWindow.tsx       # Chat interface + SSE consumer
    │   │   ├── MessageBubble.tsx    # Message rendering (markdown, SQL, tables)
    │   │   ├── ChatInput.tsx        # Input + mode selector
    │   │   ├── CsvUpload.tsx        # Drag-and-drop CSV upload
    │   │   ├── ConnectionForm.tsx   # Database connection form
    │   │   └── SchemaViewer.tsx     # Collapsible schema display
    │   └── lib/
    │       ├── api.ts               # Typed API client
    │       ├── sse.ts               # POST-based SSE stream parser
    │       └── types.ts             # Shared TypeScript types

API Endpoints

All routes are prefixed with /v1/api. Routes marked with a lock require the X-API-Key header.

Method	Endpoint	Auth	Description
`POST`	`/keys`		Create a new API key
`GET`	`/keys`	🔒	List all API keys
`DELETE`	`/keys/{id}`	🔒	Revoke an API key
`POST`	`/data/upload`	🔒	Upload CSV files
`GET`	`/data/tables`	🔒	List DuckDB tables
`DELETE`	`/data/tables/{name}`	🔒	Drop a DuckDB table
`POST`	`/connections`	🔒	Register a database connection
`GET`	`/connections`	🔒	List connections
`POST`	`/connections/{id}/test`	🔒	Test a connection
`DELETE`	`/connections/{id}`	🔒	Remove a connection
`GET`	`/connections/{id}/schema`	🔒	Get cached schema
`POST`	`/connections/{id}/schema/refresh`	🔒	Re-introspect schema
`POST`	`/stream/{user_id}/{thread_id}`		Chat (SSE stream)
`POST`	`/stream/{user_id}/{thread_id}/resume`		Resume after interrupt
`GET`	`/connections/{id}/history`	🔒	Query history
`GET`	`/usage`	🔒	Token usage by day
`GET`	`/usage/summary`	🔒	Monthly usage summary
`GET`	`/health/`		Health check

Environment Variables

Variable	Required	Default	Description
`DB__HOST`	Yes	--	PostgreSQL host
`DB__PORT`	Yes	`5432`	PostgreSQL port
`DB__DATABASE`	Yes	--	Database name
`DB__USER`	Yes	--	Database user
`DB__PASSWORD`	Yes	--	Database password
`DB__SCHEMA_NAME`	No	`public`	PostgreSQL schema
`DB__POOL_MIN_SIZE`	No	`0`	Min pool connections
`DB__POOL_MAX_SIZE`	No	`5`	Max pool connections
`FERNET_KEY`	Prod	--	Encryption key (auto-generated in dev)
`ENVIRONMENT`	No	`development`	Set `production` to enforce FERNET_KEY and disable docs
`CORS_ORIGINS`	No	`*`	Comma-separated allowed origins
`ANTHROPIC_API_KEY`	Yes	--	Anthropic API key
`LANGSMITH_TRACING`	No	`false`	Enable LangSmith tracing
`NEXT_PUBLIC_API_URL`	No	`http://localhost:8000/v1/api`	Frontend API URL

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

Fork the repo
Create your branch (git checkout -b feature/my-feature)
Make your changes
Run linters: uv run --group lint ruff check src/ server.py
Commit (git commit -m "Add my feature")
Push and open a Pull Request

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
api		api
frontend		frontend
prompts		prompts
queries/ddl		queries/ddl
scripts		scripts
src/app		src/app
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py
uv.lock		uv.lock
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text-to-sql-agent

What Is This?

Three Modes

What You'll Learn

Architecture

Tech Stack

Backend

Frontend

Infrastructure

Getting Started

Prerequisites

1. Clone the repo

2. Configure environment

3. Install dependencies

4. Initialize the database

5. Start the backend

6. Start the frontend

7. Create an API key

8. Start chatting

Docker Deployment

Project Structure

API Endpoints

Environment Variables

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

text-to-sql-agent

What Is This?

Three Modes

What You'll Learn

Architecture

Tech Stack

Backend

Frontend

Infrastructure

Getting Started

Prerequisites

1. Clone the repo

2. Configure environment

3. Install dependencies

4. Initialize the database

5. Start the backend

6. Start the frontend

7. Create an API key

8. Start chatting

Docker Deployment

Project Structure

API Endpoints

Environment Variables

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages