Skip to content

Agent-Tracking-Mech/agentic-db-query

Repository files navigation

agentic-db-query

Natural language → SQL TypeScript agent — install dependencies, point at a database, ask questions in plain English.

TypeScript Vercel AI SDK License: MIT Node.js PostgreSQL


What is this?

agentic-db-query is a TypeScript agent that turns natural language questions into SQL, validates execution safety, runs the query, and returns rows plus a short explanation. It loads your live schema (tables, columns, types, foreign keys, optional sample values), uses an LLM via the Vercel AI SDK, and retries with error feedback when the database rejects generated SQL.

Typical uses: internal BI helpers, support dashboards, admin lookups, prototyping queries, teaching SQL by example.


Features

Feature Description
Schema-aware text-to-SQL Injects table/column definitions and relationships into the model prompt.
Safe execution Read-only mode by default; row cap via LIMIT; query timeout (PostgreSQL statement_timeout).
Self-healing On validation or execution failure, the agent retries with the error message (configurable attempts).
Session context Optional sessionId keeps a short transcript for follow-up questions.
Streaming queryStream() yields SQL fragments, then rows, then a done event.
Explain mode explain() returns SQL + explanation without executing.
Databases PostgreSQL, SQLite, MySQL (via mysql2).

Requirements

  • Node.js 22+
  • A reachable database (local or remote)
  • An API key for OpenAI or Anthropic (used by the model string, see below)

Installation

cd agentic-db-query
npm install

Run npm run build before npm start (the compiled API loads from dist/). For development, npm run dev runs TypeScript directly.


Configuration

Copy .env.example to .env and set:

Variable Description
DATABASE_URL Connection string (see below).
OPENAI_API_KEY Required if you use openai/... models.
ANTHROPIC_API_KEY Required if you use anthropic/... models.
MODEL Default: openai/gpt-4o-mini. Format: provider/model-id.
PORT HTTP port for npm run dev / npm start (default 3000; must be ≤ 65535).
READ_ONLY Set to false to disable read-only SQL checks (default true; use with care).
MAX_ROWS Hard cap appended as LIMIT when missing (default 1000).
QUERY_TIMEOUT_MS Query timeout in ms (default 30000).
MAX_RETRIES Max LLM generate/execute attempts for POST /query (default 3).
INCLUDE_SAMPLE_VALUES Set to true to add per-column sample values to the schema prompt (default off; slower on large DBs).

Database URLs

  • PostgreSQL: postgresql://user:pass@localhost:5432/mydb
  • SQLite: sqlite:./data.db or file:./data.db (path after the prefix)
  • MySQL: mysql://user:pass@localhost:3306/mydb

Quick start (library)

import 'dotenv/config';
import { DatabaseAgent } from './src/index.js';

const agent = new DatabaseAgent({
  connectionString: process.env.DATABASE_URL!,
  model: process.env.MODEL ?? 'openai/gpt-4o-mini',
  readOnly: true,
  maxRows: 100,
});

const result = await agent.query('How many users joined in the last 30 days?');

console.log(result.sql);
console.log(result.results);
console.log(result.explanation);

await agent.close();

Explain only (no execution)

Returns generated SQL and explanation without running it. In read-only mode, the SQL is still validated the same way as for query() (mutating SQL is rejected).

const { sql, explanation } = await agent.explain('Show me top products by revenue');

Session memory

await agent.query('How many users do we have?', { sessionId: 'user-123' });
await agent.query('List their email addresses', { sessionId: 'user-123' });

Domain hints

agent.addHint('When asked for revenue, use orders.total_amount');

Schema filter

const agent = new DatabaseAgent({
  connectionString: process.env.DATABASE_URL!,
  model: 'openai/gpt-4o-mini',
  schemaFilter: (table) => !table.startsWith('_internal'),
});

Streaming

queryStream() emits partial SQL while the model generates it, then rows and a final done event. Unlike query(), it does not retry on execution errors (you get a single error event).

for await (const event of agent.queryStream('Show first 10 rows from users')) {
  if (event.type === 'sql') console.log(event.sql);
  if (event.type === 'row') console.log(event.row);
  if (event.type === 'done') console.log(event.explanation, event.rowCount);
  if (event.type === 'error') console.error(event.message);
}

HTTP API

npm run dev
  • GET /health — liveness check
  • POST /query — JSON body: { "question": string, "sessionId?: string }
    Returns: { sql, results, explanation, rowCount, executionTimeMs, retries? }

Example:

curl -s -X POST http://localhost:3000/query \
  -H 'Content-Type: application/json' \
  -d '{"question":"List tables"}'

CLI

npm run cli

Interactive prompts; uses DATABASE_URL and MODEL from .env.


Docker

export OPENAI_API_KEY=sk-...
docker compose up --build

Services: PostgreSQL 16 and the API on port 3000. Ensure the database has data/schema you expect before querying.


How it works

  1. Load schema — introspect tables, columns, types, foreign keys (optional row samples).
  2. Build prompt — system rules + formatted schema + optional hints and session context.
  3. Generate SQL — structured output (sql + explanation) via generateObject.
  4. Validate — in read-only mode, allow a single SELECT / WITH ... SELECT and block DML/DDL patterns.
  5. Execute — run with timeout and automatic LIMIT when absent.
  6. Retry — on failure, feed the database error back into the next attempt.

API reference

DatabaseAgentConfig

Field Type Notes
connectionString string Required.
model string | LanguageModel e.g. openai/gpt-4o-mini or a Vercel AI SDK model instance.
readOnly boolean Default true.
maxRows number Default 1000.
queryTimeoutMs number Default 30000.
maxRetries number Default 3; max generation/execute attempts for query() (each failure feeds back to the model).
includeSampleValues boolean Default false; adds per-column samples to the prompt.
schemaFilter (table: string) => boolean Exclude tables from schema injection.
hints string[] Initial domain hints (same as addHint).

QueryResult

Field Type
sql string
results Record<string, unknown>[]
explanation string
rowCount number
executionTimeMs number
retries number (optional)

Scripts

Command Description
npm run build Compile TypeScript to dist/.
npm start Run compiled API (node dist/api/server.js; run npm run build first).
npm run dev Run API with tsx watch.
npm run cli Interactive CLI.
npm test All automated tests (tests/agent.integration.test.ts is skipped unless OPENAI_API_KEY is set).
npm run test:integration Run only tests/agent.integration.test.ts (faster when iterating on the LLM test).
npm run test:watch Vitest in watch mode.

prepack runs npm run build automatically before npm pack / npm publish, so dist/ is fresh for the tarball.


Project layout

agentic-db-query/
├── src/
│   ├── DatabaseAgent.ts
│   ├── db/
│   │   ├── dialect.ts
│   │   └── runner.ts
│   ├── schema/
│   │   ├── extractor.ts
│   │   └── formatter.ts
│   ├── sql/
│   │   ├── validator.ts
│   │   ├── executor.ts
│   │   └── result-formatter.ts
│   ├── prompts/
│   │   └── text-to-sql.ts
│   ├── llm/
│   │   └── model.ts
│   ├── api/
│   │   └── server.ts
│   ├── cli/
│   │   └── index.ts
│   └── index.ts
├── tests/
├── .gitignore
├── .env.example
├── docker-compose.yml
├── Dockerfile
├── vitest.config.ts
├── package.json
├── tsconfig.json
├── LICENSE
└── README.md

Limitations

  • Read-only validation uses heuristics (not a full SQL parser). Rare edge cases can include identifiers that resemble keywords.
  • Query timeouts are applied via PostgreSQL statement_timeout and MySQL’s query timeout. SQLite does not apply a server-side timeout; maxRows still caps result size.

Security notes

  • Keep read-only enabled for untrusted users.
  • The LLM can still produce expensive queries; use timeouts and row limits.
  • Run against least-privilege DB users in production.
  • This tool is not a substitute for parameterized app queries in application code; it is for controlled analytics/admin use.

License

This project is licensed under the MIT License — see LICENSE.

Contact Info

Telegram: @AuraTerminal

About

Natural language to SQL in TypeScript — LLM text-to-SQL agent with schema injection, validation, Vercel AI SDK, REST API & CLI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors