Skip to content

Manish9211Ram/Intelligent-Query-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Intelligent Query & Insight Engine

A FastAPI backend that converts natural language financial questions into SQL, executes them safely, and generates LLM-powered human-readable insights using Claude (Anthropic).


Architecture

User Question (NL)
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   POST /query      β”‚  FastAPI endpoint
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Cache Check      β”‚  In-memory TTL cache (Redis-optional)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚ miss
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  NL β†’ SQL (Claude) β”‚  Anthropic API call β†’ JSON {sql, params}
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SQL Validation    β”‚  Whitelist: SELECT-only, no DDL/DML
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   SQLite Execute   β”‚  Parameterized queries (? placeholders)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Insight (Claude)  β”‚  LLM summarises raw rows β†’ readable insight
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  History Log +     β”‚  Stored in query_history table
β”‚  Cache Write       β”‚  Cached for 5 minutes
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
    JSON Response

Features

Feature Implementation
NL β†’ SQL Claude (claude-sonnet-4-20250514) via Anthropic API
Database SQLite (parameterized ? placeholders β€” SQL injection safe)
SQL Safety Whitelist validation: only SELECT, blocks INSERT/UPDATE/DELETE/DROP/PRAGMA
Caching In-process TTL dict cache; auto-upgrades to Redis if REDIS_URL is set
LLM Insights Second Claude call turns raw rows into a 2-3 sentence insight
Query History All queries logged to query_history table with success/fail/row count
Analytics /analytics/{user_id} β€” category breakdown, monthly trends, top merchant
Error Handling Invalid SQL, empty results, LLM failures, bad inputs β€” all handled gracefully
Sample Data 600 seeded transactions (3 users Γ— 200 rows) across 8 categories

Setup

1. Clone & install dependencies

git clone <repo-url>
cd query-engine
pip install -r requirements.txt

2. Configure environment

cp .env.example .env
# Edit .env and set your Anthropic API key:
# ANTHROPIC_API_KEY=sk-ant-...

Optional: set REDIS_URL=redis://localhost:6379/0 for Redis caching.

3. Run

python run.py
# Server starts at http://localhost:8000

The database is created and seeded automatically on first start.


API Reference

POST /query

Convert a natural language question into SQL, execute it, and return an AI-generated insight.

Request:

{
  "user_id": 1,
  "question": "How much did I spend on food last month?"
}

Response:

{
  "success": true,
  "question": "How much did I spend on food last month?",
  "sql": "SELECT SUM(amount) AS total_spent FROM transactions WHERE user_id = ? AND category = ? AND transaction_date >= date('now','start of month','-1 month') AND transaction_date < date('now','start of month')",
  "data": [{ "total_spent": 3240.50 }],
  "row_count": 1,
  "insight": "You spent β‚Ή3,240.50 on Food last month across multiple merchants. This is roughly β‚Ή108 per day β€” consider meal planning to bring this down.",
  "cached": false,
  "latency_ms": 1842.3
}

GET /analytics/{user_id}

Returns spending breakdown and monthly trends.

GET /analytics/1
{
  "user_id": 1,
  "total_spent": 483920.39,
  "total_transactions": 200,
  "avg_transaction": 2419.60,
  "top_category": "Education",
  "top_merchant": "Coursera",
  "by_category": [
    { "category": "Education", "total": 90946.84, "count": 36 },
    { "category": "Food", "total": 67979.18, "count": 30 }
  ],
  "by_month": [
    { "month": "2026-04", "total": 54449.22, "count": 28 }
  ]
}

GET /history/{user_id}

Returns query history for a user.

GET /history/1?limit=10

GET /health

{ "status": "ok", "cache_size": 3 }

Example Queries

These all work out of the box with the seeded data:

"How much did I spend on food last month?"
"Show me my top 5 most expensive transactions"
"What did I spend on entertainment this year?"
"How many transactions did I make on Swiggy?"
"What is my average transaction amount?"
"Show me my spending by category"
"Which month did I spend the most?"
"How much did I spend on Uber this year?"

Security

  • SQL Injection: All user-supplied values go through ? placeholders. The only strings ever interpolated are category/column names generated by the LLM under a whitelisted schema prompt.
  • SQL Whitelist: A _validate_sql() guard blocks any statement that doesn't start with SELECT or contains DDL/DML keywords (INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, PRAGMA, ATTACH).
  • Input validation: Pydantic models enforce user_id > 0 and 3 ≀ len(question) ≀ 500.

Testing

pytest tests/ -v

17 tests covering:

  • Database seeding and parameterized queries
  • Analytics endpoint structure
  • In-memory cache (get/set/TTL/delete/size)
  • SQL whitelist validation (SELECT allowed, DDL/DML blocked)
  • User ID injection into parameterized queries

Project Structure

query-engine/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py          # FastAPI app, routes, startup
β”‚   β”œβ”€β”€ models.py        # Pydantic request/response schemas
β”‚   β”œβ”€β”€ database.py      # SQLite layer, seeding, history, analytics
β”‚   β”œβ”€β”€ query_engine.py  # NLβ†’SQL, validation, execution, LLM insight
β”‚   └── cache.py         # InMemoryCache + RedisCache with auto-fallback
β”œβ”€β”€ tests/
β”‚   └── test_engine.py   # 17 unit tests
β”œβ”€β”€ run.py               # Entry point (uvicorn)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md

Design Decisions

Why SQLite? Zero-setup for assignment; the database abstraction layer in database.py is easily swapped for PostgreSQL by replacing sqlite3 with psycopg2 and changing ? placeholders to %s.

Why two LLM calls? The NL→SQL call must return structured JSON (sql + params). Mixing insight generation into that prompt degrades JSON reliability. Keeping them separate gives cleaner outputs and easier error handling.

Why separate param injection? Claude returns null as the user_id placeholder, which gets replaced with the real value server-side. This prevents prompt injection: an attacker who manipulates the question cannot override the user_id used in the query.

Caching strategy: 5-minute TTL on successful query results. Cache key = query:{user_id}:{question.lower()}. Redis is used if REDIS_URL is set; falls back to in-process dict cache with zero config required.

About

🧠 An AI-powered Financial Query Engine that converts natural language questions into secure SQL queries, executes them on a transaction database, and generates LLM-powered human-readable insights. Built with FastAPI, SQLite, and Claude (Anthropic).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages