A production-ready Natural Language to SQL (NL2SQL) conversational agent built with FastAPI, Vanna 2.0, and multiple Large Language Models (Groq & Google Gemini). This system translates natural language questions into accurate SQL queries, executes them securely, and dynamically renders the results alongside interactive charts.
- Advanced NL2SQL Translation: Leverages Vanna 2.0 tied to the cutting-edge Groq (
llama-3.3-70b-versatile) model for blazing fast SQL generation. - Robust Multi-LLM Fallback Architecture: Automatically switches seamlessly to Google Gemini (
gemini-flash-latest) via the nativegoogle-genaiSDK if the Groq API rate limit (HTTP 429) is exhausted. ensuring 100% uptime and reliability. - Context-Aware Memory Seeding: Asynchronously pre-loaded with high-quality domain-specific Q&A pairs to instantly improve agent accuracy.
- Hardened SQL Validation Guard: 5-layer query validator that rigorously blocks destructive actions (
DROP,DELETE,UPDATE,INSERT), system tables, and multiline exploits. - Automated Chart Intelligence: Analyzes resultant datasets using Plotly and automatically determines the most effective visualization type (Bar, Pie, Line).
- Modern UI/UX Frontend: A sleek, fully-integrated Dark-Mode interface with interactive code highlighting, responsive design, and dynamic data tables.
- User Input: Natural language queries sent via standard UI
/chatinterface. - LLM Orchestration:
Vanna 2.0parses schema metadata and agent memory context. - Primary / Secondary Generation: High-speed query processing using Groq by default, seamlessly falling back to Gemini APIs on limits.
- Validation: Query validation strictly enforces
SELECTsemantics. - Execution & Rendering: Synthesized SQL executed against a pre-seeded SQLite clinic dataset (
1450+ records). - Response Payload: The backend serves JSON comprising the semantic answer, underlying SQL, execution time, and rendered JSON-ready Plotly charts.
- Backend: Python + FastAPI
- NL2SQL Engine: Vanna 2.0
- LLM APIs:
- Primary: Groq (Llama 3.3 70b)
- Fallback: Google Gemini (Gemini Flash via
google-genai)
- Data & Visualization: Pandas, Plotly
- Database: SQLite3
Ensure you have Python 3.10+ installed.
git clone https://github.com/Ranj8521Kumar/AI-Powered-NL2SQL-Chatbot-System.git
cd AI-Powered-NL2SQL-Chatbot-System/projectpython -m venv venv
# On Windows
venv\Scripts\activate
# On MacOS/Linux
source venv/bin/activate
pip install -r requirements.txtCopy the example environment file and insert your API keys:
cp .env.example .envInside .env, configure the following:
GROQ_API_KEY=your_groq_key # Primary LLM
GROQ_MODEL=llama-3.3-70b-versatile
GOOGLE_API_KEY=your_gemini_key # Fallback LLM
GEMINI_MODEL=gemini-flash-latest
DB_PATH=./clinic.db
HOST=0.0.0.0
PORT=8000Initialize the dummy database (clinic.db) with 1450+ records (Patients, Doctors, Invoices, Appointments, Treatments):
python setup_database.pyPre-train the Vanna Agent with context schema and questions:
python seed_memory.pyStart the FastAPI application using Uvicorn:
uvicorn main:app --host 0.0.0.0 --port 8000 --reload- Frontend Interface: http://localhost:8000
- API Documentation (Swagger): http://localhost:8000/api/docs
- Health Check: http://localhost:8000/health
Generates and executes SQL based on a user's natural language question.
Request:
{
"question": "What is the total revenue?"
}Response:
{
"message": "The total revenue is...",
"sql_query": "SELECT SUM(total_amount) AS total_revenue FROM invoices",
"columns": ["total_revenue"],
"rows": [[45800.5]],
"row_count": 1,
"execution_time_ms": 142
}Returns the operational status of the service, database connectivity, and the number of context queries pre-loaded globally into memory.
Response:
{
"status": "ok",
"database": "connected",
"agent_memory_items": 15
}An exhaustive 20-question comprehensive evaluation suite has been built into the system covering:
- Simple
SELECTqueries - Multi-table
JOINconstraints - Date/Temporal filtering
- Exception & boundary handling
- Direct SQL Injection blocking (returns HTTP exceptions)
The system passes 100% (20/20) of the test suites designed iteratively in RESULTS.md.
Ranjan Kumar Pandit
Designed and engineered as part of the rigorous NLP internship assignment requirements, focusing on production reliability, clean code architectures, fallback mechanisms, and robust security.


