Weather Q&A is a web application that translates natural-language weather questions into SQL queries and returns answers from a structured database.
The project was developed as a take-home AI/ML engineering task and focuses on practical LLM application design: prompt engineering, SQL generation, query validation, model backends, and evaluation of generated queries.
- Natural-language-to-SQL generation for weather-related questions
- FastAPI backend with a simple interactive web interface
- Multiple LLM backends: OpenAI, Hugging Face, and Ollama
- LangChain-based prompt orchestration
- SQL dialect adaptation for SQLite / SQL Server
- Dockerized setup for reproducible local deployment
- Evaluation scripts for SQL accuracy, validity, and latency
This project demonstrates how LLMs can be integrated into structured-data workflows where correctness, robustness, and query safety matter. It is a small but production-inspired example of turning natural-language interaction into reliable data access.
- FastAPI – Web backend and interactive interface.
- LangChain – Handles prompt logic and LLM orchestration.
- OpenAI / HuggingFace / Ollama – Backends for running different language models.
- Docker & Docker Compose – For easy deployment and environment isolation.
- PyODBC / SQL Server – Database engine and Python connectivity.
- Autocorrect – Improves robustness by correcting minor typos in natural language queries.
- Bidirectional NL ↔ SQL conversion for weather-related queries
- Friendly web interface with model selector, live feedback and error handling
- Multiple model backends (OpenAI, Ollama, HuggingFace)
- SQL dialect adaptation for SQLite and SQL Server (LIMIT → TOP, etc.)
- Automatic query correction and schema normalization
- Integrated LangChain pipeline with prompt templating, model orchestration, and fuzzy value correction
- Prompt engineering support and evaluation utilities
- Evaluation scripts for accuracy, validity, and latency
- Dockerized deployment support
Before running the project, make sure you have:
- Docker and Docker Compose installed (or run it locally)
- A valid OpenAI API key (optional, for GPT-based models)
- (Optional) Ollama installed — to use local language models
- Java 17+ (if you're running it locally)
This app uses the language_tool_python library, which depends on Java.
If you're running the app locally (not in Docker), you must have Java 17 or higher installed on your system.
To check your current version:
java --versionIf it's below version 17, download and install the latest Java version from Adoptium.
After installation, make sure your system uses the new version (restart terminal or adjust JAVA_HOME/PATH if needed).
git clone https://github.com/annaostrowska03/weather_qa_nl2sql.git
cd weather_qa_nl2sqlCreate a file named .env in the project root:
# SQL Server configuration
SA_PASSWORD=MyStr0ngP@ssword! # Required when using Docker for SQL Server
DB_SERVER=sqlserver # Use 'sqlserver' with docker-compose, or 'host.docker.internal' if running SQL Server separately
DB_PORT=1433
DB_NAME=WeatherDB
DB_USER=sa
DB_PASSWORD=MyStr0ngP@ssword! # Should match SA_PASSWORD
SQL_DRIVER=ODBC Driver 17 for SQL Server
# OpenAI (optional)
OPEN_API_KEY=sk-...
# Ollama (for local models)
OLLAMA_BASE_URL=http://host.docker.internal:11434Your password must be strong:
- Min. 8 characters
- Must include uppercase, lowercase, digit, and special character (e.g.
@,!,#)
This runs the app and SQL Server together, fully automated.
docker compose up --buildStop with:
docker compose downOpen your browser at: http://localhost:8000
You can also use docker run manually, but docker-compose is recommended for simplicity and proper DB initialization.
# Create virtual environment (optional)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start the app
uvicorn app.main:app --reloadOpen your browser at: http://localhost:8000
To use local models like phi3:mini, mistral, or llama3:
- Windows: Download and run the installer from https://ollama.com/download
- Linux/macOS:
curl -fsSL https://ollama.com/install.sh | sh
ollama run mistral
# or
ollama run phi3:miniThe app will connect to Ollama at http://host.docker.internal:11434.
The recommended and most accurate model is GPT-4o mini – highest precision, fast and best overall results.
However, this model requires a valid OpenAI API key. If you don’t have one, you can still choose from several alternative models (e.g., phi3:mini, mistral, or juierror/text-to-sql-with-table-schema) that are open-source and run locally using Ollama – although their accuracy may be lower.
To use GPT-4o mini:
-
Enter it into the web interface when prompted.
Support for additional OpenAI models (gpt-3.5-turbo and gpt-4o) was added to provide more flexibility for users with different API access levels. These models were not tested due to lack of access, so their behavior and accuracy may vary.
| Model | Source | Requires API Key |
|---|---|---|
gpt-4o-mini, gpt-4o, gpt-3.5-turbo |
OpenAI | ✅ Yes |
phi3:mini, mistral, llama3 |
Ollama | ❌ No |
tscholak/1zha5ono |
HuggingFace | ❌ No |
You’re now ready! Open your browser at:
http://localhost:8000
Use the web UI to enter questions and select the model you want.
| Model | Source |
|---|---|
tscholak/1zha5ono |
HuggingFace |
juierror/text-to-sql-with-table-schema |
HuggingFace |
mistral |
Ollama |
phi3:mini |
Ollama |
llama3 |
Ollama |
gpt-4o-mini |
OpenAI API |
After evaluation on:
- Accuracy of SQL execution (compared with 'ideal' results from the database)
- Validity of SQL syntax
- Response time
The selected default models in the web interface were:
phi3:mini– for fast, accurate free usegpt-4o-mini– best overall accuracy with fast response (requires OpenAI API key)mistral– open-source fallback model (slower but reliable)juierror/text-to-sql-with-table-schema– baseline HuggingFace model (the fastest but lower accuracy)gpt-4oandgpt-3.5-turbofor users with different API access levels (not tested)
From tests/prompt_templates.py, the following prompt styles were tested:
default– descriptive, fuzzy matching enabled, detailed task instructionsfew_shot– includes 4 diverse examplesschema_only– minimal, schema-only baselineinstructional– assumes analyst personarag_style– schema in natural languageconversational– lightweight format
Each model was mapped to its suitable prompt style (it was checked in tests which prompt works the best for which model):
model_prompt_styles = {
"phi3:mini": "few_shot",
"gpt-4o-mini": "default",
"gpt-3.5-turbo": "default",
"gpt-4o": "default",
"mistral": "default",
"llama3": "default",
"tscholak/1zha5ono": "{corrected_question} | Weather : City, Temperature, Weather, Climate",
"juierror/text-to-sql-with-table-schema": "question: {corrected_question} Weather: City,Temperature,Weather,Climate"
}These prompt texts are available in the tests/prompt_templates.py file.
Hyperparameters tested:
-
SQL generation temp: 0.0, 0.1, 0.3,
-
Answer temp: 0.5, 0.7, 1.0,
-
Context length: 2048, Tokens: 64, 128.
Using evaluate.py, models were scored on:
- SQL accuracy vs. ground-truth answers
- SQL validity (via parsing)
- Query execution success
- Latency (response time)
Results were saved to:
tests/output/model_test_results.csvevaluation_results.csvevaluation_summary.csv
Query:
{
"question": "Where is it raining?",
"model": "phi3:mini",
"api_key": ""
}Response:
"sql": "SELECT City FROM Weather WHERE Weather LIKE '%rainy%'"
"result": "Bangalore"
"answer": "It is raining in Bangalore."The default database (WeatherDB) is automatically created and initialized using the init-db service defined in docker-compose.yml. This service runs a provided SQL script (init_db.sql) to seed example weather data during startup.
If you wish to use your own custom database schema, you can modify or replace the init_db.sql script. However, please note that prompt templates and model behavior are aligned to the original column names and data structure — adjustments may be required to maintain accuracy if your schema differs significantly.
The app expects a single table called Weather with the following columns:
| Column | Type | Description |
|---|---|---|
City |
TEXT | Name of the city |
Temperature |
INTEGER | Temperature (degrees) |
Weather |
TEXT | Weather description |
Climate |
TEXT | General climate classification |
You can inspect or edit this schema in the init_db.sql file.
Returns clear messages on:
- Empty or invalid questions
- Query translation failure
- No matching results
- Wrong/missing API Key value (when using gpt)
This project includes automatic, interactive documentation thanks to FastAPI.
- Swagger UI – Full interactive docs with try-it-now interface: http://localhost:8000/docs
- ReDoc – Clean, readable OpenAPI spec documentation: http://localhost:8000/redoc
weather_qa_nl2sql/
├── app/
│ ├── main.py # FastAPI
│ ├── nlp.py # NLP logic: generation, repair
│ ├── database.py # DB connection setup
│ ├── models.py # Pydantic Question schema (question, model, api_key)
├── tests/
│ ├── tests.py # Grid search on different models, prompts and hyperparamaters
│ ├── evaluate.py # Model performance evaluator
│ ├── questions.py # Natural language test questions
│ ├── prompt_templates.py # Tested prompts for models
├── app/templates/ # HTML (Jinja2-based)
├── Dockerfile
├── requirements.txt
├── .env # Config file (not committed)
├── init_db.sql # SQL init script