Transform Natural Language into Data Insights
InsightGen is an intelligent analytics dashboard that empowers users to explore and analyze structured datasets using plain English queries. Powered by Google's Gemini AI, it automatically converts natural language questions into SQL queries, executes them safely, and presents results through interactive visualizations and AI-generated insights.
Stop writing SQL. Start asking questions.
- π§ Natural Language Queries - Ask questions in plain English instead of writing SQL
- π AI-Powered SQL Generation - Gemini AI automatically generates optimal SQL queries
- π‘οΈ Secure Execution - Strict query validation prevents harmful operations (only SELECT queries allowed)
- π CSV & Excel Support - Upload custom datasets and analyze them instantly
- π Auto Schema Detection - Automatically detects and understands your data structure
- π Interactive Visualizations - Beautiful Plotly charts that adapt to your data
- π― Smart Chart Selection - AI chooses the best visualization for your data
- π Query Explanations - Understand exactly what SQL was generated
- π‘ AI Insights - Get analytical interpretations of your query results
- π Conversation Memory - Ask follow-up questions with full context
- π₯οΈ Responsive Dashboard - Modern, user-friendly Streamlit interface
Main dashboard interface with natural language query input area, sample data loaded, and navigation sidebar
Automatic chart generation showing query results with interactive Plotly visualizations
Easy dataset upload interface supporting CSV and Excel files with auto-schema detection
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE (Streamlit) β
β Upload Data | Ask Questions | View Results β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ
β QUERY PROCESSING PIPELINE β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β’ Schema Analysis β’ AI Query Generation β
β β’ Query Validation β’ Security Checks β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ
β DATABASE & VISUALIZATION LAYER β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β’ SQLite Database β’ Plotly Visualizations β
β β’ Pandas Processing β’ AI Insights Generation β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | Technology |
|---|---|
| Frontend | Streamlit 1.28.1 |
| Backend | Python 3.9+ |
| Database | SQLite3 |
| Data Processing | Pandas 2.0+ |
| Visualizations | Plotly 5.0+ |
| AI/LLM | Google Gemini API |
| Query Validation | Custom SQL Parser |
- Python 3.9 or higher
- pip (Python package manager)
- Google Gemini API Key (get it from aistudio.google.com)
# 1. Clone the repository
git clone https://github.com/Sansii18/InsightGen.git
cd InsightGen
# 2. Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r InsightGen/requirements.txt
# 4. Setup environment variables
cd InsightGen
echo "GEMINI_API_KEY=your_api_key_here" > .env
# 5. Initialize database
python setup_db.py
# 6. Run the application
streamlit run app.pyOpen your browser to http://localhost:8501
- Open the Dashboard at
http://localhost:8501 - Enter Your Question in natural language
- View Results with automatic visualizations
- Upload Your Data to analyze custom datasets
- Ask Follow-ups with full context awareness
β’ Category wise total sales
β’ Top 5 products by revenue
β’ Monthly sales trend
β’ Average order value by category
β’ Revenue distribution across products
β’ Number of orders per month
β’ Best performing category
β’ Sales growth year over year
InsightGen/
βββ app.py # Main application
βββ setup_db.py # Database setup
βββ requirements.txt # Dependencies
βββ .env # Environment variables
βββ sales.db # SQLite database
βββ screenshots/ # Demo screenshots
β βββ Screenshot 2026-04-02 at 3.50.16 PM.png
β βββ Screenshot 2026-04-02 at 3.52.38 PM.png
β βββ Screenshot 2026-04-02 at 3.53.31 PM.png
β βββ Screenshot 2026-04-02 at 3.55.10 PM.png
β βββ Screenshot 2026-04-02 at 3.55.36 PM.png
β
βββ chart_utils.py # Visualization utilities
βββ schema_utils.py # Schema analysis
βββ memory_utils.py # Conversation memory
βββ sql_validator.py # Query validation
βββ README.md # Documentation
InsightGen implements multiple security layers:
- β Only SELECT queries are executed
- β Blocks dangerous operations: INSERT, UPDATE, DELETE, DROP, ALTER, TRUNCATE
- SQL injection prevention through parameterized queries
- Safe error handling without exposing database details
- User input validation and sanitization
- Business Intelligence - Quick ad-hoc analysis without data teams
- Data Exploration - Exploratory data analysis (EDA) and pattern discovery
- Decision Making - Quick data-driven insights for strategic decisions
- Education - Learning SQL through AI-generated examples
# Required
GEMINI_API_KEY=your_api_key_here
# Optional
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_HEADLESS=false- Visit aistudio.google.com
- Click "Get API Key"
- Create new API key
- Copy and paste into
.envfile
| Issue | Solution |
|---|---|
| API key error | Ensure .env file exists with valid key |
| "No such table: sales" | Run python setup_db.py |
| Port in use | Change port: streamlit run app.py --server.port 8502 |
| Import errors | Run pip install -r requirements.txt --upgrade |
Query: "Show me total sales by category"
Result: Bar chart with category breakdown and AI insights
Query: "What's the sales trend over months?"
Result: Line chart showing sales progression with trend analysis
Query: "Which products are top performers?"
Result: Ranked list with visualizations and performance metrics
- Remembers previous queries and results
- Understands context for follow-up questions
- Enables multi-step analysis workflows
- Time series β Line charts
- Categories β Bar charts
- Distribution β Histograms
- Relationships β Scatter plots
- Automatic data type detection
- Relationship understanding
- Column availability tracking
- Data range analysis
Contributions welcome! Areas for improvement:
- Support for more data sources (PostgreSQL, MySQL, etc.)
- Multi-language support
- Advanced data cleaning features
- Export results to PDF/Excel
- Scheduled reports
- Real-time data federation
This project is open source. Please see LICENSE file for details.
Sanskar Sansii
- GitHub: @Sansii18
- Project: InsightGen Repository
If you find InsightGen useful:
- Give it a β on GitHub
- Share it with your network
- Report issues and suggest features
- Contribute improvements


