Skip to content

Sansii18/InsightGen

Repository files navigation

πŸš€ InsightGen – GenAI Analytics Dashboard

Transform Natural Language into Data Insights

InsightGen is an intelligent analytics dashboard that empowers users to explore and analyze structured datasets using plain English queries. Powered by Google's Gemini AI, it automatically converts natural language questions into SQL queries, executes them safely, and presents results through interactive visualizations and AI-generated insights.

Stop writing SQL. Start asking questions.


✨ Key Features

  • 🧠 Natural Language Queries - Ask questions in plain English instead of writing SQL
  • πŸ” AI-Powered SQL Generation - Gemini AI automatically generates optimal SQL queries
  • πŸ›‘οΈ Secure Execution - Strict query validation prevents harmful operations (only SELECT queries allowed)
  • πŸ“‚ CSV & Excel Support - Upload custom datasets and analyze them instantly
  • πŸ“Š Auto Schema Detection - Automatically detects and understands your data structure
  • πŸ“ˆ Interactive Visualizations - Beautiful Plotly charts that adapt to your data
  • 🎯 Smart Chart Selection - AI chooses the best visualization for your data
  • πŸ“ Query Explanations - Understand exactly what SQL was generated
  • πŸ’‘ AI Insights - Get analytical interpretations of your query results
  • πŸ”„ Conversation Memory - Ask follow-up questions with full context
  • πŸ–₯️ Responsive Dashboard - Modern, user-friendly Streamlit interface

πŸ“Έ Screenshots & Demo

Dashboard Overview - Main Interface

InsightGen Dashboard Main dashboard interface with natural language query input area, sample data loaded, and navigation sidebar

Query Results with Visualizations

Query Results

Query Results

Query Results

Query Results Automatic chart generation showing query results with interactive Plotly visualizations

Data Upload Feature

Data Upload Easy dataset upload interface supporting CSV and Excel files with auto-schema detection


πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    USER INTERFACE (Streamlit)               β”‚
β”‚              Upload Data | Ask Questions | View Results     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              QUERY PROCESSING PIPELINE                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ Schema Analysis        β€’ AI Query Generation             β”‚
β”‚  β€’ Query Validation       β€’ Security Checks                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           DATABASE & VISUALIZATION LAYER                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β€’ SQLite Database        β€’ Plotly Visualizations          β”‚
β”‚  β€’ Pandas Processing      β€’ AI Insights Generation         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🧰 Technology Stack

Component Technology
Frontend Streamlit 1.28.1
Backend Python 3.9+
Database SQLite3
Data Processing Pandas 2.0+
Visualizations Plotly 5.0+
AI/LLM Google Gemini API
Query Validation Custom SQL Parser

πŸ› οΈ Installation Guide

Prerequisites

  • Python 3.9 or higher
  • pip (Python package manager)
  • Google Gemini API Key (get it from aistudio.google.com)

Quick Start

# 1. Clone the repository
git clone https://github.com/Sansii18/InsightGen.git
cd InsightGen

# 2. Create virtual environment
python3 -m venv .venv
source .venv/bin/activate    # macOS/Linux
# .venv\Scripts\activate      # Windows

# 3. Install dependencies
pip install -r InsightGen/requirements.txt

# 4. Setup environment variables
cd InsightGen
echo "GEMINI_API_KEY=your_api_key_here" > .env

# 5. Initialize database
python setup_db.py

# 6. Run the application
streamlit run app.py

Open your browser to http://localhost:8501


πŸ“– How to Use

Basic Workflow

  1. Open the Dashboard at http://localhost:8501
  2. Enter Your Question in natural language
  3. View Results with automatic visualizations
  4. Upload Your Data to analyze custom datasets
  5. Ask Follow-ups with full context awareness

Example Queries

β€’ Category wise total sales
β€’ Top 5 products by revenue
β€’ Monthly sales trend
β€’ Average order value by category
β€’ Revenue distribution across products
β€’ Number of orders per month
β€’ Best performing category
β€’ Sales growth year over year

πŸ“ Project Structure

InsightGen/
β”œβ”€β”€ app.py                      # Main application
β”œβ”€β”€ setup_db.py                # Database setup
β”œβ”€β”€ requirements.txt           # Dependencies
β”œβ”€β”€ .env                       # Environment variables
β”œβ”€β”€ sales.db                   # SQLite database
β”œβ”€β”€ screenshots/               # Demo screenshots
β”‚   β”œβ”€β”€ Screenshot 2026-04-02 at 3.50.16 PM.png
β”‚   β”œβ”€β”€ Screenshot 2026-04-02 at 3.52.38 PM.png
β”‚   β”œβ”€β”€ Screenshot 2026-04-02 at 3.53.31 PM.png
β”‚   β”œβ”€β”€ Screenshot 2026-04-02 at 3.55.10 PM.png
β”‚   └── Screenshot 2026-04-02 at 3.55.36 PM.png
β”‚
β”œβ”€β”€ chart_utils.py            # Visualization utilities
β”œβ”€β”€ schema_utils.py           # Schema analysis
β”œβ”€β”€ memory_utils.py           # Conversation memory
β”œβ”€β”€ sql_validator.py          # Query validation
└── README.md                 # Documentation

πŸ”’ Security & Safety

InsightGen implements multiple security layers:

Query Validation

  • βœ… Only SELECT queries are executed
  • ❌ Blocks dangerous operations: INSERT, UPDATE, DELETE, DROP, ALTER, TRUNCATE

Additional Protections

  • SQL injection prevention through parameterized queries
  • Safe error handling without exposing database details
  • User input validation and sanitization

πŸ’‘ Example Use Cases

  • Business Intelligence - Quick ad-hoc analysis without data teams
  • Data Exploration - Exploratory data analysis (EDA) and pattern discovery
  • Decision Making - Quick data-driven insights for strategic decisions
  • Education - Learning SQL through AI-generated examples

βš™οΈ Configuration

Environment Variables

# Required
GEMINI_API_KEY=your_api_key_here

# Optional
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_HEADLESS=false

Getting Gemini API Key

  1. Visit aistudio.google.com
  2. Click "Get API Key"
  3. Create new API key
  4. Copy and paste into .env file

πŸ› Troubleshooting

Issue Solution
API key error Ensure .env file exists with valid key
"No such table: sales" Run python setup_db.py
Port in use Change port: streamlit run app.py --server.port 8502
Import errors Run pip install -r requirements.txt --upgrade

πŸ“Š Sample Queries & Results

Query: "Show me total sales by category"
Result: Bar chart with category breakdown and AI insights

Query: "What's the sales trend over months?"
Result: Line chart showing sales progression with trend analysis

Query: "Which products are top performers?"
Result: Ranked list with visualizations and performance metrics

πŸš€ Advanced Features

Conversation Memory

  • Remembers previous queries and results
  • Understands context for follow-up questions
  • Enables multi-step analysis workflows

Auto-Chart Selection

  • Time series β†’ Line charts
  • Categories β†’ Bar charts
  • Distribution β†’ Histograms
  • Relationships β†’ Scatter plots

Schema Intelligence

  • Automatic data type detection
  • Relationship understanding
  • Column availability tracking
  • Data range analysis

🀝 Contributing

Contributions welcome! Areas for improvement:

  • Support for more data sources (PostgreSQL, MySQL, etc.)
  • Multi-language support
  • Advanced data cleaning features
  • Export results to PDF/Excel
  • Scheduled reports
  • Real-time data federation

πŸ“œ License

This project is open source. Please see LICENSE file for details.


πŸ‘€ Author

Sanskar Sansii


⭐ Support

If you find InsightGen useful:

  • Give it a ⭐ on GitHub
  • Share it with your network
  • Report issues and suggest features
  • Contribute improvements

πŸ”— Useful Links


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages