Skip to content

Latest commit

 

History

History
715 lines (588 loc) · 25.9 KB

File metadata and controls

715 lines (588 loc) · 25.9 KB
PolySensor Logo

PolySensor

AI-Powered Multi-Modal Content Analyzer


A full-stack web application that leverages advanced AI to analyze and extract insights from virtually any type of media content. Built with cutting-edge technologies including React, Flask, LangChain, and Google's Gemini model, PolySensor transforms unstructured data into actionable intelligence through an intuitive web interface.

🌟 What Makes PolySensor Unique?

🧠 Intelligent Content Understanding

  • Multi-Modal Analysis: Seamlessly processes documents, images, audio, and video files
  • Advanced RAG Architecture: Utilizes Retrieval-Augmented Generation with ChromaDB vector database for contextual chat responses
  • Universal Text Extraction: Capable of extracting and analyzing textual information from almost any format
  • Persistent Analysis Storage: Stores AI-generated analysis results in vector database for semantic search and retrieval

🔄 Powered by State-of-the-Art AI

  • Google Gemini Integration: Harnesses the power of one of the most advanced LLMs available
  • LangChain Orchestration: Professional-grade AI workflow management
  • Smart Context Awareness: Understands content relationships and nuances through semantic search
  • Conversational Memory: Chat interface with RAG-powered responses using past analysis results

📊 Comprehensive Content Coverage

From research papers to multimedia presentations, PolySensor delivers deep analytical insights across:

  • Academic & Technical Documents
  • Business Reports & Presentations
  • Multimedia Content & Recordings
  • Visual Data & Infographics
  • Semantic Search: Access previous analyses through vector similarity search

graph TB
    subgraph "🌐 Frontend Layer"
        U[👤 User] --> FE[⚛️ React Frontend<br/>File Upload & Chat Interface]
        FE --> API[📡 /analyze0<br/>File Upload]
        FE --> CHAT[💬 /chat<br/>Conversation + History]
    end

    subgraph "🔧 Backend Processing Layer"
        API --> B[🔄 File Type Detection<br/>Flask API]

        B --> C{File Type?}
        C -->|Documents| D[📄 Unstructured<br/>Text Extraction]
        C -->|Images| E[🖼️ Base64 Encoding<br/>+ Validation]
        C -->|Audio| F[🎵 Base64 Encoding<br/>+ Validation]
        C -->|Video| G[🎬 Base64 Encoding<br/>+ Validation]

        D --> H[📝 Document Prompt<br/>LangChain Template]
        E --> I[🖼️ Image Prompt<br/>Direct to LLM]
        F --> J[🎵 Audio Prompt<br/>Direct to LLM]
        G --> K[🎬 Video Prompt<br/>Direct to LLM]

        H --> L[🧠 Google Gemini<br/>2.5 Pro]
        I --> L
        J --> L
        K --> L
    end

    subgraph "💾 Vector Database Layer"
        L --> M[📥 Store Analysis<br/>ChromaDB + Metadata]
        CHAT --> N[🔍 Semantic Search<br/>Vector Similarity]
        N --> O[🔄 RAG Context<br/>Augment Query]
        O --> L
    end

    subgraph "📤 Response Layer"
        L --> P[📈 Analysis Results<br/>JSON Response]
        P --> FE2[⚛️ Frontend Display<br/>Markdown Rendering]
        FE2 --> U2[👤 User Views<br/>Analysis & Export]

        L --> Q[💬 Chat Response<br/>RAG-Enhanced Answer]
        Q --> FE3[⚛️ Chat Display<br/>Typing Indicators]
        FE3 --> U3[👤 User Continues<br/>Conversation]
    end

    %% Styling with better contrast
    classDef frontend fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000000
    classDef backend fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px,color:#000000
    classDef processing fill:#fff3e0,stroke:#ef6c00,stroke-width:2px,color:#000000
    classDef ai fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#000000
    classDef vector fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000000
    classDef response fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000

    class U,FE,API,CHAT,FE2,FE3,U2,U3 frontend
    class B,C,D,E,F,G backend
    class H,I,J,K processing
    class L ai
    class M,N,O vector
    class P,Q response
Loading

🌟 Features

Frontend Features

  • Intuitive Drag-and-Drop Interface: Seamless file upload with real-time validation and preview
  • Responsive Design: Modern React-based UI that works across devices and screen sizes
  • Real-Time Processing Feedback: Loading indicators and error handling for smooth user experience
  • Rich Markdown Rendering: Beautiful display of AI analysis results using markdown-to-jsx
  • Conversational Chat Interface: RAG-powered chat with past analysis context and typing indicators
  • One-Click PDF Export: Export analysis results to PDF using backend ReportLab generation with frontend fallback using html2canvas and jsPDF

Backend Features

  • Flask API Server: Multiple RESTful endpoints (/analyze0 for file analysis, /chat for RAG conversations, /export-pdf for reports, vector DB endpoints) with CORS enabled for frontend communication
  • File Type Detection: Extension-based routing for documents, images, audio, and video files
  • Document Processing: Uses Unstructured library to extract and convert content from 40+ document formats into JSON for AI analysis
  • Media Validation: Length limits (1 minute for audio, 30 seconds for video) with premium upgrade messaging for longer content
  • AI Integration: Google Gemini 2.5 Pro model with LangChain orchestration using specialized prompts for each media type
  • Vector Database: ChromaDB with Google embeddings for persistent storage and semantic search of analysis results
  • Temporary Storage: Secure file handling with automatic cleanup of uploaded files and temp directories
  • Error Handling: JSON responses for file validation errors, processing failures, and unsupported formats
  • Content Analysis: Structured analysis output in table format for all media types with pattern detection and insights

📸 Screenshots

Demo

PolySensor Main Interface

Home Page of PolySensor

PolySensor Chatting

Chatting with Polysensor

PolySensor File Choose

Drag-and-drop or file upload

PolySensor Analysis

Analysis Results of choosen file

PolySensor Analysis Export

Exporting anlysis reprot as PDF

PolySensor Exported PDF

Exported PDF

PolySensor Analysis Results

Raw File Content

Presentation File (.ppt)

Analysis Results Display

PolySensor analysis results.

📋 Supported File Formats (~108+)

Image Files

Extension(s) Description
.jpg, .jpeg JPEG images
.png Portable Network Graphics
.gif Graphics Interchange Format
.webp WebP image
.heic High Efficiency Image Format
.tif, .tiff Tagged Image File Format
.bmp Bitmap Image File
.svg Scalable Vector Graphics
.ico Icon Image File
.avif AV1 Image File Format
.raw, .cr2, .nef, .arw, .orf, .rw2 Camera RAW Image Formats

Audio Files

Extension(s) Description
.mp3 MPEG Audio Layer III
.wav Waveform Audio File Format
.flac Free Lossless Audio Codec
.aiff, aif Audio Interchange File Format
.ogg, oga Ogg Vorbis Audio File
.m4a MPEG-4 Audio Format
.aac Advanced Audio Coding File
.opus Opus Audio File

Video Files

Extension(s) Description
.mp4 MPEG-4 Part 14 Video
.mkv Matroska Video File
.webm WebM Video File
.mov Apple QuickTime Movie
.avi Audio Video Interleave
.m4v MPEG-4 Video File
.wmv Windows Media Video
.flv Flash Video Format

Documents and Text Files

Extension(s) Description
.pdf Portable Document Format
.docx Microsoft Word Open XML Document
.doc Microsoft Word Document (older format)
.txt Plain Text File
.odt OpenDocument Text File
.rtf Rich Text Format
.md Markdown Documentation
.epub Electronic Publication
.hwp Hangul Word Processor File
.abw, .zabw AbiWord Document
.org Lotus Organizer Data File or Data Analysis File
.rst reStructuredText File
.tex LaTeX Document Source File
.csv Comma-Separated Values File
.json JavaScript Object Notation File
.yaml, yml YAML Data File
.log Log Text File
.ini, cfg, .conf Configuration File
.xps XML Paper Specification File

Spreadsheets

Extension(s) Description
.xlsx Microsoft Excel Open XML Spreadsheet
.xls Microsoft Excel Spreadsheet (older format)
.csv Comma-Separated Values File
.tsv Tab-Separated Values File
.fods OpenDocument Flat XML Spreadsheet
.dif Data Interchange Format File
.dbf dBase Database File
.et E-Text Spreadsheet
.numbers Apple Numbers Spreadsheet

Presentations

Extension(s) Description
.pptx Microsoft PowerPoint Open XML Presentation
.ppt Microsoft PowerPoint Presentation (older format)
.pptm Microsoft PowerPoint Macro-Enabled Presentation
.pot, .potx Microsoft PowerPoint Template
.odp OpenDocument Presentation
.key Apple Keynote Presentation

Email Files

Extension(s) Description
.msg Microsoft Outlook Message
.eml Electronic Mail File
.p7s PKCS #7 Signature File Format
.mbox Mailbox Storage File

Code & Development Files

Extension(s) Description
.py Python Source Code
.js JavaScript File
.ts TypeScript File
.java Java Source File
.cpp, .cc, .cxx C++ Source Code
.c C Language Source Code
.h, hpp Header File
.html, htm Hypertext Markup Language File
.css Cascading Style Sheets File
.php PHP Source File
.json JSON Data File
.xml XML Markup File
.yaml, .yml YAML Data File
.ipynb Jupyter Notebook
.r, .Rmd R Script or R Markdown File
.go Go Language Source File
.rb Ruby Source Code
.sh, .bash Shell Script
.bat, .cmd Batch Script
.swift Swift Source File
.kt, .kts Kotlin Script
.pl Perl Script
.lua Lua Script
.sql SQL Database Query File

Other File Types

Extension(s) Description
.xml Extensible Markup Language File
.html, .htm Hypertext Markup Language File
.md Markdown Documentation
.cwk AppleWorks Document
.mcw Microchip MPLAB Workspace
.prn Print to File
.eth Ethnograph Data File
.pbd PowerBuilder Document
.sdp Session Description Protocol File
.mw MATLAB Workspace File
.sxg Signed Exchange File
.zip, .rar, .7z, .tar, .gz Compressed Archive Files
.exe, .app Executable Files (sometimes analyzed in sandboxed mode)
.apk Android Application Package
.blend Blender 3D Project File
.fbx, .obj, .stl, .glb, .gltf 3D Model Files

🚀 Quick Start

Prerequisites

  • Python 3.11 (Recommended for Unstructured) or can use higher version 3.11+
  • Node.js 16+ and npm
  • Google Gemini API key
  • Tesseract OCR (for image/text extraction)

Installation

  1. Clone the repository
git clone https://github.com/adityasinghcoding/PolySensor.git
cd PolySensor
  1. Backend Setup
# Install Python dependencies
pip install -r requirements.txt

# Install Tesseract OCR
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# Mac: brew install tesseract
# Linux: sudo apt-get install tesseract-ocr

# Set up environment variables
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
  1. Frontend Setup
# Navigate to frontend directory
cd frontend

# Install Node.js dependencies
npm install

# Return to root directory
cd ..

Usage

  1. Start the Backend Server
python main.py

The Flask server will start on http://localhost:5000

  1. Start the Frontend (in a new terminal)
cd frontend
npm run dev

The React app will be available at http://localhost:5173

  1. Access the Application Open your browser and navigate to http://localhost:5173 to use the web interface. Upload files through the drag-and-drop interface and receive AI-powered analysis results.

🐳 Docker Setup (Recommended)

For easier development and deployment, PolySensor supports Docker. This ensures consistent environments and handles system dependencies automatically.

Prerequisites

  • Docker and Docker Compose installed on your system
  • Google Gemini API key

Quick Start with Docker

  1. Clone the repository
git clone https://github.com/adityasinghcoding/PolySensor.git
cd PolySensor
  1. Set up environment variables
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
  1. Run the application
docker-compose up --build
  1. Access the Application
  • Frontend: http://localhost:5173
  • Backend API: http://localhost:5000

Docker Commands

  • Start services: docker-compose up
  • Build and start: docker-compose up --build
  • Stop services: docker-compose down
  • View logs: docker-compose logs -f [service_name]
  • Rebuild specific service: docker-compose up --build [service_name]

Production Deployment

  • Backend: The render.yaml is configured for Docker deployment on Render
  • Frontend: Deploy to Vercel as usual (static hosting)

🏗️ Project Structure

PolySensor/
├── main.py                    # Flask backend entry point with RAG chat endpoints
├── data_handling.py           # Content extraction and validation utilities
├── prompts.py                 # AI prompt templates for different content types
├── vector_db.py               # ChromaDB vector database management
├── api_endpoints.py           # Additional vector database API endpoints
├── test_vector_db.py          # Vector database testing utilities
├── requirements.txt           # Python dependencies
├── .env                       # Environment variables (API keys)
├── .gitignore                 # Git ignore rules
├── README.md                  # Project README
├── docker-compose.yml         # Docker Compose configuration
├── Dockerfile                 # Backend Docker configuration
├── .dockerignore              # Docker ignore rules
├── package.json               # Root package.json
├── package-lock.json          # Root package-lock.json
├── vite.config.js             # Vite build configuration
├── chroma_db/                 # ChromaDB persistent storage
├── Docs/                      # Documentation and diagrams
├── assets/                    # Project assets and screenshots
├── md/                        # Markdown files
├── utilities/                 # Additional tools and executables
└── frontend/                  # React frontend application
    ├── public/
    │   ├── index.html         # Main HTML template
    │   ├── favicon.ico
    │   ├── manifest.json      # PWA manifest
    │   └── robots.txt
    ├── src/
    │   ├── components/           # Reusable React components
    │   │   ├── AnalysisResults/  # Displays analysis output with markdown support
    │   │   │   ├── AnalysisResults.jsx
    │   │   │   ├── AnalysisResults.css
    │   │   │   └── index.js
    │   │   ├── FileUploader/     # Drag-and-drop file upload interface
    │   │   │   ├── FileUploader.jsx
    │   │   │   ├── FileUploader.css
    │   │   │   └── index.js
    │   │   ├── Loading/          # Loading spinner component
    │   │   │   ├── Loading.jsx
    │   │   │   ├── Loading.css
    │   │   │   └── index.js
    │   │   ├── InputArea.jsx     # Chat input component
    │   │   ├── InputArea.css     # Chat input styles
    │   │   ├── TypingIndicator.jsx # Animated typing indicator
    │   │   └── TypingIndicator.css # Typing indicator styles
    │   ├── pages/
    │   │   └── AnalyzePage.jsx   # Main analysis page
    │   ├── utils/
    │   │   └── apiService.js     # API communication utilities
    │   ├── App.jsx               # Main React application component
    │   ├── App.css               # Global styles
    │   ├── main.jsx              # React application entry point
    │   ├── index.css             # Base styles
    │   └── assets/               # Frontend assets
    ├── package.json              # Node.js dependencies and scripts
    ├── Dockerfile                # Frontend Docker configuration
    ├── nginx.conf                # Nginx configuration for production
    └── vercel.json               # Vercel deployment configuration

Core Modules

Backend:

  • main.py: Flask API server with CORS support, handles file uploads, AI processing, and RAG chat
  • data_handling.py: Content extraction functions for documents, images, audio, and video
  • prompts.py: Specialized prompts for different media types optimized for Gemini AI
  • vector_db.py: ChromaDB vector database management for persistent analysis storage and semantic search
  • api_endpoints.py: Additional vector database API endpoints for search and management

Frontend:

  • App.jsx: Main application router and state management with analysis history
  • AnalyzePage.jsx: Core analysis interface with file upload, results display, and chat functionality
  • FileUploader.jsx: Drag-and-drop file upload component with validation
  • AnalysisResults.jsx: Markdown rendering component with PDF export functionality
  • InputArea.jsx: Chat input component with typing indicators
  • TypingIndicator.jsx: Animated typing indicator for chat responses
  • apiService.js: Axios-based API client for backend communication (analyzeFile, analyzeText, exportPDF)

🔧 Configuration

API Keys

Get your Google Gemini API key from Google AI Studio and add it to your .env file:

GOOGLE_API_KEY=your_actual_api_key_here

Customizing Analysis

You can modify the analysis prompts in prompts.py to tailor the output to your specific needs:

# Example custom prompt
CUSTOM_ANALYSIS = '''
Analyze this content and focus on technical details:

Content: {content_data} // Place holder which contains the function output

Please provide:
1. Technical specifications
2. Implementation details
3. Potential improvements
'''

💡 Examples

Document Analysis

# Input: research_paper.pdf
# Output: Summary of key findings, methodology, and conclusions

Image Analysis

# Input: diagram.png  
# Output: Extracted text + analysis of visual content and structure

Video Analysis

# Input: presentation.mp4
# Output: Combined analysis of slide content and spoken presentation

🛠️ Development

Adding New File Types

  1. Add file extension detection in main.py:
if file_path.lower().endswith(('.new_extension')):
    new_data = new_extraction_function(file_path)
  1. Create extraction function in text_extractor.py:
def new_extraction_function(file_path):
    # Implement extraction logic
    return extracted_content
  1. Add prompt template in prompts.py:
NEW_TYPE = '''
Your analysis prompt for new file type...
'''

Running Tests

# Add tests to the repository and run with:
python -m pytest tests/

📊 Output Examples

The AI provides structured analysis including:

  • Key Points: Main takeaways from the content
  • Summary: Concise overview of the material
  • Actionable Insights: Practical recommendations
  • Ambiguity Detection: Identification of unclear sections

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

🐛 Troubleshooting

Common Issues

"File does not exist" error

  • Check file path spelling
  • Use absolute paths if needed

OCR not working

  • Verify Tesseract installation
  • Check image quality and resolution

API key errors

  • Ensure .env file is in the project root
  • Verify the API key has sufficient permissions

Audio/video processing slow

  • Large files may take time to process
  • Consider shorter intervals for video analysis

Getting Help

  • Check existing Issues for similar problems
  • Create a new issue with detailed error messages and file examples

👥 Credits & Acknowledgments

🧑‍💻 Project Creator

Aditya Singh

  • Artificial Intelligence Engineer
  • Project Lead & Full-Stack Developer
  • AI/ML Integration & Architecture Design
  • Multi-modal Content Processing Specialist

🛠️ Technologies & Libraries

This project stands on the shoulders of these amazing open-source technologies:

Technology Purpose Credit
React Frontend Framework Meta
Vite Build Tool & Dev Server Vite
Flask Backend Web Framework Pallets
Flask-CORS Cross-Origin Resource Sharing Flask-CORS
Google Gemini AI Language Model Google AI
LangChain LLM Orchestration LangChain AI
ChromaDB Vector Database for RAG Chroma
Sentence Transformers Embeddings Generation Hugging Face
Unstructured Document Processing Unstructured IO
MoviePy Video Processing Zulko
Mutagen Audio Metadata Handling QuodLibet
ReportLab PDF Generation ReportLab
BeautifulSoup HTML Parsing BeautifulSoup
Markdown Markdown Processing Python Markdown
Python-dotenv Environment Variables Dotenv
Axios HTTP Client Axios
html2canvas HTML to Canvas Niklas von Hertzen
jsPDF PDF Generation Parallax
markdown-to-jsx Markdown Rendering ProbablyUp
Docker Containerization Docker
Nginx Reverse Proxy & Load Balancer Nginx
Vercel Frontend Deployment Vercel

🙏 Special Thanks

  • Open Source Community for invaluable tools and libraries
  • Google Gemini Team for powerful AI capabilities
  • All Future Contributors who will test and improve PolySensor

Built with ❤️ by Aditya Singh

GitHub LinkedIn Portfolio

⚠️ Terms of Usage:

This tool is designed for content analysis and should be used in compliance with copyright laws and content usage rights. Always ensure you have permission to analyze and process the files you use with this system.