PolySensor

AI-Powered Multi-Modal Content Analyzer

A full-stack web application that leverages advanced AI to analyze and extract insights from virtually any type of media content. Built with cutting-edge technologies including React, Flask, LangChain, and Google's Gemini model, PolySensor transforms unstructured data into actionable intelligence through an intuitive web interface.

🌟 What Makes PolySensor Unique?

🧠 Intelligent Content Understanding

Multi-Modal Analysis: Seamlessly processes documents, images, audio, and video files
Advanced RAG Architecture: Utilizes Retrieval-Augmented Generation with ChromaDB vector database for contextual chat responses
Universal Text Extraction: Capable of extracting and analyzing textual information from almost any format
Persistent Analysis Storage: Stores AI-generated analysis results in vector database for semantic search and retrieval

🔄 Powered by State-of-the-Art AI

Google Gemini Integration: Harnesses the power of one of the most advanced LLMs available
LangChain Orchestration: Professional-grade AI workflow management
Smart Context Awareness: Understands content relationships and nuances through semantic search
Conversational Memory: Chat interface with RAG-powered responses using past analysis results

📊 Comprehensive Content Coverage

From research papers to multimedia presentations, PolySensor delivers deep analytical insights across:

Academic & Technical Documents
Business Reports & Presentations
Multimedia Content & Recordings
Visual Data & Infographics
Semantic Search: Access previous analyses through vector similarity search

graph TB
    subgraph "🌐 Frontend Layer"
        U[👤 User] --> FE[⚛️ React Frontend<br/>File Upload & Chat Interface]
        FE --> API[📡 /analyze0<br/>File Upload]
        FE --> CHAT[💬 /chat<br/>Conversation + History]
    end

    subgraph "🔧 Backend Processing Layer"
        API --> B[🔄 File Type Detection<br/>Flask API]

        B --> C{File Type?}
        C -->|Documents| D[📄 Unstructured<br/>Text Extraction]
        C -->|Images| E[🖼️ Base64 Encoding<br/>+ Validation]
        C -->|Audio| F[🎵 Base64 Encoding<br/>+ Validation]
        C -->|Video| G[🎬 Base64 Encoding<br/>+ Validation]

        D --> H[📝 Document Prompt<br/>LangChain Template]
        E --> I[🖼️ Image Prompt<br/>Direct to LLM]
        F --> J[🎵 Audio Prompt<br/>Direct to LLM]
        G --> K[🎬 Video Prompt<br/>Direct to LLM]

        H --> L[🧠 Google Gemini<br/>2.5 Pro]
        I --> L
        J --> L
        K --> L
    end

    subgraph "💾 Vector Database Layer"
        L --> M[📥 Store Analysis<br/>ChromaDB + Metadata]
        CHAT --> N[🔍 Semantic Search<br/>Vector Similarity]
        N --> O[🔄 RAG Context<br/>Augment Query]
        O --> L
    end

    subgraph "📤 Response Layer"
        L --> P[📈 Analysis Results<br/>JSON Response]
        P --> FE2[⚛️ Frontend Display<br/>Markdown Rendering]
        FE2 --> U2[👤 User Views<br/>Analysis & Export]

        L --> Q[💬 Chat Response<br/>RAG-Enhanced Answer]
        Q --> FE3[⚛️ Chat Display<br/>Typing Indicators]
        FE3 --> U3[👤 User Continues<br/>Conversation]
    end

    %% Styling with better contrast
    classDef frontend fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000000
    classDef backend fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px,color:#000000
    classDef processing fill:#fff3e0,stroke:#ef6c00,stroke-width:2px,color:#000000
    classDef ai fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#000000
    classDef vector fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000000
    classDef response fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000000

    class U,FE,API,CHAT,FE2,FE3,U2,U3 frontend
    class B,C,D,E,F,G backend
    class H,I,J,K processing
    class L ai
    class M,N,O vector
    class P,Q response

🌟 Features

Frontend Features

Intuitive Drag-and-Drop Interface: Seamless file upload with real-time validation and preview
Responsive Design: Modern React-based UI that works across devices and screen sizes
Real-Time Processing Feedback: Loading indicators and error handling for smooth user experience
Rich Markdown Rendering: Beautiful display of AI analysis results using markdown-to-jsx
Conversational Chat Interface: RAG-powered chat with past analysis context and typing indicators
One-Click PDF Export: Export analysis results to PDF using backend ReportLab generation with frontend fallback using html2canvas and jsPDF

Backend Features

Flask API Server: Multiple RESTful endpoints (/analyze0 for file analysis, /chat for RAG conversations, /export-pdf for reports, vector DB endpoints) with CORS enabled for frontend communication
File Type Detection: Extension-based routing for documents, images, audio, and video files
Document Processing: Uses Unstructured library to extract and convert content from 40+ document formats into JSON for AI analysis
Media Validation: Length limits (1 minute for audio, 30 seconds for video) with premium upgrade messaging for longer content
AI Integration: Google Gemini 2.5 Pro model with LangChain orchestration using specialized prompts for each media type
Vector Database: ChromaDB with Google embeddings for persistent storage and semantic search of analysis results
Temporary Storage: Secure file handling with automatic cleanup of uploaded files and temp directories
Error Handling: JSON responses for file validation errors, processing failures, and unsupported formats
Content Analysis: Structured analysis output in table format for all media types with pattern detection and insights

📸 Screenshots

Demo

Home Page of PolySensor

Chatting with Polysensor

Drag-and-drop or file upload

Analysis Results of choosen file

Exporting anlysis reprot as PDF

Exported PDF

PolySensor Analysis Results

Presentation File (.ppt)

PolySensor analysis results.

📋 Supported File Formats (~108+)

Image Files

Extension(s)	Description
`.jpg`, `.jpeg`	JPEG images
`.png`	Portable Network Graphics
`.gif`	Graphics Interchange Format
`.webp`	WebP image
`.heic`	High Efficiency Image Format
`.tif`, `.tiff`	Tagged Image File Format
`.bmp`	Bitmap Image File
`.svg`	Scalable Vector Graphics
`.ico`	Icon Image File
`.avif`	AV1 Image File Format
`.raw`, `.cr2`, `.nef`, `.arw`, `.orf`, `.rw2`	Camera RAW Image Formats

Audio Files

Extension(s)	Description
`.mp3`	MPEG Audio Layer III
`.wav`	Waveform Audio File Format
`.flac`	Free Lossless Audio Codec
`.aiff`, `aif`	Audio Interchange File Format
`.ogg`, `oga`	Ogg Vorbis Audio File
`.m4a`	MPEG-4 Audio Format
`.aac`	Advanced Audio Coding File
`.opus`	Opus Audio File

Video Files

Extension(s)	Description
`.mp4`	MPEG-4 Part 14 Video
`.mkv`	Matroska Video File
`.webm`	WebM Video File
`.mov`	Apple QuickTime Movie
`.avi`	Audio Video Interleave
`.m4v`	MPEG-4 Video File
`.wmv`	Windows Media Video
`.flv`	Flash Video Format

Documents and Text Files

Extension(s)	Description
`.pdf`	Portable Document Format
`.docx`	Microsoft Word Open XML Document
`.doc`	Microsoft Word Document (older format)
`.txt`	Plain Text File
`.odt`	OpenDocument Text File
`.rtf`	Rich Text Format
`.md`	Markdown Documentation
`.epub`	Electronic Publication
`.hwp`	Hangul Word Processor File
`.abw`, `.zabw`	AbiWord Document
`.org`	Lotus Organizer Data File or Data Analysis File
`.rst`	reStructuredText File
`.tex`	LaTeX Document Source File
`.csv`	Comma-Separated Values File
`.json`	JavaScript Object Notation File
`.yaml`, `yml`	YAML Data File
`.log`	Log Text File
`.ini`, `cfg`, `.conf`	Configuration File
`.xps`	XML Paper Specification File

Spreadsheets

Extension(s)	Description
`.xlsx`	Microsoft Excel Open XML Spreadsheet
`.xls`	Microsoft Excel Spreadsheet (older format)
`.csv`	Comma-Separated Values File
`.tsv`	Tab-Separated Values File
`.fods`	OpenDocument Flat XML Spreadsheet
`.dif`	Data Interchange Format File
`.dbf`	dBase Database File
`.et`	E-Text Spreadsheet
`.numbers`	Apple Numbers Spreadsheet

Presentations

Extension(s)	Description
`.pptx`	Microsoft PowerPoint Open XML Presentation
`.ppt`	Microsoft PowerPoint Presentation (older format)
`.pptm`	Microsoft PowerPoint Macro-Enabled Presentation
`.pot`, `.potx`	Microsoft PowerPoint Template
`.odp`	OpenDocument Presentation
`.key`	Apple Keynote Presentation

Email Files

Extension(s)	Description
`.msg`	Microsoft Outlook Message
`.eml`	Electronic Mail File
`.p7s`	PKCS #7 Signature File Format
`.mbox`	Mailbox Storage File

Code & Development Files

Extension(s)	Description
`.py`	Python Source Code
`.js`	JavaScript File
`.ts`	TypeScript File
`.java`	Java Source File
`.cpp`, `.cc`, `.cxx`	C++ Source Code
`.c`	C Language Source Code
`.h`, `hpp`	Header File
`.html`, `htm`	Hypertext Markup Language File
`.css`	Cascading Style Sheets File
`.php`	PHP Source File
`.json`	JSON Data File
`.xml`	XML Markup File
`.yaml`, `.yml`	YAML Data File
`.ipynb`	Jupyter Notebook
`.r`, `.Rmd`	R Script or R Markdown File
`.go`	Go Language Source File
`.rb`	Ruby Source Code
`.sh`, `.bash`	Shell Script
`.bat`, `.cmd`	Batch Script
`.swift`	Swift Source File
`.kt`, `.kts`	Kotlin Script
`.pl`	Perl Script
`.lua`	Lua Script
`.sql`	SQL Database Query File

Other File Types

Extension(s)	Description
`.xml`	Extensible Markup Language File
`.html`, `.htm`	Hypertext Markup Language File
`.md`	Markdown Documentation
`.cwk`	AppleWorks Document
`.mcw`	Microchip MPLAB Workspace
`.prn`	Print to File
`.eth`	Ethnograph Data File
`.pbd`	PowerBuilder Document
`.sdp`	Session Description Protocol File
`.mw`	MATLAB Workspace File
`.sxg`	Signed Exchange File
`.zip`, `.rar`, `.7z`, `.tar`, `.gz`	Compressed Archive Files
`.exe`, `.app`	Executable Files (sometimes analyzed in sandboxed mode)
`.apk`	Android Application Package
`.blend`	Blender 3D Project File
`.fbx`, `.obj`, `.stl`, `.glb`, `.gltf`	3D Model Files

🚀 Quick Start

Prerequisites

Python 3.11 (Recommended for Unstructured) or can use higher version 3.11+
Node.js 16+ and npm
Google Gemini API key
Tesseract OCR (for image/text extraction)

Installation

Clone the repository

git clone https://github.com/adityasinghcoding/PolySensor.git
cd PolySensor

Backend Setup

# Install Python dependencies
pip install -r requirements.txt

# Install Tesseract OCR
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# Mac: brew install tesseract
# Linux: sudo apt-get install tesseract-ocr

# Set up environment variables
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here

Frontend Setup

# Navigate to frontend directory
cd frontend

# Install Node.js dependencies
npm install

# Return to root directory
cd ..

Usage

Start the Backend Server

python main.py

The Flask server will start on http://localhost:5000

Start the Frontend (in a new terminal)

cd frontend
npm run dev

The React app will be available at http://localhost:5173

Access the Application Open your browser and navigate to http://localhost:5173 to use the web interface. Upload files through the drag-and-drop interface and receive AI-powered analysis results.

🐳 Docker Setup (Recommended)

For easier development and deployment, PolySensor supports Docker. This ensures consistent environments and handles system dependencies automatically.

Prerequisites

Docker and Docker Compose installed on your system
Google Gemini API key

Quick Start with Docker

Clone the repository

git clone https://github.com/adityasinghcoding/PolySensor.git
cd PolySensor

Set up environment variables

cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here

Run the application

docker-compose up --build

Access the Application

Frontend: http://localhost:5173
Backend API: http://localhost:5000

Docker Commands

Start services: docker-compose up
Build and start: docker-compose up --build
Stop services: docker-compose down
View logs: docker-compose logs -f [service_name]
Rebuild specific service: docker-compose up --build [service_name]

Production Deployment

Backend: The render.yaml is configured for Docker deployment on Render
Frontend: Deploy to Vercel as usual (static hosting)

🏗️ Project Structure

PolySensor/
├── main.py                    # Flask backend entry point with RAG chat endpoints
├── data_handling.py           # Content extraction and validation utilities
├── prompts.py                 # AI prompt templates for different content types
├── vector_db.py               # ChromaDB vector database management
├── api_endpoints.py           # Additional vector database API endpoints
├── test_vector_db.py          # Vector database testing utilities
├── requirements.txt           # Python dependencies
├── .env                       # Environment variables (API keys)
├── .gitignore                 # Git ignore rules
├── README.md                  # Project README
├── docker-compose.yml         # Docker Compose configuration
├── Dockerfile                 # Backend Docker configuration
├── .dockerignore              # Docker ignore rules
├── package.json               # Root package.json
├── package-lock.json          # Root package-lock.json
├── vite.config.js             # Vite build configuration
├── chroma_db/                 # ChromaDB persistent storage
├── Docs/                      # Documentation and diagrams
├── assets/                    # Project assets and screenshots
├── md/                        # Markdown files
├── utilities/                 # Additional tools and executables
└── frontend/                  # React frontend application
    ├── public/
    │   ├── index.html         # Main HTML template
    │   ├── favicon.ico
    │   ├── manifest.json      # PWA manifest
    │   └── robots.txt
    ├── src/
    │   ├── components/           # Reusable React components
    │   │   ├── AnalysisResults/  # Displays analysis output with markdown support
    │   │   │   ├── AnalysisResults.jsx
    │   │   │   ├── AnalysisResults.css
    │   │   │   └── index.js
    │   │   ├── FileUploader/     # Drag-and-drop file upload interface
    │   │   │   ├── FileUploader.jsx
    │   │   │   ├── FileUploader.css
    │   │   │   └── index.js
    │   │   ├── Loading/          # Loading spinner component
    │   │   │   ├── Loading.jsx
    │   │   │   ├── Loading.css
    │   │   │   └── index.js
    │   │   ├── InputArea.jsx     # Chat input component
    │   │   ├── InputArea.css     # Chat input styles
    │   │   ├── TypingIndicator.jsx # Animated typing indicator
    │   │   └── TypingIndicator.css # Typing indicator styles
    │   ├── pages/
    │   │   └── AnalyzePage.jsx   # Main analysis page
    │   ├── utils/
    │   │   └── apiService.js     # API communication utilities
    │   ├── App.jsx               # Main React application component
    │   ├── App.css               # Global styles
    │   ├── main.jsx              # React application entry point
    │   ├── index.css             # Base styles
    │   └── assets/               # Frontend assets
    ├── package.json              # Node.js dependencies and scripts
    ├── Dockerfile                # Frontend Docker configuration
    ├── nginx.conf                # Nginx configuration for production
    └── vercel.json               # Vercel deployment configuration

Core Modules

Backend:

main.py: Flask API server with CORS support, handles file uploads, AI processing, and RAG chat
data_handling.py: Content extraction functions for documents, images, audio, and video
prompts.py: Specialized prompts for different media types optimized for Gemini AI
vector_db.py: ChromaDB vector database management for persistent analysis storage and semantic search
api_endpoints.py: Additional vector database API endpoints for search and management

Frontend:

App.jsx: Main application router and state management with analysis history
AnalyzePage.jsx: Core analysis interface with file upload, results display, and chat functionality
FileUploader.jsx: Drag-and-drop file upload component with validation
AnalysisResults.jsx: Markdown rendering component with PDF export functionality
InputArea.jsx: Chat input component with typing indicators
TypingIndicator.jsx: Animated typing indicator for chat responses
apiService.js: Axios-based API client for backend communication (analyzeFile, analyzeText, exportPDF)

🔧 Configuration

API Keys

Get your Google Gemini API key from Google AI Studio and add it to your .env file:

GOOGLE_API_KEY=your_actual_api_key_here

Customizing Analysis

You can modify the analysis prompts in prompts.py to tailor the output to your specific needs:

# Example custom prompt
CUSTOM_ANALYSIS = '''
Analyze this content and focus on technical details:

Content: {content_data} // Place holder which contains the function output

Please provide:
1. Technical specifications
2. Implementation details
3. Potential improvements
'''

💡 Examples

Document Analysis

# Input: research_paper.pdf
# Output: Summary of key findings, methodology, and conclusions

Image Analysis

# Input: diagram.png  
# Output: Extracted text + analysis of visual content and structure

Video Analysis

# Input: presentation.mp4
# Output: Combined analysis of slide content and spoken presentation

🛠️ Development

Adding New File Types

Add file extension detection in main.py:

if file_path.lower().endswith(('.new_extension')):
    new_data = new_extraction_function(file_path)

Create extraction function in text_extractor.py:

def new_extraction_function(file_path):
    # Implement extraction logic
    return extracted_content

Add prompt template in prompts.py:

NEW_TYPE = '''
Your analysis prompt for new file type...
'''

Running Tests

# Add tests to the repository and run with:
python -m pytest tests/

📊 Output Examples

The AI provides structured analysis including:

Key Points: Main takeaways from the content
Summary: Concise overview of the material
Actionable Insights: Practical recommendations
Ambiguity Detection: Identification of unclear sections

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

🐛 Troubleshooting

Common Issues

"File does not exist" error

Check file path spelling
Use absolute paths if needed

OCR not working

Verify Tesseract installation
Check image quality and resolution

API key errors

Ensure .env file is in the project root
Verify the API key has sufficient permissions

Audio/video processing slow

Large files may take time to process
Consider shorter intervals for video analysis

Getting Help

Check existing Issues for similar problems
Create a new issue with detailed error messages and file examples

👥 Credits & Acknowledgments

🧑‍💻 Project Creator

Aditya Singh

Artificial Intelligence Engineer
Project Lead & Full-Stack Developer
AI/ML Integration & Architecture Design
Multi-modal Content Processing Specialist

🛠️ Technologies & Libraries

This project stands on the shoulders of these amazing open-source technologies:

Technology	Purpose	Credit
React	Frontend Framework	Meta
Vite	Build Tool & Dev Server	Vite
Flask	Backend Web Framework	Pallets
Flask-CORS	Cross-Origin Resource Sharing	Flask-CORS
Google Gemini	AI Language Model	Google AI
LangChain	LLM Orchestration	LangChain AI
ChromaDB	Vector Database for RAG	Chroma
Sentence Transformers	Embeddings Generation	Hugging Face
Unstructured	Document Processing	Unstructured IO
MoviePy	Video Processing	Zulko
Mutagen	Audio Metadata Handling	QuodLibet
ReportLab	PDF Generation	ReportLab
BeautifulSoup	HTML Parsing	BeautifulSoup
Markdown	Markdown Processing	Python Markdown
Python-dotenv	Environment Variables	Dotenv
Axios	HTTP Client	Axios
html2canvas	HTML to Canvas	Niklas von Hertzen
jsPDF	PDF Generation	Parallax
markdown-to-jsx	Markdown Rendering	ProbablyUp
Docker	Containerization	Docker
Nginx	Reverse Proxy & Load Balancer	Nginx
Vercel	Frontend Deployment	Vercel

🙏 Special Thanks

Open Source Community for invaluable tools and libraries
Google Gemini Team for powerful AI capabilities
All Future Contributors who will test and improve PolySensor

Built with ❤️ by Aditya Singh

⚠️ Terms of Usage:

This tool is designed for content analysis and should be used in compliance with copyright laws and content usage rights. Always ensure you have permission to analyze and process the files you use with this system.

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PolySensor

AI-Powered Multi-Modal Content Analyzer

🌟 What Makes PolySensor Unique?

🧠 Intelligent Content Understanding

🔄 Powered by State-of-the-Art AI

📊 Comprehensive Content Coverage

🌟 Features

Frontend Features

Backend Features

📸 Screenshots

Demo

PolySensor Analysis Results

📋 Supported File Formats (~108+)

Image Files

Audio Files

Video Files

Documents and Text Files

Spreadsheets

Presentations

Email Files

Code & Development Files

Other File Types

🚀 Quick Start

Prerequisites

Installation

Usage

🐳 Docker Setup (Recommended)

Prerequisites

Quick Start with Docker

Docker Commands

Production Deployment

🏗️ Project Structure

Core Modules

🔧 Configuration

API Keys

Customizing Analysis

💡 Examples

Document Analysis

Image Analysis

Video Analysis

🛠️ Development

Adding New File Types

Running Tests

📊 Output Examples

🤝 Contributing

🐛 Troubleshooting

Common Issues

Getting Help

👥 Credits & Acknowledgments

🧑‍💻 Project Creator

🛠️ Technologies & Libraries

🙏 Special Thanks

⚠️ Terms of Usage: