🧠 easyResearch - AI Research Assistant

Advanced Research Assistant powered by RAG (Retrieval-Augmented Generation)

📖 Introduction

easyResearch is an AI application that helps you query and ask questions on your own documents. The system uses advanced RAG technology to:

📄 Read and analyze documents (PDF, DOCX, TXT, Code)
🔍 Hybrid Search (Vector + BM25 keyword search)
💬 Answer questions based on document content
🌍 Multi-language support (Vietnamese & English)
🎯 Cross-Encoder Reranking for better accuracy

✨ Features

Feature	Description
📂 Workspace Management	Organize documents by project/topic separately
📥 Multi-format Import	Support PDF, DOCX, TXT, Python code
🧠 Parent Document	Small chunks for search, large chunks for context
⚡ GPU Acceleration	Optimized for NVIDIA GPU (CUDA)
🔑 Multi-LLM Support	Groq (LLaMA 3.3) or Google Gemini
🌐 RESTful API	Easy integration via FastAPI
🎨 AnythingLLM Theme	Dark zinc UI inspired by AnythingLLM
📊 Workspace Stats	Mini stat cards (docs, vectors, storage size)
📝 Auto-Summarizer	Automatic summary generation after document upload
🔄 Smart Context	Only contextualize when needed (faster response)
💬 Chat Persistence	Auto-save/load chat history per workspace
🎚️ Context Depth	Fast / Accurate / Detailed search modes
🗑️ File Management	Delete individual files from workspace
🕐 Recent Questions	Quick access to recent queries in sidebar

🏗️ System Architecture

easyResearch/
├── app.py              # Streamlit Interface (Web UI)
├── main.py             # FastAPI Server (REST API)
├── core/
│   ├── loader.py       # Parent Document Retrieval & Smart Splitter
│   ├── embedder.py     # Vectorization & ChromaDB Management
│   ├── generator.py    # Advanced RAG Pipeline
│   └── summarizer.py   # Auto-Summarization
├── database/
│   ├── chroma_db/      # Vector Database Storage
│   └── chat_history/   # Persistent Chat History (JSON per workspace)
└── uploads/            # Temporary File Storage

Tech Stack

LLM: Groq (LLaMA 3.3 70B) or Google Gemini 2.0 Flash
Embedding: HuggingFace paraphrase-multilingual-MiniLM-L12-v2
Reranker: CrossEncoder ms-marco-MiniLM-L-6-v2
Vector DB: ChromaDB
Keyword Search: BM25 (rank-bm25)
Framework: LangChain, Streamlit, FastAPI

🔬 Advanced RAG Pipeline

Question → Smart Contextualization → Vector Search
                                          ↓
                                    BM25 Scoring
                                          ↓
                              Cross-Encoder Reranking
                                          ↓
               Hybrid Score (0.7×Rerank + 0.3×BM25)
                                          ↓
                 Parent Document Retrieval → LLM Answer

Key Optimizations

Component	Benefit
Hybrid Search	Combines semantic + keyword matching
Parent Document	Small chunks (400) for search, large (2000) for context
Smart Contextualization	Only calls LLM when pronouns/references detected
Cross-Encoder	Local reranking (no API calls)

🚀 Installation

System Requirements

Python 3.10+
NVIDIA GPU with CUDA (recommended) or CPU

Installation Steps

Clone repository

git clone https://github.com/your-username/easyResearch.git
cd easyResearch

Create virtual environment

python -m venv venv

# Windows
venv\Scripts\activate

# Linux/Mac
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```
Configure API Key

Create a .env file in the root directory:
```
GROQ_API_KEY=your_groq_api_key_here
GOOGLE_API_KEY=your_gemini_api_key_here  # Optional
```
💡 Get Groq API Key at console.groq.com 💡 Get Gemini API Key at aistudio.google.com/apikey

📖 Usage Guide

Run Web UI (Streamlit)

streamlit run app.py

Access: http://localhost:8501

Run REST API (FastAPI)

uvicorn main:app --reload

Access Swagger UI: http://localhost:8000/docs

🔌 API Endpoints

1. Question & Answer - `POST /ask`

{
  "question": "Your question here",
  "collection_name": "notebook_name",
  "k_target": 10,
  "api_key": "your_api_key_here"
}

Response:

{
  "answer": "AI generated answer",
  "sources": ["file1.pdf", "file2.docx"]
}

2. Upload Document - `POST /upload`

curl -X POST "http://localhost:8000/upload?collection_name=my_research" \
  -F "file=@document.pdf"

⚙️ Advanced Configuration

Parent Document Chunking

File Type	Parent Size	Child Size	Notes
PDF, DOCX	2500	500	Preserve long text context
Code (.py, .js)	1500	400	Split by function/class
JSON, CSV	1000	300	Don't split mid-object
Default Text	2000	400	Balanced

Search Parameters

Hybrid Score: 0.7 × Rerank + 0.3 × BM25
Min Score Threshold: 0.1 (filter low relevance)

Context Depth Modes

Mode	k (documents)	Use Case
⚡ Fast	5	Quick answers, low latency
🎯 Accurate	10	Balanced (default)
📚 Detailed	18	Deep research, comprehensive info

📁 Workspace Management

Create New: Select "➕ New workspace…" from dropdown and name it
Switch: Select workspace from dropdown — badge shows active workspace
Delete Workspace: Go to ⚙️ Settings tab → "🗑 Delete workspace"
Delete File: Click ✕ next to any file in the 📁 Files list
Clear Chat: Go to ⚙️ Settings tab → "🗑 Clear chat"
Chat Persistence: Conversations auto-save per workspace and reload on switch
Recent Questions: Quick access to last 5 questions in sidebar
Auto-Summary: Generated automatically after uploading documents

Sidebar Layout

Tab / Section	Function
📂 Workspace Selector	Select/create workspace with stats cards
📄 Documents Tab	Upload files, view summary, file list with delete
🔍 Recent Questions	Quick-access to recent queries (in Documents tab)
⚙️ Settings Tab	LLM provider, API key, clear chat, delete workspace
🎚️ Context Depth	Fast / Accurate / Detailed radio pills (main area)

UI Theme

The interface uses an AnythingLLM-inspired dark theme:

Element	Color	Description
Sidebar	`#111111`	Deep dark background
Main area	`#1c1c1f`	Slightly lighter dark
Inputs / Cards	`#27272a`	Zinc-800 for form elements
Borders	`#3f3f46`	Subtle zinc-700 borders
Accent (buttons)	`#4f46e5`	Indigo primary buttons
Chat input	`#303034`	Unified single-color box
Font	Inter	Clean sans-serif via Google Fonts

Note: CSS targets specific selectors (not *) to preserve Streamlit's Material Symbols Rounded icons.

🛠️ Troubleshooting

Issue	Solution
Missing API Key	Create `.env` file or enter key in sidebar
CUDA Error	Check NVIDIA driver or run on CPU
VRAM Overflow	Reduce batch size in `embedder.py`
Slow Response	Already optimized (1-2 LLM calls only)

📄 License

MIT License - See LICENSE file for more details.

Made with ❤️ by easyResearch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 easyResearch - AI Research Assistant

📖 Introduction

✨ Features

🏗️ System Architecture

Tech Stack

🔬 Advanced RAG Pipeline

Key Optimizations

🚀 Installation

System Requirements

Installation Steps

📖 Usage Guide

Run Web UI (Streamlit)

Run REST API (FastAPI)

🔌 API Endpoints

1. Question & Answer - `POST /ask`

2. Upload Document - `POST /upload`

⚙️ Advanced Configuration

Parent Document Chunking

Search Parameters

Context Depth Modes

📁 Workspace Management

Sidebar Layout

UI Theme

🛠️ Troubleshooting

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
core		core
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 easyResearch - AI Research Assistant

📖 Introduction

✨ Features

🏗️ System Architecture

Tech Stack

🔬 Advanced RAG Pipeline

Key Optimizations

🚀 Installation

System Requirements

Installation Steps

📖 Usage Guide

Run Web UI (Streamlit)

Run REST API (FastAPI)

🔌 API Endpoints

1. Question & Answer - POST /ask

2. Upload Document - POST /upload

⚙️ Advanced Configuration

Parent Document Chunking

Search Parameters

Context Depth Modes

📁 Workspace Management

Sidebar Layout

UI Theme

🛠️ Troubleshooting

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Question & Answer - `POST /ask`

2. Upload Document - `POST /upload`

Packages