Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .DS_Store
Binary file not shown.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ trivy-report-fixed.json
coverage.xml
.ruff_cache
.pytest_cache
trivy-report-current.json
trivy-report-current.json
.vscode
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ RUN apt-get update && apt-get install -y \
libgl1 \
git \
tesseract-ocr \
tesseract-ocr-fra \
&& rm -rf /var/lib/apt/lists/*

# Set working directory
Expand Down
126 changes: 65 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,24 @@ Multi-modal RAG service exposing a REST API and MCP server for document indexing
|
+---------------+---------------+
| |
Application Layer MCP Tools
+------------------------------+ (FastMCP)
| api/ | |
| indexing_routes.py | |
| query_routes.py | |
| file_routes.py | |
| health_routes.py | |
| use_cases/ | |
| IndexFileUseCase | |
| IndexFolderUseCase | |
| QueryUseCase | |
| ListFilesUseCase | |
| ReadFileUseCase | |
| requests/ responses/ | |
+------------------------------+ |
| | | |
v v v v
Application Layer MCP Servers (FastMCP)
+------------------------------+ |
| api/ | +---+--------+ +--+-----------+
| indexing_routes.py | | RAGAnything | | RAGAnything |
| query_routes.py | | Query | | Files |
| file_routes.py | | /rag/mcp | | /files/mcp |
| health_routes.py | +---+--------+ +--+-----------+
| use_cases/ | | |
| IndexFileUseCase | query_knowledge list_files
| IndexFolderUseCase | _base read_file
| QueryUseCase | query_knowledge
| ListFilesUseCase | _base_multimodal
| ListFoldersUseCase |
| ReadFileUseCase |
| requests/ responses/ |
+------------------------------+
| | |
v v v
Domain Layer (ports)
+------------------------------------------+
| RAGEnginePort StoragePort BM25EnginePort DocumentReaderPort |
Expand Down Expand Up @@ -173,7 +174,7 @@ The service automatically detects and processes the following document formats t

| Format | Extensions | Notes |
|--------|------------|-------|
| PDF | `.pdf` | Includes OCR support |
| PDF | `.pdf` | Includes OCR support (English + French via Tesseract) |
| Microsoft Word | `.docx` | |
| Microsoft PowerPoint | `.pptx` | |
| Microsoft Excel | `.xlsx` | |
Expand Down Expand Up @@ -213,6 +214,26 @@ Response (`200 OK`):
| `prefix` | string | `""` | MinIO prefix to filter files by |
| `recursive` | boolean | `true` | List files in subdirectories |

### List folders

Returns top-level folder prefixes in the bucket. REST-only endpoint (not exposed as an MCP tool).

```bash
curl http://localhost:8000/api/v1/files/folders
```

Response (`200 OK`):

```json
["documents/", "photos/", "reports/"]
```

Error responses:

| Status | Condition |
|--------|-----------|
| `404` | Bucket not found |

### Read a file

Downloads the file from MinIO, extracts its text content using Kreuzberg, and returns the result. Supports 91 file formats including PDF, Office documents, images, and HTML.
Expand Down Expand Up @@ -382,20 +403,24 @@ Response (`200 OK`):

The `combined_score` is the sum of `bm25_score` and `vector_score`, each computed as `1 / (k + rank)`. Results are sorted by `combined_score` descending. A chunk that appears in both result sets will have a higher combined score than one that appears in only one.

## MCP Server
## MCP Servers

The MCP server is mounted at `/mcp` and exposes the following tools:
The service exposes **two separate MCP servers**, both using streamable HTTP transport:

### Tool: `query_knowledge_base`
### RAGAnythingQuery — `/rag/mcp`

Query-focused tools for searching the indexed knowledge base.

#### Tool: `query_knowledge_base`

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `working_dir` | string | required | RAG workspace directory for this project |
| `query` | string | required | The search query |
| `mode` | string | `"naive"` | Search mode: `naive`, `local`, `global`, `hybrid`, `hybrid+`, `mix`, `bm25`, `bypass` |
| `top_k` | integer | `10` | Number of chunks to retrieve |
| `mode` | string | `"hybrid"` | Search mode: `naive`, `local`, `global`, `hybrid`, `hybrid+`, `mix`, `bm25`, `bypass` |
| `top_k` | integer | `5` | Number of chunks to retrieve |

### Tool: `query_knowledge_base_multimodal`
#### Tool: `query_knowledge_base_multimodal`

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
Expand All @@ -405,54 +430,32 @@ The MCP server is mounted at `/mcp` and exposes the following tools:
| `mode` | string | `"hybrid"` | Search mode |
| `top_k` | integer | `5` | Number of chunks to retrieve |

### Tool: `list_files`
### RAGAnythingFiles — `/files/mcp`

File browsing tools for listing and reading files from MinIO storage.

#### Tool: `list_files`

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prefix` | string | `""` | MinIO prefix to filter files by |
| `recursive` | boolean | `true` | List files in subdirectories |

### Tool: `read_file`
#### Tool: `read_file`

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `file_path` | string | required | File path in MinIO bucket (e.g. `documents/report.pdf`) |

Downloads the file from MinIO, extracts its text content using Kreuzberg, and returns the extracted text along with metadata and any detected tables.

### Transport modes

The `MCP_TRANSPORT` environment variable controls how the MCP server is exposed:

| Value | Behavior |
|-------|----------|
| `stdio` | MCP runs over stdin/stdout; FastAPI runs in a background thread |
| `sse` | MCP mounted at `/mcp` as SSE endpoint |
| `streamable` | MCP mounted at `/mcp` as streamable HTTP endpoint |
### Transport

### Claude Desktop configuration
Both MCP servers use **streamable HTTP** transport exclusively. Connect MCP clients to the mount paths:

Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:

```json
{
"mcpServers": {
"raganything": {
"command": "uv",
"args": [
"run",
"--directory",
"/absolute/path/to/mcp-raganything",
"python",
"-m",
"src.main"
],
"env": {
"MCP_TRANSPORT": "stdio"
}
}
}
}
```
http://localhost:8000/rag/mcp # RAGAnythingQuery
http://localhost:8000/files/mcp # RAGAnythingFiles
```

## Configuration
Expand All @@ -465,7 +468,6 @@ All configuration is via environment variables, loaded through Pydantic Settings
|----------|---------|-------------|
| `HOST` | `0.0.0.0` | Server bind address |
| `PORT` | `8000` | Server port |
| `MCP_TRANSPORT` | `stdio` | MCP transport: `stdio`, `sse`, `streamable` |
| `ALLOWED_ORIGINS` | `["*"]` | CORS allowed origins |
| `OUTPUT_DIR` | system temp | Temporary directory for downloaded files |
| `UVICORN_LOG_LEVEL` | `critical` | Uvicorn log level |
Expand Down Expand Up @@ -577,7 +579,7 @@ The PostgreSQL server must have the `pg_textsearch` extension installed and load

```
src/
main.py -- FastAPI app, MCP mount, entry point
main.py -- FastAPI app, dual MCP mounts, entry point
config.py -- Pydantic Settings config classes
dependencies.py -- Dependency injection wiring
domain/
Expand All @@ -593,8 +595,9 @@ src/
health_routes.py -- GET /health
indexing_routes.py -- POST /file/index, /folder/index
query_routes.py -- POST /query
file_routes.py -- GET /files/list, POST /files/read
mcp_tools.py -- MCP tools: query_knowledge_base, list_files, read_file
file_routes.py -- GET /files/list, GET /files/folders, POST /files/read
mcp_query_tools.py -- MCP tools: query_knowledge_base, query_knowledge_base_multimodal
mcp_file_tools.py -- MCP tools: list_files, read_file
requests/
indexing_request.py -- IndexFileRequest, IndexFolderRequest
query_request.py -- QueryRequest, MultimodalQueryRequest
Expand All @@ -607,6 +610,7 @@ src/
index_folder_use_case.py -- Downloads from MinIO, indexes folder
query_use_case.py -- Query with bm25/hybrid+ support
list_files_use_case.py -- Lists files with metadata from MinIO
list_folders_use_case.py -- Lists folder prefixes from MinIO
read_file_use_case.py -- Reads file from MinIO, extracts content via Kreuzberg
infrastructure/
rag/
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ dependencies = [
"fastmcp>=3.2.0",
"cryptography>=46.0.5",
"httpx>=0.27.0",
"kreuzberg>=4.0.0",
"kreuzberg[all]>=4.8.2",
"lightrag-hku>=1.4.13",
"lightrag-hku[api]>=1.4.13",
"mcp>=1.24.0",
Expand Down
25 changes: 22 additions & 3 deletions src/application/api/file_routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,19 @@
from application.requests.file_request import ReadFileRequest
from application.responses.file_response import FileContentResponse, FileInfoResponse
from application.use_cases.list_files_use_case import ListFilesUseCase
from application.use_cases.list_folders_use_case import ListFoldersUseCase
from application.use_cases.read_file_use_case import ReadFileUseCase
from dependencies import get_list_files_use_case, get_read_file_use_case
from dependencies import (
get_list_files_use_case,
get_list_folders_use_case,
get_read_file_use_case,
)

file_router = APIRouter(tags=["Files"])


@file_router.get(
"/files/list",
response_model=list[FileInfoResponse],
status_code=status.HTTP_200_OK,
)
async def list_files(
Expand All @@ -25,9 +29,24 @@ async def list_files(
return [FileInfoResponse(**asdict(f)) for f in files]


@file_router.get(
"/files/folders",
status_code=status.HTTP_200_OK,
)
async def list_folders(
use_case: ListFoldersUseCase = Depends(get_list_folders_use_case),
) -> list[str]:
try:
return await use_case.execute()
except FileNotFoundError as e:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=str(e),
) from None


@file_router.post(
"/files/read",
response_model=FileContentResponse,
status_code=status.HTTP_200_OK,
)
async def read_file(
Expand Down
20 changes: 19 additions & 1 deletion src/application/api/health_routes.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
from fastapi import APIRouter
from fastapi import APIRouter, Depends
from fastapi.responses import JSONResponse

from application.use_cases.liveness_check_use_case import LivenessCheckUseCase
from dependencies import get_liveness_check_use_case

health_router = APIRouter(tags=["Health"])

Expand All @@ -12,3 +16,17 @@ def health_check():
dict: Status message indicating the API is running.
"""
return {"message": "RAG Anything API is running"}


@health_router.get("/health/live")
async def liveness_check(
use_case: LivenessCheckUseCase = Depends(get_liveness_check_use_case),
):
"""Liveness probe that checks PostgreSQL and MinIO connectivity.

Returns:
200 if both connections are healthy, 503 if any is unreachable.
"""
result = await use_case.execute()
status_code = 200 if result["status"] == "healthy" else 503
return JSONResponse(content=result, status_code=status_code)
65 changes: 65 additions & 0 deletions src/application/api/mcp_file_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""MCP file tools for RAGAnything.

These tools are registered with FastMCP for Claude Desktop integration.
"""

import logging
from dataclasses import asdict

from fastmcp import FastMCP

from application.responses.file_response import FileContentResponse, FileInfoResponse
from dependencies import (
get_list_files_use_case,
get_read_file_use_case,
)

logger = logging.getLogger(__name__)

mcp_files = FastMCP("RAGAnythingFiles")


@mcp_files.tool()
async def list_files(
prefix: str = "", recursive: bool = True
) -> list[FileInfoResponse]:
"""List files in MinIO storage under a given prefix.

Args:
prefix: MinIO prefix/path to filter files by (e.g. 'documents/')
recursive: Whether to list files in subdirectories (default True)

Returns:
List of file objects with object_name, size, and last_modified
"""
use_case = get_list_files_use_case()
files = await use_case.execute(prefix=prefix, recursive=recursive)
return [FileInfoResponse(**asdict(f)) for f in files]


@mcp_files.tool()
async def read_file(file_path: str) -> FileContentResponse:
"""Read and extract text content from a file stored in MinIO.

Supports 91 file formats including PDF, Office documents, images, HTML, etc.
Uses Kreuzberg for document intelligence extraction.

Args:
file_path: Path to the file in MinIO bucket (e.g. 'documents/report.pdf')

Returns:
Extracted text content with metadata and any detected tables
"""
use_case = get_read_file_use_case()
try:
result = await use_case.execute(file_path=file_path)
except FileNotFoundError:
raise ValueError(f"File not found: {file_path}") from None
except Exception:
logger.exception("Unexpected error reading file: %s", file_path)
raise RuntimeError("Failed to read file") from None
return FileContentResponse(
content=result.content,
metadata=result.metadata,
tables=result.tables,
)
Loading
Loading