MCP (Model Context Protocol) Document Reader - A powerful MCP tool for reading documents in multiple formats, enabling AI agents to truly "read" your documents.
- Multi-format Support: Supports 4 mainstream document formats: Excel (XLSX/XLS), DOCX, PDF, and TXT
- MCP Protocol: Compliant with MCP standards, can be used as a tool for AI assistants like Trae IDE
- Easy Integration: Simple configuration for immediate use
- Reliable Performance: Successfully tested and running in Trae IDE
- File System Support: Reads documents directly from the file system
User Guide · API Reference · Contributing · Changelog · License
graph TB
A[AI Assistant / User] -->|Call read_document| B[MCP Document Reader]
B -->|Detect file type| C{File Type?}
C -->|.docx| D[DOCX Reader]
C -->|.pdf| E[PDF Reader]
C -->|.xlsx/.xls| F[Excel Reader]
C -->|.txt| G[Text Reader]
D -->|Extract text| H[Return Content]
E -->|Extract text| H
F -->|Extract text| H
G -->|Extract text| H
H -->|Text content| A
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#f0f0f0
style D fill:#e8f5e9
style E fill:#e8f5e9
style F fill:#e8f5e9
style G fill:#e8f5e9
style H fill:#fff9c4
| Format | Extensions | MIME Type | Features |
|---|---|---|---|
| Excel | .xlsx, .xls | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | Sheet and cell data extraction |
| DOCX | .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Text and structure extraction |
| application/pdf | Text extraction | ||
| Text | .txt | text/plain | Plain text reading |
pip install mcp-documents-readergit clone https://github.com/xt765/mcp_documents_reader.git
cd mcp_documents_reader
pip install -e .This server provides the following tool:
Read any supported document type with a unified interface.
Arguments:
filename(string, required): Document file path, supports absolute or relative paths.
Add the following to your MCP configuration file:
Option 1: Using PyPI (Recommended)
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"mcp-documents-reader"
]
}
}
}Option 2: Using GitHub repository
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/xt765/mcp_documents_reader",
"mcp_documents_reader"
]
}
}
}Option 3: Using Gitee repository (Faster access in China)
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"--from",
"git+https://gitee.com/xt765/mcp_documents_reader",
"mcp_documents_reader"
]
}
}
}After configuration, AI assistants can directly call the following tool:
# Read a DOCX file
read_document(filename="example.docx")
# Read a PDF file
read_document(filename="example.pdf")
# Read an Excel file
read_document(filename="example.xlsx")
# Read a text file
read_document(filename="example.txt")from mcp_documents_reader import DocumentReaderFactory
# Using factory (recommended)
reader = DocumentReaderFactory.get_reader("document.pdf")
content = reader.read("/path/to/document.pdf")
# Check if format is supported
if DocumentReaderFactory.is_supported("file.xlsx"):
reader = DocumentReaderFactory.get_reader("file.xlsx")
content = reader.read("/path/to/file.xlsx")Read any supported document type.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| filename | string | ✅ | Document file path, supports absolute or relative paths |
| Variable | Description | Default |
|---|---|---|
DOCUMENT_DIRECTORY |
Directory where documents are stored | ./documents |
mcp>= 0.1.0 - MCP protocol implementationpython-docx>= 0.8.11 - DOCX file readingPyPDF2>= 3.0.1 - PDF file readingopenpyxl>= 3.0.10 - Excel file reading
pytest>= 8.0.0 - Testing frameworkpytest-asyncio>= 0.24.0 - Async testing supportpytest-cov>= 6.0.0 - Coverage reportingbasedpyright>= 0.28.0 - Type checkingruff>= 0.8.0 - Linting and formatting
MIT License
Issues and Pull Requests are welcome!
- MCP Document Converter - MCP document converter supporting multiple format conversions
- Model Context Protocol - Official Model Context Protocol documentation