High-confidence document Q&A system using LLM fact extraction with source verification.
Quick Links: Design Doc | All Docs
# Start the application
./run.sh
# Open in browser
open http://localhost:3000The web UI provides:
- Session Browser: View and manage all sessions with document counts and fact statistics
- Document Processing: Upload PDFs and extract facts with real-time progress visualization
- Facts Browser: Filter and search through extracted facts
- Query Interface: Ask natural language questions with parallel batch processing and source citations
Frfr uses a Go backend with a React frontend:
┌─────────────────────────────────────────┐
│ Web Interface (React) │
│ - Session management │
│ - Document upload & processing │
│ - Facts browser with search │
│ - Query UI with source context panel │
└────────────────┬────────────────────────┘
│ REST API + SSE
▼
┌─────────────────────────────────────────┐
│ Go Backend Server │
│ - Session & document management │
│ - Claude API integration │
│ - Parallel fact extraction │
│ - Query processing with citations │
└────────────────┬────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Python PDF Extractor │
│ - pdfplumber / PyMuPDF extraction │
│ - Handles text & scanned PDFs │
└─────────────────────────────────────────┘
- PDF text extraction (pdfplumber with PyMuPDF fallback)
- Adaptive chunking with semantic boundaries
- Parallel fact extraction (up to 20 concurrent workers)
- Real-time progress visualization with chunk grid
- Automatic session management
- Claude-powered extraction with structured output
- 8 metadata fields per fact (type, control family, entities, etc.)
- Multiple evidence quotes per fact
- Specificity scoring and quality filtering
- Natural language questions over extracted facts
- Parallel batch processing (150 facts per batch)
- Live progress streaming via SSE
- Clickable source citations
- Source context panel with quote highlighting
frfr/
├── backend/ # Go backend server
│ ├── cmd/server/ # Server entrypoint
│ └── internal/
│ ├── api/ # REST API handlers
│ ├── config/ # Configuration
│ ├── domain/models/ # Data models
│ └── services/ # Business logic
│ ├── claude/ # Claude API client
│ ├── extraction/ # Fact extraction
│ ├── pdf/ # PDF extraction (calls Python)
│ ├── query/ # Query processing
│ ├── session/ # Session management
│ └── validation/ # Fact validation
├── frontend/ # React + TypeScript frontend
│ └── src/
│ ├── api/ # API client
│ ├── components/ # React components
│ └── pages/ # Page components
├── python/ # Python PDF extractor module
│ └── frfr_pdf/
├── run.sh # Start script
└── docs/ # Documentation
- Go 1.21+
- Node.js 18+
- Python 3.10+
- Claude API access (via
ANTHROPIC_API_KEYorclaudeCLI authentication)
# Clone repository
git clone <repo-url>
cd frfr
# Start everything (installs dependencies automatically)
./run.shThe run.sh script will:
- Check dependencies (Go, Node, Python)
- Install the Python PDF extractor
- Install frontend dependencies
- Build and start the Go backend
- Start the frontend dev server
Environment variables:
| Variable | Default | Description |
|---|---|---|
FRFR_PORT |
8080 |
Backend server port |
FRFR_DATA_DIR |
~/Documents/frfr/sessions |
Session storage directory |
FRFR_MAX_WORKERS |
20 |
Max parallel extraction workers |
ANTHROPIC_API_KEY |
- | Claude API key (optional if using CLI auth) |
FRFR_PYTHON_PATH |
auto-detect | Python interpreter path |
Each session stores:
sessions/{session_id}/
├── metadata.json # Session metadata & document registry
├── text/ # Extracted PDF text
│ └── {doc}.txt
├── chunks/ # Source text chunks (for context panel)
│ └── {doc}_chunk_{id}.txt
├── facts/ # Extracted facts per chunk
│ └── {doc}_chunk_{id}.json
└── summaries/ # Document summaries
└── {doc}.json
GET /api/sessions- List all sessionsPOST /api/sessions- Create new sessionGET /api/sessions/{id}- Get session detailsDELETE /api/sessions/{id}- Delete session
GET /api/sessions/{id}/documents- List documentsPOST /api/sessions/{id}/documents- Add documentPOST /api/sessions/{id}/documents/{doc}/reprocess- Reprocess document
POST /api/sessions/{id}/process- Start processingGET /api/sessions/{id}/process/events- SSE event stream
GET /api/sessions/{id}/facts- List facts (paginated)GET /api/sessions/{id}/facts/{index}/context- Get fact with source context
POST /api/sessions/{id}/query- Query factsGET /api/sessions/{id}/query/stream- SSE query stream with batch progress
# Backend only
cd backend && go run ./cmd/server
# Frontend only (with backend running)
cd frontend && npm run dev
# Build frontend for production
cd frontend && npm run build- Security Audits: "Does this pentest report identify any critical vulnerabilities?"
- Compliance: "Does this SOC2 report implement the controls in this reference spec?"
- Design Review: "Does this architecture doc address the scaling requirements?"
- Governance: "What data retention policies are described in this document?"
The system is designed for high-stakes questions where accuracy matters more than speed.
TBD