Frfr

High-confidence document Q&A system using LLM fact extraction with source verification.

Quick Links: Design Doc | All Docs

Quick Start

# Start the application
./run.sh

# Open in browser
open http://localhost:3000

The web UI provides:

Session Browser: View and manage all sessions with document counts and fact statistics
Document Processing: Upload PDFs and extract facts with real-time progress visualization
Facts Browser: Filter and search through extracted facts
Query Interface: Ask natural language questions with parallel batch processing and source citations

Architecture

Frfr uses a Go backend with a React frontend:

┌─────────────────────────────────────────┐
│         Web Interface (React)           │
│  - Session management                   │
│  - Document upload & processing         │
│  - Facts browser with search            │
│  - Query UI with source context panel   │
└────────────────┬────────────────────────┘
                 │ REST API + SSE
                 ▼
┌─────────────────────────────────────────┐
│         Go Backend Server               │
│  - Session & document management        │
│  - Claude API integration               │
│  - Parallel fact extraction             │
│  - Query processing with citations      │
└────────────────┬────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────┐
│      Python PDF Extractor               │
│  - pdfplumber / PyMuPDF extraction      │
│  - Handles text & scanned PDFs          │
└─────────────────────────────────────────┘

Features

Document Processing

PDF text extraction (pdfplumber with PyMuPDF fallback)
Adaptive chunking with semantic boundaries
Parallel fact extraction (up to 20 concurrent workers)
Real-time progress visualization with chunk grid
Automatic session management

Fact Extraction

Claude-powered extraction with structured output
8 metadata fields per fact (type, control family, entities, etc.)
Multiple evidence quotes per fact
Specificity scoring and quality filtering

Query System

Natural language questions over extracted facts
Parallel batch processing (150 facts per batch)
Live progress streaming via SSE
Clickable source citations
Source context panel with quote highlighting

Project Structure

frfr/
├── backend/                    # Go backend server
│   ├── cmd/server/            # Server entrypoint
│   └── internal/
│       ├── api/               # REST API handlers
│       ├── config/            # Configuration
│       ├── domain/models/     # Data models
│       └── services/          # Business logic
│           ├── claude/        # Claude API client
│           ├── extraction/    # Fact extraction
│           ├── pdf/           # PDF extraction (calls Python)
│           ├── query/         # Query processing
│           ├── session/       # Session management
│           └── validation/    # Fact validation
├── frontend/                   # React + TypeScript frontend
│   └── src/
│       ├── api/               # API client
│       ├── components/        # React components
│       └── pages/             # Page components
├── python/                     # Python PDF extractor module
│   └── frfr_pdf/
├── run.sh                      # Start script
└── docs/                       # Documentation

Prerequisites

Go 1.21+
Node.js 18+
Python 3.10+
Claude API access (via ANTHROPIC_API_KEY or claude CLI authentication)

Installation

# Clone repository
git clone <repo-url>
cd frfr

# Start everything (installs dependencies automatically)
./run.sh

The run.sh script will:

Check dependencies (Go, Node, Python)
Install the Python PDF extractor
Install frontend dependencies
Build and start the Go backend
Start the frontend dev server

Configuration

Environment variables:

Variable	Default	Description
`FRFR_PORT`	`8080`	Backend server port
`FRFR_DATA_DIR`	`~/Documents/frfr/sessions`	Session storage directory
`FRFR_MAX_WORKERS`	`20`	Max parallel extraction workers
`ANTHROPIC_API_KEY`	-	Claude API key (optional if using CLI auth)
`FRFR_PYTHON_PATH`	auto-detect	Python interpreter path

Session Structure

Each session stores:

sessions/{session_id}/
├── metadata.json      # Session metadata & document registry
├── text/              # Extracted PDF text
│   └── {doc}.txt
├── chunks/            # Source text chunks (for context panel)
│   └── {doc}_chunk_{id}.txt
├── facts/             # Extracted facts per chunk
│   └── {doc}_chunk_{id}.json
└── summaries/         # Document summaries
    └── {doc}.json

API Endpoints

Sessions

GET /api/sessions - List all sessions
POST /api/sessions - Create new session
GET /api/sessions/{id} - Get session details
DELETE /api/sessions/{id} - Delete session

Documents

GET /api/sessions/{id}/documents - List documents
POST /api/sessions/{id}/documents - Add document
POST /api/sessions/{id}/documents/{doc}/reprocess - Reprocess document

Processing

POST /api/sessions/{id}/process - Start processing
GET /api/sessions/{id}/process/events - SSE event stream

Facts

GET /api/sessions/{id}/facts - List facts (paginated)
GET /api/sessions/{id}/facts/{index}/context - Get fact with source context

Query

POST /api/sessions/{id}/query - Query facts
GET /api/sessions/{id}/query/stream - SSE query stream with batch progress

Development

# Backend only
cd backend && go run ./cmd/server

# Frontend only (with backend running)
cd frontend && npm run dev

# Build frontend for production
cd frontend && npm run build

Use Cases

Security Audits: "Does this pentest report identify any critical vulnerabilities?"
Compliance: "Does this SOC2 report implement the controls in this reference spec?"
Design Review: "Does this architecture doc address the scaling requirements?"
Governance: "What data retention policies are described in this document?"

The system is designed for high-stakes questions where accuracy matters more than speed.

License

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.claude		.claude
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
backend		backend
bin		bin
docs		docs
documents		documents
frfr		frfr
frontend		frontend
python		python
scripts		scripts
sessions		sessions
tests		tests
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CONVERSATION_ENHANCEMENTS.md		CONVERSATION_ENHANCEMENTS.md
GOVERNANCE.md		GOVERNANCE.md
HIGHLIGHTING_IMPROVEMENTS.md		HIGHLIGHTING_IMPROVEMENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
renovate.json		renovate.json
requirements-local.txt		requirements-local.txt
requirements.txt		requirements.txt
run.sh		run.sh
v5_extraction_monitor.sh		v5_extraction_monitor.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Frfr

Quick Start

Architecture

Features

Document Processing

Fact Extraction

Query System

Project Structure

Prerequisites

Installation

Configuration

Session Structure

API Endpoints

Sessions

Documents

Processing

Facts

Query

Development

Use Cases

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

block/frfr

Folders and files

Latest commit

History

Repository files navigation

Frfr

Quick Start

Architecture

Features

Document Processing

Fact Extraction

Query System

Project Structure

Prerequisites

Installation

Configuration

Session Structure

API Endpoints

Sessions

Documents

Processing

Facts

Query

Development

Use Cases

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages