Skip to content

block/frfr

Frfr

High-confidence document Q&A system using LLM fact extraction with source verification.

Quick Links: Design Doc | All Docs

Quick Start

# Start the application
./run.sh

# Open in browser
open http://localhost:3000

The web UI provides:

  • Session Browser: View and manage all sessions with document counts and fact statistics
  • Document Processing: Upload PDFs and extract facts with real-time progress visualization
  • Facts Browser: Filter and search through extracted facts
  • Query Interface: Ask natural language questions with parallel batch processing and source citations

Architecture

Frfr uses a Go backend with a React frontend:

┌─────────────────────────────────────────┐
│         Web Interface (React)           │
│  - Session management                   │
│  - Document upload & processing         │
│  - Facts browser with search            │
│  - Query UI with source context panel   │
└────────────────┬────────────────────────┘
                 │ REST API + SSE
                 ▼
┌─────────────────────────────────────────┐
│         Go Backend Server               │
│  - Session & document management        │
│  - Claude API integration               │
│  - Parallel fact extraction             │
│  - Query processing with citations      │
└────────────────┬────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────┐
│      Python PDF Extractor               │
│  - pdfplumber / PyMuPDF extraction      │
│  - Handles text & scanned PDFs          │
└─────────────────────────────────────────┘

Features

Document Processing

  • PDF text extraction (pdfplumber with PyMuPDF fallback)
  • Adaptive chunking with semantic boundaries
  • Parallel fact extraction (up to 20 concurrent workers)
  • Real-time progress visualization with chunk grid
  • Automatic session management

Fact Extraction

  • Claude-powered extraction with structured output
  • 8 metadata fields per fact (type, control family, entities, etc.)
  • Multiple evidence quotes per fact
  • Specificity scoring and quality filtering

Query System

  • Natural language questions over extracted facts
  • Parallel batch processing (150 facts per batch)
  • Live progress streaming via SSE
  • Clickable source citations
  • Source context panel with quote highlighting

Project Structure

frfr/
├── backend/                    # Go backend server
│   ├── cmd/server/            # Server entrypoint
│   └── internal/
│       ├── api/               # REST API handlers
│       ├── config/            # Configuration
│       ├── domain/models/     # Data models
│       └── services/          # Business logic
│           ├── claude/        # Claude API client
│           ├── extraction/    # Fact extraction
│           ├── pdf/           # PDF extraction (calls Python)
│           ├── query/         # Query processing
│           ├── session/       # Session management
│           └── validation/    # Fact validation
├── frontend/                   # React + TypeScript frontend
│   └── src/
│       ├── api/               # API client
│       ├── components/        # React components
│       └── pages/             # Page components
├── python/                     # Python PDF extractor module
│   └── frfr_pdf/
├── run.sh                      # Start script
└── docs/                       # Documentation

Prerequisites

  • Go 1.21+
  • Node.js 18+
  • Python 3.10+
  • Claude API access (via ANTHROPIC_API_KEY or claude CLI authentication)

Installation

# Clone repository
git clone <repo-url>
cd frfr

# Start everything (installs dependencies automatically)
./run.sh

The run.sh script will:

  1. Check dependencies (Go, Node, Python)
  2. Install the Python PDF extractor
  3. Install frontend dependencies
  4. Build and start the Go backend
  5. Start the frontend dev server

Configuration

Environment variables:

Variable Default Description
FRFR_PORT 8080 Backend server port
FRFR_DATA_DIR ~/Documents/frfr/sessions Session storage directory
FRFR_MAX_WORKERS 20 Max parallel extraction workers
ANTHROPIC_API_KEY - Claude API key (optional if using CLI auth)
FRFR_PYTHON_PATH auto-detect Python interpreter path

Session Structure

Each session stores:

sessions/{session_id}/
├── metadata.json      # Session metadata & document registry
├── text/              # Extracted PDF text
│   └── {doc}.txt
├── chunks/            # Source text chunks (for context panel)
│   └── {doc}_chunk_{id}.txt
├── facts/             # Extracted facts per chunk
│   └── {doc}_chunk_{id}.json
└── summaries/         # Document summaries
    └── {doc}.json

API Endpoints

Sessions

  • GET /api/sessions - List all sessions
  • POST /api/sessions - Create new session
  • GET /api/sessions/{id} - Get session details
  • DELETE /api/sessions/{id} - Delete session

Documents

  • GET /api/sessions/{id}/documents - List documents
  • POST /api/sessions/{id}/documents - Add document
  • POST /api/sessions/{id}/documents/{doc}/reprocess - Reprocess document

Processing

  • POST /api/sessions/{id}/process - Start processing
  • GET /api/sessions/{id}/process/events - SSE event stream

Facts

  • GET /api/sessions/{id}/facts - List facts (paginated)
  • GET /api/sessions/{id}/facts/{index}/context - Get fact with source context

Query

  • POST /api/sessions/{id}/query - Query facts
  • GET /api/sessions/{id}/query/stream - SSE query stream with batch progress

Development

# Backend only
cd backend && go run ./cmd/server

# Frontend only (with backend running)
cd frontend && npm run dev

# Build frontend for production
cd frontend && npm run build

Use Cases

  • Security Audits: "Does this pentest report identify any critical vulnerabilities?"
  • Compliance: "Does this SOC2 report implement the controls in this reference spec?"
  • Design Review: "Does this architecture doc address the scaling requirements?"
  • Governance: "What data retention policies are described in this document?"

The system is designed for high-stakes questions where accuracy matters more than speed.

License

TBD

About

high-confidence document q+a system for complex docs

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •