OpenContracts (Demo)

The open source platform for building knowledge bases that humans and AI agents can work with together.


Backend CI/CD
Meta

Most knowledge lives in documents. Contracts, regulations, research papers, policies — the stuff that governs how organizations actually work. That knowledge is usually trapped: locked in PDFs, scattered across drives, understood fully by a handful of people who happened to read the right things at the right time.

OpenContracts started in 2019 with a simple conviction: that knowledge needed to be carefully curated, and that machine learning systems were only as good as the data underneath them. It was built as a platform for human collaborators — lawyers, researchers, analysts — to annotate documents together and produce gold-standard training data.

Those collaborators mostly never came. The platform was too early, the problem too niche, the value too invisible.

Then large language models arrived, and the world suddenly needed exactly what OpenContracts had been building all along: structured, annotated, version-controlled knowledge bases that AI could actually reason over. The collaborators the platform was designed for finally showed up — they just turned out to be AI agents.

Today, OpenContracts is a self-hosted platform where teams build knowledge bases from their documents and where AI agents work alongside humans to search, analyze, and extend that knowledge. The core conviction hasn't changed. The best AI systems still need carefully curated data. The difference is that now, the curation and the AI happen in the same place.

AI Agents Configurable assistants that search, annotate, and reason over your knowledge base	MCP Server Expose your corpus to Claude, Cursor, and any MCP-compatible AI tool	Multimodal Search Vector embeddings and full-text search across documents and annotations
Collaboration Threaded discussions, @mentions, voting, and moderation at every level	Data Extract Structured extraction across hundreds of documents with LLM-powered queries	Format Preservation PDF layout fidelity with precise text-to-coordinate mapping via PAWLS

What Makes This Different

Human Knowledge as the Foundation

This is not another "chat with your PDFs" tool. OpenContracts treats human annotation as the ground truth. Teams define custom label schemas, annotate documents with precise selections (including multi-page spans), and map relationships between concepts. AI builds on top of that work — it doesn't replace it.

Knowledge Bases, Not File Cabinets

Documents are organized into corpuses — version-controlled collections with folder hierarchies, fine-grained permissions, and full history. Fork a public corpus to build on someone else's annotations. Restore any previous version. Every change is tracked.

This is git for knowledge: you can branch, build, share, and never lose work.

AI Agents That Work With What You've Built

Configurable AI agents can search your documents, query your annotations, and participate in discussions — all grounded in the structured knowledge your team has created. They don't hallucinate in a vacuum; they reason over real, curated data.

@mention an agent in a discussion thread. Ask it to compare clauses across a hundred contracts. Let it surface patterns your team annotated last quarter. The agent's power comes from the quality of the knowledge base underneath it.

Collaboration Where the Knowledge Lives

Forum-style threaded discussions at every level — global, per-corpus, per-document. @mention documents, corpuses, and AI agents. Upvote the best analysis. Pin critical findings. The conversation happens next to the source material, not in a separate tool.

Shared Knowledge Compounds

Make a corpus public. Others fork it, refine the annotations, add documents, and share their improvements. Leaderboards and badges recognize contributors. Analytics show which knowledge bases are gaining traction and where the community is most active.

This is the DRY principle applied to institutional knowledge: annotate once, build on it forever.

See it in Action

PDF Annotation Flow

Text Format Support

Quick Start

Development

git clone https://github.com/JSv4/OpenContracts.git
cd OpenContracts
docker compose -f local.yml up

Production

# Apply database migrations first
docker compose -f production.yml --profile migrate up migrate

# Start services
docker compose -f production.yml up -d

Documentation

Browse the full documentation at jsv4.github.io/OpenContracts or in the repo:

Guide	Description
Quick Start	Get running with Docker in minutes
Key Concepts	Core workflows and terminology
PDF Data Format	How text maps to PDF coordinates
LLM Framework	PydanticAI integration and agents
Vector Stores	Semantic search architecture
Pipeline Overview	Parser and embedder system
Custom Extractors	Build your own data extraction tasks
v3.0.0.b3 Release Notes	Latest features and migration guide

Architecture

Data Format

OpenContracts uses a standardized format for representing text and layout on PDF pages, enabling portable annotations across tools:

Processing Pipeline

The modular pipeline supports custom parsers, embedders, and thumbnail generators:

Each component inherits from a base class with a defined interface:

Parsers — Extract text and structure from documents
Embedders — Generate vector embeddings for search
Thumbnailers — Create document previews

See the pipeline documentation for details on creating custom components.

Telemetry

OpenContracts collects anonymous usage data to guide development priorities: installation events, feature usage statistics, and aggregate counts. We do not collect document contents, extracted data, user identities, or query contents.

Disable backend telemetry: Set TELEMETRY_ENABLED=False in your Django settings. Disable frontend analytics: Leave REACT_APP_POSTHOG_API_KEY unset in frontend/public/env-config.js.

Supported Formats

PDF (full layout and annotation support)
Text-based formats (plaintext, Markdown)

Coming soon: DOCX viewing and annotation powered by Docxodus.

Acknowledgements

This project builds on work from:

AllenAI PAWLS — PDF annotation data format and concepts
NLMatics nlm-ingestor — Document parsing pipeline

License

AGPL-3.0 — See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3,335 Commits
.cursor/rules		.cursor/rules
.envs/.test		.envs/.test
.github		.github
.idea		.idea
.ipython/profile_default/startup		.ipython/profile_default/startup
cloudflare-og-worker		cloudflare-og-worker
compose		compose
config		config
docs		docs
fixtures/vcr_cassettes		fixtures/vcr_cassettes
frontend		frontend
locale		locale
model_preloaders		model_preloaders
opencontractserver		opencontractserver
plans		plans
requirements		requirements
scripts		scripts
tools		tools
.codecov.yml		.codecov.yml
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
local.yml		local.yml
manage.py		manage.py
merge_production_dotenvs_in_dotenv.py		merge_production_dotenvs_in_dotenv.py
mkdocs.yml		mkdocs.yml
plan.md		plan.md
production.yml		production.yml
pytest.ini		pytest.ini
schema.graphql		schema.graphql
schema.json		schema.json
setup.cfg		setup.cfg
setup_codecov.sh		setup_codecov.sh
test.yml		test.yml

Uh oh!

License

Open-Source-Legal/OpenContracts

Folders and files

Latest commit

History

Repository files navigation

OpenContracts (Demo)

AI Agents

MCP Server

Multimodal Search

Collaboration

Data Extract

Format Preservation

What Makes This Different

Human Knowledge as the Foundation

Knowledge Bases, Not File Cabinets

AI Agents That Work With What You've Built

Collaboration Where the Knowledge Lives

Shared Knowledge Compounds

See it in Action

PDF Annotation Flow

Text Format Support

Quick Start

Development

Production

Documentation

Data Format

Processing Pipeline

Telemetry

Supported Formats

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 23

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages