GovIntel

GovIntel is a local-first federal procurement intelligence system. It imports USAspending contract awards, stores them in PostgreSQL, builds a local Chroma retrieval index, and generates citation-grounded market intelligence briefs through a FastAPI service and Streamlit UI.

The default workflow is intentionally practical: public contract data, local storage, hybrid retrieval, SQL analytics, Gemini-backed report generation, and fail-closed citation validation.

Demo

Watch the walkthrough on YouTube or open the repository MP4.

The walkthrough shows the Streamlit UI as a user selects filters, enters a DHS cybersecurity market question, generates a grounded brief, and reviews the cited contract evidence.

Example Analysis Runs

Use the UI or /api/v1/analyze endpoint for questions like:

Question	Useful filters	What GovIntel returns
Who are the top DHS cybersecurity contractors by total award value?	Agency `DHS`, NAICS `541512`, 3 years	Ranked contractors, spend context, cited awards, and citation validation metadata.
Who leads DoD artificial intelligence and data platform awards?	Agency `DoD`, NAICS `541512`, 3 years	Mission-data platform brief with leading vendors, spend signals, and cited awards.
Which GSA cloud marketplace awards indicate growing demand?	Agency `GSA`, NAICS `541512`, 3 years	Cloud-demand brief with marketplace leaders, award evidence, and trace metadata.

Capabilities

Async USAspending.gov award ingestion with pagination and idempotent PostgreSQL upserts.
Typed Pydantic models for awards, analysis requests, contractor summaries, retrieved evidence, and generated briefs.
Chroma-backed vector indexing with sentence-transformer embeddings.
BM25 keyword retrieval, vector retrieval, hybrid merge, and cross-encoder reranking.
SQL analytics for top contractors, quarterly spend trends, and market concentration.
Versioned prompt templates and structured JSON generation.
Citation validation that rejects unsupported citations before returning a brief.
FastAPI /api/v1/analyze endpoint with optional X-API-Key protection.
Streamlit UI for choosing filters, generating briefs, and inspecting cited contract evidence.
Docker Compose stack for PostgreSQL, the API, and the UI.

Optional extension hooks are included for Langfuse tracing, Pinecone mirroring, Hugging Face-hosted generation against a separately served model, and offline training/evaluation utilities. They are not required for the core workflow.

Architecture

See docs/architecture.md for the technical deep dive.

flowchart LR
    A[USAspending.gov API] --> B[Ingestion CLI]
    B --> C[PostgreSQL contracts]
    C --> D[Indexing CLI]
    D --> E[Chroma vector index]
    C --> F[BM25 corpus]
    E --> G[Hybrid retrieval]
    F --> G
    G --> H[Reranker]
    C --> I[SQL analytics]
    H --> J[Prompt context]
    I --> J
    J --> K[Generation provider]
    K --> L[Structured brief draft]
    L --> M[Citation validation]
    M --> N[FastAPI + Streamlit]

Tech Stack

Python 3.10+
FastAPI, Uvicorn, and Streamlit
PostgreSQL 16, SQLAlchemy asyncio, and asyncpg
ChromaDB and sentence-transformers
rank-bm25 and cross-encoder reranking
Pydantic v2 and pydantic-settings
Jinja2 and PyYAML for prompt templates
pytest, pytest-cov, Ruff, and mypy

Repository Layout

docs/          Public architecture notes
eval/          Evaluation query fixtures and gold answers
prompts/       Versioned prompt templates
scripts/       Training helper entry points
src/govintel/
  api/          FastAPI app, routes, and dependencies
  ingestion/    USAspending import, loading, embeddings, and indexing
  retrieval/    BM25, vector search, hybrid retrieval, and reranking
  analysis/     SQL analytics for contractor rankings, trends, and HHI
  generation/   Prompt loading, LLM clients, reports, and citations
  frontend/     Streamlit app and API client helpers
  evaluation/   Offline evaluation runner and metrics
  training/     Offline synthetic-data and QLoRA utilities
  models.py     Shared domain models
tests/          Unit and integration-style test coverage

Quick Start

Prerequisites

Python 3.10 or newer
Docker and Docker Compose
Network access to the public USAspending.gov API
Gemini API key for live brief generation

Install

git clone <repository-url>
cd GovtIntel
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Configure

cp .env.example .env

For the normal local workflow, set:

EXTERNAL_PROVIDERS_ENABLED=true
GENERATION_PROVIDER=gemini
GEMINI_API_KEY=<your-gemini-api-key>

The default ingestion scope imports a bounded USAspending award slice for NAICS 541512. Set APP_API_KEY if you want /api/v1/analyze to require X-API-Key.

Run Locally

Start PostgreSQL:

make db-up

Import contract data and build the retrieval index:

make db-seed
make index

Run the API:

make run

In a second shell, run the Streamlit UI:

make ui

Open the UI at http://127.0.0.1:8501. The API health check is available at http://127.0.0.1:8000/api/v1/health.

API Example

curl -X POST http://127.0.0.1:8000/api/v1/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Who are the top DHS cybersecurity contractors?",
    "agency_filter": "DHS",
    "naics_filter": "541512",
    "date_range_years": 3,
    "generation_provider": "gemini"
  }'

When APP_API_KEY is set, include -H "X-API-Key: <key>".

Docker Compose

Build and start PostgreSQL, the API, and the Streamlit UI:

docker compose up --build

Then seed and index contract data through the API image:

docker compose run --rm api python -m govintel.ingestion.bootstrap
docker compose run --rm api python -m govintel.ingestion.index

The UI runs at http://127.0.0.1:8501.

Evaluation And Quality Checks

Run the offline evaluation harness:

EXTERNAL_PROVIDERS_ENABLED=false \
python3 -m govintel.evaluation.run_ablation \
  --output eval/results/latest.json

Run the main quality checks:

pytest -q --cov=govintel --cov-report=term-missing --cov-fail-under=90
ruff check src/ tests/
mypy src/
docker compose config --no-interpolate

Optional Extensions

These paths are implemented but not needed for the default workflow:

Langfuse tracing with conservative prompt/context redaction.
Pinecone mirroring for managed vector search.
Hugging Face hosted generation when HF_MODEL_ID points to a servable model or endpoint and HF_INFERENCE_ENABLED=true.
Offline synthetic training-data generation and QLoRA launcher utilities.

Generated training data, model artifacts, local eval results, and local notes are ignored by Git.

Data Model

The primary persisted table is contracts, keyed by award_id. Each row captures the recipient, awarding agency, award amount, performance dates, NAICS code, description, place of performance, and award type. Inserts use ON CONFLICT upserts so repeated ingestion runs refresh existing records without duplicating awards.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
docs		docs
eval		eval
prompts/v1		prompts/v1
scripts		scripts
src/govintel		src/govintel
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GovIntel

Demo

Example Analysis Runs

Capabilities

Architecture

Tech Stack

Repository Layout

Quick Start

Prerequisites

Install

Configure

Run Locally

API Example

Docker Compose

Evaluation And Quality Checks

Optional Extensions

Data Model

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GovIntel

Demo

Example Analysis Runs

Capabilities

Architecture

Tech Stack

Repository Layout

Quick Start

Prerequisites

Install

Configure

Run Locally

API Example

Docker Compose

Evaluation And Quality Checks

Optional Extensions

Data Model

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages