Skip to content

Implement Phase 2 canonical fact validation surfaces#16

Merged
magic-alt merged 1 commit into
mainfrom
feat/phase2-fact-validation
May 22, 2026
Merged

Implement Phase 2 canonical fact validation surfaces#16
magic-alt merged 1 commit into
mainfrom
feat/phase2-fact-validation

Conversation

@magic-alt
Copy link
Copy Markdown
Owner

Summary

This PR implements the Phase 2 canonical fact schema and fact-validation slice on top of the now-merged Phase 1 benchmark work.

It enriches canonical facts with issuer metadata, adds a persisted fact-validation artifact and API route, keeps the legacy statements and facts surfaces compatible, and updates the frontend API types to match the new evidence model.

What Changed

  • Extend DocumentMeta and FinancialFact with issuer and filing metadata:
    • company
    • ticker
    • cik
    • filing_type
    • raw_label
  • Add fact-validation models and checks for:
    • missing critical facts
    • duplicate concepts
    • period consistency
    • scale and currency consistency
    • balance equation
    • cashflow reconciliation
  • Generate and persist fact_validation.json in the pipeline, including partial-failure persistence
  • Add GET /v1/documents/{doc_id}/fact-validation
  • Keep GET /v1/documents/{doc_id}/facts compatible as the raw facts array
  • Accept richer metadata inputs through API upload and CLI options
  • Update frontend API mirror types and add docsApi.factValidation(docId)
  • Add focused tests for fact enrichment, validation behavior, and the new route

Why

Phase 2 in the roadmap shifts Jetbot’s primary output toward facts plus evidence. This PR completes the next backend and API slice of that transition without breaking the existing FinancialStatement and /facts clients.

Validation

  • python -m pytest tests/test_finance_facts.py tests/test_routes_web.py tests/test_validators_extended.py -q
  • cd web && npm run typecheck
  • python -m ruff check src tests
  • python -m mypy src --ignore-missing-imports

Copilot AI review requested due to automatic review settings May 22, 2026 06:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Phase 2 “canonical facts + validation” slice by enriching canonical facts with issuer/filing metadata, producing a persisted fact-validation artifact, and exposing it through a new API route (with matching frontend API types), while keeping the legacy /facts surface intact.

Changes:

  • Extend DocumentMeta and FinancialFact with issuer/filing metadata (company/ticker/cik/filing_type) plus raw_label.
  • Add canonical fact validation (FactValidationResult) and persist/load it as extracted/fact_validation.json, including a new GET /v1/documents/{doc_id}/fact-validation.
  • Update web API mirror types + client (docsApi.factValidation) and add targeted tests for enrichment, validation behavior, and the new route.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
web/src/api/types.ts Adds issuer/filing fields to FinancialFact/DocumentMeta and introduces FactValidationResult TS mirrors.
web/src/api/docs.ts Adds docsApi.factValidation(docId) client method.
tests/test_validators_extended.py Adds unit tests for validate_facts (missing critical facts, duplicates, balance equation).
tests/test_routes_web.py Adds route tests for /fact-validation success + 404 behavior.
tests/test_finance_facts.py Updates fact generation tests for enriched metadata + raw_label, including report_typefiling_type fallback.
src/schemas/models.py Extends Pydantic models and adds FactValidationIssue/Result.
src/finance/validators.py Implements validate_facts and fact-level validation checks/metrics.
src/finance/facts.py Updates fact generation to accept DocumentMeta, propagate metadata, and populate raw_label.
src/cli.py Adds CLI options for ticker/cik/filing_type and persists them in DocumentMeta.
src/api/routes.py Accepts richer upload form metadata, adds /fact-validation route, and persists partial fact_validation.json.
src/agent/state.py Adds fact_validation_results to agent state.
src/agent/nodes.py Runs validate_facts, persists fact_validation.json, and treats high-severity fact issues as “severe”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/finance/validators.py
Comment on lines +181 to +189
result.issues.append(
FactValidationIssue(
code="duplicate_concepts",
severity="medium",
message=(
f"Duplicate canonical facts detected for {group[0].statement_type}:{_canonical_concept(group[0].concept)}."
),
fact_ids=[fact.fact_id for fact in group],
concepts=[group[0].concept],
@magic-alt magic-alt merged commit 080271e into main May 22, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants