Add financial fact foundation and roadmap by magic-alt · Pull Request #12 · magic-alt/jetbot

magic-alt · 2026-05-22T03:20:22Z

Summary

This PR starts the Filing-to-Model roadmap by adding the first financial fact foundation slice.

It introduces a canonical fact layer, richer evidence metadata, a report-producing evaluation runner, a facts API endpoint, and a detailed local roadmap document for the next implementation phases.

What Changed

Added FinancialFact, ExtractionTrace, and Correction schemas.
Extended SourceRef and TableCell with richer evidence fields such as row, col, bbox, engine, and artifact_path.
Added src/finance/facts.py to convert current FinancialStatement outputs into canonical facts and apply corrections.
Wired facts into AgentState, pipeline finalization, and partial-failure persistence.
Added GET /v1/documents/{doc_id}/facts.
Extended frontend API types and client wrappers for facts and richer evidence metadata.
Upgraded scripts/eval.py into a golden evaluation runner that produces JSON and Markdown reports.
Added tests for fact conversion, metrics, eval reporting, and facts API behavior.
Added a detailed roadmap document at docs/financial_fact_platform_roadmap.md.

Why

Jetbot already has a functional PDF analysis pipeline, but the next bottleneck is trust and workflow fit: exact facts, evidence traceability, reviewability, and exportability.

This PR establishes the data model and evaluation base needed for the next phases:

evidence review UI
corrections and audit log
facts export
table extraction router
SEC/XBRL/HTML ingestion

Validation

python -m ruff check src tests scripts
python -m mypy src --ignore-missing-imports
python -m pytest -q --timeout=60
python scripts/eval.py --output-dir data/eval-dev
cd web && npm run lint
cd web && npm run typecheck
cd web && npm run build

Follow-up

Follow-up PRs should focus on:

correction APIs and effective facts
evidence review UI and PDF bbox highlighting
facts export endpoints
benchmark manifest and eval thresholds
table extraction router

Copilot

Pull request overview

This PR lays the groundwork for a canonical “financial facts” layer in Jetbot, expanding evidence metadata, adding fact conversion/corrections utilities, persisting facts through the pipeline, exposing a new facts API endpoint, and upgrading evaluation tooling to produce machine-readable reports.

Changes:

Introduces new canonical schemas (FinancialFact, ExtractionTrace, Correction) and enriches evidence fields on SourceRef and TableCell.
Adds a fact conversion layer (facts_from_statements) + correction application, wires facts into agent state/finalization, and exposes GET /v1/documents/{doc_id}/facts.
Upgrades scripts/eval.py into a golden-case runner that outputs JSON + Markdown reports, plus adds/extends tests for facts, metrics, eval reporting, and the new endpoint.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
web/src/api/types.ts	Adds `FinancialFact` type and extends evidence-related types for richer provenance metadata.
web/src/api/docs.ts	Adds `docsApi.facts()` and normalizes richer `SourceRef` fields in existing normalizers.
src/schemas/models.py	Adds canonical fact/correction/trace schemas; extends `SourceRef` and `TableCell` evidence fields.
src/finance/facts.py	Implements statement→facts conversion, correction application, fact ID generation, and scale inference.
src/agent/state.py	Extends `AgentState` to carry facts, corrections, and extraction traces.
src/agent/nodes.py	Persists facts/corrections/traces on finalize; auto-generates facts from statements if missing.
src/api/routes.py	Adds `GET /documents/{doc_id}/facts` and persists partial facts on failure.
src/utils/metrics.py	Adds fact-level metrics and aggregates them into golden metrics.
scripts/eval.py	Reworks eval script into a runner that executes golden cases and writes JSON/MD reports.
tests/test_finance_facts.py	Adds unit tests for fact conversion and correction application.
tests/test_metrics.py	Adds unit tests for fact-level metrics.
tests/test_eval_script.py	Adds tests ensuring eval report writer produces JSON/MD and marks failures correctly.
tests/test_routes_web.py	Adds API tests for facts endpoint success and missing-file 404.
docs/financial_fact_platform_roadmap.md	Adds detailed roadmap document for the fact platform implementation phases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def apply_corrections(facts: Iterable[FinancialFact], corrections: Iterable[Correction]) -> list[FinancialFact]:
+    by_id = {fact.fact_id: fact for fact in facts}
+    valid_fields = set(FinancialFact.model_fields)
+
+    for correction in corrections:
+        fact = by_id.get(correction.fact_id)
+        if fact is None or correction.field_name not in valid_fields:
+            continue
+        by_id[correction.fact_id] = fact.model_copy(update={correction.field_name: correction.new_value})
+    return list(by_id.values())


+def _fact_id(doc_id: str, statement_type: str, concept: str, period_end: object, label: str) -> str:
+    raw = "|".join([doc_id, statement_type, concept, str(period_end or ""), label])
+    return "fact_" + hashlib.sha1(raw.encode("utf-8")).hexdigest()[:16]


+    appear in different statements.
+    """
+    indexed: dict[str, FinancialFact] = {}
+    for fact in actual_facts:
+        indexed.setdefault(fact.concept, fact)
+        indexed[f"{fact.statement_type}:{fact.concept}"] = fact
+
+    matched: list[str] = []
+    mismatched: list[dict[str, Any]] = []
+    missing: list[str] = []
+
+    for key, expected_val in expected_values.items():


@@ -557,6 +558,21 @@ def finalize(state: AgentState) -> AgentState:
    store.save_json(state.doc_meta.doc_id, "extracted/pages.json", [p.model_dump() for p in state.pages])
    store.save_json(state.doc_meta.doc_id, "extracted/tables.json", [t.model_dump() for t in state.tables])
    store.save_json(state.doc_meta.doc_id, "extracted/statements.json", {k: v.model_dump() for k, v in state.statements.items()})


@@ -522,6 +530,8 @@ def _save_partial_results(doc_id: str) -> None:
            s.save_json(doc_id, "extracted/tables.json", [t.model_dump() for t in partial.tables])
        if partial.statements:
            s.save_json(doc_id, "extracted/statements.json", {k: v.model_dump() for k, v in partial.statements.items()})


add financial fact foundation and roadmap

1826c77

Copilot AI review requested due to automatic review settings May 22, 2026 03:20

Copilot started reviewing on behalf of magic-alt May 22, 2026 03:20 View session

magic-alt merged commit b4f764c into main May 22, 2026
7 checks passed

Copilot AI reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add financial fact foundation and roadmap#12

Add financial fact foundation and roadmap#12
magic-alt merged 1 commit into
mainfrom
feat/financial-fact-foundation

magic-alt commented May 22, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

magic-alt commented May 22, 2026

Summary

What Changed

Why

Validation

Follow-up

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants