Implement Phase 2 canonical fact validation surfaces#16
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Implements the Phase 2 “canonical facts + validation” slice by enriching canonical facts with issuer/filing metadata, producing a persisted fact-validation artifact, and exposing it through a new API route (with matching frontend API types), while keeping the legacy /facts surface intact.
Changes:
- Extend
DocumentMetaandFinancialFactwith issuer/filing metadata (company/ticker/cik/filing_type) plusraw_label. - Add canonical fact validation (
FactValidationResult) and persist/load it asextracted/fact_validation.json, including a newGET /v1/documents/{doc_id}/fact-validation. - Update web API mirror types + client (
docsApi.factValidation) and add targeted tests for enrichment, validation behavior, and the new route.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| web/src/api/types.ts | Adds issuer/filing fields to FinancialFact/DocumentMeta and introduces FactValidationResult TS mirrors. |
| web/src/api/docs.ts | Adds docsApi.factValidation(docId) client method. |
| tests/test_validators_extended.py | Adds unit tests for validate_facts (missing critical facts, duplicates, balance equation). |
| tests/test_routes_web.py | Adds route tests for /fact-validation success + 404 behavior. |
| tests/test_finance_facts.py | Updates fact generation tests for enriched metadata + raw_label, including report_type→filing_type fallback. |
| src/schemas/models.py | Extends Pydantic models and adds FactValidationIssue/Result. |
| src/finance/validators.py | Implements validate_facts and fact-level validation checks/metrics. |
| src/finance/facts.py | Updates fact generation to accept DocumentMeta, propagate metadata, and populate raw_label. |
| src/cli.py | Adds CLI options for ticker/cik/filing_type and persists them in DocumentMeta. |
| src/api/routes.py | Accepts richer upload form metadata, adds /fact-validation route, and persists partial fact_validation.json. |
| src/agent/state.py | Adds fact_validation_results to agent state. |
| src/agent/nodes.py | Runs validate_facts, persists fact_validation.json, and treats high-severity fact issues as “severe”. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+181
to
+189
| result.issues.append( | ||
| FactValidationIssue( | ||
| code="duplicate_concepts", | ||
| severity="medium", | ||
| message=( | ||
| f"Duplicate canonical facts detected for {group[0].statement_type}:{_canonical_concept(group[0].concept)}." | ||
| ), | ||
| fact_ids=[fact.fact_id for fact in group], | ||
| concepts=[group[0].concept], |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements the Phase 2 canonical fact schema and fact-validation slice on top of the now-merged Phase 1 benchmark work.
It enriches canonical facts with issuer metadata, adds a persisted fact-validation artifact and API route, keeps the legacy statements and facts surfaces compatible, and updates the frontend API types to match the new evidence model.
What Changed
DocumentMetaandFinancialFactwith issuer and filing metadata:companytickercikfiling_typeraw_labelfact_validation.jsonin the pipeline, including partial-failure persistenceGET /v1/documents/{doc_id}/fact-validationGET /v1/documents/{doc_id}/factscompatible as the raw facts arraydocsApi.factValidation(docId)Why
Phase 2 in the roadmap shifts Jetbot’s primary output toward facts plus evidence. This PR completes the next backend and API slice of that transition without breaking the existing
FinancialStatementand/factsclients.Validation
python -m pytest tests/test_finance_facts.py tests/test_routes_web.py tests/test_validators_extended.py -qcd web && npm run typecheckpython -m ruff check src testspython -m mypy src --ignore-missing-imports