[FEAT]: Validation gates input extraction json pdf by Aryama-srivastav · Pull Request #279 · fireform-core/FireForm

Aryama-srivastav · 2026-03-17T16:20:16Z

Description

Implemented three-stage validation checkpoints in the form-filling pipeline to improve measurable robustness before production.

What changed

Added validation_gates.py with:

GateResult (pass/fail + reason codes + details)
ValidationReport (run summary with overall pass/fail and JSON export)
ValidationGates: input_to_extraction(...), extraction_to_json(...), json_to_pdf(...)

Integrated gates into filler.py :

Runs all 3 gates per fill run
Generates per-run validation report JSON
Blocks final PDF output when strict_validation=True and any gate fails

Added tests: tests/test_validation_gates.py

Acceptance criteria mapping

✅ Each gate returns pass/fail + reason codes.
✅ Completeness and mandatory field checks enforced.
✅ Mismatch/misplacement surfaced before final output (PDF_FIELDS_UNMATCHED, POSITIONAL_FALLBACK_USED, etc.).
✅ Summary validation report generated per run.

How to test

Run:

python -m pytest -q tests/test_validation_gates.py

Execute normal fill flow and verify:

validation report JSON is written per run
strict mode blocks PDF write on failed gate(s)

…idgets

Copilot

Pull request overview

Adds a validation “gating” layer to the PDF form-filling pipeline to detect common failure modes (bad inputs, incomplete extraction, mapping mismatches) and optionally block PDF generation in strict mode.

Changes:

Introduces validation_gates.py (gate results + run report + 3 validation stages).
Adds a new semantic mapping layer (semantic_mapper.py) and integrates gates + report output into filler.py.
Adds unit tests for validation gates and semantic mapper behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`src/validation_gates.py`	Implements gate results/reporting and the 3-stage validation checks.
`src/filler.py`	Runs the gates during fill, writes a per-run validation report, and blocks output in strict mode.
`src/semantic_mapper.py`	Adds semantic matching (explicit/case-insensitive/alias/fuzzy) with positional fallback and reporting.
`tests/test_validation_gates.py`	Tests gate pass/fail and reason codes for each stage.
`src/test/test_semantic_mapper.py`	Tests semantic mapper matching modes, fallback behavior, and report formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/filler.py

src/validation_gates.py

+        unmatched_pdf_fields = [f for f in pdf_field_names if f not in matched]
+        required_pdf_fields: list[str] = template_config.get("required_pdf_fields", []) if template_config else []
+        missing_required_pdf = [f for f in required_pdf_fields if f not in matched]
+


tests/test_validation_gates.py

+from types import SimpleNamespace
+
+from src.validation_gates import ValidationGates
+


tests/test_validation_gates.py

+def test_input_to_extraction_fail_missing_pdf_and_bad_extraction():
+    gate = ValidationGates.input_to_extraction(
+        "C:/does-not-exist/form.pdf",


src/test/test_semantic_mapper.py

@@ -0,0 +1,113 @@
+from src.semantic_mapper import SemanticMapper, MappingResult


src/filler.py

+        t2j = llm.main_loop()
+        textbox_answers = t2j.get_data()  # {json_key: value}
+


src/filler.py

+        # ── 3. Semantic mapping ───────────────────────────────────────────────
+        mapper = SemanticMapper(cfg)
+        result = mapper.map(textbox_answers, pdf_field_names)
+        print(result.report())


src/filler.py

+        ts = datetime.now().strftime("%Y%m%d_%H%M%S")
+
+        output_pdf = pdf_form[:-4] + "_" + ts + "_filled.pdf"
+        validation_report_path = pdf_form[:-4] + "_" + ts + "_validation_report.json"


src/validation_gates.py

+        details: dict[str, Any] = {}
+        required_fields: list[str] = template_config.get("required_fields", []) if template_config else []
+
+        missing_required = [k for k in required_fields if not extracted.get(k)]


src/validation_gates.py

+        if positional_values:
+            reasons.append("POSITIONAL_FALLBACK_USED")
+
+        if unmatched_pdf_fields:
+            reasons.append("PDF_FIELDS_UNMATCHED")


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Aryama-srivastav added 3 commits March 14, 2026 23:01

feat: add semantic key-based mapping between extracted JSON and PDF w…

5bbbcf3

…idgets

fix: resolve semantic mapper and filler typing diagnostics

623d12f

feat(validation): add InputExtractionJSONPDF gates with per-run report

b595b33

Copilot AI review requested due to automatic review settings March 17, 2026 16:20

Copilot started reviewing on behalf of Aryama-srivastav March 17, 2026 16:21 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Potential fix for pull request finding

f4cf554

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Validation gates input extraction json pdf#279

[FEAT]: Validation gates input extraction json pdf#279
Aryama-srivastav wants to merge 4 commits intofireform-core:mainfrom
Aryama-srivastav:feature/validation-gates-input-extraction-json-pdf

Aryama-srivastav commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from types import SimpleNamespace

		from src.validation_gates import ValidationGates

		@@ -0,0 +1,113 @@
		from src.semantic_mapper import SemanticMapper, MappingResult

		t2j = llm.main_loop()
		textbox_answers = t2j.get_data() # {json_key: value}

Conversation

Aryama-srivastav commented Mar 17, 2026

Description

What changed

Acceptance criteria mapping

How to test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants