[FEAT] : Improves extraction post-processing quality by introducing a dedicated normalization pipeline for noisy conversational outputs#288
Open
Aryama-srivastav wants to merge 3 commits intofireform-core:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a dedicated extraction normalization pipeline to make LLM extraction outputs deterministic and consistent before downstream form-filling, and integrates that pipeline into the LLM JSON accumulation flow.
Changes:
- Added
ExtractionQualityProcessorto normalize missing values, plural outputs, duplicates, and ambiguity signals, plus a per-run quality report. - Integrated the quality processor into
LLM.add_response_to_json()and addedLLM.get_quality_report(), with report logging inmain_loop(). - Added focused pytest coverage for sentinel handling, plural normalization/dedup, duplicate merging, ambiguity flagging, and the LLM integration path.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/extraction_quality.py |
Implements normalization/merge logic and generates a structured quality report. |
src/llm.py |
Routes extracted values through the quality processor and exposes/logs the quality report. |
src/test/test_extraction_quality_controls.py |
Adds targeted tests validating normalization behavior and LLM integration. |
src/inputs/input.txt |
Updates the sample input text used for local runs/examples. |
Comments suppressed due to low confidence (1)
src/llm.py:113
handle_plural_values()appears to be unused now thatadd_response_to_json()routes normalization throughExtractionQualityProcessor(no other references found in the repo). Consider removing this method (and its logging) to reduce dead code, or keeping it but delegating to the quality processor to avoid divergence.
def handle_plural_values(self, plural_value):
"""
This method handles plural values.
Takes in strings of the form 'value1; value2; value3; ...; valueN'
returns a list with the respective values -> [value1, value2, value3, ..., valueN]
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Comment on lines
+87
to
+90
| if len(merged) == 1: | ||
| return merged[0], had_duplicate | ||
|
|
||
| return merged, had_duplicate |
Comment on lines
+3
to
+8
| Name/SID: Sarah Johnson, SID 4527891 | ||
| Job Title: Research Scientist | ||
| Department: Microbiology | ||
| Phone Number: 831-555-0142 | ||
| Email: sjohnson@ucsc.edu | ||
| Date: 03/15/2026 |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Raw model responses can contain repeated values, ambiguous phrases, inconsistent plural formatting, and missing placeholders. This PR makes extraction handling deterministic, reviewable, and consistent before downstream form filling.
What changed:
Added quality processor in extraction_quality.py
Integrated post-processing into llm.py
Added focused tests in test_extraction_quality_controls.py
Type of change
Feature (non-breaking enhancement)
Bug fix (non-breaking reliability improvement)
How Has This Been Tested?
Test A (focused feature suite):
Ran:
python -m pytest test_extraction_quality_controls.py -q
Verified output:
5 passed
Acceptance criteria verification:
Duplicate keys/values handled deterministically: ✅
Plural outputs normalized to documented format: ✅
Ambiguous fields marked for review: ✅
Missing values use one consistent sentinel across modules: ✅
Test Configuration:
Firmware version: N/A
Hardware: Local development machine (Windows)
SDK: N/A
Python: 3.13
Shell: PowerShell
Checklist: