refactor: updates to quality QA mechanics to improve utility by asteier2026 · Pull Request #171 · NVIDIA-NeMo/Anonymizer

asteier2026 · 2026-05-28T16:11:14Z

Changes include:

Add generalization_suggestion for protection method 'generalize' in sensitivity disposition and feed that down into meaning unit prompt.
Feed latent evidence down to meaning unit prompt
Loosen quality compare to allow for abstraction differences
Biography supplement for meaning units slightly updated
Enhancements to the re-answer prompt (avoid unknown)
Our earlier error on validation failing on combined_risk_level = low but protection_method_suggestion is not 'leave_as_is', still occasionally crops up. The LLM prompt is very clear about not allowing this, so I went with the approach of fixing it after the fact in parse_sensitivity_disposition before validate runs. It will presume combined_risk_level is correct and replace the current protection method with 'leave_as_is'. It will still log a warning if you run in debug.

greptile-apps · 2026-05-28T16:16:02Z

Greptile Summary

This PR refactors the QA mechanics in the rewrite pipeline to improve utility by adding generalization_suggestion support throughout the pipeline, feeding latent entity evidence into the meaning unit prompt, loosening the quality comparison rubric for abstraction tolerance, and pre-correcting LLM consistency violations before schema validation.

generalization_suggestion field added to EntityDispositionSchema with default="N/A" and min_length=1; it is conditionally forwarded to the meaning unit and rewrite prompts only when the protection method is "generalize", and to the rewrite prompt with graceful fallback language when absent.
Latent evidence plumbing: _format_disposition_block now looks up evidence spans from COL_LATENT_ENTITIES and attaches them to suppress_inference entities, giving the meaning unit LLM explicit reconstruction clues to avoid.
Pre-validation correction: _correct_disposition_consistency in parsers.py auto-corrects the recurring combined_risk_level='low' + protection_method_suggestion != 'leave_as_is' LLM inconsistency before Pydantic validation, preventing spurious validation failures on otherwise valid responses.

Confidence Score: 5/5

Safe to merge — changes are well-scoped, all existing validation contracts are preserved, and new behavior is gated correctly behind protection method checks.

All three modified pipeline stages handle the new generalization_suggestion field consistently: the schema provides a safe default, pre-validation correction prevents spurious validation failures, and the field is conditionally forwarded only when semantically relevant. Test coverage is comprehensive.

No files require special attention.

Important Files Changed

Filename	Overview
src/anonymizer/engine/rewrite/parsers.py	Adds _correct_disposition_consistency pre-validator and integrates it before model_validate; logic is correct and well-tested.
src/anonymizer/engine/rewrite/qa_generation.py	Adds COL_LATENT_ENTITIES as a required column and conditionally attaches generalization_suggestion and evidence fields per entity; evidence lookup by (entity_label, entity_value) correctly matches the latent entities structure.
src/anonymizer/engine/schemas/rewrite.py	Adds generalization_suggestion with default N/A and min_length=1 to EntityDispositionSchema; consistent with adjacent fields.
src/anonymizer/engine/rewrite/evaluate.py	Prompt expansions for re-answer and compare phases are coherent; abstraction tolerance rules are well-structured.
src/anonymizer/engine/rewrite/rewrite_generation.py	Conditionally includes generalization_suggestion only for generalize entities; rewrite prompt updated with fallback language.
src/anonymizer/engine/rewrite/sensitivity_disposition.py	Prompt updated to instruct the LLM to populate generalization_suggestion for generalize entities and N/A otherwise.
src/anonymizer/engine/rewrite/domain_classification.py	Updated quality_supplement for BIOGRAPHY_PROFILE domain to de-emphasize distinctiveness; text-only change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LLM: Sensitivity Disposition] --> B[parse_sensitivity_disposition]
    B --> C{_correct_disposition_consistency}
    C -->|low + non-leave_as_is| D[Force leave_as_is]
    C -->|Valid| E[model_validate]
    D --> E
    E --> F[_format_disposition_block]
    G[COL_LATENT_ENTITIES] --> F
    F -->|generalize| H[Add generalization_suggestion]
    F -->|suppress_inference| I[Add evidence]
    F -->|other| J[Base entry only]
    H --> K[Meaning Unit LLM]
    I --> K
    J --> K
    E --> L[_format_rewrite_disposition_block]
    L -->|generalize| M[Add generalization_suggestion]
    L -->|other protected| N[Base fields only]
    M --> O[Rewrite LLM]
    N --> O
    O --> P[Rewritten Text]
    P --> Q[Quality Re-answer with abstraction tolerance]
    Q --> R[Quality Score]

_{Reviews (4): Last reviewed commit: "fix: misc greptile suggestions" | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

feature: utility enhancements

f4da6cf

asteier2026 requested a review from a team as a code owner May 28, 2026 16:11

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

Comment thread src/anonymizer/engine/rewrite/parsers.py

Comment thread src/anonymizer/engine/schemas/rewrite.py Outdated

Comment thread src/anonymizer/engine/rewrite/qa_generation.py

fix: test scripts

4356165

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

Comment thread src/anonymizer/engine/schemas/rewrite.py Outdated

asteier2026 and others added 2 commits May 28, 2026 09:36

Update src/anonymizer/engine/schemas/rewrite.py

837cc96

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

fix: misc greptile suggestions

1a33023

asteier2026 requested a review from lipikaramaswamy May 28, 2026 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: updates to quality QA mechanics to improve utility#171

refactor: updates to quality QA mechanics to improve utility#171
asteier2026 wants to merge 4 commits into
mainfrom
asteier2026/feature/generalization-suggestion

asteier2026 commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

asteier2026 commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 28, 2026 •

edited

Loading