Skip to content

refactor: updates to quality QA mechanics to improve utility#171

Open
asteier2026 wants to merge 4 commits into
mainfrom
asteier2026/feature/generalization-suggestion
Open

refactor: updates to quality QA mechanics to improve utility#171
asteier2026 wants to merge 4 commits into
mainfrom
asteier2026/feature/generalization-suggestion

Conversation

@asteier2026

Copy link
Copy Markdown
Contributor

Changes include:

  • Add generalization_suggestion for protection method 'generalize' in sensitivity disposition and feed that down into meaning unit prompt.
  • Feed latent evidence down to meaning unit prompt
  • Loosen quality compare to allow for abstraction differences
  • Biography supplement for meaning units slightly updated
  • Enhancements to the re-answer prompt (avoid unknown)
  • Our earlier error on validation failing on combined_risk_level = low but protection_method_suggestion is not 'leave_as_is', still occasionally crops up. The LLM prompt is very clear about not allowing this, so I went with the approach of fixing it after the fact in parse_sensitivity_disposition before validate runs. It will presume combined_risk_level is correct and replace the current protection method with 'leave_as_is'. It will still log a warning if you run in debug.

@asteier2026 asteier2026 requested a review from a team as a code owner May 28, 2026 16:11
@greptile-apps

greptile-apps Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR refactors the QA mechanics in the rewrite pipeline to improve utility by adding generalization_suggestion support throughout the pipeline, feeding latent entity evidence into the meaning unit prompt, loosening the quality comparison rubric for abstraction tolerance, and pre-correcting LLM consistency violations before schema validation.

  • generalization_suggestion field added to EntityDispositionSchema with default="N/A" and min_length=1; it is conditionally forwarded to the meaning unit and rewrite prompts only when the protection method is "generalize", and to the rewrite prompt with graceful fallback language when absent.
  • Latent evidence plumbing: _format_disposition_block now looks up evidence spans from COL_LATENT_ENTITIES and attaches them to suppress_inference entities, giving the meaning unit LLM explicit reconstruction clues to avoid.
  • Pre-validation correction: _correct_disposition_consistency in parsers.py auto-corrects the recurring combined_risk_level='low' + protection_method_suggestion != 'leave_as_is' LLM inconsistency before Pydantic validation, preventing spurious validation failures on otherwise valid responses.

Confidence Score: 5/5

Safe to merge — changes are well-scoped, all existing validation contracts are preserved, and new behavior is gated correctly behind protection method checks.

All three modified pipeline stages handle the new generalization_suggestion field consistently: the schema provides a safe default, pre-validation correction prevents spurious validation failures, and the field is conditionally forwarded only when semantically relevant. Test coverage is comprehensive.

No files require special attention.

Important Files Changed

Filename Overview
src/anonymizer/engine/rewrite/parsers.py Adds _correct_disposition_consistency pre-validator and integrates it before model_validate; logic is correct and well-tested.
src/anonymizer/engine/rewrite/qa_generation.py Adds COL_LATENT_ENTITIES as a required column and conditionally attaches generalization_suggestion and evidence fields per entity; evidence lookup by (entity_label, entity_value) correctly matches the latent entities structure.
src/anonymizer/engine/schemas/rewrite.py Adds generalization_suggestion with default N/A and min_length=1 to EntityDispositionSchema; consistent with adjacent fields.
src/anonymizer/engine/rewrite/evaluate.py Prompt expansions for re-answer and compare phases are coherent; abstraction tolerance rules are well-structured.
src/anonymizer/engine/rewrite/rewrite_generation.py Conditionally includes generalization_suggestion only for generalize entities; rewrite prompt updated with fallback language.
src/anonymizer/engine/rewrite/sensitivity_disposition.py Prompt updated to instruct the LLM to populate generalization_suggestion for generalize entities and N/A otherwise.
src/anonymizer/engine/rewrite/domain_classification.py Updated quality_supplement for BIOGRAPHY_PROFILE domain to de-emphasize distinctiveness; text-only change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LLM: Sensitivity Disposition] --> B[parse_sensitivity_disposition]
    B --> C{_correct_disposition_consistency}
    C -->|low + non-leave_as_is| D[Force leave_as_is]
    C -->|Valid| E[model_validate]
    D --> E
    E --> F[_format_disposition_block]
    G[COL_LATENT_ENTITIES] --> F
    F -->|generalize| H[Add generalization_suggestion]
    F -->|suppress_inference| I[Add evidence]
    F -->|other| J[Base entry only]
    H --> K[Meaning Unit LLM]
    I --> K
    J --> K
    E --> L[_format_rewrite_disposition_block]
    L -->|generalize| M[Add generalization_suggestion]
    L -->|other protected| N[Base fields only]
    M --> O[Rewrite LLM]
    N --> O
    O --> P[Rewritten Text]
    P --> Q[Quality Re-answer with abstraction tolerance]
    Q --> R[Quality Score]
Loading

Reviews (4): Last reviewed commit: "fix: misc greptile suggestions" | Re-trigger Greptile

Comment thread src/anonymizer/engine/rewrite/parsers.py
Comment thread src/anonymizer/engine/schemas/rewrite.py Outdated
Comment thread src/anonymizer/engine/rewrite/qa_generation.py
Comment thread src/anonymizer/engine/schemas/rewrite.py Outdated
asteier2026 and others added 2 commits May 28, 2026 09:36
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant