docs: add docs for anonymizer replace evaluation by memadi-nv · Pull Request #181 · NVIDIA-NeMo/Anonymizer

memadi-nv · 2026-06-08T19:06:05Z

Summary

New page: docs/concepts/evaluation.md — dedicated concept page covering LLM-as-judge evaluation for both anonymization modes, rather than appending it to replace.md
Updated docs/concepts/replace.md — added a short "Evaluating replace output" callout at the bottom pointing to the new evaluation page
Updated mkdocs.yml — wired evaluation.md into the Concepts nav section

Why a separate page instead of adding to replace.md?

Evaluation is conceptually independent from any single anonymization strategy. Replace and Rewrite both have evaluation — each with different mechanics (post-hoc vs. built-in), different metrics, and different model roles. Keeping evaluation in its own page:

Gives users a single place to understand evaluation across modes, without hunting across strategy pages
Keeps replace.md focused on pipeline mechanics (detection → replacement)
Makes the page forward-proof: we plan to expose rewrite evaluation as a standalone step as well, at which point this page gets an update rather than requiring restructuring of rewrite.md

What's documented in evaluation.md?

Mode comparison table — shows upfront that replace evaluation is post-hoc (Anonymizer.evaluate()) while rewrite evaluation is built into the pipeline
Rewrite evaluation — brief summary of the evaluate–repair loop and output metrics; defers to rewrite.md for full details
Replace evaluation — usage pattern for Anonymizer.evaluate(), including the save/reload workflow for evaluating across sessions
Detection validity judge — flags false positives, wrong labels, boundary errors, and contextual mismatches; runs per record regardless of replace mode
Three replace judges (Substitute mode only, run in parallel per record):
- Type fidelity — checks entity class and format preservation
- Attribute fidelity — checks gender-of-name and age-bucket preservation
- Relational consistency — checks cross-entity coherence (city ↔ state, DOB ↔ age, etc.)
Reading results — display_record() usage and tabular overview snippet
Model roles — defaults (all gpt-oss-120b), link to evaluate.yaml, and override pattern via Anonymizer(model_configs=...)

This PR is related to the changes made in PR #158

Type of Change

Testing

make test passes locally
make check passes locally (format + lint + typecheck + lock-check)
Added/updated tests for changes

Documentation

If docs changed: make docs-build passes locally

Related Issues

Closes #98

Signed-off-by: memadi <memadi@nvidia.com>

greptile-apps · 2026-06-08T19:08:17Z

Greptile Summary

This PR adds a dedicated docs/concepts/evaluation.md page covering LLM-as-judge evaluation for both replace and rewrite modes, wires it into the mkdocs nav, and adds a short callout at the bottom of replace.md pointing to it.

evaluation.md documents all four judges (detection validity, type fidelity, attribute fidelity, relational consistency), their output columns, the Anonymizer.evaluate() call pattern including the save/reload workflow, model-role defaults, and how to override them — the YAML structure shown matches the actual _merge_selections() logic in model_loader.py.
replace.md gains an "Evaluating replace output" section and its previously missing trailing newline is restored.
mkdocs.yml places Evaluation after "Choosing a Strategy" in the Concepts nav.

Confidence Score: 5/5

Documentation-only change with no executable code paths affected; safe to merge.

All three changed files are Markdown/YAML docs. The model-override YAML snippet in evaluation.md was verified against the actual _merge_selections() implementation and is correct. The four judge role names match evaluate.yaml exactly. Previously flagged issues (trailing newline, empty section header, grammar nit) are resolved in the current file state.

No files require special attention.

Important Files Changed

Filename	Overview
docs/concepts/evaluation.md	New concept page documenting LLM-as-judge evaluation for replace and rewrite modes; structure, content, and model-override YAML all match the actual implementation.
docs/concepts/replace.md	Added a short "Evaluating replace output" callout at the bottom with a link to evaluation.md; also fixes the missing trailing newline from the previous review.
mkdocs.yml	Wired evaluation.md into the Concepts nav section between "Choosing a Strategy" and "Self-hosting GLiNER".

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["anonymizer.run() / preview()"] --> B["AnonymizerResult"]
    B --> C["anonymizer.evaluate(result)"]
    C --> D["Detection Validity Judge\n(all replace modes)"]
    C --> E{"Substitute mode?"}
    E -- Yes --> F["Type Fidelity Judge"]
    E -- Yes --> G["Attribute Fidelity Judge"]
    E -- Yes --> H["Relational Consistency Judge"]
    E -- No --> I["(skip replace judges)"]
    D & F & G & H --> J["EvaluatedResult"]
    J --> K["evaluated.display_record(n)"]
    J --> L["evaluated.dataframe[...]"]
    J --> M["evaluated.trace_dataframe"]

_{Reviews (4): Last reviewed commit: "Update docs/concepts/evaluation.md" | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Signed-off-by: memadi <memadi@nvidia.com>

Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>

memadi-nv added 2 commits June 8, 2026 11:49

add docs for Anonymizer-replace evaluation

d3f4f07

Signed-off-by: memadi <memadi@nvidia.com>

nit

50262c8

Signed-off-by: memadi <memadi@nvidia.com>

memadi-nv requested review from alexahaushalter and lipikaramaswamy June 8, 2026 19:06

memadi-nv requested review from a team as code owners June 8, 2026 19:06

greptile-apps Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread docs/concepts/evaluation.md Outdated

Comment thread docs/concepts/evaluation.md Outdated

Comment thread docs/concepts/replace.md Outdated

Comment thread docs/concepts/evaluation.md Outdated

memadi-nv and others added 2 commits June 8, 2026 12:11

Update docs/concepts/evaluation.md

5a4eca9

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Update docs/concepts/evaluation.md

7ea4c4a

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

alexahaushalter reviewed Jun 8, 2026

View reviewed changes