Skip to content

docs: add docs for anonymizer replace evaluation#181

Merged
memadi-nv merged 6 commits into
mainfrom
memadi/docs/add-evaluation-replace-docs
Jun 10, 2026
Merged

docs: add docs for anonymizer replace evaluation#181
memadi-nv merged 6 commits into
mainfrom
memadi/docs/add-evaluation-replace-docs

Conversation

@memadi-nv

@memadi-nv memadi-nv commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • New page: docs/concepts/evaluation.md — dedicated concept page covering LLM-as-judge evaluation for both anonymization modes, rather than appending it to replace.md
  • Updated docs/concepts/replace.md — added a short "Evaluating replace output" callout at the bottom pointing to the new evaluation page
  • Updated mkdocs.yml — wired evaluation.md into the Concepts nav section

Why a separate page instead of adding to replace.md?

Evaluation is conceptually independent from any single anonymization strategy. Replace and Rewrite both have evaluation — each with different mechanics (post-hoc vs. built-in), different metrics, and different model roles. Keeping evaluation in its own page:

  • Gives users a single place to understand evaluation across modes, without hunting across strategy pages
  • Keeps replace.md focused on pipeline mechanics (detection → replacement)
  • Makes the page forward-proof: we plan to expose rewrite evaluation as a standalone step as well, at which point this page gets an update rather than requiring restructuring of rewrite.md

What's documented in evaluation.md?

  • Mode comparison table — shows upfront that replace evaluation is post-hoc (Anonymizer.evaluate()) while rewrite evaluation is built into the pipeline
  • Rewrite evaluation — brief summary of the evaluate–repair loop and output metrics; defers to rewrite.md for full details
  • Replace evaluation — usage pattern for Anonymizer.evaluate(), including the save/reload workflow for evaluating across sessions
  • Detection validity judge — flags false positives, wrong labels, boundary errors, and contextual mismatches; runs per record regardless of replace mode
  • Three replace judges (Substitute mode only, run in parallel per record):
    • Type fidelity — checks entity class and format preservation
    • Attribute fidelity — checks gender-of-name and age-bucket preservation
    • Relational consistency — checks cross-entity coherence (city ↔ state, DOB ↔ age, etc.)
  • Reading results — display_record() usage and tabular overview snippet
  • Model roles — defaults (all gpt-oss-120b), link to evaluate.yaml, and override pattern via Anonymizer(model_configs=...)

This PR is related to the changes made in PR #158

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Refactoring

Testing

  • make test passes locally
  • make check passes locally (format + lint + typecheck + lock-check)
  • Added/updated tests for changes

Documentation

  • If docs changed: make docs-build passes locally

Related Issues

Closes #98

memadi-nv added 2 commits June 8, 2026 11:49
Signed-off-by: memadi <memadi@nvidia.com>
Signed-off-by: memadi <memadi@nvidia.com>
@memadi-nv memadi-nv requested review from a team as code owners June 8, 2026 19:06
@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a dedicated docs/concepts/evaluation.md page covering LLM-as-judge evaluation for both replace and rewrite modes, wires it into the mkdocs nav, and adds a short callout at the bottom of replace.md pointing to it.

  • evaluation.md documents all four judges (detection validity, type fidelity, attribute fidelity, relational consistency), their output columns, the Anonymizer.evaluate() call pattern including the save/reload workflow, model-role defaults, and how to override them — the YAML structure shown matches the actual _merge_selections() logic in model_loader.py.
  • replace.md gains an "Evaluating replace output" section and its previously missing trailing newline is restored.
  • mkdocs.yml places Evaluation after "Choosing a Strategy" in the Concepts nav.

Confidence Score: 5/5

Documentation-only change with no executable code paths affected; safe to merge.

All three changed files are Markdown/YAML docs. The model-override YAML snippet in evaluation.md was verified against the actual _merge_selections() implementation and is correct. The four judge role names match evaluate.yaml exactly. Previously flagged issues (trailing newline, empty section header, grammar nit) are resolved in the current file state.

No files require special attention.

Important Files Changed

Filename Overview
docs/concepts/evaluation.md New concept page documenting LLM-as-judge evaluation for replace and rewrite modes; structure, content, and model-override YAML all match the actual implementation.
docs/concepts/replace.md Added a short "Evaluating replace output" callout at the bottom with a link to evaluation.md; also fixes the missing trailing newline from the previous review.
mkdocs.yml Wired evaluation.md into the Concepts nav section between "Choosing a Strategy" and "Self-hosting GLiNER".

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["anonymizer.run() / preview()"] --> B["AnonymizerResult"]
    B --> C["anonymizer.evaluate(result)"]
    C --> D["Detection Validity Judge\n(all replace modes)"]
    C --> E{"Substitute mode?"}
    E -- Yes --> F["Type Fidelity Judge"]
    E -- Yes --> G["Attribute Fidelity Judge"]
    E -- Yes --> H["Relational Consistency Judge"]
    E -- No --> I["(skip replace judges)"]
    D & F & G & H --> J["EvaluatedResult"]
    J --> K["evaluated.display_record(n)"]
    J --> L["evaluated.dataframe[...]"]
    J --> M["evaluated.trace_dataframe"]
Loading

Reviews (4): Last reviewed commit: "Update docs/concepts/evaluation.md" | Re-trigger Greptile

Comment thread docs/concepts/evaluation.md Outdated
Comment thread docs/concepts/evaluation.md Outdated
Comment thread docs/concepts/replace.md Outdated
Comment thread docs/concepts/evaluation.md Outdated
memadi-nv and others added 2 commits June 8, 2026 12:11
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Comment thread docs/concepts/evaluation.md Outdated
Comment thread docs/concepts/evaluation.md Outdated
Comment thread docs/concepts/evaluation.md Outdated
Comment thread docs/concepts/evaluation.md Outdated
Comment thread docs/concepts/evaluation.md Outdated
Comment thread docs/concepts/evaluation.md Outdated
Comment thread docs/concepts/evaluation.md Outdated
Signed-off-by: memadi <memadi@nvidia.com>
@memadi-nv memadi-nv requested a review from lipikaramaswamy June 8, 2026 20:14
Comment thread docs/concepts/evaluation.md Outdated
Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>
@memadi-nv memadi-nv merged commit faa5fb1 into main Jun 10, 2026
11 checks passed
@memadi-nv memadi-nv deleted the memadi/docs/add-evaluation-replace-docs branch June 10, 2026 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: LLM-as-a-judge for REPLACE evaluation

3 participants