Skip to content

feat: add regex-backed detection benchmark strategies#183

Draft
binaryaaron wants to merge 1 commit into
binaryaaron/perf-structured-substitutefrom
binaryaaron/perf-regex-detection
Draft

feat: add regex-backed detection benchmark strategies#183
binaryaaron wants to merge 1 commit into
binaryaaron/perf-structured-substitutefrom
binaryaaron/perf-regex-detection

Conversation

@binaryaaron

@binaryaaron binaryaaron commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds benchmark-only regex/rule detection strategies for structured secrets and identifiers. The goal is to measure when deterministic detection can reduce LLM work while preserving anonymization safety.

Stack

What changed

  • Add high-confidence rule detection for structured tokens such as API keys, credentials, cookies, URLs, identifiers, and shell-history secrets.
  • Add benchmark strategy variants for rules-only, guardrail, filter-guardrail, covered-label routing, and native router probes.
  • Add route/source-count analysis fields, staged probe summaries, signature-delta tooling, and regression coverage for regex-backed paths.
  • Document the strategy set as benchmark probes, not public Anonymizer defaults.

Validation

  • uv run --frozen ruff format ...
  • uv run --frozen ruff check ...
  • uv run pytest tests/engine/test_detection_rules.py tests/engine/test_structured_substitute.py tests/test_measurement.py tests/tools/test_benchmark_output_analysis.py tests/tools/test_compare_strategy_pairs.py tests/tools/test_detection_strategies.py tests/tools/test_extract_signature_deltas.py tests/tools/test_measurement_tools.py tests/tools/test_replacement_strategies.py tests/tools/test_replay_replacement_strategies.py tests/tools/test_screen_strategy_comparisons.py tests/tools/test_staged_detection_output_analysis.py tests/tools/test_staged_detection_probe.py -q
  • git diff --check

Focused suite result: 256 passed, with existing DataDesigner provider deprecation warnings.

Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
@binaryaaron binaryaaron force-pushed the binaryaaron/perf-regex-detection branch from 2f41830 to 915dd5b Compare June 8, 2026 23:53
@binaryaaron binaryaaron changed the base branch from binaryaaron/perf-epic to binaryaaron/perf-structured-substitute June 8, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant