Skip to content

chore: add derivative measurement analysis tools#187

Draft
binaryaaron wants to merge 21 commits into
mainfrom
binaryaaron/perf-derived-tools
Draft

chore: add derivative measurement analysis tools#187
binaryaaron wants to merge 21 commits into
mainfrom
binaryaaron/perf-derived-tools

Conversation

@binaryaaron

Copy link
Copy Markdown
Collaborator

Summary

  • Restores the derivative measurement tools on top of feat: anonymizer measurement instrumentation and benchmark tooling #177 so the base PR stays focused on capture/export/basic analysis.
  • Adds DataDesigner trace analysis (analyze_dd_traces.py) and staged detection output analysis (analyze_staged_detection_output.py).
  • Adds benchmark strategy comparison and screening tools (compare_strategy_pairs.py, screen_strategy_comparisons.py) plus signature-delta extraction.
  • Adds benchmark-only direct and staged detection probes for local performance experiments.
  • Adds benchmark-only detection strategy plumbing in run_benchmarks.py, including parser compatibility and native runtime configuration for direct OpenAI-compatible endpoint experiments.
  • Expands the measurement tool README with usage patterns for the derivative tools.

These tools are experiment infrastructure. They are not public Anonymizer defaults, and candidate strategies still need workload-specific safety, leakage, provenance, parser, and reliability checks before promotion.

Stack

Validation

  • uv run --frozen pytest tests/test_measurement.py tests/engine/test_ndd_adapter.py tests/tools/test_measurement_tools.py tests/tools/test_benchmark_output_analysis.py tests/tools/test_detection_artifact_analysis.py tests/tools/test_compare_strategy_pairs.py tests/tools/test_dd_parser_compat.py tests/tools/test_dd_trace_analysis.py tests/tools/test_detection_strategies.py tests/tools/test_direct_detection_probe.py tests/tools/test_extract_signature_deltas.py tests/tools/test_screen_strategy_comparisons.py tests/tools/test_staged_detection_output_analysis.py tests/tools/test_staged_detection_probe.py -q
    • Result: 188 passed, 9 existing DataDesigner model-config deprecation warnings
  • uv run --frozen ruff check tools/measurement/run_benchmarks.py tools/measurement/analyze_dd_traces.py tools/measurement/analyze_staged_detection_output.py tools/measurement/compare_strategy_pairs.py tools/measurement/dd_parser_compat.py tools/measurement/detection_strategies.py tools/measurement/direct_detection_probe.py tools/measurement/extract_signature_deltas.py tools/measurement/screen_strategy_comparisons.py tools/measurement/staged_detection_probe.py tests/tools/test_compare_strategy_pairs.py tests/tools/test_dd_parser_compat.py tests/tools/test_dd_trace_analysis.py tests/tools/test_detection_strategies.py tests/tools/test_direct_detection_probe.py tests/tools/test_extract_signature_deltas.py tests/tools/test_measurement_tools.py tests/tools/test_screen_strategy_comparisons.py tests/tools/test_staged_detection_output_analysis.py tests/tools/test_staged_detection_probe.py
  • uv run tools/codestyle/format.sh --check
  • git diff --cached --check
  • CLI smoke:
    • uv run python tools/measurement/analyze_dd_traces.py --help
    • uv run python tools/measurement/analyze_staged_detection_output.py --help
    • uv run python tools/measurement/compare_strategy_pairs.py --help
    • uv run python tools/measurement/screen_strategy_comparisons.py --help
    • uv run python tools/measurement/direct_detection_probe.py --help
    • uv run python tools/measurement/staged_detection_probe.py --help
    • uv run python tools/measurement/extract_signature_deltas.py --help
    • uv run python tools/measurement/run_benchmarks.py --help

Notes

  • Probe outputs and raw trace sidecars can contain prompts, model outputs, secrets, or PII. Treat them as sensitive local debugging artifacts.
  • Runtime endpoint/model names are intentionally captured through config/env metadata for experiment reproducibility, but raw endpoint URLs should not become portable benchmark fixtures.

Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Base automatically changed from binaryaaron/perf-epic to main June 12, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant