Skip to content

fix(detection): pass single chunk validation flag to exports#190

Open
lipikaramaswamy wants to merge 2 commits into
mainfrom
lipikaramaswamy/bugfix/export-detection-single-chunk-flag
Open

fix(detection): pass single chunk validation flag to exports#190
lipikaramaswamy wants to merge 2 commits into
mainfrom
lipikaramaswamy/bugfix/export-detection-single-chunk-flag

Conversation

@lipikaramaswamy

@lipikaramaswamy lipikaramaswamy commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes the integrated main regression where detection workflow construction referenced validation_single_chunk_full_text inside _build_detection_spec(...) without defining or passing that parameter. This broke EntityDetectionWorkflow.run(...) and Anonymizer.export_detection_config(...) after PR #177 merged on top of PR #182.

This PR threads the existing validation_single_chunk_full_text argument into _build_detection_spec(...) and adds regression coverage for both the direct detection path and the external detection config path.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Refactoring

Testing

  • make test passes locally
  • make check passes locally (format + lint + typecheck + lock-check)
  • Added/updated tests for changes

Ran:

uv run pytest tests/engine/test_detection_workflow.py
make format-check
make test
make check

Notes:

  • make test: 914 passed, 94 warnings
  • make check: exited successfully. The typecheck substep is advisory and printed existing ty diagnostics, then make continued through uv lock --check and copyright checks.

Also reproduced the failure on merged main commit f6dd05d in a temporary worktree: tests/engine/test_detection_workflow.py failed with NameError: name 'validation_single_chunk_full_text' is not defined.

Documentation

  • If docs changed: make docs-build passes locally

No docs changes.

Related Issues

Follow-up fix for the stacked PR integration of #177 and #182.

Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
@lipikaramaswamy lipikaramaswamy marked this pull request as ready for review June 12, 2026 22:53
@lipikaramaswamy lipikaramaswamy requested a review from a team as a code owner June 12, 2026 22:53
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a NameError crash introduced when PR #177 merged on top of PR #182: _build_detection_spec referenced validation_single_chunk_full_text without having it as a parameter. The fix adds the parameter to _build_detection_spec and threads it through detect_and_validate_entities, build_detection_config, build_detection_builder_for_seed, and run, with four new regression tests.

  • _build_detection_spec, build_detection_config, build_detection_builder_for_seed, and run all gain a validation_single_chunk_full_text: bool = True parameter that is forwarded into ChunkedValidationParams, matching the existing pattern for the other two validation knobs.
  • Four new tests cover the False path for run(), build_detection_config, and build_detection_builder_for_seed, plus a default-value assertion for build_detection_config.
  • AnonymizerDetectConfig does not include validation_single_chunk_full_text, so the three call sites in anonymizer.py (export_detection_config, export_detection_builder_for_seed, and _run_detection) still silently default the flag to True; users of the top-level Anonymizer API cannot opt into False without bypassing Anonymizer entirely.

Confidence Score: 4/5

The crash fix itself is correct, but the Anonymizer high-level API cannot forward the new flag because AnonymizerDetectConfig has no validation_single_chunk_full_text field.

The _build_detection_spec fix and the test coverage are solid. However, all three Anonymizer-level call sites in anonymizer.py omit the new parameter, permanently locking Anonymizer.run(), export_detection_config(), and export_detection_builder_for_seed() to True. Any user who needs False through the standard Anonymizer API is silently stuck.

src/anonymizer/interface/anonymizer.py — three forwarding sites need validation_single_chunk_full_text once AnonymizerDetectConfig gains the field.

Important Files Changed

Filename Overview
src/anonymizer/engine/detection/detection_workflow.py Threads validation_single_chunk_full_text through _build_detection_spec, build_detection_config, build_detection_builder_for_seed, and run — fixing the NameError crash; all five signatures default to hardcoded True instead of a module-level constant.
tests/engine/test_detection_workflow.py Adds four regression tests covering run(), build_detection_config, and build_detection_builder_for_seed for both the False value and the default True; good coverage of the changed paths.

Sequence Diagram

sequenceDiagram
    participant A as Anonymizer
    participant EDW as EntityDetectionWorkflow
    participant BDS as _build_detection_spec
    participant CVP as ChunkedValidationParams

    Note over A: export_detection_config / run (validation_single_chunk_full_text NOT forwarded — locked to True)
    A->>EDW: build_detection_config(...) missing: validation_single_chunk_full_text
    EDW->>BDS: "_build_detection_spec(..., validation_single_chunk_full_text=True default)"
    BDS->>CVP: "ChunkedValidationParams(single_chunk_full_text=True)"

    Note over EDW: Direct callers can now set False
    EDW->>BDS: "_build_detection_spec(..., validation_single_chunk_full_text=False PR fix)"
    BDS->>CVP: "ChunkedValidationParams(single_chunk_full_text=False)"
Loading

Reviews (2): Last reviewed commit: "fix(detection): expose single chunk vali..." | Re-trigger Greptile

Signed-off-by: lipikaramaswamy <lramaswamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant