Skip to content

Extract hardcoded confidence thresholds to named constants#10

Merged
JuliusScheuerer merged 1 commit into
mainfrom
worktree-extract-threshold-constants
Mar 25, 2026
Merged

Extract hardcoded confidence thresholds to named constants#10
JuliusScheuerer merged 1 commit into
mainfrom
worktree-extract-threshold-constants

Conversation

@JuliusScheuerer
Copy link
Copy Markdown
Owner

Summary

  • Create constants.py with 5 named constants replacing 20+ magic numbers across 11 files
  • DEFAULT_SCORE_THRESHOLD (0.35) — detection sensitivity, was duplicated in schemas, routes, handlers
  • TIER_HIGH_THRESHOLD (0.7) / TIER_MEDIUM_THRESHOLD (0.5) — review panel tier boundaries
  • RECOGNIZER_BASE_SCORE_HIGH (0.5) / RECOGNIZER_BASE_SCORE_LOW (0.3) — pattern recognizer base scores
  • No logic changes — only literal replacements with named imports
  • Per GOLDEN-PRINCIPLES.md §20: no magic numbers

Test plan

  • make check passes (275 tests, 95.64% coverage)
  • All pre-commit hooks pass

Create constants.py with DEFAULT_SCORE_THRESHOLD (0.35),
TIER_HIGH_THRESHOLD (0.7), TIER_MEDIUM_THRESHOLD (0.5),
RECOGNIZER_BASE_SCORE_HIGH (0.5), and RECOGNIZER_BASE_SCORE_LOW (0.3).
Replace all 20+ magic number occurrences across 11 files with these
named constants. No logic changes — only literal replacements.
@JuliusScheuerer JuliusScheuerer merged commit ece5445 into main Mar 25, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 25, 2026

Greptile Summary

This PR is a clean, targeted refactor that introduces a single constants.py module and replaces duplicated magic number literals across 11 files with 5 named constants (DEFAULT_SCORE_THRESHOLD, TIER_HIGH_THRESHOLD, TIER_MEDIUM_THRESHOLD, RECOGNIZER_BASE_SCORE_HIGH, RECOGNIZER_BASE_SCORE_LOW). No logic is changed — every replacement is a value-preserving substitution that improves maintainability and eliminates the risk of accidental divergence between call sites.

  • All 5 constants are correctly valued and match every literal they replace; the module is well-documented and exports a clean __all__.
  • Two magic numbers remain unaddressed: 0.2 (short-date pattern in german_date.py) and 0.6 (mobile-phone pattern in german_phone.py) are unique values with no matching constant. They are not regressions introduced here, but since the PR's stated goal is eliminating all magic numbers these could be tidied up with two additional constants (RECOGNIZER_BASE_SCORE_VERY_LOW and RECOGNIZER_BASE_SCORE_MOBILE, or similar).
  • RECOGNIZER_BASE_SCORE_HIGH and TIER_MEDIUM_THRESHOLD happen to share the value 0.5, which is intentional but worth noting — they represent different concepts (recognizer pattern confidence vs. UI tier boundary) and should not be consolidated.

Confidence Score: 5/5

  • This PR is safe to merge — it is a pure literal-to-constant substitution with no behavioral changes.
  • All replacements are value-preserving, tests pass at 95%+ coverage, and the two remaining magic numbers (0.2, 0.6) are pre-existing minor style gaps, not regressions introduced by this PR. There are no logic, security, or correctness concerns.
  • german_date.py and german_phone.py each retain one bare numeric literal not covered by the new constants.

Important Files Changed

Filename Overview
src/document_anonymizer/constants.py New file introducing 5 well-documented named constants; values match all literal replacements made in the PR.
src/document_anonymizer/detection/recognizers/german_date.py Replaces 0.3 with RECOGNIZER_BASE_SCORE_LOW, but leaves 0.2 as an unaddressed magic number on the short-date pattern.
src/document_anonymizer/detection/recognizers/german_phone.py Replaces 0.5 and 0.3 with named constants, but leaves the mobile-pattern score 0.6 as an unaddressed magic number.
src/document_anonymizer/api/schemas.py DEFAULT_SCORE_THRESHOLD correctly replaces both 0.35 defaults and 0.35 examples; no issues.
src/document_anonymizer/web/routes.py Replaces DEFAULT_SCORE_THRESHOLD in three form handlers and TIER_HIGH/MEDIUM_THRESHOLD in _score_to_tier; clean and correct.
src/document_anonymizer/document/pdf_handler.py Replaces three 0.35 default parameter values with DEFAULT_SCORE_THRESHOLD; no logic changes.
src/document_anonymizer/document/processor.py Straightforward substitution of DEFAULT_SCORE_THRESHOLD in two function signatures; correct.
src/document_anonymizer/document/text_handler.py DEFAULT_SCORE_THRESHOLD replaces 0.35 in two function defaults; clean change.
src/document_anonymizer/detection/recognizers/german_handelsreg.py 0.5 replaced with RECOGNIZER_BASE_SCORE_HIGH; no issues.
src/document_anonymizer/detection/recognizers/german_iban.py DEFAULT_SCORE class attribute correctly migrated to RECOGNIZER_BASE_SCORE_HIGH; no issues.
src/document_anonymizer/detection/recognizers/german_id_card.py 0.3 replaced with RECOGNIZER_BASE_SCORE_LOW; clean and correct.
src/document_anonymizer/detection/recognizers/german_tax.py Both 0.3 occurrences replaced with RECOGNIZER_BASE_SCORE_LOW, inline comments preserved correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    C["constants.py"]
    C -->|DEFAULT_SCORE_THRESHOLD 0.35| S["api/schemas.py"]
    C -->|DEFAULT_SCORE_THRESHOLD 0.35| R["web/routes.py"]
    C -->|DEFAULT_SCORE_THRESHOLD 0.35| PH["document/pdf_handler.py"]
    C -->|DEFAULT_SCORE_THRESHOLD 0.35| PR["document/processor.py"]
    C -->|DEFAULT_SCORE_THRESHOLD 0.35| TH["document/text_handler.py"]
    C -->|TIER_HIGH_THRESHOLD 0.7\nTIER_MEDIUM_THRESHOLD 0.5| R
    C -->|RECOGNIZER_BASE_SCORE_HIGH 0.5| IBAN["recognizers/german_iban.py"]
    C -->|RECOGNIZER_BASE_SCORE_HIGH 0.5| HAND["recognizers/german_handelsreg.py"]
    C -->|RECOGNIZER_BASE_SCORE_HIGH 0.5\nRECOGNIZER_BASE_SCORE_LOW 0.3| PHONE["recognizers/german_phone.py"]
    C -->|RECOGNIZER_BASE_SCORE_LOW 0.3| DATE["recognizers/german_date.py"]
    C -->|RECOGNIZER_BASE_SCORE_LOW 0.3| ID["recognizers/german_id_card.py"]
    C -->|RECOGNIZER_BASE_SCORE_LOW 0.3| TAX["recognizers/german_tax.py"]
    DATE -. "0.2 still magic" .-> DATE
    PHONE -. "0.6 still magic" .-> PHONE
Loading

Reviews (1): Last reviewed commit: "Extract hardcoded confidence thresholds ..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant