fix(heuristics): add word boundaries to live-debug keywords by madara88645 · Pull Request #913 · madara88645/Compiler

madara88645 · 2026-07-02T09:19:28Z

Problem

_JOINED_LIVE_DEBUG in app/heuristics/__init__.py joined LIVE_DEBUG_KEYWORDS with re.compile("|".join(...)) and no word boundaries. Patterns like logs? therefore substring-matched unrelated words — login, blog, catalog, dialog.

This made detect_live_debug() return True for benign prompts (e.g. "Implement secure login sessions"), which cascaded:

detect_live_debug → detect_troubleshooting_intent / live_debug flag → ContentHandler + LiveDebugHandler add spurious troubleshooting/debug intents → PolicyHandler sets debug_request=True → escalates policy to risk_level=medium with execution_mode=human_approval_required.

The result: harmless requests were misclassified as live-debug and forced into human approval.

Fix

Wrap each keyword with \b…\b. This mirrors the fix already applied to _TEACHING_RE (same file, ~line 169, with a comment documenting the identical substring bug). We wrap with \b rather than reusing that helper's re.escape, because these keywords intentionally contain regex (e.g. the ? in logs?), which re.escape would neutralize.

Tests

Added regression tests in tests/test_detect_live_debug.py:

Must be False: "Implement secure login sessions", "write a blog post", "browse the catalog", "open the dialog box"
Must stay True: "check the error log", "attach the logs", "help me debug this stack trace"

Verification

Full backend suite: 1692 passed, 5 skipped. The one unrelated failure (test_cli_new_features.py::test_validate_summary_and_api_schemas) is an environmental flake — its opportunistic branch assumes port 8000 is the Compiler API, but a local Docker process on 8000 returns 404. Not caused by this change.
ruff check . → clean.

Scope

Two files, no out-of-scope changes:

app/heuristics/__init__.py (+5/-1)
tests/test_detect_live_debug.py (+17)

🤖 Generated with Claude Code

_JOINED_LIVE_DEBUG joined LIVE_DEBUG_KEYWORDS without word boundaries, so patterns like "logs?" substring-matched unrelated words ("login", "blog", "catalog", "dialog"). This made detect_live_debug return True for benign prompts, cascading through detect_troubleshooting_intent and the Content/LiveDebug handlers into PolicyHandler, which then escalated policy to risk_level=medium / human_approval_required via debug_request. Wrap each keyword with \b (mirroring the existing _TEACHING_RE fix) instead of re.escape, since the keywords intentionally contain regex like "logs?". Add regression tests covering the false positives and the genuine log/debug cues that must still match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-07-02T09:19:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
compiler	Ready	Preview, Comment	Jul 2, 2026 9:19am

cursor

PR Risk Assessment — Low Risk ✅ Approved

Evidence-based assessment

Factor	Finding
Files changed	2 (`app/heuristics/__init__.py`, `tests/test_detect_live_debug.py`)
Diff size	+22 / −1
Production logic	Yes — `_JOINED_LIVE_DEBUG` regex compilation in shared heuristics
Infra / auth / schema	None
Blast radius	Narrow — `detect_live_debug()` only; feeds troubleshooting intent → policy escalation

What changed

Wraps each LIVE_DEBUG_KEYWORDS entry with \b…\b word boundaries when compiling the regex. This prevents substring false positives (e.g. logs? matching inside "login", "blog", "catalog", "dialog") that incorrectly escalated benign prompts to human_approval_required.

The approach mirrors the existing _TEACHING_RE fix in the same file (~line 169). Keywords intentionally retain regex metacharacters (e.g. logs?), so re.escape is correctly avoided.

Why Low (not Medium)

Clearly scoped bug fix with regression tests covering both false-positive and true-positive cases
Limited surface area; easy to reason about correctness
Reduces incorrect policy escalation rather than introducing new behavior
No cross-file behavioral changes beyond the single regex compilation line

Why not Very Low

Touches shared production heuristics that influence policy/execution mode — a small but real behavioral change in a core detection path.

Actions taken

Reviewers: Not required (Low risk); 0 reviewers currently assigned
CODEOWNERS: None configured in repo
Approval: Approved (Low risk, correctness clear, well-tested)

Automated risk assessment — conclusions derived from diff evidence only.

_{Sent by Cursor Automation: Assign PR reviewers}

…ion-modes

vercel Bot deployed to Preview July 2, 2026 09:19 View deployment

cursor Bot approved these changes Jul 2, 2026

View reviewed changes

madara88645 merged commit 5c4f21a into main Jul 2, 2026
12 checks passed

madara88645 added a commit that referenced this pull request Jul 2, 2026

Merge main (post #913 live-debug fix, #915 readme) into feat/explorat…

c68461b

…ion-modes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(heuristics): add word boundaries to live-debug keywords#913

fix(heuristics): add word boundaries to live-debug keywords#913
madara88645 merged 1 commit into
mainfrom
claude/reverent-curran-1872c3

madara88645 commented Jul 2, 2026

Uh oh!

vercel Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

madara88645 commented Jul 2, 2026

Problem

Fix

Tests

Verification

Scope

Uh oh!

vercel Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

PR Risk Assessment — Low Risk ✅ Approved

Evidence-based assessment

What changed

Why Low (not Medium)

Why not Very Low

Actions taken

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jul 2, 2026 •

edited

Loading