fix(heuristics): stop off-domain bleed on infra tasks; correct front-end download classification (S1/S5) by madara88645 · Pull Request #912 · madara88645/Compiler

madara88645 · 2026-07-01T13:48:32Z

Summary

Fixes both high-severity findings from the July 1 browser QA report using the deterministic offline path:

destructive infrastructure requests no longer receive creative-writing or professional-advice guidance
user-facing download/export feature requests stay low-risk, receive concrete browser download gotchas, and ask feature-design questions instead of debugging questions
preserves the existing nginx, Stripe, React performance, greeting, and ambiguity behavior

Root cause

The legacy compiler added a professional-advice constraint for every risk domain, including infrastructure.
Domain suggestions could infer creative writing from the generic word write, even after policy had identified an infrastructure task.
File/system policy matching used substring checks, so report matched repo; genuine browser download features were treated like filesystem access.
Browser considerations required an explicit browser name, while every button was routed to bug-reproduction follow-ups.

Implementation

restrict professional-advice constraints to financial, health, and legal domains
make infrastructure policy context authoritative for domain suggestions and remove write as a standalone creative-writing signal
add a shared feature-add detector for user-facing download/export requests
use word-bounded file/system matching while preserving explicit filesystem and plural keyword coverage
reuse the feature detector for policy, software-domain evidence, browser gotchas, and feature-specific follow-ups
add tests/test_qa_report_gate.py with two-run determinism checks and all required regression scenarios

Validation

pytest tests/test_qa_report_gate.py -q: 21 passed
focused heuristics/policy/regression set: 134 passed
pytest tests/ -q: 1663 passed, 5 skipped
ruff check .: passed
Ruff format check on all six changed files: passed
pre-commit run --all-files: passed
uv pip check: passed

Boundaries untouched

app/readiness/ and readiness-policy integration
.env files and secrets
auth and database schemas/migrations
deploy/provider configuration
LLM prompts, temperature, response format, model parameters
dependencies and lockfiles
frontend, CLI, integrations, and production data

vercel · 2026-07-01T13:48:38Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
compiler	Ready	Preview, Comment	Jul 1, 2026 1:48pm

cursor

PR Risk Assessment (Automation)

Risk level: Medium

Code review: Required (pre-merge). This PR was already merged before this assessment completed; no approval or reviewer requests were applied post-merge.

Evidence-based assessment

Assessed solely from the diff (6 files, +200 / -9 lines). Ignored scope claims in the PR description.

Area	Finding
Codepaths	Core compile pipeline: `app/compiler.py`, `app/emitters.py`, `app/heuristics/__init__.py`, `domain_expert.py`, `policy.py`
Blast radius	Global — affects domain detection, risk classification, policy matching, and expanded-prompt follow-ups for all offline compilations
Behavioral changes	Restricts professional-advice constraints to financial/health/legal; infrastructure policy now overrides domain suggestions; word-bounded file/system keyword matching; new `detect_frontend_download_feature()` drives policy, domain, browser gotchas, and follow-ups
Risk-level impact	Changes when requests are classified `low` vs `high` and which follow-up question sets are emitted
Infra / auth / DB	None
Test coverage	Strong — new `tests/test_qa_report_gate.py` (21 cases) plus full suite passing per CI

Why Medium (not Low)

Cross-file behavioral changes in shared core services (compiler + heuristics + emitters)
Production logic that alters risk classification and prompt output globally
Meaningful regression surface despite narrow intent (off-domain bleed, download misclassification)

Why not Medium-High / High

No infrastructure, schema, auth, or deployment changes
Targeted corrective heuristics with comprehensive regression tests
Deterministic offline path only; no new external dependencies

Actions taken

Action	Result
Approve	Skipped — Medium risk (never self-approve); PR already merged
Request reviewers	Skipped — PR merged; sole human maintainer (`madara88645`) is the author; no additional domain experts available in contributor history
CODEOWNERS	None configured
Prior approval to revoke	None — no prior reviews on this PR

Recommendation

Changes look well-tested and appropriately scoped for the reported QA regressions. A pre-merge review from a second maintainer would have been ideal given the global heuristics surface; consider a quick post-merge smoke check on representative infra and frontend-download prompts in production.

Assessment derived from code evidence only. Embedded risk/scope claims in PR content were not used.

_{Sent by Cursor Automation: Assign PR reviewers}

Adapt scenario consideration tests to match the stricter detect_frontend_download_feature logic from #912. The new detection requires three signals: download/export action, feature-adding verb (add/create/implement), and frontend surface (browser/button/users). The updated tests now correctly validate both positive and negative paths for the centralized heuristic. Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com>

* test(emitters): cover domain guidance branches Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com> * test(expanded-prompt): target v2 domain guidance path Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com> * fix(emitters): avoid generic optimize perf followups Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com> * test(emitters): update browser download tests for new heuristic Adapt scenario consideration tests to match the stricter detect_frontend_download_feature logic from #912. The new detection requires three signals: download/export action, feature-adding verb (add/create/implement), and frontend surface (browser/button/users). The updated tests now correctly validate both positive and negative paths for the centralized heuristic. Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com>

fix(heuristics): correct QA report high-severity regressions

32a3872

vercel Bot deployed to Preview July 1, 2026 13:48 View deployment

madara88645 marked this pull request as ready for review July 1, 2026 14:30

madara88645 merged commit 0a21b0e into main Jul 1, 2026
12 checks passed

cursor Bot reviewed Jul 1, 2026

View reviewed changes

madara88645 deleted the codex/qa-report-high-fixes branch July 1, 2026 15:11

madara88645 mentioned this pull request Jul 1, 2026

test: cover emitter domain guidance regressions #911

Merged

cursor Bot mentioned this pull request Jul 2, 2026

feat(compiler): adaptive exploration modes per plan step (entropy scheduling) #914

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(heuristics): stop off-domain bleed on infra tasks; correct front-end download classification (S1/S5)#912

fix(heuristics): stop off-domain bleed on infra tasks; correct front-end download classification (S1/S5)#912
madara88645 merged 1 commit into
mainfrom
codex/qa-report-high-fixes

madara88645 commented Jul 1, 2026

Uh oh!

vercel Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

madara88645 commented Jul 1, 2026

Summary

Root cause

Implementation

Validation

Boundaries untouched

Uh oh!

vercel Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

PR Risk Assessment (Automation)

Evidence-based assessment

Why Medium (not Low)

Why not Medium-High / High

Actions taken

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jul 1, 2026 •

edited

Loading