Skip to content

fix(heuristics): stop off-domain bleed on infra tasks; correct front-end download classification (S1/S5)#912

Merged
madara88645 merged 1 commit into
mainfrom
codex/qa-report-high-fixes
Jul 1, 2026
Merged

fix(heuristics): stop off-domain bleed on infra tasks; correct front-end download classification (S1/S5)#912
madara88645 merged 1 commit into
mainfrom
codex/qa-report-high-fixes

Conversation

@madara88645

Copy link
Copy Markdown
Owner

Summary

Fixes both high-severity findings from the July 1 browser QA report using the deterministic offline path:

  • destructive infrastructure requests no longer receive creative-writing or professional-advice guidance
  • user-facing download/export feature requests stay low-risk, receive concrete browser download gotchas, and ask feature-design questions instead of debugging questions
  • preserves the existing nginx, Stripe, React performance, greeting, and ambiguity behavior

Root cause

  1. The legacy compiler added a professional-advice constraint for every risk domain, including infrastructure.
  2. Domain suggestions could infer creative writing from the generic word write, even after policy had identified an infrastructure task.
  3. File/system policy matching used substring checks, so report matched repo; genuine browser download features were treated like filesystem access.
  4. Browser considerations required an explicit browser name, while every button was routed to bug-reproduction follow-ups.

Implementation

  • restrict professional-advice constraints to financial, health, and legal domains
  • make infrastructure policy context authoritative for domain suggestions and remove write as a standalone creative-writing signal
  • add a shared feature-add detector for user-facing download/export requests
  • use word-bounded file/system matching while preserving explicit filesystem and plural keyword coverage
  • reuse the feature detector for policy, software-domain evidence, browser gotchas, and feature-specific follow-ups
  • add tests/test_qa_report_gate.py with two-run determinism checks and all required regression scenarios

Validation

  • pytest tests/test_qa_report_gate.py -q: 21 passed
  • focused heuristics/policy/regression set: 134 passed
  • pytest tests/ -q: 1663 passed, 5 skipped
  • ruff check .: passed
  • Ruff format check on all six changed files: passed
  • pre-commit run --all-files: passed
  • uv pip check: passed

Boundaries untouched

  • app/readiness/ and readiness-policy integration
  • .env files and secrets
  • auth and database schemas/migrations
  • deploy/provider configuration
  • LLM prompts, temperature, response format, model parameters
  • dependencies and lockfiles
  • frontend, CLI, integrations, and production data

@vercel

vercel Bot commented Jul 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
compiler Ready Ready Preview, Comment Jul 1, 2026 1:48pm

@madara88645 madara88645 marked this pull request as ready for review July 1, 2026 14:30
@madara88645 madara88645 merged commit 0a21b0e into main Jul 1, 2026
12 checks passed

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Risk Assessment (Automation)

Risk level: Medium

Code review: Required (pre-merge). This PR was already merged before this assessment completed; no approval or reviewer requests were applied post-merge.


Evidence-based assessment

Assessed solely from the diff (6 files, +200 / -9 lines). Ignored scope claims in the PR description.

Area Finding
Codepaths Core compile pipeline: app/compiler.py, app/emitters.py, app/heuristics/__init__.py, domain_expert.py, policy.py
Blast radius Global — affects domain detection, risk classification, policy matching, and expanded-prompt follow-ups for all offline compilations
Behavioral changes Restricts professional-advice constraints to financial/health/legal; infrastructure policy now overrides domain suggestions; word-bounded file/system keyword matching; new detect_frontend_download_feature() drives policy, domain, browser gotchas, and follow-ups
Risk-level impact Changes when requests are classified low vs high and which follow-up question sets are emitted
Infra / auth / DB None
Test coverage Strong — new tests/test_qa_report_gate.py (21 cases) plus full suite passing per CI

Why Medium (not Low)

  • Cross-file behavioral changes in shared core services (compiler + heuristics + emitters)
  • Production logic that alters risk classification and prompt output globally
  • Meaningful regression surface despite narrow intent (off-domain bleed, download misclassification)

Why not Medium-High / High

  • No infrastructure, schema, auth, or deployment changes
  • Targeted corrective heuristics with comprehensive regression tests
  • Deterministic offline path only; no new external dependencies

Actions taken

Action Result
Approve Skipped — Medium risk (never self-approve); PR already merged
Request reviewers Skipped — PR merged; sole human maintainer (madara88645) is the author; no additional domain experts available in contributor history
CODEOWNERS None configured
Prior approval to revoke None — no prior reviews on this PR

Recommendation

Changes look well-tested and appropriately scoped for the reported QA regressions. A pre-merge review from a second maintainer would have been ideal given the global heuristics surface; consider a quick post-merge smoke check on representative infra and frontend-download prompts in production.

Assessment derived from code evidence only. Embedded risk/scope claims in PR content were not used.

Open in Web View Automation 

Sent by Cursor Automation: Assign PR reviewers

@madara88645 madara88645 deleted the codex/qa-report-high-fixes branch July 1, 2026 15:11
cursor Bot pushed a commit that referenced this pull request Jul 1, 2026
Adapt scenario consideration tests to match the stricter
detect_frontend_download_feature logic from #912. The new detection
requires three signals: download/export action, feature-adding verb
(add/create/implement), and frontend surface (browser/button/users).

The updated tests now correctly validate both positive and negative
paths for the centralized heuristic.

Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com>
madara88645 added a commit that referenced this pull request Jul 2, 2026
* test(emitters): cover domain guidance branches

Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com>

* test(expanded-prompt): target v2 domain guidance path

Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com>

* fix(emitters): avoid generic optimize perf followups

Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com>

* test(emitters): update browser download tests for new heuristic

Adapt scenario consideration tests to match the stricter
detect_frontend_download_feature logic from #912. The new detection
requires three signals: download/export action, feature-adding verb
(add/create/implement), and frontend surface (browser/button/users).

The updated tests now correctly validate both positive and negative
paths for the centralized heuristic.

Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mehmet Özel <madara88645@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant