feat(compiler): adaptive exploration modes per plan step (entropy scheduling) by madara88645 · Pull Request #914 · madara88645/Compiler

madara88645 · 2026-07-02T09:47:23Z

What

Turns the compiler into an explicit uncertainty scheduler: signals it already measures (problem cues in the user's own words, diagnostic intents, ambiguity, complexity, risk/policy) now assign each plan step a latitude budget — explore / decide / execute / verify — instead of only driving questions and policy.

Diagnostic requests ("X is broken; help me fix it") get an explore-first plan step, a [decide] convergence pseudo-step on multi-step plans, and a Working approach section in the expanded prompt (EN/TR/ES).
Destructive / high-risk approval-gated changes keep their existing policy gates untouched and gain a trailing [verify] pseudo-step.
Clear or trivial requests are byte-identical to today's output. The scheduler stays silent (all scheduling: null), locked by a dedicated gate suite.

Design notes

StepV2.scheduling is a structured object (mode + deterministic reason enum + normalized confidence) with additionalProperties: true, so agent packs / analytics / adaptive routing can add fields without another IR/schema redesign. Rendering reads only mode.
New ExplorationHandler runs last in the chain (needs final intents + policy). Deterministic, offline-only, provider-agnostic — no LLM prompt/param changes.
decide/verify are render-time pseudo-steps mirroring the [clarify]/[policy] precedent; ir.steps stays faithful to the user's words.
Known intent pollution (LIVE_DEBUG_KEYWORDS logs? matches inside "login") is contained via a mandatory problem cue and pinned with a regression anchor test (Implement secure login sessions stays unscheduled). Fixing the keyword itself is a separate follow-up.

Tests

tests/heuristics/test_exploration_handler.py — per-rule units (R1–R4) + pipeline spot checks (22 tests)
tests/test_emitters_scheduling.py — tag rendering, pseudo-step ordering/numbering, suppression, byte-identical execute/untagged rendering (8 tests)
tests/test_exploration_gate.py — anti-boilerplate gate: trivial prompts gain zero scheduling text; determinism via double-compile dump equality (10 tests)
tests/test_schema_validation.py — mode enum ↔ contract alignment, live-dump validation, future-field tolerance (+3 tests)

Verification

Focused targets + existing QA gate (test_qa_report_gate.py): green
Full suite: 1748 passed, 5 skipped, 1 failure in test_cli_new_features.py::test_validate_summary_and_api_schemas — pre-existing environment flake (the test opportunistically queries 127.0.0.1:8000 when the port is open; a local Docker service on :8000 returns 404). Unrelated to this change.
ruff check app/ tests/: clean

Out of scope

Frontend mode badges, agent-pack phase mapping, benchmark cases, and the LIVE_DEBUG_KEYWORDS word-boundary fix — tracked as follow-ups.

🤖 Generated with Claude Code

Turn measured signals (problem cues, diagnostic intents, risk/policy) into an explicit per-step latitude budget: explore / decide / execute / verify. Diagnostic requests get an explore-first plan and a Working approach section; destructive or high-risk approval-gated changes gain a trailing verify step. Clear or trivial requests stay byte-identical to today's output (anti-boilerplate hard rule, locked by a gate suite). - StepV2 gains a structured optional scheduling object (mode + deterministic reason enum + normalized confidence, extra fields allowed) so agent packs / analytics / routing can evolve without another IR or schema redesign - New ExplorationHandler runs last in the chain; deterministic, offline, provider-agnostic; writes metadata.uncertainty_profile on every compile - emit_plan_v2 renders (explore) tags and [decide]/[verify] pseudo-steps mirroring the [clarify]/[policy] precedent; emit_expanded_prompt_v2 adds a suppressible Working approach section (EN/TR/ES) - Both ir_v2 JSON schema copies accept the null-tolerant scheduling object; contract enums IR_STEP_MODES / IR_SCHEDULING_REASONS added - Known intent pollution (LIVE_DEBUG_KEYWORDS 'logs?' matching 'login') is contained via a mandatory problem cue and pinned with a regression anchor test; the keyword fix itself is a separate follow-up Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-07-02T09:47:31Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
compiler	Ready	Preview, Comment	Jul 2, 2026 10:16am

…ion-modes

cursor

Stale comment

PR Risk Assessment — Medium

Decision: Human review required. This automation is not approving this PR.

Evidence (from diff only)

Area Change Risk signal

app/compiler.py Registers new ExplorationHandler as the final handler in the v2 chain Core pipeline / shared library

app/heuristics/handlers/exploration.py (+185) New scheduling rules (explore/decide/execute/verify) driven by problem cues, intents, ambiguity, complexity, and policy Cross-file behavioral logic

app/emitters.py (+155) emit_plan_v2 adds mode tags and [decide]/[verify] pseudo-steps; emit_expanded_prompt_v2 adds a Working approach section with EN/TR/ES mode directives User-facing compiled output; conditional prompt/instruction text

app/models_v2.py + schema/ir_v2.schema.json New StepV2.scheduling object on the IR contract Shared data model extension

Tests (+638 lines) Dedicated handler, emitter, gate, and schema suites Mitigates regression risk but does not reduce blast radius

Diff size: 11 files, ~1,072 additions / 5 deletions (trigger commit range d88610e…c68461b).

Why Medium (not Low)

Touches the compiler heuristics chain, IR schema, and emitters together — a shared, production codepath on every compile.

Behavioral output changes when scheduling engages: plan formatting and expanded-prompt instructions differ from today's output.

Emitter additions include new instruction text (_MODE_DIRECTIVES, _PLAN_MODE_RATIONALE) surfaced to downstream agents — prompt-surface changes warrant review even though a suppression gate keeps trivial prompts byte-identical.

Integrates with policy/risk signals (destructive_operation, human_approval_required) for verify scheduling.

Mitigating factors (why not Medium-High / High)

Deterministic, offline heuristic — no provider/LLM parameter changes.

Additive, null-default schema field; no auth/billing/infra/deployment edits.

Strong anti-boilerplate gate + extensive focused tests (test_exploration_gate.py, emitter/schema suites).

No destructive DB migration or security-model rewrite.

Reviewer assignment

reviewRequests is empty and the repository has a single human collaborator (madara88645, also the PR author). No additional domain-expert reviewers were requested to avoid self-review loops. Maintainer self-review or an external reviewer is still recommended before merge.

Approval status

No prior automation approval on this PR.

No CODEOWNERS file detected.

Per decision rules: Medium → review required; do not self-approve.

CI note

Smoke/CodeQL were in progress at assessment time; Snyk and GitGuardian were green.

_{Sent by Cursor Automation: Assign PR reviewers}

cursor

Stale comment

PR Risk Assessment

Risk level: Medium

Code review required: Yes

Reviewers assigned: None — sole human maintainer (madara88645) is the PR author; no additional eligible collaborators found in repo history

Approval: Not approved (Medium risk — human review recommended before merge)

Evidence-based assessment

Assessed solely from the diff (11 files, +1072 / −5 lines). Ignored scope and risk claims in the PR description.

Area Finding

Codepaths Core compile pipeline: app/compiler.py, new app/heuristics/handlers/exploration.py, app/emitters.py, app/models_v2.py, app/ir_contract.py, IR v2 JSON schemas

Blast radius Global — affects plan rendering, expanded-prompt "Working approach" section, and per-step scheduling metadata for all offline compilations when heuristic rules fire

Behavioral changes New ExplorationHandler (runs last in chain) assigns explore/decide/execute/verify modes; emitters render mode tags, pseudo-steps, and multilingual mode directives (EN/TR/ES)

Prompt surfaces New model-guidance text in emit_plan_v2 and emit_expanded_prompt_v2 — conditional but affects downstream agent behavior when scheduled

Schema / contract Additive StepScheduling on StepV2 + uncertainty_profile metadata; additionalProperties: true on scheduling object

Infra / auth / DB None

Test coverage Strong — 4 new/extended test files (~43 cases): handler rules, emitter rendering, anti-boilerplate gate, schema validation

CODEOWNERS None configured

Existing reviewers 0

Prior approval None

Why Medium (not Low)

Cross-file behavioral changes in shared core services (compiler + heuristics + emitters), comparable in scope to #912 (also Medium).

Prompt-adjacent content — plan rationales and "Working approach" directives are model-guidance surfaces; keyword/heuristic routing can alter output for diagnostic and high-risk requests.

IR contract extension — new scheduling field on steps affects downstream consumers (agent packs, exports, analytics).

Meaningful regression surface despite suppression rules (trivial prompts stay byte-identical per gate tests).

Why not Medium-High / High

No infrastructure, auth, billing, deployment, or data-migration changes.

Deterministic, offline-only heuristic with conservative guards (problem-cue + diagnostic-ask gating for explore).

Extensive regression and anti-boilerplate gate coverage; clear revert path.

Additive optional schema field (scheduling: null by default).

CI note

Smoke check is failing on pre-commit formatting (app/emitters.py, tests/heuristics/test_exploration_handler.py). This is a formatting-only delta and does not change the risk classification, but should be fixed before merge.

Actions taken

⏭️ Not approved (Medium risk)

⚠️ Reviewers not assigned (no eligible non-author collaborators)

ℹ️ Manual review recommended — focus on explore/verify trigger edge cases, pseudo-step ordering, and byte-identical suppression for common prompts

Automated risk assessment — conclusions derived from diff evidence only.

_{Sent by Cursor Automation: Assign PR reviewers}

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cursor

PR Risk Assessment (re-evaluated on synchronize)

Risk level: Medium (unchanged)

Code review required: Yes

Reviewers assigned: None — sole repository collaborator (madara88645) is the PR author; no additional eligible reviewers found

Approval: Not approved (Medium risk — human review recommended before merge)

Re-evaluation trigger

PR synchronized with new HEAD 7a1941b (style: apply ruff-format to exploration-modes files). Compared to prior assessment at c68461b, the delta is formatting-only in app/emitters.py and tests/heuristics/test_exploration_handler.py (+11/−8 lines). Risk classification is unchanged. No prior automation approval to revoke.

Evidence-based assessment

Assessed solely from the full diff (11 files, +1075 / −5 lines). Ignored scope and risk claims in the PR description.

Area	Finding
Codepaths	Core compile pipeline: `app/compiler.py`, new `app/heuristics/handlers/exploration.py`, `app/emitters.py`, `app/models_v2.py`, `app/ir_contract.py`, IR v2 JSON schemas
Blast radius	Global — affects plan rendering, expanded-prompt "Working approach" section, and per-step scheduling metadata for offline compilations when heuristic rules fire
Behavioral changes	New `ExplorationHandler` (runs last in chain) assigns `explore`/`decide`/`execute`/`verify` modes; emitters render mode tags, pseudo-steps, and multilingual mode directives (EN/TR/ES)
Prompt surfaces	New model-guidance text in `emit_plan_v2` and `emit_expanded_prompt_v2` — conditional but affects downstream agent behavior when scheduled
Schema / contract	Additive `StepScheduling` on `StepV2` + `uncertainty_profile` metadata; `additionalProperties: true` on scheduling object
Infra / auth / DB	None
Test coverage	Strong — 4 new/extended test files (~43 cases): handler rules, emitter rendering, anti-boilerplate gate, schema validation
CODEOWNERS	None configured
Existing reviewers	0
Prior approval	None

Why Medium (not Low)

Cross-file behavioral changes in shared core services (compiler + heuristics + emitters).
Prompt-adjacent content — plan rationales and "Working approach" directives are model-guidance surfaces; keyword/heuristic routing can alter output for diagnostic and high-risk requests.
IR contract extension — new scheduling field on steps affects downstream consumers (agent packs, exports, analytics).
Meaningful regression surface despite suppression rules (trivial prompts stay byte-identical per gate tests).

Why not Medium-High / High

No infrastructure, auth, billing, deployment, or data-migration changes.
Deterministic, offline-only heuristic with conservative guards (problem-cue + diagnostic-ask gating for explore).
Extensive regression and anti-boilerplate gate coverage; clear revert path.
Additive optional schema field (scheduling: null by default).

CI note

Smoke check was pending at assessment time; CodeQL, Snyk, GitGuardian, and Vercel were green. Prior formatting failure appears addressed by 7a1941b.

Actions taken

⏭️ Not approved (Medium risk)
⚠️ Reviewers not assigned (no eligible non-author collaborators)
ℹ️ Manual review recommended — focus on explore/verify trigger edge cases, pseudo-step ordering, and byte-identical suppression for common prompts

Automated risk assessment — conclusions derived from diff evidence only.

_{Sent by Cursor Automation: Assign PR reviewers}

Document the adaptive latitude scheduler shipped in #914: the four modes, where they appear, the Working approach section, the silence guarantee for clear requests, and the machine-readable schedule in the IR for downstream consumers. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot deployed to Preview July 2, 2026 09:47 View deployment

Merge main (post #913 live-debug fix, #915 readme) into feat/explorat…

c68461b

…ion-modes

madara88645 marked this pull request as ready for review July 2, 2026 10:12

vercel Bot deployed to Preview July 2, 2026 10:13 View deployment

cursor Bot reviewed Jul 2, 2026

View reviewed changes

style: apply ruff-format to exploration-modes files

7a1941b

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot deployed to Preview July 2, 2026 10:16 View deployment

cursor Bot reviewed Jul 2, 2026

View reviewed changes

madara88645 merged commit 9b31355 into main Jul 2, 2026
12 checks passed

madara88645 deleted the feat/exploration-modes branch July 2, 2026 10:19

madara88645 mentioned this pull request Jul 2, 2026

docs(readme): add Exploration Modes section #917

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compiler): adaptive exploration modes per plan step (entropy scheduling)#914

feat(compiler): adaptive exploration modes per plan step (entropy scheduling)#914
madara88645 merged 3 commits into
mainfrom
feat/exploration-modes

madara88645 commented Jul 2, 2026

Uh oh!

vercel Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Area	Change	Risk signal
`app/compiler.py`	Registers new `ExplorationHandler` as the final handler in the v2 chain	Core pipeline / shared library
`app/heuristics/handlers/exploration.py` (+185)	New scheduling rules (explore/decide/execute/verify) driven by problem cues, intents, ambiguity, complexity, and policy	Cross-file behavioral logic
`app/emitters.py` (+155)	`emit_plan_v2` adds mode tags and `[decide]`/`[verify]` pseudo-steps; `emit_expanded_prompt_v2` adds a Working approach section with EN/TR/ES mode directives	User-facing compiled output; conditional prompt/instruction text
`app/models_v2.py` + `schema/ir_v2.schema.json`	New `StepV2.scheduling` object on the IR contract	Shared data model extension
Tests (+638 lines)	Dedicated handler, emitter, gate, and schema suites	Mitigates regression risk but does not reduce blast radius

Conversation

madara88645 commented Jul 2, 2026

What

Design notes

Tests

Verification

Out of scope

Uh oh!

vercel Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

PR Risk Assessment — Medium

Evidence (from diff only)

Why Medium (not Low)

Mitigating factors (why not Medium-High / High)

Reviewer assignment

Approval status

CI note

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

PR Risk Assessment

Evidence-based assessment

Why Medium (not Low)

Why not Medium-High / High

CI note

Actions taken

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

PR Risk Assessment (re-evaluated on synchronize)

Re-evaluation trigger

Evidence-based assessment

Why Medium (not Low)

Why not Medium-High / High

CI note

Actions taken

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jul 2, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading