Remove DOM-validation chain (PR #586), add ContextModule prompt registry + recovery fixes by mercurialsolo · Pull Request #591 · mercurialsolo/mantis

mercurialsolo · 2026-05-22T19:15:10Z

Summary

Revert PR click handler: DOM-aware grounding — query element at click coords, reject non-card targets #586 DOM-validation chain — cdp_element_matches_selector + listing_card_css_selector + PRE-CLICK REJECT + OFF_CARD recovery. Per feedback_cua_no_dom_access.md this is the "querySelectorAll-to-derive-target" pattern the architecture bans. Banner-ad misclick recovery now flows through the existing CUA-pure path (expect_url_contains + _maybe_demote_wrong_target).
Add ContextModule registry (src/mantis_agent/context_modules.py) — modular per-step prompt sections + handler hints, composed at brain dispatch time via a context-var. Two starter modules: BLOCKING_OVERLAY (extraction-phase carousel/modal/paywall dismiss) and KEYBOARD_SCROLL (scroll-step image-misclick defense). Future patterns add a single ContextModule entry, not global prompt edits.
Recovery-plumbing fixes uncovered through 12+ hours of run debugging — failure-history bug (_record_failure_for_retry early-returned on missing target), critic-frontier reading the wrong step_index key, binary fire-once guard replaced with growth-gated re-fires.
Extraction-quality fixes — VIABLE | prefix bug in WorkflowRunner._extract_data Priority-1 (silent 0-lead reports), scroll brain budget cap 3→8 (was producing brain_loop_exhausted on 4000+ pixel pages), recovery analyzer prompt teaches "scroll exhausted mid-page" → add_hint not halt.
Cost-meter fixes — sync_gpu_seconds_from_time_meter fires unconditionally (per-step deltas absorb between-step think growth); back-allocate finalize residual to last step so per-step COST column sums to header total.

Why the DOM-validation chain was removed

cdp_element_matches_selector(x, y, css_selector) queried the DOM at a vision-picked coordinate, walked ancestors against a CSS selector, and gated whether to dispatch the click. That's using DOM to derive the action decision, not to verify a completed action. The memory note feedback_cua_no_dom_access.md explicitly bans this; PR #586 had drifted past it.

A 12-hour debugging arc shipped 8 progressive runner-side fixes plumbing information around this gate's failure mode. Live DOM inspection via Chrome MCP showed the selector matched 0 elements on the current boattrader page (BoatTrader had renamed the attribute the gate looked for). Updating the selector would have worked, but the existence of the per-site selector was itself the architectural bug. This PR removes the gate and keeps only the genuinely useful recovery-plumbing fixes from that arc.

Architectural rule for future modules

ContextModule.applies_when predicates are the analysis hook. Today's predicates are rule-based (step_type == "scroll"); the signature Callable[[dict], bool] accepts any analysis — a vision module that runs Claude against the recent screenshot is one entry in MODULES. Adding a new behavior for a newly observed failure pattern means adding ONE module entry, not editing global prompts.

Pre-existing always-on guidance (FORM FILLING / NAVIGATION) stays in the base holo3_system.txt prompt — the registry's job is augmentation, not replacement. Pinned in test_registry_does_not_modulerize_base_prompt_defaults.

Test plan

tests/test_context_modules.py (23 new tests) — context-var semantics, predicate firing matrix, prompt composition, handler hints, defensive predicate-error handling
tests/test_workflow_runner_extract_viable_prefix.py (4 new tests) — VIABLE prefix added in Priority-1, no double-prefix, holds across URL backfill, Priority-2 synthetic path unchanged
tests/test_execution_critic.py — growth-gated re-fire test + step_result.step_index keying test
tests/test_agentic_retry_feedback.py — bare-record-on-missing-target + stamp-guard
tests/test_cost_meter.py — sync even when steps_used == 0
tests/test_observability_augur_518.py — back-allocation happy path + noise floor + zero-emitted-steps
tests/test_step_recovery_policy.py — OFF_CARD cases removed (feature deleted)
tests/test_dom_aware_click_validation_586.py — file deleted
387 tests pass across all affected suites

What this PR does NOT include

The plan file plans/boattrader_scrape_urlnav was rewritten locally (phone-optional viability + carousel hint in step 4 prose) but plans/ is not under version control — plan changes should land separately when the team chooses how to version plans.
scripts/run_boattrader_urlnav_with_proxy.py has pre-session edits per the standing rule against adhoc submit scripts in the repo; not extended in this PR.
A follow-up should teach the PlanDecomposer to preserve "DO NOT X" prohibitions from plan prose into MicroIntent.intent or hints.prohibitions — currently they're dropped during decomposition.

🤖 Generated with Claude Code

…y, fix recovery + VIABLE-prefix gaps This pulls in two distinct lines of work that landed together through a long debugging session against the boattrader by-owner scrape: 1) Remove the PR #586 DOM-validation chain (CUA violation per feedback_cua_no_dom_access.md) - Delete env.cdp_element_matches_selector (xdotool_env.py): DOM-derived decision about whether to dispatch a vision-picked click is exactly the "querySelectorAll-to-derive-target" pattern the architecture bans. window.scrollY-style post-action verification stays; element-at-point pre-action gating goes. - Delete SiteConfig.listing_card_css_selector field + the boattrader default. Per-site CSS selectors are themselves the architectural anti-pattern. - Delete the PRE-CLICK REJECT block in step_handlers/click.py and the scan-phase pre-filter / wipeout-driven OFF_CARD halt I had layered on top to work around it. - Delete the OFF_CARD branch in step_recovery.py and the _off_card_reject_count reset in run_executor._handle_success. - Delete tests/test_dom_aware_click_validation_586.py and the matching OFF_CARD cases in tests/test_step_recovery_policy.py. Failure mode this leaves behind (banner-ad / wrong-target misclick) is handled by the existing CUA-pure path: expect_url_contains / expect_url_excludes hints + _maybe_demote_wrong_target. 2) Keep the genuine recovery + extraction fixes uncovered along the way - Failure-history plumbing (run_executor.py): * _record_failure_for_retry no longer early-returns when _last_submit_target is absent — records bare-coord entries so the critic-frontier's failure_count actually advances. The prior guard made wrong_target loops invisible to escalation and produced 30-minute time_cap halts with zero leads. * Centralized fallback call in _handle_failure for any failure_class in REWRITE_TRIGGERING_CLASSES, with a step_result stamp guard so demote sites that already recorded don't double-count. - Critic-frontier (critic.py): * Reads _step_failure_history under step_result.step_index, not state.step_index — the recovery policy can advance state.step_index past the failed step before _consult_critic runs, so the prior key-by-current-state lookup missed forever. * Growth-gated re-fires replace the binary "fire-once" guard. After firing at failure_count=N, the next fire requires failure_count >= N + threshold. Bounded by the existing per-step / per-run recovery budget caps so total Claude spend stays the same. * Instrumentation: budget consumption logged at WARNING when each site bumps _recovery_attempts_per_step. - Cost-meter (cost_meter.py): * sync_gpu_seconds_from_time_meter fires unconditionally inside record_step (not gated on steps_used > 0) so steps with no brain inference still absorb between-step think growth into their per-step delta. - Cost back-allocation (run_executor.py): * _back_allocate_residual_to_last_step patches the final step's total_usd so the per-step COST column sums to the run total. Catches GPU/think growth captured by the terminal sync after the last per-step delta was computed. - VIABLE-prefix fix (workflow_runner.py): * Priority-1 branch in _extract_data now prepends "VIABLE | " when missing. ListingDedup.successful_lead_data filters on the prefix — without this fix, every lead the brain extracted in the happy-path "Year: 2024 | Make: X | ..." shape got silently dropped from the count, producing 0-leads reports despite real extraction. - Scroll budget cap (step_handlers/holo3.py): * DEFAULT_BRAIN_BUDGET_CAPS["scroll"] bumped 3 -> 8. 3 actions cover ~1800px; modern detail pages are 4000-6000px, so the cap was producing brain_loop_exhausted on plans the decomposer thought it had budget for. 3) ContextModule registry + recovery-analysis prompt sections - New module src/mantis_agent/context_modules.py: * ContextModule dataclass (applies_when predicate + prompt_section + step_hint + handler_hints) + push_step_context context-var so handlers publish dispatch context without signature plumbing through GymRunner / brain layers. * MODULES registry with two starter modules — BLOCKING_OVERLAY (scoped to extraction-phase steps; carousel / cookie / paywall / newsletter dismiss protocol) and KEYBOARD_SCROLL (scoped to scroll steps; large-amount scroll() preferred, no image clicks, no within-page-nav link clicks). * compose_system_prompt + applicable_step_hints + applicable_handler_hints — three composition helpers covering all three contribution channels. - brain_holo3._build_messages composes the system prompt per dispatch via compose_system_prompt(SYSTEM_PROMPT). - gym/step_handlers/holo3.py wraps gym_runner.run in push_step_context({step_type, step_section, step_intent}). - prompts/files/recovery_analysis.txt adds two highest-priority diagnoses ahead of the existing decode table: * SCROLL STEP EXHAUSTED MID-PAGE — when scroll_y advanced but the page is still partially traversed, return add_hint("continue scrolling with larger amount=15") rather than halt. Earlier the analyzer's halt-bias killed scroll steps that just needed another 2-3 iterations. * BLOCKING OVERLAY DETECTED — fullscreen carousel / cookie banner / paywall / signup modal -> add_hint with explicit dismiss-then-retry text. Site-agnostic; works through the existing recovery_hints channel. Test coverage - tests/test_context_modules.py (NEW, 23 tests) — context-var semantics (push/pop/nesting/copy), predicate firing matrix per starter module, prompt composition (sections include/exclude per context), handler hints surface, defensive predicate-error handling, registry sanity. - tests/test_workflow_runner_extract_viable_prefix.py (NEW, 4 tests) — pins the VIABLE prefix is added in Priority-1, left intact when already present, holds across URL backfill, and that Priority-2's existing prefix injection still works. - tests/test_execution_critic.py — added growth-gated re-fire test and step_result.step_index keying test; updated existing fire-once test to the new growth-gated semantic. - tests/test_agentic_retry_feedback.py — updated to reflect that _record_failure_for_retry now always appends (bare record when no target), plus stamp-guard test. - tests/test_cost_meter.py — added "sync even when steps_used == 0" test. - tests/test_observability_augur_518.py — added back-allocation happy path + noise-floor + zero-emitted-steps cases. - tests/test_dom_aware_click_validation_586.py — deleted entirely (the feature it tested is gone). 387 tests pass across the affected suites. Memory note added at ~/.claude/projects/-Users-barada-Sandbox-Mason-mantis/memory/project_pr586_cua_violation_removed_2026_05_22.md so the DOM-gate pattern doesn't get reintroduced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mercurialsolo merged commit f6bc6dc into main May 22, 2026
4 checks passed

This was referenced May 22, 2026

Extraction-quality tunings: scroll cap 12, scroll-to-top, BLOCKING_OVERLAY scope, Sonnet extractor #595

Merged

find_all_listings: pass already-extracted titles into scan prompt (pre-filter at source, not post-hoc) #597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove DOM-validation chain (PR #586), add ContextModule prompt registry + recovery fixes#591

Remove DOM-validation chain (PR #586), add ContextModule prompt registry + recovery fixes#591
mercurialsolo merged 1 commit into
mainfrom
fix/cua-pure-recovery-and-context-modules

mercurialsolo commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mercurialsolo commented May 22, 2026

Summary

Why the DOM-validation chain was removed

Architectural rule for future modules

Test plan

What this PR does NOT include

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant