Extraction-quality tunings: scroll cap 12, scroll-to-top, BLOCKING_OVERLAY scope, Sonnet extractor by mercurialsolo · Pull Request #595 · mercurialsolo/mantis

mercurialsolo · 2026-05-22T21:19:04Z

Summary

Five focused follow-up tunings to PR #591, surfaced by running the boattrader plan against the live site post-merge and observing where extraction was still failing.

Scroll budget cap 8 → 12 (gym/step_handlers/holo3.py) — Holo3 picks ~330 px/action despite the prompt asking for amount=10+; cap=8 only covered ~2600px of 4150-pixel detail pages. cap=12 ≈ ~3960px coverage.
Scroll-to-top before extract (gym/step_recovery.py) — when scroll plateaus at the page bottom (3 cycles of scrollBy with no movement), recovery now runs window.scrollTo(0, 0) before advancing so the next step's (typically extract_data) screenshot is the listing content, not the footer.
BLOCKING_OVERLAY scoped to extract primitives only (context_modules.py) — was matching step_section == "extraction", which included scroll-in-extraction. Result: scroll steps got BOTH KEYBOARD_SCROLL AND BLOCKING_OVERLAY prompt sections, doubling deliberation overhead and exhausting cap=8. Tightened to step_type in (extract_data, extract_url) only — KEYBOARD_SCROLL already prohibits image clicks (the carousel trigger).
KEYBOARD_SCROLL rewrite — earlier "keyboard-only" mandate forced Page_Down × 6 to cover tall pages and exhausted the budget. New section prefers scroll(amount=10), keeps Page_Down/End as fallback, adds an explicit "KNOW WHEN TO STOP" block teaching the brain to emit done(success=true) when it sees the footer or scrollY plateaus. Stripped plan-specific references (boattrader's "Description / More Details" / "Next Boat") in favor of generic site-agnostic guidance.
Extractor model: Opus 4.7 → Sonnet 4.6 (extraction/extractor.py) — run 20260522_204618_706eca3d returned 0 leads despite year/make/model/price all visible in the detail-page screenshot per Chrome MCP inspection. Opus extractor was returning empty; Sonnet is the right level for structured-field tool_use extraction. Verifier escalation path (haiku → opus on disagreement) unchanged.

Debugging arc that surfaced these

Run	What changed	Outcome	Lesson
post-PR-591 baseline	(PR #591 + plan rewrite, default caps)	9 steps / 0 leads / time_cap	scroll cap=8 too tight
`…_201848`	cap=8 + analyzer halt-bias fix	scroll plateaus at scrollY=2640	analyzer now returns `add_hint` not `halt`, but brain still exhausts at bottom
`…_204618`	cap=12 + "know when to stop" + scroll-to-top	extract runs from top, still fails	rules out budget/viewport-position causes; the extractor itself is returning empty
`…_211720` (this PR)	+ Sonnet extractor	TBD	direct test of whether Opus tool_use was the bottleneck

Outstanding (not in this PR)

Even with all of the above, the boattrader live-site test produces 0 leads. Chrome MCP inspection of an abandoned detail page (2015-pioneer-197-sportfish-10079002) shows ALL extraction targets ARE on the page: year=2015, make=Pioneer, model=197 Sportfish, price=$25,000, seller=Private Seller. The Opus extractor was being called (17 calls in the cost breakdown) but returning empty. This PR tests Sonnet as the fix; if the Sonnet run still returns 0 leads, the next investigation is the extractor's prompt construction without a schema (the boattrader plan doesn't pass extraction_schema or _objective, so it uses the legacy hardcoded prompt — that prompt may be the actual bottleneck).

Test plan

tests/test_context_modules.py — predicate tests updated for the new BLOCKING_OVERLAY scope; pins scroll-in-extraction gets KEYBOARD_SCROLL only, extract_data DOES get the overlay-dismiss handler hint.
tests/test_extractor_tool_use_migration.py — 58 tests pass with the Sonnet default; no test was pinned to the Opus model string.
tests/test_brain_budget_clamp.py — uses literal cap values in test inputs (independent of the constant), so bumping scroll 8→12 doesn't affect them.
tests/test_step_recovery.py / test_step_recovery_policy.py — scroll-to-top change is best-effort (try/except), recovery contract unchanged.
105 tests pass across the affected suites.

🤖 Generated with Claude Code

…ING_OVERLAY scope, Sonnet extractor Follow-up tunings discovered during real-world boattrader debugging against the PR #591 build: 1) Scroll budget cap 8 → 12 (gym/step_handlers/holo3.py) Observed on 4150-pixel boattrader detail pages: cap=8 ≈ ~2600px coverage (~330px/action — Holo3 picks small ``amount`` values despite the prompt asking for amount=10+). cap=12 ≈ ~3960px which covers most detail pages in the budget. Run history captured in the constant's comment for future tuning. 2) Scroll-to-top before extract on plateau-advance (gym/step_recovery.py) When scroll plateaus at the page bottom (3 cycles of CDP scrollBy with no movement → scroll_no_movement_advance), the recovery now runs ``window.scrollTo(0, 0)`` before advancing to the next step. The next step is typically extract_data; without the reset its screenshot was the page footer rather than the listing content. Best-effort (try/except) so a CDP failure can't break the runner. 3) BLOCKING_OVERLAY scoped to extract primitives only (context_modules.py) Predicate was ``step_section == "extraction" OR step_type in (extract_data, extract_url)``. The first clause matched scroll-in-extraction, which then got BOTH KEYBOARD_SCROLL AND BLOCKING_OVERLAY prompt sections — double overhead paralyzed the brain's scroll-step deliberation, exhausting cap=8 before page traversal. Tightened to ``step_type in (extract_data, extract_url)`` only — KEYBOARD_SCROLL already prohibits image clicks (the carousel trigger), which is the actual concern on scroll steps. 4) KEYBOARD_SCROLL prompt rewrite (context_modules.py) Earlier version mandated keyboard-only scrolling, which forced Page_Down × 6 to cover tall pages — exhausted the per-step budget. New prompt: prefer ``scroll(direction="down", amount=10)`` for bulk advancement, Page_Down/End as fallback for stuck pages, and an explicit "KNOW WHEN TO STOP" block teaching the brain to emit ``done(success=true)`` when it sees the footer or scrollY plateaus. Stripped plan-specific references ("Description and More Details", "Next Boat") in favor of generic site-agnostic guidance. 5) Extractor model: Opus 4.7 → Sonnet 4.6 (extraction/extractor.py) Run 20260522_204618_706eca3d returned 0 leads despite year/make/model/ price all visible in the detail-page screenshot. The extractor IS being called (17 extract calls in the cost summary) but returns no usable data. Switched to Sonnet 4.6 — right level for structured- field tool_use extraction (fast, cheap, proven on similar shapes). Verifier escalation path (haiku → opus on disagreement) is unchanged. Test coverage - tests/test_context_modules.py — predicate tests updated for the tightened BLOCKING_OVERLAY scope; new tests pin that scroll-in- extraction gets KEYBOARD_SCROLL only (no overlay-dismiss section), and that extract_data DOES get the overlay-dismiss handler hint. - tests/test_extractor_tool_use_migration.py — 58 tests pass with the Sonnet default; no test was pinned to the Opus model string. - tests/test_brain_budget_clamp.py — uses literal cap values in test inputs (independent of the DEFAULT_BRAIN_BUDGET_CAPS constant), so bumping scroll 8→12 doesn't affect them. 105 tests pass across the affected suites. Boattrader still produces 0 leads on the live site — the page content IS extractable (year=2015, make=Pioneer, model=197 Sportfish, price= $25,000 all visible per Chrome MCP inspection of the run's abandoned detail page) but the Opus extractor was returning empty. Sonnet verification is the next test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mercurialsolo merged commit d9f487c into main May 22, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extraction-quality tunings: scroll cap 12, scroll-to-top, BLOCKING_OVERLAY scope, Sonnet extractor#595

Extraction-quality tunings: scroll cap 12, scroll-to-top, BLOCKING_OVERLAY scope, Sonnet extractor#595
mercurialsolo merged 1 commit into
mainfrom
fix/post-pr591-extraction-tunings

mercurialsolo commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mercurialsolo commented May 22, 2026

Summary

Debugging arc that surfaced these

Outstanding (not in this PR)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant