Extraction-quality tunings: scroll cap 12, scroll-to-top, BLOCKING_OVERLAY scope, Sonnet extractor#595
Merged
Conversation
…ING_OVERLAY scope, Sonnet extractor Follow-up tunings discovered during real-world boattrader debugging against the PR #591 build: 1) Scroll budget cap 8 → 12 (gym/step_handlers/holo3.py) Observed on 4150-pixel boattrader detail pages: cap=8 ≈ ~2600px coverage (~330px/action — Holo3 picks small ``amount`` values despite the prompt asking for amount=10+). cap=12 ≈ ~3960px which covers most detail pages in the budget. Run history captured in the constant's comment for future tuning. 2) Scroll-to-top before extract on plateau-advance (gym/step_recovery.py) When scroll plateaus at the page bottom (3 cycles of CDP scrollBy with no movement → scroll_no_movement_advance), the recovery now runs ``window.scrollTo(0, 0)`` before advancing to the next step. The next step is typically extract_data; without the reset its screenshot was the page footer rather than the listing content. Best-effort (try/except) so a CDP failure can't break the runner. 3) BLOCKING_OVERLAY scoped to extract primitives only (context_modules.py) Predicate was ``step_section == "extraction" OR step_type in (extract_data, extract_url)``. The first clause matched scroll-in-extraction, which then got BOTH KEYBOARD_SCROLL AND BLOCKING_OVERLAY prompt sections — double overhead paralyzed the brain's scroll-step deliberation, exhausting cap=8 before page traversal. Tightened to ``step_type in (extract_data, extract_url)`` only — KEYBOARD_SCROLL already prohibits image clicks (the carousel trigger), which is the actual concern on scroll steps. 4) KEYBOARD_SCROLL prompt rewrite (context_modules.py) Earlier version mandated keyboard-only scrolling, which forced Page_Down × 6 to cover tall pages — exhausted the per-step budget. New prompt: prefer ``scroll(direction="down", amount=10)`` for bulk advancement, Page_Down/End as fallback for stuck pages, and an explicit "KNOW WHEN TO STOP" block teaching the brain to emit ``done(success=true)`` when it sees the footer or scrollY plateaus. Stripped plan-specific references ("Description and More Details", "Next Boat") in favor of generic site-agnostic guidance. 5) Extractor model: Opus 4.7 → Sonnet 4.6 (extraction/extractor.py) Run 20260522_204618_706eca3d returned 0 leads despite year/make/model/ price all visible in the detail-page screenshot. The extractor IS being called (17 extract calls in the cost summary) but returns no usable data. Switched to Sonnet 4.6 — right level for structured- field tool_use extraction (fast, cheap, proven on similar shapes). Verifier escalation path (haiku → opus on disagreement) is unchanged. Test coverage - tests/test_context_modules.py — predicate tests updated for the tightened BLOCKING_OVERLAY scope; new tests pin that scroll-in- extraction gets KEYBOARD_SCROLL only (no overlay-dismiss section), and that extract_data DOES get the overlay-dismiss handler hint. - tests/test_extractor_tool_use_migration.py — 58 tests pass with the Sonnet default; no test was pinned to the Opus model string. - tests/test_brain_budget_clamp.py — uses literal cap values in test inputs (independent of the DEFAULT_BRAIN_BUDGET_CAPS constant), so bumping scroll 8→12 doesn't affect them. 105 tests pass across the affected suites. Boattrader still produces 0 leads on the live site — the page content IS extractable (year=2015, make=Pioneer, model=197 Sportfish, price= $25,000 all visible per Chrome MCP inspection of the run's abandoned detail page) but the Opus extractor was returning empty. Sonnet verification is the next test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five focused follow-up tunings to PR #591, surfaced by running the boattrader plan against the live site post-merge and observing where extraction was still failing.
gym/step_handlers/holo3.py) — Holo3 picks ~330 px/action despite the prompt asking for amount=10+; cap=8 only covered ~2600px of 4150-pixel detail pages. cap=12 ≈ ~3960px coverage.gym/step_recovery.py) — when scroll plateaus at the page bottom (3 cycles ofscrollBywith no movement), recovery now runswindow.scrollTo(0, 0)before advancing so the next step's (typicallyextract_data) screenshot is the listing content, not the footer.BLOCKING_OVERLAYscoped to extract primitives only (context_modules.py) — was matchingstep_section == "extraction", which included scroll-in-extraction. Result: scroll steps got BOTHKEYBOARD_SCROLLANDBLOCKING_OVERLAYprompt sections, doubling deliberation overhead and exhausting cap=8. Tightened tostep_type in (extract_data, extract_url)only —KEYBOARD_SCROLLalready prohibits image clicks (the carousel trigger).KEYBOARD_SCROLLrewrite — earlier "keyboard-only" mandate forced Page_Down × 6 to cover tall pages and exhausted the budget. New section prefersscroll(amount=10), keeps Page_Down/End as fallback, adds an explicit "KNOW WHEN TO STOP" block teaching the brain to emitdone(success=true)when it sees the footer or scrollY plateaus. Stripped plan-specific references (boattrader's "Description / More Details" / "Next Boat") in favor of generic site-agnostic guidance.extraction/extractor.py) — run20260522_204618_706eca3dreturned 0 leads despite year/make/model/price all visible in the detail-page screenshot per Chrome MCP inspection. Opus extractor was returning empty; Sonnet is the right level for structured-field tool_use extraction. Verifier escalation path (haiku → opus on disagreement) unchanged.Debugging arc that surfaced these
…_201848add_hintnothalt, but brain still exhausts at bottom…_204618…_211720(this PR)Outstanding (not in this PR)
Even with all of the above, the boattrader live-site test produces 0 leads. Chrome MCP inspection of an abandoned detail page (
2015-pioneer-197-sportfish-10079002) shows ALL extraction targets ARE on the page: year=2015, make=Pioneer, model=197 Sportfish, price=$25,000, seller=Private Seller. The Opus extractor was being called (17 calls in the cost breakdown) but returning empty. This PR tests Sonnet as the fix; if the Sonnet run still returns 0 leads, the next investigation is the extractor's prompt construction without a schema (the boattrader plan doesn't passextraction_schemaor_objective, so it uses the legacy hardcoded prompt — that prompt may be the actual bottleneck).Test plan
tests/test_context_modules.py— predicate tests updated for the new BLOCKING_OVERLAY scope; pins scroll-in-extraction gets KEYBOARD_SCROLL only, extract_data DOES get the overlay-dismiss handler hint.tests/test_extractor_tool_use_migration.py— 58 tests pass with the Sonnet default; no test was pinned to the Opus model string.tests/test_brain_budget_clamp.py— uses literal cap values in test inputs (independent of the constant), so bumping scroll 8→12 doesn't affect them.tests/test_step_recovery.py/test_step_recovery_policy.py— scroll-to-top change is best-effort (try/except), recovery contract unchanged.🤖 Generated with Claude Code