Skip to content

Extraction-quality tunings: scroll cap 12, scroll-to-top, BLOCKING_OVERLAY scope, Sonnet extractor#595

Merged
mercurialsolo merged 1 commit into
mainfrom
fix/post-pr591-extraction-tunings
May 22, 2026
Merged

Extraction-quality tunings: scroll cap 12, scroll-to-top, BLOCKING_OVERLAY scope, Sonnet extractor#595
mercurialsolo merged 1 commit into
mainfrom
fix/post-pr591-extraction-tunings

Conversation

@mercurialsolo
Copy link
Copy Markdown
Owner

Summary

Five focused follow-up tunings to PR #591, surfaced by running the boattrader plan against the live site post-merge and observing where extraction was still failing.

  • Scroll budget cap 8 → 12 (gym/step_handlers/holo3.py) — Holo3 picks ~330 px/action despite the prompt asking for amount=10+; cap=8 only covered ~2600px of 4150-pixel detail pages. cap=12 ≈ ~3960px coverage.
  • Scroll-to-top before extract (gym/step_recovery.py) — when scroll plateaus at the page bottom (3 cycles of scrollBy with no movement), recovery now runs window.scrollTo(0, 0) before advancing so the next step's (typically extract_data) screenshot is the listing content, not the footer.
  • BLOCKING_OVERLAY scoped to extract primitives only (context_modules.py) — was matching step_section == "extraction", which included scroll-in-extraction. Result: scroll steps got BOTH KEYBOARD_SCROLL AND BLOCKING_OVERLAY prompt sections, doubling deliberation overhead and exhausting cap=8. Tightened to step_type in (extract_data, extract_url) only — KEYBOARD_SCROLL already prohibits image clicks (the carousel trigger).
  • KEYBOARD_SCROLL rewrite — earlier "keyboard-only" mandate forced Page_Down × 6 to cover tall pages and exhausted the budget. New section prefers scroll(amount=10), keeps Page_Down/End as fallback, adds an explicit "KNOW WHEN TO STOP" block teaching the brain to emit done(success=true) when it sees the footer or scrollY plateaus. Stripped plan-specific references (boattrader's "Description / More Details" / "Next Boat") in favor of generic site-agnostic guidance.
  • Extractor model: Opus 4.7 → Sonnet 4.6 (extraction/extractor.py) — run 20260522_204618_706eca3d returned 0 leads despite year/make/model/price all visible in the detail-page screenshot per Chrome MCP inspection. Opus extractor was returning empty; Sonnet is the right level for structured-field tool_use extraction. Verifier escalation path (haiku → opus on disagreement) unchanged.

Debugging arc that surfaced these

Run What changed Outcome Lesson
post-PR-591 baseline (PR #591 + plan rewrite, default caps) 9 steps / 0 leads / time_cap scroll cap=8 too tight
…_201848 cap=8 + analyzer halt-bias fix scroll plateaus at scrollY=2640 analyzer now returns add_hint not halt, but brain still exhausts at bottom
…_204618 cap=12 + "know when to stop" + scroll-to-top extract runs from top, still fails rules out budget/viewport-position causes; the extractor itself is returning empty
…_211720 (this PR) + Sonnet extractor TBD direct test of whether Opus tool_use was the bottleneck

Outstanding (not in this PR)

Even with all of the above, the boattrader live-site test produces 0 leads. Chrome MCP inspection of an abandoned detail page (2015-pioneer-197-sportfish-10079002) shows ALL extraction targets ARE on the page: year=2015, make=Pioneer, model=197 Sportfish, price=$25,000, seller=Private Seller. The Opus extractor was being called (17 calls in the cost breakdown) but returning empty. This PR tests Sonnet as the fix; if the Sonnet run still returns 0 leads, the next investigation is the extractor's prompt construction without a schema (the boattrader plan doesn't pass extraction_schema or _objective, so it uses the legacy hardcoded prompt — that prompt may be the actual bottleneck).

Test plan

  • tests/test_context_modules.py — predicate tests updated for the new BLOCKING_OVERLAY scope; pins scroll-in-extraction gets KEYBOARD_SCROLL only, extract_data DOES get the overlay-dismiss handler hint.
  • tests/test_extractor_tool_use_migration.py — 58 tests pass with the Sonnet default; no test was pinned to the Opus model string.
  • tests/test_brain_budget_clamp.py — uses literal cap values in test inputs (independent of the constant), so bumping scroll 8→12 doesn't affect them.
  • tests/test_step_recovery.py / test_step_recovery_policy.py — scroll-to-top change is best-effort (try/except), recovery contract unchanged.
  • 105 tests pass across the affected suites.

🤖 Generated with Claude Code

…ING_OVERLAY scope, Sonnet extractor

Follow-up tunings discovered during real-world boattrader debugging against
the PR #591 build:

1) Scroll budget cap 8 → 12 (gym/step_handlers/holo3.py)

   Observed on 4150-pixel boattrader detail pages: cap=8 ≈ ~2600px coverage
   (~330px/action — Holo3 picks small ``amount`` values despite the prompt
   asking for amount=10+). cap=12 ≈ ~3960px which covers most detail pages
   in the budget. Run history captured in the constant's comment for
   future tuning.

2) Scroll-to-top before extract on plateau-advance (gym/step_recovery.py)

   When scroll plateaus at the page bottom (3 cycles of CDP scrollBy with
   no movement → scroll_no_movement_advance), the recovery now runs
   ``window.scrollTo(0, 0)`` before advancing to the next step. The next
   step is typically extract_data; without the reset its screenshot was
   the page footer rather than the listing content. Best-effort
   (try/except) so a CDP failure can't break the runner.

3) BLOCKING_OVERLAY scoped to extract primitives only (context_modules.py)

   Predicate was ``step_section == "extraction" OR step_type in
   (extract_data, extract_url)``. The first clause matched
   scroll-in-extraction, which then got BOTH KEYBOARD_SCROLL AND
   BLOCKING_OVERLAY prompt sections — double overhead paralyzed the
   brain's scroll-step deliberation, exhausting cap=8 before page traversal.
   Tightened to ``step_type in (extract_data, extract_url)`` only —
   KEYBOARD_SCROLL already prohibits image clicks (the carousel trigger),
   which is the actual concern on scroll steps.

4) KEYBOARD_SCROLL prompt rewrite (context_modules.py)

   Earlier version mandated keyboard-only scrolling, which forced
   Page_Down × 6 to cover tall pages — exhausted the per-step budget.
   New prompt: prefer ``scroll(direction="down", amount=10)`` for bulk
   advancement, Page_Down/End as fallback for stuck pages, and an
   explicit "KNOW WHEN TO STOP" block teaching the brain to emit
   ``done(success=true)`` when it sees the footer or scrollY plateaus.
   Stripped plan-specific references ("Description and More Details",
   "Next Boat") in favor of generic site-agnostic guidance.

5) Extractor model: Opus 4.7 → Sonnet 4.6 (extraction/extractor.py)

   Run 20260522_204618_706eca3d returned 0 leads despite year/make/model/
   price all visible in the detail-page screenshot. The extractor IS
   being called (17 extract calls in the cost summary) but returns no
   usable data. Switched to Sonnet 4.6 — right level for structured-
   field tool_use extraction (fast, cheap, proven on similar shapes).
   Verifier escalation path (haiku → opus on disagreement) is unchanged.

Test coverage

- tests/test_context_modules.py — predicate tests updated for the
  tightened BLOCKING_OVERLAY scope; new tests pin that scroll-in-
  extraction gets KEYBOARD_SCROLL only (no overlay-dismiss section),
  and that extract_data DOES get the overlay-dismiss handler hint.
- tests/test_extractor_tool_use_migration.py — 58 tests pass with the
  Sonnet default; no test was pinned to the Opus model string.
- tests/test_brain_budget_clamp.py — uses literal cap values in test
  inputs (independent of the DEFAULT_BRAIN_BUDGET_CAPS constant), so
  bumping scroll 8→12 doesn't affect them.

105 tests pass across the affected suites.

Boattrader still produces 0 leads on the live site — the page content
IS extractable (year=2015, make=Pioneer, model=197 Sportfish, price=
$25,000 all visible per Chrome MCP inspection of the run's abandoned
detail page) but the Opus extractor was returning empty. Sonnet
verification is the next test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mercurialsolo mercurialsolo merged commit d9f487c into main May 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant