fix(ixp-skill): drop document image viewing from configure-model step#1828
fix(ixp-skill): drop document image viewing from configure-model step#1828cezara98t wants to merge 1 commit into
Conversation
The guide told the agent to download 2-3 sample document images and Read them before configuring the model. Each image injects ~300k base64 tokens into context; with 3 invoices the context overflows and the agent emits an end_turn text response mid-lifecycle, scoring 0 on everything past the confirm step. The default recommendation (gemini_2_5_flash + table_mini) is already correct for invoices — no visual inspection needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude finished @cezara98t's task in 1m 28s —— View job PR Review: fix(ixp-skill): drop document image viewing from configure-model step
SummaryThis PR removes the instruction for agents to download and Read 2–3 sample document images before running Change-by-Change Review1.
|
| Area | Status | Notes |
|---|---|---|
| Frontmatter | OK | Not touched |
| E2E Tests | OK | Existing e2e test unaffected; PR description shows 8/8 passing |
| Skill Body | OK | Not touched |
| References & Assets | OK | Clean simplification, all links valid |
| Repo Hygiene | OK | Scoped change, no secrets, no cross-skill deps |
Issues for Manual Review
None found.
Conclusion
Clean, well-scoped fix with strong empirical evidence (8/8 passing vs. 6/8 before). The change improves both reliability and LLM usability by giving the agent a direct default command instead of requiring expensive image inspection. Approve.
Problem
The e2e full-lifecycle test was flaky (~2–3 of every 8 runs failed). Root cause: Step 2 of the project-setup guide told the agent to download 2–3 sample document images and Read them before
configure-model. Each image is 50K–170K chars of base64, driving the context to 700–760K tokens before the lifecycle even got going. The model then emitted a text-only turn, the SDK reportedend_turn, and the run ended mid-lifecycle — typically right at or afterconfigure-model/get-predictions, well beforeconfirm,update-prompts, andpublish.This fix was originally made on
fix/ixp-e2e-drop-f1-direction-gate(commit110f1ff07) but was never shipped — PR #1814 merged without it, somainstill carried the image-dumping Step 2.Fix
Step 2 now applies the model configuration directly and explicitly instructs the agent not to download or Read document images to decide.
gemini_2_5_flash+table_miniis the default (correct for invoices and most structured docs); a compact table covers the override cases. Image reading remains where it's genuinely needed — per-document labelling and the improve-prompts phase — both of which the passing runs already tolerate.Evidence
🤖 Generated with Claude Code