Skip to content

fix(ixp-skill): drop document image viewing from configure-model step#1828

Open
cezara98t wants to merge 1 commit into
mainfrom
fix/ixp-guide-drop-image-download
Open

fix(ixp-skill): drop document image viewing from configure-model step#1828
cezara98t wants to merge 1 commit into
mainfrom
fix/ixp-guide-drop-image-download

Conversation

@cezara98t

Copy link
Copy Markdown
Contributor

Problem

The e2e full-lifecycle test was flaky (~2–3 of every 8 runs failed). Root cause: Step 2 of the project-setup guide told the agent to download 2–3 sample document images and Read them before configure-model. Each image is 50K–170K chars of base64, driving the context to 700–760K tokens before the lifecycle even got going. The model then emitted a text-only turn, the SDK reported end_turn, and the run ended mid-lifecycle — typically right at or after configure-model/get-predictions, well before confirm, update-prompts, and publish.

This fix was originally made on fix/ixp-e2e-drop-f1-direction-gate (commit 110f1ff07) but was never shipped — PR #1814 merged without it, so main still carried the image-dumping Step 2.

Fix

Step 2 now applies the model configuration directly and explicitly instructs the agent not to download or Read document images to decide. gemini_2_5_flash + table_mini is the default (correct for invoices and most structured docs); a compact table covers the override cases. Image reading remains where it's genuinely needed — per-document labelling and the improve-prompts phase — both of which the passing runs already tolerate.

Evidence

Guide state e2e result
Before (image dump in Step 2) 6/8 passing — 2 runs died at ~700–760K-token context
After (this change) 8/8 passing, every run 13/13, score 1.000

🤖 Generated with Claude Code

The guide told the agent to download 2-3 sample document images and Read
them before configuring the model. Each image injects ~300k base64 tokens
into context; with 3 invoices the context overflows and the agent emits
an end_turn text response mid-lifecycle, scoring 0 on everything past the
confirm step. The default recommendation (gemini_2_5_flash + table_mini)
is already correct for invoices — no visual inspection needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cezara98t's task in 1m 28s —— View job


PR Review: fix(ixp-skill): drop document image viewing from configure-model step

  • Read review criteria and project conventions
  • Analyze the full diff
  • Read the full changed file for context
  • Check for cross-skill references, broken links, secrets
  • Post review findings

Summary

This PR removes the instruction for agents to download and Read 2–3 sample document images before running configure-model, which was bloating the context window (700–760K tokens) and causing flaky e2e failures. Instead, the step now applies gemini_2_5_flash + table_mini directly as the default and provides a compact override table.

Change-by-Change Review

1. skills/uipath-ixp/references/project-setup-guide.md

Severity: OK

The change is clean and well-motivated:

  • Lines 51–52: The new instruction is prescriptive ("Apply the configuration directly — do NOT download or Read document images to decide") — good LLM usability; the negative instruction prevents the agent from reverting to the old behavior.
  • Lines 53–58: The configure-model command is now shown first with concrete default values (gemini_2_5_flash, table_mini) instead of a placeholder <none|table_mini|table>. This is an improvement — the agent gets a copy-paste-ready command for the happy path.
  • Lines 60–67: The override table is simplified from 4 rows to 4 rows with compressed descriptions ("simple or multiple tables", "nested/merged/multi-page tables"). Table still covers all cases.
  • The removed documents list / documents download commands are still documented in the CLI reference and used elsewhere (labelling, improve-prompts), so they're not lost — just removed from this step where they caused harm.
  • All relative links (cli-reference.md, label-documents-guide.md) resolve correctly.
  • No cross-skill references, no secrets, no hardcoded paths.

What's Missing

Nothing identified. The change is tightly scoped to the root cause (image downloads in Step 2) and the e2e test (tests/tasks/uipath-ixp/e2e/full_lifecycle.yaml) does not assert on the removed commands, so no test updates are needed.

Area Ratings

Area Status Notes
Frontmatter OK Not touched
E2E Tests OK Existing e2e test unaffected; PR description shows 8/8 passing
Skill Body OK Not touched
References & Assets OK Clean simplification, all links valid
Repo Hygiene OK Scoped change, no secrets, no cross-skill deps

Issues for Manual Review

None found.

Conclusion

Clean, well-scoped fix with strong empirical evidence (8/8 passing vs. 6/8 before). The change improves both reliability and LLM usability by giving the agent a direct default command instead of requiring expensive image inspection. Approve.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant