refactor(troubleshoot): tiered signature-index investigator (v2 architecture) by dmorosanu · Pull Request #1820 · UiPath/skills

dmorosanu · 2026-07-02T13:12:27Z

Why

Eval data across 184 replay scenarios showed the previous 7-sub-agent orchestration (triage → scope-checker → generator → sequential testers → depth-verifier → presenter) cost 25–55 min per investigation while its marginal value sits in the playbook knowledge, not the choreography: 159/184 scenarios pass without the skill, 12 more flake-pass on retry, and the 13 stable-fails are all cases where a playbook fact or decision-tree discriminator is required. This PR keeps the knowledge and removes the choreography.

What changed

SKILL.md rewritten as a single-context tiered protocol (~145 lines): anchor → extract signals (mandatory AggregateException/inner-exception unwrap) → route via a greppable signature index → walk the matched playbook's decision tree → mandatory format-forced verification checklist (cause named verbatim, evidence pinned vs sibling causes, runtime-evidence gate, resolution-branch alignment, causal precedence) → present. The previous Critical Rules section is restructured into ## 1. Invariants — every load-bearing rule is preserved or strengthened (no-CLI-discovery, retry caps, empty≠absent, live≠historical, correlation, raw-file redirect, symptom≠cause, fix-approval gate); the removed rules were orchestration mechanics (never run uip yourself, sequential testers, verbatim presenter hand-off) that no longer have a boundary to protect.
agents/ (7 files) and schemas/ (4 files) deleted. Their content is redistributed: tester gates → SKILL.md walk rules; depth-verifier → inline checklist + escalation verifier prompt; presenter.md → references/presenting.md (near-verbatim, including the interactive Healing-Agent apply-flow and the approval gate: user source files are never modified without explicit approval, decline/non-answer = no edit).
Tier 0 — signature index. All 215 playbooks now declare signatures: frontmatter (631 signatures; 22 justified silent: true); scripts/build-signature-index.py generates references/signature-index.md (grep-only routing table + no-signature symptom routing + signal-extraction cheatsheet) and lints: every playbook routable or silent, duplicate (kind,value) claims require discriminating notes, exclusion targets must exist. Playbook bodies untouched.
Tier 2 — escalation (references/escalation.md, loaded only on 6 defined triggers): 2–4 parallel read-only hypothesis probes + adjudication + a conditional fresh-eyes verifier; bounded spawn budget.
Investigation state simplified to .local/investigations/raw/*.json + notes.md (state.json/hypotheses.json/needs_input.json removed; generate_scenario.py needs no code change — it globs by basename).
Test-side docs updated (tests CLAUDE.md input table + forbidden-criteria rationale). Three excel fixture project.json descriptions neutralized — they spelled out the scenario's root cause in agent-visible text.

Validation

Signature lint + index freshness clean; description-length hook and skill-status check pass.
Local sentinel — 15/15 pass (13 stable-fails + 2 known flaky, judge-graded, skill loaded from this branch): 13 at 1.000, all ≥0.8. Per-scenario wall time 5.7–17.7 min, avg ~9.4 min vs 25–55 min under v1. Three first-pass misses (no-healing-agent, replace-text-silent, excel-rr-sheet-bytes) were transcript-diagnosed, fixed (6422dcea8), and re-run to 3× 1.000.
CI full suite (185 tasks, ubuntu, parallelism 4) — first pass 173/185 = 93.5% in a single 2h15m job (matches v1's ~93% nightly rate at ~1/10 the per-task cost). After the fixes above, every one of the 185 tasks has ≥1 passing CI run; the only test to fail both CI tracks (replace-text-silent) is fixed and CI-verified.

Speed: v1 vs v2

	v1 (old)	v2 (this branch)
Per scenario, local (same proxy/model, parallelism 3)	25–55 min	5.7–17.7 min, avg ~9.4
Per task, CI (185-task suite, j=4)	25–55 min	avg 3.5 · median 3.2 · p90 5.1 · max 16 min
Full suite, CI	overnight-cron only (~20–40 h at v1 pace)	one 2h15m job

≈ 3–5× faster locally, ~7–10× per task on CI, quality up not down (15/15 on the former hard core).

Follow-ups

.github/workflows/validate-signature-index.yml (index lint PR gate) is authored but not in this branch — the push credential lacks workflow scope. Will be added once granted.
Suite is green; lower per-task run_limits (5400s task_timeout is now ~10× headroom) in a separate PR.

dmorosanu · 2026-07-02T13:49:00Z

Sentinel validation complete (local, judge-graded, skill loaded from this branch):

15/15 pass — the 13 previously stable-failing scenarios plus getasset-activity-silent-failure and healing-agent-no-license. 12 passed first-pass; 3 misses were diagnosed from transcripts and fixed in 6422dce:

no-healing-agent (judge 0.2 → 1.0): source-acquisition failure — the playbook forbade checking the working directory for the project while the workflow source sat in cwd. Restored cwd-first discovery (SKILL.md §5.4 precedence + playbook step 5), matching the old tester's auto-discovery order.
replace-text-silent (0.5 → 1.0): agent bundled a speculative "second bug" fix for a property the failing run never evaluated. New checklist rule §6.6: fixes must trace to the confirmed cause; unexercised code paths are unverified observations only.
excel-rr-sheet-bytes (0.2 → 1.0): ground-truth precision — the raw Get Workbook Sheets payload preserves the NBSP bytes, so byte-verified branch identification is correct client behavior; RESOLUTION.md now accepts it alongside the byte-compare recommendation.

Wall time per scenario: 6–18 min (vs 25–55 min under the previous architecture), all at parallelism 3 through the local proxy. Full-suite CI run on this branch: https://github.com/UiPath/skills/actions/runs/28592769248

dmorosanu · 2026-07-03T05:49:38Z

Full-suite validation complete — dual CI coverage (every task ran on two independent CI tracks) plus a fix round:

Track	Result	Per-task wall time
Full-suite run (185 tasks, one job)	173/185	avg 3.5 min, median 3.2, p90 5.1, max 16 (total 2h15m)
Sequential 6-task batches (31 runs, same 185 tasks)	178/185	~10 min per 6-task batch incl. setup
Re-run of all 8 remaining failures after fixes (`64a9a33`)	8/8, scores 0.925–1.000	2–5 min

Every one of the 185 tasks now has at least one passing CI run on this branch; the only scenario that failed both tracks (replace-text-silent) is fixed and re-verified. Fix commits since the sentinel round: c34bafc (XAML expression-binding false-positive guard, post-apply resolution restatement, fix-scope hardening) and 64a9a33 (solution-root resource glob — the IS overview documented the wrong layout, argument-null source-read mandate, contradiction-terminal rule, getasset manifest order-tolerant mock rules + allowed_tools cleanup). No coder-eval or judge changes.

Known residual (pre-existing, not this PR): skill activation is a coin flip on some "why did my job fail" phrasings — 4 single-track failures were correct diagnoses (judge 0.8–1.0) that activated uipath-platform instead of uipath-troubleshoot and lost only skill_triggered. Recommended follow-up outside this PR's scope: add the faulted-job→uipath-troubleshoot redirect to uipath-platform's when_to_use.

dmorosanu · 2026-07-03T06:16:02Z

Routing-fix validation (e7e1c9b — front-loaded the faulted-job→uipath-troubleshoot redirect in uipath-platform's when_to_use; the previous redirects sat past the 1536-char listing truncation and never reached the model):

3 CI rounds × the 4 routing-affected tasks = 12 samples:

Task	R1	R2	R3
gsuite-connection-invalid	pass	pass	pass
connector-general-disabled	pass	pass	pass
connector-general-no-access	pass	misroute	pass
uia-alter-if-disabled	pass	pass	pass

Activation success: 11/12 (92%) vs ~50% observed pre-fix on these prompts (each of the four misrouted on one of its two pre-fix track runs). The residual miss is the most connection-token-heavy prompt in the suite — the coin flip is attenuated, not eliminated. All 11 correctly-routed runs scored 1.000.

… signature-index investigator

…le, byte-verified excel ground truth

…equire post-apply resolution restatement

…e, contradiction terminal, getasset mock tolerance

…roubleshoot in when_to_use

…ate localized messages before index grep

dmorosanu mentioned this pull request Jul 3, 2026

feat(troubleshoot): signature frontmatter for pending playbook PRs (merge LAST) #1837

Draft

dmorosanu added 5 commits July 3, 2026 12:19

refactor(troubleshoot): replace multi-agent orchestration with tiered…

9204400

… signature-index investigator

fix(troubleshoot): cwd-first source discovery, fix-scope checklist ru…

33ab0b4

…le, byte-verified excel ground truth

fix(troubleshoot): guard XAML expression-binding false positive and r…

ef50bbd

…equire post-apply resolution restatement

fix(troubleshoot): solution-root resource glob, argnull source mandat…

ceac056

…e, contradiction terminal, getasset mock tolerance

fix(platform): front-load faulted-job root-cause redirect to uipath-t…

1ac0ae4

…roubleshoot in when_to_use

dmorosanu force-pushed the feat/troubleshoot-v2 branch from e7e1c9b to 1ac0ae4 Compare July 3, 2026 09:20

feat(troubleshoot): route on language-invariant signals first; transl…

157fc48

…ate localized messages before index grep

dmorosanu mentioned this pull request Jul 3, 2026

feat(troubleshoot): language-invariant routing signatures + localization lint warning #1842

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(troubleshoot): tiered signature-index investigator (v2 architecture)#1820

refactor(troubleshoot): tiered signature-index investigator (v2 architecture)#1820
dmorosanu wants to merge 6 commits into
mainfrom
feat/troubleshoot-v2

dmorosanu commented Jul 2, 2026 •

edited

Loading

Uh oh!

dmorosanu commented Jul 2, 2026

Uh oh!

dmorosanu commented Jul 3, 2026

Uh oh!

dmorosanu commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dmorosanu commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Validation

Speed: v1 vs v2

Follow-ups

Uh oh!

dmorosanu commented Jul 2, 2026

Uh oh!

dmorosanu commented Jul 3, 2026

Uh oh!

dmorosanu commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dmorosanu commented Jul 2, 2026 •

edited

Loading