refactor(troubleshoot): tiered signature-index investigator (v2 architecture)#1820
refactor(troubleshoot): tiered signature-index investigator (v2 architecture)#1820dmorosanu wants to merge 6 commits into
Conversation
|
Sentinel validation complete (local, judge-graded, skill loaded from this branch): 15/15 pass — the 13 previously stable-failing scenarios plus getasset-activity-silent-failure and healing-agent-no-license. 12 passed first-pass; 3 misses were diagnosed from transcripts and fixed in 6422dce:
Wall time per scenario: 6–18 min (vs 25–55 min under the previous architecture), all at parallelism 3 through the local proxy. Full-suite CI run on this branch: https://github.com/UiPath/skills/actions/runs/28592769248 |
|
Full-suite validation complete — dual CI coverage (every task ran on two independent CI tracks) plus a fix round:
Every one of the 185 tasks now has at least one passing CI run on this branch; the only scenario that failed both tracks (replace-text-silent) is fixed and re-verified. Fix commits since the sentinel round: c34bafc (XAML expression-binding false-positive guard, post-apply resolution restatement, fix-scope hardening) and 64a9a33 (solution-root resource glob — the IS overview documented the wrong layout, argument-null source-read mandate, contradiction-terminal rule, getasset manifest order-tolerant mock rules + allowed_tools cleanup). No coder-eval or judge changes. Known residual (pre-existing, not this PR): skill activation is a coin flip on some "why did my job fail" phrasings — 4 single-track failures were correct diagnoses (judge 0.8–1.0) that activated uipath-platform instead of uipath-troubleshoot and lost only skill_triggered. Recommended follow-up outside this PR's scope: add the faulted-job→uipath-troubleshoot redirect to uipath-platform's when_to_use. |
|
Routing-fix validation (e7e1c9b — front-loaded the faulted-job→uipath-troubleshoot redirect in uipath-platform's when_to_use; the previous redirects sat past the 1536-char listing truncation and never reached the model): 3 CI rounds × the 4 routing-affected tasks = 12 samples:
Activation success: 11/12 (92%) vs ~50% observed pre-fix on these prompts (each of the four misrouted on one of its two pre-fix track runs). The residual miss is the most connection-token-heavy prompt in the suite — the coin flip is attenuated, not eliminated. All 11 correctly-routed runs scored 1.000. |
… signature-index investigator
…le, byte-verified excel ground truth
…equire post-apply resolution restatement
…e, contradiction terminal, getasset mock tolerance
…roubleshoot in when_to_use
e7e1c9b to
1ac0ae4
Compare
…ate localized messages before index grep
Why
Eval data across 184 replay scenarios showed the previous 7-sub-agent orchestration (triage → scope-checker → generator → sequential testers → depth-verifier → presenter) cost 25–55 min per investigation while its marginal value sits in the playbook knowledge, not the choreography: 159/184 scenarios pass without the skill, 12 more flake-pass on retry, and the 13 stable-fails are all cases where a playbook fact or decision-tree discriminator is required. This PR keeps the knowledge and removes the choreography.
What changed
## 1. Invariants— every load-bearing rule is preserved or strengthened (no-CLI-discovery, retry caps, empty≠absent, live≠historical, correlation, raw-file redirect, symptom≠cause, fix-approval gate); the removed rules were orchestration mechanics (never run uip yourself, sequential testers, verbatim presenter hand-off) that no longer have a boundary to protect.agents/(7 files) andschemas/(4 files) deleted. Their content is redistributed: tester gates → SKILL.md walk rules; depth-verifier → inline checklist + escalation verifier prompt; presenter.md →references/presenting.md(near-verbatim, including the interactive Healing-Agent apply-flow and the approval gate: user source files are never modified without explicit approval, decline/non-answer = no edit).signatures:frontmatter (631 signatures; 22 justifiedsilent: true);scripts/build-signature-index.pygeneratesreferences/signature-index.md(grep-only routing table + no-signature symptom routing + signal-extraction cheatsheet) and lints: every playbook routable or silent, duplicate (kind,value) claims require discriminating notes, exclusion targets must exist. Playbook bodies untouched.references/escalation.md, loaded only on 6 defined triggers): 2–4 parallel read-only hypothesis probes + adjudication + a conditional fresh-eyes verifier; bounded spawn budget..local/investigations/raw/*.json+notes.md(state.json/hypotheses.json/needs_input.json removed;generate_scenario.pyneeds no code change — it globs by basename).project.jsondescriptions neutralized — they spelled out the scenario's root cause in agent-visible text.Validation
6422dcea8), and re-run to 3× 1.000.replace-text-silent) is fixed and CI-verified.Speed: v1 vs v2
≈ 3–5× faster locally, ~7–10× per task on CI, quality up not down (15/15 on the former hard core).
Follow-ups
.github/workflows/validate-signature-index.yml(index lint PR gate) is authored but not in this branch — the push credential lacks workflow scope. Will be added once granted.run_limits(5400s task_timeout is now ~10× headroom) in a separate PR.