fix(uipath-admin): realistic audit prompts + whole-day export / LTS-schema CSV guidance#1831
fix(uipath-admin): realistic audit prompts + whole-day export / LTS-schema CSV guidance#1831yadvender-uipath wants to merge 4 commits into
Conversation
…chema CSV guidance
Rework the uipath-admin audit skill docs and their coder-eval tasks so the
tests measure the skill's guidance rather than a hand-scripted prompt.
Skill docs (SKILL.md, references/audit-commands.md, audit-workflow-guide.md):
- export bounds are whole UTC days, inclusive on both ends — distinct from
the events `--to-date` next-day trick (new anti-pattern rule); a next-day
--to-date on export over-exports a full day.
- add the LTS-lag caveat (long-term store trails live events by ~24-48h;
recent trailing days may come back empty).
- document LTS-schema CSV column ordering and distinguish org/tenant audit
export from `uip or audit-logs`.
- non-interactive `uip login` handling (never launch the browser flow in an
automated session; env-var sessions are already authenticated).
- resolve relative windows ("yesterday", "last 7 days") against the real UTC
clock and echo the resolved bounds; don't re-confirm mid-sequence when the
user already asked for the follow-on steps.
Tasks (16 audit_*.yaml): replace scripted "Important: do NOT ..." prompts with
minimal customer-style asks; align criteria and date windows accordingly.
Test: all 16 uipath-admin audit tasks pass locally via coder-eval 0.8.0
(experiments/default.yaml, tempdir) against a live tenant — 10 smoke +
3 real-artifact e2e + 3 command-pattern e2e, each score 1.0.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Claude finished @yadvender-uipath's task in 2m 45s —— View job PR Review: fix(uipath-admin): realistic audit prompts + whole-day export / LTS-schema CSV guidance
SummaryReplaces 16 heavily-scripted audit test prompts with minimal, customer-style asks so the tests measure what the skill teaches rather than what the prompt hand-holds. Simultaneously tightens the skill docs around whole-day export bounds vs. the events next-day trick, LTS lag caveats, LTS-schema CSV columns, non-interactive login safety, and relative date resolution. Change-by-Change Review1.
|
| Area | Status | Notes |
|---|---|---|
| Frontmatter | OK | No frontmatter fields changed |
| E2E Tests | OK | 10 smoke + 6 e2e; all criteria preserved; prompts properly minimal |
| Skill Body | OK | New rules and anti-patterns are accurate and properly cross-referenced |
| References & Assets | OK | Examples, flags tables, and gotchas all internally consistent |
| Repo Hygiene | OK | Changes scoped to uipath-admin; no secrets, no cross-skill deps |
Issues for Manual Review
- Domain accuracy of whole-day export semantics: the PR states that
export --to-dateis whole-day inclusive (the server truncates times to the calendar day) and that theeventsnext-day trick over-exports a full day. This is a domain claim about the CLI/API behavior — the reviewer should verify this matches the shipped@uipath/cli1.197.0 behavior. - LTS lag estimate (~24–48h): cited in three places across the skill docs. If the actual lag differs materially, all three need updating.
Conclusion
Clean, well-scoped PR. The prompt simplification is the right approach — it shifts test fidelity from "can the agent follow a script" to "does the skill teach the agent correctly." The doc changes around export semantics, LTS lag, and CSV schema are internally consistent and well cross-referenced. One Low inconsistency in the org-export verify task description (stale LTS lag note wording). Recommend approve after addressing or acknowledging the Low.
|
Claude finished @yadvender-uipath's task in 2m 26s —— View job Coder-eval task lint (advisory)16 task YAMLs changed; verdict counts: 0 Critical, 0 High, 0 Medium, 0 Low, 16 OK. Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge. Evidence of passing run
Per-task lint
|
…ever `uip or audit-logs`
CI (Bedrock backend) surfaced that generic audit prompts ("audit logs",
"export the audit trail", "login history") were routed to
`uip or audit-logs` (Orchestrator-operational audit, wrong schema — its
--export even returns a ZIP) instead of `uip admin audit <scope>`. Local
runs (anthropic_direct) masked it; the surface choice was backend-dependent.
Strengthen the skill's surface routing so the boundary is unambiguous:
- Rule 1: uipath-admin owns org/tenant audit via `uip admin audit <scope>`;
never `uip or audit-logs` for audit events/history/export/login-history/
who-did-what.
- "When to Use > Audit" + workflow-guide: same boundary as a prominent
callout; and when asked WHAT events/sources are visible, discover live by
running `audit <scope> sources` rather than answering from memory (fixes
the sources-smoke conceptual-answer failure).
- Rule 26: reinforce discover-by-execution.
Skill-doc only; no task/prompt changes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ver uipath-platform
CI (Bedrock) showed a perfect correlation: every audit task that loaded
uipath-admin passed (used `uip admin audit`); every one that loaded
uipath-platform failed (used `uip or audit-logs`). Root cause is skill
*selection*, not in-skill guidance — and it's non-deterministic run to run.
Strengthen the uipath-admin `description` so generic audit phrasings route
here: name the trigger words ('audit logs', 'audit trail', 'audit events',
'export the audit', login history, 'who did what') and assert the boundary
— org/tenant/identity audit is `uip admin audit`, not `uip or audit-logs`.
uipath-admin description only (per scope); uipath-platform untouched.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Root cause of the CI flakiness (Bedrock): audit prompts trimmed to pure customer phrasing lost the signal that pins skill selection, so the agent sometimes loaded uipath-platform and used `uip or audit-logs` (Orchestrator audit) instead of `uip admin audit`. Perfect correlation across 4 runs: pass ⟺ uipath-admin loaded; fail ⟺ uipath-platform loaded. Restore a light, realistic surface anchor to each of the 16 audit prompts — "Admin audit" / "UiPath Admin" / "Admin audit trail" — which maps to the `uip admin audit` product surface and disambiguates it from Orchestrator's audit logs, without re-scripting the steps. scope-ambiguity stays scope- ambiguous (org vs tenant) while pinning the surface. Pairs with the description sharpening in a196bc9. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What & why
Reworks the uipath-admin audit skill docs and their coder-eval tasks so the tests measure what the skill teaches, not a hand-scripted prompt. The audit task prompts were heavily scripted ("Important: do NOT retry, do NOT login, run each command exactly once…"), which masked whether the skill itself guides the agent correctly. This replaces them with minimal, customer-style asks and tightens the supporting guidance — especially around export semantics and the LTS-schema CSV.
Changes
Skill docs (
SKILL.md,references/audit-commands.md,references/audit-workflow-guide.md)audit <scope> exportbounds are whole UTC days, inclusive on both ends, distinct from theevents --to-date"next-day" trick (added as an explicit anti-pattern rule). A next-day--to-dateon export over-exports a full day.eventsby ~24–48h; windows including today/yesterday can return empty trailing days.uip or audit-logs(uipath-platform).uip loginbrowser flow in an automated session; env-var sessions are already authenticated.Tasks — 16
audit_*.yamlprompts trimmed to minimal customer-style asks; criteria and date windows aligned. (Net −220/+80.)Test evidence
All 16 uipath-admin audit tasks pass locally via coder-eval 0.8.0,
experiments/default.yaml(tempdir), against a live tenant:audit_export_verify_e2e,audit_export_csv_verify_e2e,audit_org_export_verify_e2eaudit_export_e2e,audit_login_history_e2e,audit_who_did_x_e2eNotes for reviewers reproducing locally on an authenticated dev box (Windows):
initializehandshake 60s timeout during sandbox spin-up (Defender scanning the fresh venv). SetCLAUDE_CODE_STREAM_CLOSE_TIMEOUT=300000to avoid it. Purely a local-env timing issue.audit_export_e2eis designed for CI's unauthenticated env (commands fail fast → command-surface graded). On an authenticated box the real 7-day LTS export exceeds itsturn_timeout: 600; run it with-D run_limits.turn_timeout=1800 -D run_limits.task_timeout=3600to let the live export complete. No change to the committed task is needed — it passes as-is in CI.🤖 Generated with Claude Code