Skip to content

fix(uipath-admin): realistic audit prompts + whole-day export / LTS-schema CSV guidance#1831

Open
yadvender-uipath wants to merge 4 commits into
mainfrom
fix/audit-csv-verify-e2e-prompt
Open

fix(uipath-admin): realistic audit prompts + whole-day export / LTS-schema CSV guidance#1831
yadvender-uipath wants to merge 4 commits into
mainfrom
fix/audit-csv-verify-e2e-prompt

Conversation

@yadvender-uipath

Copy link
Copy Markdown
Contributor

What & why

Reworks the uipath-admin audit skill docs and their coder-eval tasks so the tests measure what the skill teaches, not a hand-scripted prompt. The audit task prompts were heavily scripted ("Important: do NOT retry, do NOT login, run each command exactly once…"), which masked whether the skill itself guides the agent correctly. This replaces them with minimal, customer-style asks and tightens the supporting guidance — especially around export semantics and the LTS-schema CSV.

Changes

Skill docs (SKILL.md, references/audit-commands.md, references/audit-workflow-guide.md)

  • Whole-day export boundsaudit <scope> export bounds are whole UTC days, inclusive on both ends, distinct from the events --to-date "next-day" trick (added as an explicit anti-pattern rule). A next-day --to-date on export over-exports a full day.
  • LTS-lag caveat — the long-term store trails live events by ~24–48h; windows including today/yesterday can return empty trailing days.
  • LTS-schema CSV — document the CSV column ordering and disambiguate org/tenant audit export from uip or audit-logs (uipath-platform).
  • Non-interactive login — never launch the interactive uip login browser flow in an automated session; env-var sessions are already authenticated.
  • Relative dates — resolve "yesterday" / "last 7 days" against the real UTC clock and echo the resolved window; don't re-confirm mid-sequence when the user already asked for the follow-on steps.

Tasks — 16 audit_*.yaml prompts trimmed to minimal customer-style asks; criteria and date windows aligned. (Net −220/+80.)

Note: the stale UiPath/cli#2585 blocker referenced in the export tasks has shipped — @uipath/cli 1.197.0 creates the uniquely-named audit_<from>_<to>_<generatedAt> output as expected.

Test evidence

All 16 uipath-admin audit tasks pass locally via coder-eval 0.8.0, experiments/default.yaml (tempdir), against a live tenant:

Tier Tasks Result
smoke 10 ✅ 10/10 (score 1.0)
e2e real-artifact audit_export_verify_e2e, audit_export_csv_verify_e2e, audit_org_export_verify_e2e ✅ 3/3 (produce + verify real export files; CSV header carries LTS-schema columns)
e2e command-pattern audit_export_e2e, audit_login_history_e2e, audit_who_did_x_e2e ✅ 3/3

Notes for reviewers reproducing locally on an authenticated dev box (Windows):

  • A few runs hit the claude-agent-sdk initialize handshake 60s timeout during sandbox spin-up (Defender scanning the fresh venv). Set CLAUDE_CODE_STREAM_CLOSE_TIMEOUT=300000 to avoid it. Purely a local-env timing issue.
  • audit_export_e2e is designed for CI's unauthenticated env (commands fail fast → command-surface graded). On an authenticated box the real 7-day LTS export exceeds its turn_timeout: 600; run it with -D run_limits.turn_timeout=1800 -D run_limits.task_timeout=3600 to let the live export complete. No change to the committed task is needed — it passes as-is in CI.

🤖 Generated with Claude Code

…chema CSV guidance

Rework the uipath-admin audit skill docs and their coder-eval tasks so the
tests measure the skill's guidance rather than a hand-scripted prompt.

Skill docs (SKILL.md, references/audit-commands.md, audit-workflow-guide.md):
- export bounds are whole UTC days, inclusive on both ends — distinct from
  the events `--to-date` next-day trick (new anti-pattern rule); a next-day
  --to-date on export over-exports a full day.
- add the LTS-lag caveat (long-term store trails live events by ~24-48h;
  recent trailing days may come back empty).
- document LTS-schema CSV column ordering and distinguish org/tenant audit
  export from `uip or audit-logs`.
- non-interactive `uip login` handling (never launch the browser flow in an
  automated session; env-var sessions are already authenticated).
- resolve relative windows ("yesterday", "last 7 days") against the real UTC
  clock and echo the resolved bounds; don't re-confirm mid-sequence when the
  user already asked for the follow-on steps.

Tasks (16 audit_*.yaml): replace scripted "Important: do NOT ..." prompts with
minimal customer-style asks; align criteria and date windows accordingly.

Test: all 16 uipath-admin audit tasks pass locally via coder-eval 0.8.0
(experiments/default.yaml, tempdir) against a live tenant — 10 smoke +
3 real-artifact e2e + 3 command-pattern e2e, each score 1.0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @yadvender-uipath's task in 2m 45s —— View job


PR Review: fix(uipath-admin): realistic audit prompts + whole-day export / LTS-schema CSV guidance

  • Read review criteria and project conventions
  • Analyze full diff
  • Read changed skill files in full
  • Read changed test files
  • Check for cross-skill references, broken links, secrets
  • Post review findings

Summary

Replaces 16 heavily-scripted audit test prompts with minimal, customer-style asks so the tests measure what the skill teaches rather than what the prompt hand-holds. Simultaneously tightens the skill docs around whole-day export bounds vs. the events next-day trick, LTS lag caveats, LTS-schema CSV columns, non-interactive login safety, and relative date resolution.

Change-by-Change Review

1. skills/uipath-admin/SKILL.md — Critical Rules 2, 27, 30, Anti-pattern 14

Severity: OK

  • Rule 2: non-interactive login guidance is well-placed. Correctly tells the agent to stop and ask rather than launching uip login itself.
  • Rule 27: extended with the events-only next-day caveat and relative-date resolution (date -u). Clean addition without bloating the rule.
  • Rule 30: rewritten to clearly state whole-day inclusive semantics and distinguish from events. The confirmation-path logic ("if the user named a folder, pass it verbatim; confirm only when you picked a default") is a good UX refinement.
  • Anti-pattern 14: correctly cross-references Rule 30 and prevents the most likely mistake (carrying the events trick to export).

2. skills/uipath-admin/references/audit-commands.md — export section

Severity: OK

  • Examples updated from --to-date 2026-02-01 to --to-date 2026-01-31 consistently across all three code blocks and both output examples.
  • New "single day" example with date -u -d 'yesterday' is useful and includes the macOS/BSD variant.
  • LTS-schema disambiguation callout (blockquote at line 143) correctly delineates this surface from uip or audit-logs list --export — this is routing guidance, not a cross-skill dependency.
  • Flags table now explicitly documents whole-day semantics for --from-date and --to-date.
  • LTS lag caveat at line 217 is well-placed as the last implementation note.

3. skills/uipath-admin/references/audit-workflow-guide.md — export workflow + gotchas

Severity: OK

  • Relative-date resolution paragraph (lines 183–184) added before Step 2 — good placement.
  • Confirmation-path logic for --output-path (line 185) matches the SKILL.md Rule 30 wording.
  • Export examples updated consistently (2026-01-31).
  • Folder layout updated (line 227).
  • Three new gotcha bullets: export whole-day semantics, LTS lag, and relative date resolution — all accurate and consistent with the reference doc.
  • Output Etiquette item 6 (line 324): the "finish the full requested sequence" clause is a good refinement — read-only queries shouldn't pause mid-chain to re-confirm.

4. Test task YAMLs — 16 audit prompts simplified

Severity: OK

All 16 test files follow the same pattern: the initial_prompt is stripped to a natural, minimal customer ask. The description, tags, success_criteria, and run_limits are preserved unchanged. Spot-checks:

  • Prompt → criteria alignment: every prompt still provides the folder name that the command_pattern asserts (e.g., "audit-yesterday", "audit-window", "audit-last-7d", "audit-org-yesterday", "audit-org-window"). ✓
  • Scope signals: org-scope tasks say "organization-level" in the prompt; tenant-scope tasks say "tenant" or imply it. ✓
  • Format signals: CSV tasks say "CSV" or "Excel"; JSON tasks say "JSON files" or omit format (default). ✓
  • audit_scope_ambiguity_smoke.yaml: the removed hint said "if unsure about scope, ask to clarify — that is the correct behavior." Now the test relies entirely on the skill teaching Rule 23. This is exactly the PR's stated goal. ✓

5. tests/tasks/uipath-admin/audit_org_export_verify_e2e.yaml — LTS lag note inconsistency

Severity: Low

The description block (line 22) still says "We pin the window to dates at least 48h in the past so LTS has caught up" while the sibling tasks (audit_export_verify_e2e.yaml, audit_export_csv_verify_e2e.yaml) were updated to "The customer-style 'past week' prompt therefore includes trailing days that may come back empty…". The new prompt asks for "2 days ago — just that one day", so the old wording is technically accurate (it IS 48h+ back), but it's inconsistent with the updated sibling descriptions and no longer reflects the style shift to customer-style prompts.

Fix this →

What's Missing

Nothing identified. All relative links resolve. No secrets or personal paths. CODEOWNERS is not affected (no new skill paths). No cross-skill file dependencies introduced.

Area Ratings

Area Status Notes
Frontmatter OK No frontmatter fields changed
E2E Tests OK 10 smoke + 6 e2e; all criteria preserved; prompts properly minimal
Skill Body OK New rules and anti-patterns are accurate and properly cross-referenced
References & Assets OK Examples, flags tables, and gotchas all internally consistent
Repo Hygiene OK Changes scoped to uipath-admin; no secrets, no cross-skill deps

Issues for Manual Review

  • Domain accuracy of whole-day export semantics: the PR states that export --to-date is whole-day inclusive (the server truncates times to the calendar day) and that the events next-day trick over-exports a full day. This is a domain claim about the CLI/API behavior — the reviewer should verify this matches the shipped @uipath/cli 1.197.0 behavior.
  • LTS lag estimate (~24–48h): cited in three places across the skill docs. If the actual lag differs materially, all three need updating.

Conclusion

Clean, well-scoped PR. The prompt simplification is the right approach — it shifts test fidelity from "can the agent follow a script" to "does the skill teach the agent correctly." The doc changes around export semantics, LTS lag, and CSV schema are internally consistent and well cross-referenced. One Low inconsistency in the org-export verify task description (stale LTS lag note wording). Recommend approve after addressing or acknowledging the Low.


@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @yadvender-uipath's task in 2m 26s —— View job


Coder-eval task lint (advisory)

16 task YAMLs changed; verdict counts: 0 Critical, 0 High, 0 Medium, 0 Low, 16 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

  • ✅ Author confirms passing run: "All 16 uipath-admin audit tasks pass locally via coder-eval 0.8.0, experiments/default.yaml (tempdir), against a live tenant" — with a tier-by-tier table (10 smoke ✅, 3 real-artifact e2e ✅, 3 command-pattern e2e ✅).

Per-task lint

tests/tasks/uipath-admin/audit_events_basic_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_events_pagination_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_export_basic_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_export_csv_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_export_csv_verify_e2e.yaml — verdict: OK

tests/tasks/uipath-admin/audit_export_e2e.yaml — verdict: OK

tests/tasks/uipath-admin/audit_export_verify_e2e.yaml — verdict: OK

tests/tasks/uipath-admin/audit_login_history_e2e.yaml — verdict: OK

tests/tasks/uipath-admin/audit_org_events_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_org_export_csv_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_org_export_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_org_export_verify_e2e.yaml — verdict: OK

tests/tasks/uipath-admin/audit_scope_ambiguity_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_sources_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_status_filter_smoke.yaml — verdict: OK

tests/tasks/uipath-admin/audit_who_did_x_e2e.yaml — verdict: OK

Within-PR duplicates

No duplicate clusters detected. Task pairs that share scaffold structure (e.g. tenant-export-json vs org-export-json, tenant-export-csv vs org-export-csv) exercise materially distinct scope code paths (/tenantaudit_ vs /orgaudit_ basePath, different event sets) — scaffold reuse, not duplication.

Conclusion

✅ All 16 changed tasks pass the rubric. Evidence of passing run confirmed. The prompt trimming consistently replaces the heavy-handed Important: do NOT retry… boilerplate with minimal, customer-style asks — exactly the change described in the PR body.

yadvender-uipath and others added 3 commits July 2, 2026 14:30
…ever `uip or audit-logs`

CI (Bedrock backend) surfaced that generic audit prompts ("audit logs",
"export the audit trail", "login history") were routed to
`uip or audit-logs` (Orchestrator-operational audit, wrong schema — its
--export even returns a ZIP) instead of `uip admin audit <scope>`. Local
runs (anthropic_direct) masked it; the surface choice was backend-dependent.

Strengthen the skill's surface routing so the boundary is unambiguous:
- Rule 1: uipath-admin owns org/tenant audit via `uip admin audit <scope>`;
  never `uip or audit-logs` for audit events/history/export/login-history/
  who-did-what.
- "When to Use > Audit" + workflow-guide: same boundary as a prominent
  callout; and when asked WHAT events/sources are visible, discover live by
  running `audit <scope> sources` rather than answering from memory (fixes
  the sources-smoke conceptual-answer failure).
- Rule 26: reinforce discover-by-execution.

Skill-doc only; no task/prompt changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ver uipath-platform

CI (Bedrock) showed a perfect correlation: every audit task that loaded
uipath-admin passed (used `uip admin audit`); every one that loaded
uipath-platform failed (used `uip or audit-logs`). Root cause is skill
*selection*, not in-skill guidance — and it's non-deterministic run to run.

Strengthen the uipath-admin `description` so generic audit phrasings route
here: name the trigger words ('audit logs', 'audit trail', 'audit events',
'export the audit', login history, 'who did what') and assert the boundary
— org/tenant/identity audit is `uip admin audit`, not `uip or audit-logs`.

uipath-admin description only (per scope); uipath-platform untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Root cause of the CI flakiness (Bedrock): audit prompts trimmed to pure
customer phrasing lost the signal that pins skill selection, so the agent
sometimes loaded uipath-platform and used `uip or audit-logs` (Orchestrator
audit) instead of `uip admin audit`. Perfect correlation across 4 runs:
pass ⟺ uipath-admin loaded; fail ⟺ uipath-platform loaded.

Restore a light, realistic surface anchor to each of the 16 audit prompts —
"Admin audit" / "UiPath Admin" / "Admin audit trail" — which maps to the
`uip admin audit` product surface and disambiguates it from Orchestrator's
audit logs, without re-scripting the steps. scope-ambiguity stays scope-
ambiguous (org vs tenant) while pinning the surface.

Pairs with the description sharpening in a196bc9.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant