Audit Agent Experience Skill by jay-sahnan · Pull Request #79 · browserbase/skills

jay-sahnan · 2026-04-25T22:12:34Z

Spawns parallel Claude subagents against a target docs/SDK/SKILL.md from a one-sentence prompt, captures structured traces, and renders a graded HTML report scoring Setup Friction, Speed, Efficiency, Error Recovery, and Doc Quality. Includes narrative cross-agent review to surface convergent hallucinations and silent workarounds the JSON self-report misses.

Note

Medium Risk
Adds a new skill that orchestrates parallel subagents, optional shell execution, and credential auto-discovery guidance; mis-specification could lead to unsafe tool usage or accidental secret exposure if implemented incorrectly.

Overview
Adds a new audit-agent-experience skill definition (SKILL.md) that specifies an end-to-end workflow for benchmarking agent onboarding against a target docs/SDK/SKILL.md using multiple parallel subagents, minimal task prompts, and structured trace capture/scoring across DX dimensions.

Includes supporting reference docs (references/*.md) for prompt variants, subagent brief + JSON trace schema, and an evaluation rubric with score caps and cross-agent narrative review guidance, plus an assets/report-template.html for rendering the final graded HTML report and an MIT LICENSE.

^{Reviewed by Cursor Bugbot for commit 485b946. Bugbot is set up for automated code reviews on this repo. Configure here.}

Spawns parallel Claude subagents against a target docs/SDK/SKILL.md from a one-sentence prompt, captures structured traces, and renders a graded HTML report scoring Setup Friction, Speed, Efficiency, Error Recovery, and Doc Quality. Includes narrative cross-agent review to surface convergent hallucinations and silent workarounds the JSON self-report misses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 485b946. Configure here.}

cursor · 2026-04-25T22:46:01Z

+    <div class="stat"><div class="label">Completed</div><div class="value ok">{{COMPLETED_COUNT}}</div></div>
+    <div class="stat"><div class="label">Stuck</div><div class="value warn">{{STUCK_COUNT}}</div></div>
+    <div class="stat"><div class="label">Errored</div><div class="value bad">{{ERRORED_COUNT}}</div></div>
+  </div>


Stat grid omits partial and blocked-on-credentials statuses

Medium Severity

The stat grid only defines counters for Completed, Stuck, and Errored, but the onboarding_status schema supports five values: completed, partial, stuck, blocked-on-credentials, plus errored for parse failures. Agents ending in partial or blocked-on-credentials status won't be reflected in any status-specific counter, so the three sub-counts won't sum to {{AGENT_COUNT}}, producing a confusing report.

Additional Locations (1)

skills/audit-agent-experience/SKILL.md#L299-L300

^{Reviewed by Cursor Bugbot for commit 485b946. Configure here.}

cursor · 2026-04-25T22:46:01Z

+  .agent-card .body .confusion .section-tag { font-family: 'Geist Mono', 'SF Mono', monospace; font-size: 0.75rem; color: var(--grade-f); font-weight: 500; }
+  .agent-card .body .confusion .issue { margin-top: 0.25rem; color: var(--text); }
+  .agent-card .body .positive { padding: 0.5rem 0.75rem; background: rgba(34,197,94,0.05); border: 1px solid rgba(34,197,94,0.2); border-radius: 3px; color: var(--text); }
+  .agent-card .body .suggestion { padding: 0.5rem 0.75rem; background: rgba(77,169,228,0.06); border: 1px solid rgba(77,169,228,0.2); border-radius: 3px; color: #2a7ab5; }


Unused agent-card CSS with mismatched status vocabulary

Low Severity

The .agent-card CSS block (~30 rules) is never referenced by any template placeholder or in SKILL.md — only .trace-card and .agent-results-table are used for per-agent rendering. Worse, it defines a wrong-result status class that doesn't exist in the onboarding_status schema, while missing partial and blocked-on-credentials classes that the trace-card and status-pill CSS correctly include. This dead CSS with a stale status vocabulary could mislead the LLM into using incorrect class names.

^{Reviewed by Cursor Bugbot for commit 485b946. Configure here.}

cursor · 2026-04-25T22:46:01Z

+Behavioural hint: reads end-to-end before coding. Surfaces ambiguity. Catches docs that don't survive a close read.
+
+### Skeptical
+> Follow — note anything in the docs that seems wrong or unclear as you go while following


Skeptical prefix doesn't compose with prompt template

Medium Severity

The Skeptical persona prefix ends with "while following", so applying the template {persona_prefix} {product}'s getting-started guide… produces "Follow — note anything… while following Acme's getting-started guide…" — an awkward double-verb sentence. The worked example on line 72 restructures the clause order entirely, placing "note anything…" after the product name instead of before it. The template and example are incompatible, creating ambiguity about which prompt format to generate.

Additional Locations (1)

skills/audit-agent-experience/references/prompt-variants.md#L71-L72

^{Reviewed by Cursor Bugbot for commit 485b946. Configure here.}

cursor · 2026-04-25T22:46:01Z

+- **Speed (20%)** — total wall time, time-to-first-working-code.
+- **Efficiency (20%)** — tool calls per passed goal item, wasted calls.
+- **Error Recovery (15%)** — did errors block goal items, or did agents route around?
+- **Doc Quality (20%)** — did docs supply what was needed to pass the checklist?


Scoring summaries reference nonexistent "checklist" and "goal items"

Medium Severity

The Step 7 dimension summaries reference "goal items" and "checklist" — concepts the entire skill explicitly forbids. Line 278 says "tool calls per passed goal item" and line 280 says "pass the checklist," but the actual evaluation-rubric.md uses completed_subtasks / total tool_calls ratio for Efficiency and "Did the docs provide what agents needed?" for Doc Quality — no checklist anywhere. This stale language could cause the executing LLM to construct a scoring checklist, directly contradicting the core principle on lines 37 and 179.

^{Reviewed by Cursor Bugbot for commit 485b946. Configure here.}

jay-sahnan and others added 2 commits April 25, 2026 09:07

fix: updated license.txt

55c6fa8

cursor Bot reviewed Apr 25, 2026

View reviewed changes

bugbot fixes

6c33140

cursor Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread skills/audit-agent-experience/assets/report-template.html

bugbot fixes

485b946

cursor Bot reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit Agent Experience Skill#79

Audit Agent Experience Skill#79
jay-sahnan wants to merge 4 commits intomainfrom
audit-agent-experience

jay-sahnan commented Apr 25, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 25, 2026

Uh oh!

cursor Bot Apr 25, 2026

Uh oh!

cursor Bot Apr 25, 2026

Uh oh!

cursor Bot Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jay-sahnan commented Apr 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 25, 2026

Choose a reason for hiding this comment

Stat grid omits partial and blocked-on-credentials statuses

Uh oh!

cursor Bot Apr 25, 2026

Choose a reason for hiding this comment

Unused agent-card CSS with mismatched status vocabulary

Uh oh!

cursor Bot Apr 25, 2026

Choose a reason for hiding this comment

Skeptical prefix doesn't compose with prompt template

Uh oh!

cursor Bot Apr 25, 2026

Choose a reason for hiding this comment

Scoring summaries reference nonexistent "checklist" and "goal items"

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jay-sahnan commented Apr 25, 2026 •

edited by cursor Bot

Loading

Stat grid omits `partial` and `blocked-on-credentials` statuses

Unused `agent-card` CSS with mismatched status vocabulary