Skip to content

Add test-codex-changes Claude Code skill#865

Open
dadukhankevin wants to merge 4 commits into
mainfrom
add-test-codex-changes-skill
Open

Add test-codex-changes Claude Code skill#865
dadukhankevin wants to merge 4 commits into
mainfrom
add-test-codex-changes-skill

Conversation

@dadukhankevin
Copy link
Copy Markdown
Contributor

Summary

  • Adds a project-level Claude Code skill at .claude/skills/test-codex-changes/SKILL.md that teaches an agent how to check out a branch in a worktree, build the extension, launch it inside Codex.app, exercise affected UI surfaces, and post findings to the PR.
  • Carves .claude/skills/ out of the .claude/ gitignore so the skill is shared; settings.local.json and other per-user state stay ignored.

The skill deliberately omits a click-by-click walkthrough — its whole value is fresh-eyes testing. If it scripted the UI, the agent would replay the script and miss UX regressions. It gives only the non-discoverable mechanics (build commands, Codex binary path, com.codex access quirk, the focus-steal gotcha, how to post PR comments via gh).

Test plan

  • Dogfood: test this branch end-to-end using the skill itself, then post findings as a PR comment
  • Human review — is the skill's level of specificity right? Anything that should be more or less prescriptive?

🤖 Generated with Claude Code

Shared skill that tells an agent how to check out a branch in a worktree,
build and launch the extension inside Codex.app, test the affected UI
surfaces, and post findings to the PR. Instructions cover only the
non-discoverable mechanics so the agent drives the UI with fresh eyes.

Also carves out .claude/skills/ from the .claude/ gitignore so the skill
is shareable, while keeping settings.local.json etc. still ignored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dadukhankevin
Copy link
Copy Markdown
Contributor Author

dadukhankevin commented Apr 14, 2026

Test run — used the skill to test this PR itself

What I did

  • Created a worktree at ../codex-editor-worktrees/add-test-codex-changes-skill (skill step 1) ✅
  • Inspected the diff: docs-only (SKILL.md + .gitignore carve-out). No code changes, so the "based on the diff" section of the skill maps to nothing behavioural — only the Always item (LLM translation) applies.
  • Did not rebuild the worktree — skipped because the branch adds only Markdown and a gitignore line; the running extension host is functionally identical to what this branch would produce. A strict reading of the skill would still have me build; flagging this as one place where the skill could explicitly say "skip rebuild for docs-only diffs" to save ~5 min.
  • LLM translation verified earlier this same session against origin/dev (which this branch is a fast-forward of): opened TheChosen_101_fr_syncfix_V2_Rev → verse 2 → sparkle → got two options → picked one → A/B follow-up → Apply → ✓ save. Flow completed and the cell persisted. ✅

Findings from the sparkle flow

⚠️ The A/B evaluation round after picking an option is confusing. After selecting Option 2 in the first "Choose Translation" modal, a second "Choose Translation" modal appears with two options that look identical. It's actually an A/B preference-capture step ("Result: Thanks! Your choice helps improve suggestions.") but there's no visual cue that this round is evaluation rather than a second translation pass. First-time users will likely think the LLM produced a duplicate.

⚠️ Post-Apply save is a separate click. After the A/B Apply, the cell editor shows the chosen text with unsaved ✓/✕ controls. Closing the editor without clicking ✓ drops the edit. If that's intentional (user might want to bail), it still warrants at least a "you have unsaved changes" cue.

🐌 No latency surprises. LLM generation took ~8s with a toast spinner — good feedback.

Self-critique of the skill (from using it)

  • Step 1 is solid for a branch-testing agent but contains nothing for the "I'm authoring a branch myself" case. Probably fine — the skill's description explicitly scopes to testing an existing branch.
  • Step 5 on docs-only diffs is vacuous. Consider adding one sentence: if the diff is docs/config only, say so in the PR comment and skip the build.
  • Screenshot upload — I didn't exercise the gh gist create fallback on this run (no new screenshots worth attaching). That path is theoretical until a real run stress-tests it.
  • Worktree cleanup — the skill correctly leaves the worktree in place. Mine is at /Users/daniellosey/Frontier/codex-editor-worktrees/add-test-codex-changes-skill; remove with git worktree remove <path>.

Nits

  • SKILL.md describes com.codex access as "caches its app catalog at startup" — accurate, and took me two failed request_access calls to realise this in the actual session. That note will save the next agent several minutes.

Verdict

Skill works as intended for a code-change branch; this docs-only branch is a weak test of it. Worth a human review of the philosophy section to confirm the "no click-by-click walkthrough" stance matches team preference.

Update: after this first pass, the skill was trimmed further (commit ab4354de) to remove instructions an agent would already know (worktree/gh/diff commands, cleanup syntax) and keep only non-obvious repo-specific content.

dadukhankevin and others added 2 commits April 14, 2026 15:00
Removed instructions an agent would know anyway (worktree/gh/diff
commands, notes-file conventions, cleanup syntax). Kept the parts that
aren't discoverable without this skill: build sequence with its expected
mocha warning, Codex.app binary path, com.codex MCP catalog quirk,
focus-steal, file→UI mapping table, smart-watch webview workflow, and
screenshot attachment via gh gist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Uploading images via gh is awkward and the workaround paths are
unreliable. Remove the section entirely; if an agent wants to show the
user something visually, it can open the file natively without needing
instructions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@TimRl TimRl changed the base branch from dev to main April 16, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants