Add test-codex-changes Claude Code skill by dadukhankevin · Pull Request #865 · genesis-ai-dev/codex-editor

dadukhankevin · 2026-04-14T19:50:21Z

Summary

Adds a project-level Claude Code skill at .claude/skills/test-codex-changes/SKILL.md that teaches an agent how to check out a branch in a worktree, build the extension, launch it inside Codex.app, exercise affected UI surfaces, and post findings to the PR.
Carves .claude/skills/ out of the .claude/ gitignore so the skill is shared; settings.local.json and other per-user state stay ignored.

The skill deliberately omits a click-by-click walkthrough — its whole value is fresh-eyes testing. If it scripted the UI, the agent would replay the script and miss UX regressions. It gives only the non-discoverable mechanics (build commands, Codex binary path, com.codex access quirk, the focus-steal gotcha, how to post PR comments via gh).

Test plan

Dogfood: test this branch end-to-end using the skill itself, then post findings as a PR comment
Human review — is the skill's level of specificity right? Anything that should be more or less prescriptive?

🤖 Generated with Claude Code

Shared skill that tells an agent how to check out a branch in a worktree, build and launch the extension inside Codex.app, test the affected UI surfaces, and post findings to the PR. Instructions cover only the non-discoverable mechanics so the agent drives the UI with fresh eyes. Also carves out .claude/skills/ from the .claude/ gitignore so the skill is shareable, while keeping settings.local.json etc. still ignored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dadukhankevin · 2026-04-14T19:51:42Z

Test run — used the skill to test this PR itself

What I did

Created a worktree at ../codex-editor-worktrees/add-test-codex-changes-skill (skill step 1) ✅
Inspected the diff: docs-only (SKILL.md + .gitignore carve-out). No code changes, so the "based on the diff" section of the skill maps to nothing behavioural — only the Always item (LLM translation) applies.
Did not rebuild the worktree — skipped because the branch adds only Markdown and a gitignore line; the running extension host is functionally identical to what this branch would produce. A strict reading of the skill would still have me build; flagging this as one place where the skill could explicitly say "skip rebuild for docs-only diffs" to save ~5 min.
LLM translation verified earlier this same session against origin/dev (which this branch is a fast-forward of): opened TheChosen_101_fr_syncfix_V2_Rev → verse 2 → sparkle → got two options → picked one → A/B follow-up → Apply → ✓ save. Flow completed and the cell persisted. ✅

Findings from the sparkle flow

⚠️ The A/B evaluation round after picking an option is confusing. After selecting Option 2 in the first "Choose Translation" modal, a second "Choose Translation" modal appears with two options that look identical. It's actually an A/B preference-capture step ("Result: Thanks! Your choice helps improve suggestions.") but there's no visual cue that this round is evaluation rather than a second translation pass. First-time users will likely think the LLM produced a duplicate.

⚠️ Post-Apply save is a separate click. After the A/B Apply, the cell editor shows the chosen text with unsaved ✓/✕ controls. Closing the editor without clicking ✓ drops the edit. If that's intentional (user might want to bail), it still warrants at least a "you have unsaved changes" cue.

🐌 No latency surprises. LLM generation took ~8s with a toast spinner — good feedback.

Self-critique of the skill (from using it)

Step 1 is solid for a branch-testing agent but contains nothing for the "I'm authoring a branch myself" case. Probably fine — the skill's description explicitly scopes to testing an existing branch.
Step 5 on docs-only diffs is vacuous. Consider adding one sentence: if the diff is docs/config only, say so in the PR comment and skip the build.
Screenshot upload — I didn't exercise the gh gist create fallback on this run (no new screenshots worth attaching). That path is theoretical until a real run stress-tests it.
Worktree cleanup — the skill correctly leaves the worktree in place. Mine is at /Users/daniellosey/Frontier/codex-editor-worktrees/add-test-codex-changes-skill; remove with git worktree remove <path>.

Nits

SKILL.md describes com.codex access as "caches its app catalog at startup" — accurate, and took me two failed request_access calls to realise this in the actual session. That note will save the next agent several minutes.

Verdict

Skill works as intended for a code-change branch; this docs-only branch is a weak test of it. Worth a human review of the philosophy section to confirm the "no click-by-click walkthrough" stance matches team preference.

Update: after this first pass, the skill was trimmed further (commit ab4354de) to remove instructions an agent would already know (worktree/gh/diff commands, cleanup syntax) and keep only non-obvious repo-specific content.

Removed instructions an agent would know anyway (worktree/gh/diff commands, notes-file conventions, cleanup syntax). Kept the parts that aren't discoverable without this skill: build sequence with its expected mocha warning, Codex.app binary path, com.codex MCP catalog quirk, focus-steal, file→UI mapping table, smart-watch webview workflow, and screenshot attachment via gh gist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Uploading images via gh is awkward and the workaround paths are unreliable. Remove the section entirely; if an agent wants to show the user something visually, it can open the file natively without needing instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dadukhankevin and others added 2 commits April 14, 2026 15:00

TimRl changed the base branch from dev to main April 16, 2026 20:22

Merge branch 'main' into add-test-codex-changes-skill

752140b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test-codex-changes Claude Code skill#865

Add test-codex-changes Claude Code skill#865
dadukhankevin wants to merge 4 commits into
mainfrom
add-test-codex-changes-skill

dadukhankevin commented Apr 14, 2026

Uh oh!

dadukhankevin commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dadukhankevin commented Apr 14, 2026

Summary

Test plan

Uh oh!

dadukhankevin commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test run — used the skill to test this PR itself

What I did

Findings from the sparkle flow

Self-critique of the skill (from using it)

Nits

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dadukhankevin commented Apr 14, 2026 •

edited

Loading