From 5502b0a1ad665a802fcaf914cab63633907326fd Mon Sep 17 00:00:00 2001 From: dadukhankevin Date: Tue, 14 Apr 2026 14:48:10 -0500 Subject: [PATCH 1/3] Add test-codex-changes Claude Code skill Shared skill that tells an agent how to check out a branch in a worktree, build and launch the extension inside Codex.app, test the affected UI surfaces, and post findings to the PR. Instructions cover only the non-discoverable mechanics so the agent drives the UI with fresh eyes. Also carves out .claude/skills/ from the .claude/ gitignore so the skill is shareable, while keeping settings.local.json etc. still ignored. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/test-codex-changes/SKILL.md | 166 +++++++++++++++++++++ .gitignore | 6 +- 2 files changed, 171 insertions(+), 1 deletion(-) create mode 100644 .claude/skills/test-codex-changes/SKILL.md diff --git a/.claude/skills/test-codex-changes/SKILL.md b/.claude/skills/test-codex-changes/SKILL.md new file mode 100644 index 000000000..b4d7ecdd1 --- /dev/null +++ b/.claude/skills/test-codex-changes/SKILL.md @@ -0,0 +1,166 @@ +--- +name: test-codex-changes +description: Build, launch, and manually test a branch of the Codex Editor extension inside the Codex desktop app via computer-use, then post findings as a PR comment. Use when asked to "test this branch", "test PR #N", "try out my changes", "review this PR in the app", or similar. The skill gives you the mechanical steps (which are not discoverable); you drive the UI yourself and report anything confusing, slow, broken, or counterintuitive. +--- + +# Test Codex Changes + +Build a branch in its own worktree, launch the Codex Editor extension, drive the UI, and post findings to the PR. + +## Philosophy + +The value of this skill is **fresh-eyes testing**. If this file told you exactly which buttons to click, you'd just replay a script and miss UX problems — confusing labels, surprising modals, slow responses, missing feedback, etc. So the instructions below cover only the mechanical steps that aren't discoverable (build commands, launch flag, access grants, reporting). Once the app is running, you explore the UI yourself and report what you observe — including anything that felt awkward. + +## 1. Check out the branch in a worktree + +Never switch the user's main working tree. Create a worktree so their existing work stays untouched: + +```bash +BRANCH= +WT="../codex-editor-worktrees/${BRANCH//\//_}" +git fetch origin "$BRANCH" +git worktree add "$WT" "origin/$BRANCH" +cd "$WT" +``` + +If a PR number was given instead of a branch, resolve it first: `gh pr view --json headRefName -q .headRefName`. + +If the user already has a tool for managing Codex debug worktrees (e.g. a `debug-branch.sh` in the repo), prefer that — it may also set up isolated user-data-dirs and window titles so sessions don't collide. Check for it before falling back to raw `git worktree`. + +## 2. Build + +Run from the worktree root. Slow — chain and show progress. + +```bash +pnpm i +cd webviews/codex-webviews && pnpm i && cd ../.. +npm run build:webviews +npx webpack --config webpack.config.js +``` + +The webpack `test:` sub-bundle logs a mocha "Critical dependency" warning — expected, not a failure. A clean build ends with `compiled successfully` for the `test-runner` config. + +## 3. Launch + +The `code` CLI is NOT on PATH. Use the bundled binary inside Codex.app, pointed at the worktree: + +```bash +/Applications/Codex.app/Contents/Resources/app/bin/codex --extensionDevelopmentPath=$(pwd) > /tmp/codex-dev-${BRANCH//\//_}.log 2>&1 & +``` + +Title bar shows `[Extension Development Host]`. Process name in `ps` is `Electron`; bundle id is `com.codex`. + +## 4. Grant computer-use access + +``` +request_access(apps=["com.codex"], reason="...") +``` + +If it returns `not_installed`: ask the user to restart the computer-use MCP server, then retry. `lsregister -f /Applications/Codex.app` alone is not enough — the MCP caches its app catalog at startup. + +Codex loses frontmost focus every time you run a Bash call or switch tools. Before each click sequence: + +``` +open_application(app="com.codex") +``` + +Black screenshots mean nothing granted is frontmost — re-open Codex. + +## 5. What to test + +### Always +- **Automatic (LLM) translation** end-to-end on at least one verse. It's the core feature and the most common regression. + +### Based on the diff +```bash +git log --oneline dev..HEAD +git diff --stat dev..HEAD +git diff dev..HEAD -- src/ webviews/ sharedUtils/ +``` + +Look at the changed files and work out which UI surface they drive. Rough map: + +- `src/smartEdits/`, `llmCompletion.ts` → automatic translation +- `src/exportHandler/` → export flow +- `src/providers/NewSourceUploader/` → source file import +- `src/providers/codexCellEditorProvider/` → the main `.codex` editor (open, edit, save cells) +- `src/projectManager/syncManager.ts` → git sync +- `webviews/codex-webviews/src//` → the corresponding webview panel + +If you can't tell what UI a change drives, read the diff more carefully — function names, added commands, and new message types usually give it away. If still ambiguous, ask the user. + +### While you're in there +Poke around. If something feels off — a label you can't guess, a modal that asks the same question twice, an op with no loading feedback, a dialog that opens behind something — flag it. The value is your confusion, not silent success. + +## 6. Take notes as you go + +Keep a running note file in the worktree while testing — every observation in the moment, not reconstructed at the end: + +``` +/tmp/codex-test-notes-.md +``` + +For each flow record: +- What you tried +- What happened (including timing — "took 8s with no spinner") +- Whether it matched what the diff suggested should happen +- Anything easy, hard, or surprising +- Screenshot path for noteworthy moments + +Save screenshots with `screenshot(save_to_disk=true)` so you can reference their paths later. + +## 7. Post to the PR + +Resolve the PR for the branch — prefer commenting on a PR over pushing to a branch: + +```bash +gh pr list --head "$BRANCH" --json number,url,state +``` + +If no PR exists, tell the user and ask whether to open one or hold off. + +Write the comment as markdown. Structure: + +- **Summary** — one sentence: did the change work? +- **What I tested** — bullet list of flows, with ✅ / ⚠️ / ❌ +- **Findings** — anything counterintuitive, slow, or broken; include reproduction steps +- **Nits** — small things you noticed that aren't blockers +- **Screenshots** — see below + +Post with: + +```bash +gh pr comment --body-file /tmp/codex-test-comment.md +``` + +### Attaching screenshots + +`gh` doesn't have a first-class image upload. Options, in order of preference: + +1. **`gh gist create`** for the image files, then reference the raw URLs in the markdown (`![caption](raw-url)`). Works reliably, one gist per test run keeps things tidy. +2. If only a few images, inline them by uploading to an existing gist or using the repo's own `docs/` or an artifacts branch (only if the repo has that convention — check first). +3. If none of the above work, list local paths in the comment and tell the user the screenshots are on their machine at ``. + +Don't invent URLs. If the upload fails, say so in the comment. + +## 8. Clean up + +Leave the worktree in place by default — the user may want to poke at it. Tell them the path and the cleanup command: + +```bash +git worktree remove +``` + +Don't remove it yourself unless they ask. + +## Rebuilding after edits mid-session + +- `src/` changes: rerun webpack, then Cmd+R in the dev host window. +- Webview changes: `cd webviews/codex-webviews && pnpm run build:` then Cmd+R. `pnpm run smart-watch` in that dir rebuilds whichever webview you're iterating on. + +## Gotchas + +- **Do not commit, amend, or push** unless asked. Testing is read-only on the code. +- `/tmp/codex-dev-.log` captures stderr if the app crashes on launch. +- The project loaded in the dev host is whatever the user last had open. Don't assume content; explore what's actually there. +- Respect `CLAUDE.md` conventions — no `any`, target <500 lines per file. If the branch violates these, note it in the PR comment. diff --git a/.gitignore b/.gitignore index f77a45c33..cd50e6c6f 100644 --- a/.gitignore +++ b/.gitignore @@ -17,4 +17,8 @@ webviews/codex-webviews/vite.config.ts.timestamp-*.mjs # .project/ # AI coding agents -.claude/ +.claude/* +!.claude/skills/ + +# Glance memory +.glance/ From ab4354de36f52a00e940d9d3655539de78711260 Mon Sep 17 00:00:00 2001 From: dadukhankevin Date: Tue, 14 Apr 2026 15:00:05 -0500 Subject: [PATCH 2/3] Trim skill to repo-specific, non-obvious content MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Removed instructions an agent would know anyway (worktree/gh/diff commands, notes-file conventions, cleanup syntax). Kept the parts that aren't discoverable without this skill: build sequence with its expected mocha warning, Codex.app binary path, com.codex MCP catalog quirk, focus-steal, file→UI mapping table, smart-watch webview workflow, and screenshot attachment via gh gist. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/test-codex-changes/SKILL.md | 160 ++++++--------------- 1 file changed, 44 insertions(+), 116 deletions(-) diff --git a/.claude/skills/test-codex-changes/SKILL.md b/.claude/skills/test-codex-changes/SKILL.md index b4d7ecdd1..fe1a7faa2 100644 --- a/.claude/skills/test-codex-changes/SKILL.md +++ b/.claude/skills/test-codex-changes/SKILL.md @@ -1,35 +1,19 @@ --- name: test-codex-changes -description: Build, launch, and manually test a branch of the Codex Editor extension inside the Codex desktop app via computer-use, then post findings as a PR comment. Use when asked to "test this branch", "test PR #N", "try out my changes", "review this PR in the app", or similar. The skill gives you the mechanical steps (which are not discoverable); you drive the UI yourself and report anything confusing, slow, broken, or counterintuitive. +description: Build, launch, and manually test a branch of the Codex Editor extension inside the Codex desktop app via computer-use, then post findings as a PR comment. Use when asked to "test this branch", "test PR #N", "try out my changes", "review this PR in the app", or similar. Covers only the non-obvious, repo-specific mechanics; you drive the UI yourself and report anything confusing, slow, broken, or counterintuitive. --- # Test Codex Changes -Build a branch in its own worktree, launch the Codex Editor extension, drive the UI, and post findings to the PR. - ## Philosophy -The value of this skill is **fresh-eyes testing**. If this file told you exactly which buttons to click, you'd just replay a script and miss UX problems — confusing labels, surprising modals, slow responses, missing feedback, etc. So the instructions below cover only the mechanical steps that aren't discoverable (build commands, launch flag, access grants, reporting). Once the app is running, you explore the UI yourself and report what you observe — including anything that felt awkward. - -## 1. Check out the branch in a worktree - -Never switch the user's main working tree. Create a worktree so their existing work stays untouched: - -```bash -BRANCH= -WT="../codex-editor-worktrees/${BRANCH//\//_}" -git fetch origin "$BRANCH" -git worktree add "$WT" "origin/$BRANCH" -cd "$WT" -``` - -If a PR number was given instead of a branch, resolve it first: `gh pr view --json headRefName -q .headRefName`. +The value of this skill is **fresh-eyes testing**. If it told you exactly which buttons to click, you'd replay the script and miss UX problems — confusing labels, surprising modals, slow responses, missing feedback. Explore the UI yourself and report what you observe, including anything that felt awkward. -If the user already has a tool for managing Codex debug worktrees (e.g. a `debug-branch.sh` in the repo), prefer that — it may also set up isolated user-data-dirs and window titles so sessions don't collide. Check for it before falling back to raw `git worktree`. +Isolate the branch under test in a worktree so the user's main tree stays untouched. If the repo has a purpose-built worktree tool (e.g. a `debug-branch.sh`), prefer it — such tools often also set up isolated user-data-dirs and window titles so parallel sessions don't collide. -## 2. Build +## Build -Run from the worktree root. Slow — chain and show progress. +From the worktree root, in order: ```bash pnpm i @@ -38,129 +22,73 @@ npm run build:webviews npx webpack --config webpack.config.js ``` -The webpack `test:` sub-bundle logs a mocha "Critical dependency" warning — expected, not a failure. A clean build ends with `compiled successfully` for the `test-runner` config. +Full first build is ~5 min. Webpack's `test:` sub-bundle logs a mocha *"Critical dependency"* warning — expected, not a failure. A clean build ends with `compiled successfully` for the `test-runner` config. -## 3. Launch +## Launch -The `code` CLI is NOT on PATH. Use the bundled binary inside Codex.app, pointed at the worktree: +The `code` CLI is not on PATH. Use the binary inside the installed app, pointed at your worktree: ```bash -/Applications/Codex.app/Contents/Resources/app/bin/codex --extensionDevelopmentPath=$(pwd) > /tmp/codex-dev-${BRANCH//\//_}.log 2>&1 & -``` - -Title bar shows `[Extension Development Host]`. Process name in `ps` is `Electron`; bundle id is `com.codex`. - -## 4. Grant computer-use access - -``` -request_access(apps=["com.codex"], reason="...") -``` - -If it returns `not_installed`: ask the user to restart the computer-use MCP server, then retry. `lsregister -f /Applications/Codex.app` alone is not enough — the MCP caches its app catalog at startup. - -Codex loses frontmost focus every time you run a Bash call or switch tools. Before each click sequence: - -``` -open_application(app="com.codex") +/Applications/Codex.app/Contents/Resources/app/bin/codex --extensionDevelopmentPath=$(pwd) > /tmp/codex-dev-.log 2>&1 & ``` -Black screenshots mean nothing granted is frontmost — re-open Codex. - -## 5. What to test - -### Always -- **Automatic (LLM) translation** end-to-end on at least one verse. It's the core feature and the most common regression. - -### Based on the diff -```bash -git log --oneline dev..HEAD -git diff --stat dev..HEAD -git diff dev..HEAD -- src/ webviews/ sharedUtils/ -``` +Title bar will show `[Extension Development Host]`. Process name in `ps` is `Electron`; bundle id is `com.codex`. -Look at the changed files and work out which UI surface they drive. Rough map: +## Computer-use access -- `src/smartEdits/`, `llmCompletion.ts` → automatic translation -- `src/exportHandler/` → export flow -- `src/providers/NewSourceUploader/` → source file import -- `src/providers/codexCellEditorProvider/` → the main `.codex` editor (open, edit, save cells) -- `src/projectManager/syncManager.ts` → git sync -- `webviews/codex-webviews/src//` → the corresponding webview panel +Request access by bundle id `com.codex`. If `request_access` returns `not_installed` the MCP's app catalog is stale — ask the user to restart the computer-use MCP server. `lsregister -f` alone won't help. -If you can't tell what UI a change drives, read the diff more carefully — function names, added commands, and new message types usually give it away. If still ambiguous, ask the user. +Codex loses frontmost focus every time you run a Bash call or switch tools. Before each click sequence, re-open it (`open_application` with `com.codex`). Black screenshots mean nothing granted is frontmost. -### While you're in there -Poke around. If something feels off — a label you can't guess, a modal that asks the same question twice, an op with no loading feedback, a dialog that opens behind something — flag it. The value is your confusion, not silent success. +## What to test -## 6. Take notes as you go +**Always:** automatic (LLM) translation end-to-end on at least one verse — it's the core feature and the most common regression. -Keep a running note file in the worktree while testing — every observation in the moment, not reconstructed at the end: +**Based on the diff:** look at changed files and map them to UI surfaces. Rough map: -``` -/tmp/codex-test-notes-.md -``` +| Path | UI | +|---|---| +| `src/smartEdits/`, `llmCompletion.ts` | automatic translation | +| `src/exportHandler/` | export flow | +| `src/providers/NewSourceUploader/` | source file import | +| `src/providers/codexCellEditorProvider/` | the main `.codex` editor (open, edit, save cells) | +| `src/projectManager/syncManager.ts` | git sync | +| `webviews/codex-webviews/src//` | the webview panel of the same name | -For each flow record: -- What you tried -- What happened (including timing — "took 8s with no spinner") -- Whether it matched what the diff suggested should happen -- Anything easy, hard, or surprising -- Screenshot path for noteworthy moments +If a change's UI impact isn't obvious from filenames, read the diff for new commands, message types, or exported functions. Still ambiguous? Ask the user. For docs/config-only diffs, say so and skip the UI pass. -Save screenshots with `screenshot(save_to_disk=true)` so you can reference their paths later. +**While you're in there:** poke around. A label you can't guess, a modal that asks the same question twice, an op with no loading feedback, a dialog that opens behind something — flag it. The value is your confusion, not silent success. -## 7. Post to the PR +## Rebuilding mid-session -Resolve the PR for the branch — prefer commenting on a PR over pushing to a branch: +- `src/` changes: rerun webpack, then Cmd+R in the dev host window. +- Single-webview changes: `pnpm run build:` from `webviews/codex-webviews/` (e.g. `build:CodexCellEditor`), then Cmd+R. `pnpm run smart-watch` in that dir rebuilds whichever view you're iterating on. -```bash -gh pr list --head "$BRANCH" --json number,url,state -``` +## Reporting -If no PR exists, tell the user and ask whether to open one or hold off. +Take notes **as you go**, not reconstructed at the end — include timing ("took 8s with no spinner"), whether behaviour matched what the diff suggested, and what felt easy or hard. Save screenshots of anything noteworthy or unexpected via `screenshot(save_to_disk=true)`. -Write the comment as markdown. Structure: +Post findings to the PR (prefer a PR comment over pushing to the branch). Suggested structure: - **Summary** — one sentence: did the change work? -- **What I tested** — bullet list of flows, with ✅ / ⚠️ / ❌ -- **Findings** — anything counterintuitive, slow, or broken; include reproduction steps -- **Nits** — small things you noticed that aren't blockers -- **Screenshots** — see below - -Post with: - -```bash -gh pr comment --body-file /tmp/codex-test-comment.md -``` +- **What I tested** — bullet list with ✅ / ⚠️ / ❌ +- **Findings** — counterintuitive, slow, or broken; include repro steps +- **Nits** — small things that aren't blockers ### Attaching screenshots -`gh` doesn't have a first-class image upload. Options, in order of preference: - -1. **`gh gist create`** for the image files, then reference the raw URLs in the markdown (`![caption](raw-url)`). Works reliably, one gist per test run keeps things tidy. -2. If only a few images, inline them by uploading to an existing gist or using the repo's own `docs/` or an artifacts branch (only if the repo has that convention — check first). -3. If none of the above work, list local paths in the comment and tell the user the screenshots are on their machine at ``. - -Don't invent URLs. If the upload fails, say so in the comment. - -## 8. Clean up - -Leave the worktree in place by default — the user may want to poke at it. Tell them the path and the cleanup command: +`gh` has no first-class image upload. In order of preference: -```bash -git worktree remove -``` - -Don't remove it yourself unless they ask. +1. `gh gist create` the PNGs, reference raw URLs in the comment markdown. +2. If the repo has a convention for UI screenshots (check `docs/` or past PR comments), follow it. +3. Otherwise list local paths in the comment and tell the user the screenshots are on their machine. -## Rebuilding after edits mid-session - -- `src/` changes: rerun webpack, then Cmd+R in the dev host window. -- Webview changes: `cd webviews/codex-webviews && pnpm run build:` then Cmd+R. `pnpm run smart-watch` in that dir rebuilds whichever webview you're iterating on. +Don't invent URLs. If upload fails, say so. ## Gotchas -- **Do not commit, amend, or push** unless asked. Testing is read-only on the code. +- **Do not commit, amend, or push** unless asked — testing is read-only on the code. - `/tmp/codex-dev-.log` captures stderr if the app crashes on launch. -- The project loaded in the dev host is whatever the user last had open. Don't assume content; explore what's actually there. -- Respect `CLAUDE.md` conventions — no `any`, target <500 lines per file. If the branch violates these, note it in the PR comment. +- The project loaded in the dev host is whatever the user last had open. Explore what's actually there; don't assume content. +- Respect `CLAUDE.md` conventions (no `any`, <500 lines/file). If the branch violates them, note it in the PR comment. +- Leave the worktree in place when done unless asked to clean up — the user may want to poke at it. From b56cd5ec998a4e80cd9908a1c543c2d164077d4d Mon Sep 17 00:00:00 2001 From: dadukhankevin Date: Tue, 14 Apr 2026 15:09:05 -0500 Subject: [PATCH 3/3] Drop screenshot handling from skill Uploading images via gh is awkward and the workaround paths are unreliable. Remove the section entirely; if an agent wants to show the user something visually, it can open the file natively without needing instructions. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/test-codex-changes/SKILL.md | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/.claude/skills/test-codex-changes/SKILL.md b/.claude/skills/test-codex-changes/SKILL.md index fe1a7faa2..b1ec5a60d 100644 --- a/.claude/skills/test-codex-changes/SKILL.md +++ b/.claude/skills/test-codex-changes/SKILL.md @@ -66,7 +66,7 @@ If a change's UI impact isn't obvious from filenames, read the diff for new comm ## Reporting -Take notes **as you go**, not reconstructed at the end — include timing ("took 8s with no spinner"), whether behaviour matched what the diff suggested, and what felt easy or hard. Save screenshots of anything noteworthy or unexpected via `screenshot(save_to_disk=true)`. +Take notes **as you go**, not reconstructed at the end — include timing ("took 8s with no spinner"), whether behaviour matched what the diff suggested, and what felt easy or hard. Post findings to the PR (prefer a PR comment over pushing to the branch). Suggested structure: @@ -75,16 +75,6 @@ Post findings to the PR (prefer a PR comment over pushing to the branch). Sugges - **Findings** — counterintuitive, slow, or broken; include repro steps - **Nits** — small things that aren't blockers -### Attaching screenshots - -`gh` has no first-class image upload. In order of preference: - -1. `gh gist create` the PNGs, reference raw URLs in the comment markdown. -2. If the repo has a convention for UI screenshots (check `docs/` or past PR comments), follow it. -3. Otherwise list local paths in the comment and tell the user the screenshots are on their machine. - -Don't invent URLs. If upload fails, say so. - ## Gotchas - **Do not commit, amend, or push** unless asked — testing is read-only on the code.