Skip to content

feat: add automated QA pipeline with E2E test-driven bug reproduction#9430

Draft
snomiao wants to merge 1 commit intomainfrom
sno-skills
Draft

feat: add automated QA pipeline with E2E test-driven bug reproduction#9430
snomiao wants to merge 1 commit intomainfrom
sno-skills

Conversation

@snomiao
Copy link
Copy Markdown
Member

@snomiao snomiao commented Mar 5, 2026

Summary

Automated QA pipeline that reproduces reported bugs using Playwright E2E tests. Claude writes tests, runs them, records video evidence. Deploys results to Cloudflare Pages with badges.

Changes

  • What: Three-phase pipeline triggered by labels (qa-changes, qa-full, qa-issue)
  • Dependencies: @google/generative-ai, @anthropic-ai/claude-agent-sdk (installed inline in CI)

Pipeline

  1. Research — Claude reads issue, inspects UI via a11y tree, writes Playwright test asserting broken behavior
  2. Reproduce — Runs passing test with video recording
  3. Report — Deploys badge + video + test code to Cloudflare Pages, comments on issue/PR

Files (18 files, ~7.5k lines)

File Purpose
.github/workflows/pr-qa.yaml CI workflow
scripts/qa-agent.ts Research: Claude writes E2E tests via MCP tools
scripts/qa-record.ts Video recording with Gemini-generated steps
scripts/qa-reproduce.ts Deterministic replay with a11y assertions
scripts/qa-analyze-pr.ts PR/issue analysis for targeted QA guides
scripts/qa-video-review.ts Gemini video review
scripts/qa-deploy-pages.sh Cloudflare Pages deploy + badge
scripts/qa-generate-test.ts Regression test generation
knip.config.ts Ignore CI-only deps

Results (48 runs, 0 false positives)

Reproduced ✓

Issue Badge What
#10766 badge Workflow reload overwrites modified tab
#10708 badge Collapsed subgraph text widget disappears
#10688 badge Preview images cleared on tab switch
#10307 badge Clone z-index renders behind original
#10394 badge Node search priority wrong
#10409 badge Sidebar missing file extensions
#10288 badge Accessibility regressions
#7942 badge DOM widget gap after resize

Not Reproducible ✗

Issue Badge What
#10504 badge Group paste alignment
#10424 badge Zoom icon rendering
#3990 badge Scroll leak

Inconclusive / PRs

Target Badge
#10612 badge
PR #10745 badge
PR #10682 badge

Triggers

  • qa-issue label on issue → reproduce bug
  • qa-changes label on PR → before/after comparison
  • qa-full label on PR → 3-OS matrix
  • ./scripts/qa-batch.sh 10394 10238 → batch

Review Focus

  • Label-triggered only, no automatic runs
  • Playwright assertions are source of truth (not AI vision)
  • Trivial assertions banned — tests must assert bug-specific behavior
  • QA deps installed inline in CI (no package.json/lockfile changes)
  • Requires ANTHROPIC_API_KEY_QA, GEMINI_API_KEY, CLOUDFLARE_API_TOKEN secrets

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a QA skill pointer, a new ComfyUI frontend QA skill document, and a GitHub Actions workflow that runs Playwright via an MCP-driven Claude action to produce dated markdown QA reports and optionally create draft PRs.

Changes

Cohort / File(s) Summary
Skill config pointer
\.claude/skills
Adds a single-line pointer referencing ../skills. No code, exports, or logic changes.
ComfyUI QA skill doc
skills/comfy-qa/SKILL.md
Adds a comprehensive Markdown skill defining prerequisites, CI vs local detection, Playwright MCP automation steps, an extensive QA test plan, dated markdown report template, and draft PR creation workflow (includes sample gh CLI script and cross-platform notes).
CI workflow
.github/workflows/qa-claude.yaml
Adds a GitHub Actions workflow triggered on PRs/dispatch/label that checks out the repo, boots ComfyUI (port 8188), installs Playwright and Claude tooling, invokes Claude with an MCP config to run Playwright tests, writes dated reports to docs/qa/, posts PR comments, and can remove the qa-run label.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer/Agent
    participant Repo as GitHub Repo
    participant CI as CI Runner
    participant Claude as Claude CLI/Action
    participant Browser as Playwright Browser
    participant Server as ComfyUI Server
    Dev->>Repo: open PR / trigger workflow_dispatch / add qa-run label
    Repo->>CI: start qa job (checkout & setup)
    CI->>Server: start ComfyUI on port 8188
    CI->>Claude: invoke claude-code-action with MCP config + QA prompt
    Claude->>Browser: drive Playwright MCP tests against Server
    Browser-->>Claude: return test results & artifacts
    Claude-->>CI: generate dated markdown QA report
    CI->>Repo: commit/upload report to docs/qa/ and post PR comment / create draft PR
    Repo-->>Dev: report available / PR comment created
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hop through nodes and click with care,
I steer the browser, watch the canvas flare,
A dated report I tuck and save,
A draft PR born, bold and brave,
Hooray — QA blooms everywhere!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title accurately describes the main change: adding an automated QA pipeline with E2E test-driven bug reproduction capabilities.
Description check ✅ Passed The description includes all required template sections (Summary, Changes, Review Focus) with comprehensive details about the QA pipeline, implementation, triggers, and testing results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sno-skills

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

🎨 Storybook: ✅ Built — View Storybook

Details

⏰ Completed at: 04/04/2026, 04:04:08 AM UTC

Links

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

🎭 Playwright: ✅ 933 passed, 0 failed · 6 flaky

📊 Browser Reports
  • chromium: View Report (✅ 919 / ❌ 0 / ⚠️ 6 / ⏭️ 1)
  • chromium-2x: View Report (✅ 2 / ❌ 0 / ⚠️ 0 / ⏭️ 0)
  • chromium-0.5x: View Report (✅ 1 / ❌ 0 / ⚠️ 0 / ⏭️ 0)
  • mobile-chrome: View Report (✅ 11 / ❌ 0 / ⚠️ 0 / ⏭️ 0)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/comfy-qa/SKILL.md`:
- Around line 208-210: Add a language identifier to the fenced code block that
contains "docs/qa/YYYY-MM-DD-NNN-report.md" so it complies with MD040; update
the block starting with ``` to ```text (i.e., change the opening fence for that
snippet to ```text) and keep the closing fence unchanged.
- Around line 217-219: The current command `ls docs/qa/ | grep "$(date
+%Y-%m-%d)" | wc -l` uses a file count and can reuse an existing NNN; replace it
with logic that lists files in docs/qa/ matching the current date pattern,
extracts the numeric prefix (the NNN part), computes max(NNN) and then uses
max+1 as the next index (with the same zero-padding used elsewhere) — locate the
code that runs or documents this command (search for the exact `ls docs/qa/ |
grep "$(date +%Y-%m-%d)" | wc -l` line) and change it to parse numeric prefixes,
take the maximum, and increment to produce a collision-safe NNN.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 443b0713-ef5e-4192-be9d-55819a1d0f94

📥 Commits

Reviewing files that changed from the base of the PR and between c2fc0c0 and aaecc66.

📒 Files selected for processing (4)
  • .claude/skills
  • docs/qa/.gitkeep
  • skills/comfy-qa/SKILL.md
  • skills/writing-playwright-tests/SKILL.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
skills/comfy-qa/SKILL.md (2)

222-224: ⚠️ Potential issue | 🟡 Minor

Make NNN calculation collision-safe.

Using file count can reuse an existing index if reports were deleted or numbering is sparse. Use max(NNN)+1 instead.

Suggested fix
-```bash
-ls docs/qa/ | grep "$(date +%Y-%m-%d)" | wc -l
-```
+```bash
+DATE="$(date +%Y-%m-%d)"
+NEXT_INDEX="$(
+  ls docs/qa/ 2>/dev/null \
+  | sed -n "s/^${DATE}-\([0-9][0-9][0-9]\)-report\.md$/\1/p" \
+  | sort -n \
+  | tail -1 \
+  | awk '{printf "%03d", ($1 == "" ? 1 : $1 + 1)}'
+)"
+echo "$NEXT_INDEX"
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/comfy-qa/SKILL.md` around lines 222 - 224, Replace the fragile
file-count approach with a collision-safe max-index calculation: compute DATE
via DATE="$(date +%Y-%m-%d)", then derive NEXT_INDEX by listing docs/qa,
extracting the three-digit NNN from filenames matching the pattern
"${DATE}-NNN-report.md", sorting numerically, taking the highest value, and
outputting printf "%03d" of (max+1) (handling empty result as 001). Update the
snippet around the previous ls|grep|wc pipeline to use the DATE variable and the
NEXT_INDEX pipeline so new reports always pick max(NNN)+1 rather than file
count.

211-213: ⚠️ Potential issue | 🟡 Minor

Add a language identifier to the fenced code block.

Line 211 opens a fenced block without a language, which fails markdownlint MD040.

Suggested fix
-```
+```text
 docs/qa/YYYY-MM-DD-NNN-report.md
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @skills/comfy-qa/SKILL.md around lines 211 - 213, The fenced code block in
SKILL.md that contains "docs/qa/YYYY-MM-DD-NNN-report.md" is missing a language
identifier (causing markdownlint MD040); update the fence opening from to include a language tag such astext (or md) so the block becomes text
followed by the file path and closing ``` to satisfy the linter (locate the
fenced block around the "docs/qa/YYYY-MM-DD-NNN-report.md" line).


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @skills/comfy-qa/SKILL.md:

  • Around line 222-224: Replace the fragile file-count approach with a
    collision-safe max-index calculation: compute DATE via DATE="$(date +%Y-%m-%d)",
    then derive NEXT_INDEX by listing docs/qa, extracting the three-digit NNN from
    filenames matching the pattern "${DATE}-NNN-report.md", sorting numerically,
    taking the highest value, and outputting printf "%03d" of (max+1) (handling
    empty result as 001). Update the snippet around the previous ls|grep|wc pipeline
    to use the DATE variable and the NEXT_INDEX pipeline so new reports always pick
    max(NNN)+1 rather than file count.
  • Around line 211-213: The fenced code block in SKILL.md that contains
    "docs/qa/YYYY-MM-DD-NNN-report.md" is missing a language identifier (causing
    markdownlint MD040); update the fence opening from to include a language tag such astext (or md) so the block becomes text followed by the file
    path and closing ``` to satisfy the linter (locate the fenced block around the
    "docs/qa/YYYY-MM-DD-NNN-report.md" line).

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `40fa76c2-d818-48a8-87b1-0891b19a18a1`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between aaecc6613a8d2552eb0ec45080018e572b0efd4c and 5d7458f30e9fd04387d979c26fc9763eebe17693.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `skills/comfy-qa/SKILL.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.github/workflows/qa-claude.yaml (1)

24-29: Consider reducing permissions to least-privilege.

Several permissions appear unused in this workflow:

  • id-token: write — typically for OIDC authentication with cloud providers, not used here
  • issues: write — no issue operations are performed
  • packages: read — no package registry reads

Removing unused permissions reduces the blast radius if the workflow or its dependencies are compromised.

♻️ Suggested permission reduction
     permissions:
       contents: write
       pull-requests: write
-      issues: write
-      id-token: write
-      packages: read
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/qa-claude.yaml around lines 24 - 29, The workflow
currently grants overly broad permissions; remove the unused permission entries
(id-token: write, issues: write, packages: read) and restrict the permissions
block to only what's needed (e.g., keep contents: write and pull-requests: write
or set unused items to none) so the permissions section only lists required
keys; update the permissions block in the file by deleting the id-token, issues,
and packages lines (or setting them to 'none') to enforce least-privilege.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/qa-claude.yaml:
- Around line 81-82: The step that runs `gh pr comment ${{
github.event.pull_request.number }} --body "..."` can fail when the workflow is
started via workflow_dispatch because `github.event.pull_request.number` is
empty; update the workflow to guard that step by checking for a PR context
(e.g., use a conditional like `if: github.event.pull_request` or `if:
github.event_name == 'pull_request'`) so the `gh pr comment` command only runs
when `github.event.pull_request.number` exists, and reference the `gh pr
comment` invocation and `github.event.pull_request.number` in your change.
- Line 59: Replace the floating tag "uses: anthropics/claude-code-action@v1"
with the repository-pinned SHA used elsewhere
(ff34ce0ff04a470bd3fa56c1ef391c8f1c19f8e9, annotated as v1.0.38) so the
third-party action is SHA-pinned; update the "uses:
anthropics/claude-code-action@v1" line accordingly in the workflow to reference
that exact commit SHA.
- Line 89: Replace the mutable dist-tag in the Playwright MCP command argument
so CI is deterministic: change the string that contains "@playwright/mcp@latest"
(appearing inside the "--mcp-config" args) to an explicit version like
"@playwright/mcp@0.0.68" (or whatever tested version you want to pin) so the
Playwright MCP dependency is fixed for reproducible builds.

---

Nitpick comments:
In @.github/workflows/qa-claude.yaml:
- Around line 24-29: The workflow currently grants overly broad permissions;
remove the unused permission entries (id-token: write, issues: write, packages:
read) and restrict the permissions block to only what's needed (e.g., keep
contents: write and pull-requests: write or set unused items to none) so the
permissions section only lists required keys; update the permissions block in
the file by deleting the id-token, issues, and packages lines (or setting them
to 'none') to enforce least-privilege.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9fb7fedc-e41f-4db8-904f-0230bfac1051

📥 Commits

Reviewing files that changed from the base of the PR and between 5d7458f and 88a675e.

📒 Files selected for processing (1)
  • .github/workflows/qa-claude.yaml

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
.github/workflows/qa-claude.yaml (2)

66-66: ⚠️ Potential issue | 🟠 Major

Pin Playwright MCP to a tested version (not @latest).

Using a mutable dist-tag in CI makes QA runs non-reproducible and can break unexpectedly.

Suggested fix
-                "args": ["@playwright/mcp@latest", "--headless"]
+                "args": ["@playwright/mcp@<tested-version>", "--headless"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/qa-claude.yaml at line 66, Replace the mutable dist-tag in
the Playwright MCP args by pinning the package to a specific, tested semver
(e.g., replace the string "@playwright/mcp@latest" in the workflow args array
with a fixed version like "@playwright/mcp@X.Y.Z"); update the version to the
exact release you validated in CI and commit the change so QA runs are
reproducible and won't float to unexpected releases.

80-80: ⚠️ Potential issue | 🟡 Minor

Guard PR-comment instructions for workflow_dispatch runs without PR context.

When manually dispatched, github.event.pull_request.number is empty, so gh pr comment ${PR_NUM} becomes invalid. Add an explicit “skip if no PR number” rule in the prompt.

Suggested fix
-          7. Post a summary comment on this PR using:
-             gh pr comment ${PR_NUM} --body '<your summary>'
+          7. If PR number is present, post a summary comment:
+             gh pr comment ${PR_NUM} --body '<your summary>'
+             If PR number is empty (workflow_dispatch without PR), skip this step.

Also applies to: 105-107

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/qa-claude.yaml at line 80, The workflow sets PR_NUM="${{
github.event.pull_request.number }}" and later runs gh pr comment ${PR_NUM}
which fails for workflow_dispatch (no pull_request context); update the workflow
to guard/comment when PR_NUM is empty—either set PR_NUM conditionally (e.g.,
only assign if github.event.pull_request exists) or add an explicit check before
invoking gh pr comment to skip commenting when PR_NUM is empty; apply the same
guard around the other gh pr comment usages referenced (the blocks around lines
with the other PR comment invocations, e.g., the second usage at the 105-107
area) so no comment is attempted without a valid PR number.
🧹 Nitpick comments (1)
.github/workflows/qa-claude.yaml (1)

57-57: Pin @anthropic-ai/claude-code to an explicit version.

Global install without a version introduces CI drift. Use a fixed tested version for deterministic workflow behavior.

Suggested fix
-      - name: Install Claude Code
-        run: npm install -g `@anthropic-ai/claude-code`
+      - name: Install Claude Code
+        run: npm install -g `@anthropic-ai/claude-code`@<tested-version>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/qa-claude.yaml at line 57, The workflow step currently
installs `@anthropic-ai/claude-code` without a version which allows CI drift;
update the step that runs "npm install -g `@anthropic-ai/claude-code`" to pin to a
fixed, tested version (for example replace with
`@anthropic-ai/claude-code`@<version>) or reference a workflow variable (e.g.,
CLAUDE_CLI_VERSION) and use that variable in the install command so the CI
installs a deterministic release; ensure you update any workflow environment or
defaults to define the chosen version.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/qa-claude.yaml:
- Around line 73-82: The workflow is interpolating untrusted expressions
(github.head_ref/github.ref_name) directly into the run shell causing
command-injection risk; move BRANCH, PR_NUM and SHA into the step env: map
BRANCH to "${{ github.head_ref || github.ref_name }}", PR_NUM to "${{
github.event.pull_request.number }}", and SHA to "${{ github.sha }}" and then
reference them inside the run block only via the safe environment variables
BRANCH, PR_NUM, and SHA (do not use expression interpolation inside the run
string).

---

Duplicate comments:
In @.github/workflows/qa-claude.yaml:
- Line 66: Replace the mutable dist-tag in the Playwright MCP args by pinning
the package to a specific, tested semver (e.g., replace the string
"@playwright/mcp@latest" in the workflow args array with a fixed version like
"@playwright/mcp@X.Y.Z"); update the version to the exact release you validated
in CI and commit the change so QA runs are reproducible and won't float to
unexpected releases.
- Line 80: The workflow sets PR_NUM="${{ github.event.pull_request.number }}"
and later runs gh pr comment ${PR_NUM} which fails for workflow_dispatch (no
pull_request context); update the workflow to guard/comment when PR_NUM is
empty—either set PR_NUM conditionally (e.g., only assign if
github.event.pull_request exists) or add an explicit check before invoking gh pr
comment to skip commenting when PR_NUM is empty; apply the same guard around the
other gh pr comment usages referenced (the blocks around lines with the other PR
comment invocations, e.g., the second usage at the 105-107 area) so no comment
is attempted without a valid PR number.

---

Nitpick comments:
In @.github/workflows/qa-claude.yaml:
- Line 57: The workflow step currently installs `@anthropic-ai/claude-code`
without a version which allows CI drift; update the step that runs "npm install
-g `@anthropic-ai/claude-code`" to pin to a fixed, tested version (for example
replace with `@anthropic-ai/claude-code`@<version>) or reference a workflow
variable (e.g., CLAUDE_CLI_VERSION) and use that variable in the install command
so the CI installs a deterministic release; ensure you update any workflow
environment or defaults to define the chosen version.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f48ad6c-c830-415a-a740-e750c95c1531

📥 Commits

Reviewing files that changed from the base of the PR and between 88a675e and aee988a.

📒 Files selected for processing (1)
  • .github/workflows/qa-claude.yaml

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/qa-claude.yaml (1)

72-85: ⚠️ Potential issue | 🔴 Critical

Move untrusted GitHub context values out of inline run interpolation (and guard non-PR runs).

Line 82 and Line 83 interpolate PR-controlled context directly inside a shell run block. That is a script-injection risk, and it also leaves Line 94 invalid when workflow_dispatch has no PR number.

🔧 Proposed fix
       - name: Write QA prompt
+        env:
+          BRANCH: ${{ github.head_ref || github.ref_name }}
+          PR_NUM: ${{ github.event.pull_request.number || 'N/A' }}
+          SHA: ${{ github.sha }}
         run: |
           cat > /tmp/qa-prompt.txt <<PROMPT
           You are running an automated QA pass on the ComfyUI frontend.
@@
           Environment details:
           - CI=true
           - Server URL: http://127.0.0.1:8188
-          - Branch: ${{ github.head_ref || github.ref_name }}
-          - PR: #${{ github.event.pull_request.number }}
-          - Commit: ${{ github.sha }}
+          - Branch: ${BRANCH}
+          - PR: #${PR_NUM}
+          - Commit: ${SHA}
@@
-          7. Post a summary comment on this PR using:
-             gh pr comment ${{ github.event.pull_request.number }} --body '<your summary>'
+          7. If PR_NUM is not N/A, post a summary comment using:
+             gh pr comment ${PR_NUM} --body '<your summary>'
+             Skip this step for workflow_dispatch runs without PR context.

Also applies to: 93-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/qa-claude.yaml around lines 72 - 85, The inline shell step
named "Write QA prompt" currently interpolates untrusted GitHub context values
(github.head_ref, github.ref_name, github.event.pull_request.number, github.sha)
directly into the run block; move those context values into step-level env
entries (or set them via a previous step output) and reference the env variables
inside the run to avoid shell injection, and add a guard that handles non-PR
runs (e.g., check if the PR number env is present or use an if: condition like
github.event_name == 'pull_request' or provide a safe default) so the prompt
generation never fails when workflow_dispatch has no PR number. Ensure you
update the step that writes /tmp/qa-prompt.txt (the "Write QA prompt" step) to
use the new env vars (PR_NUMBER, BRANCH_NAME, COMMIT_SHA, SERVER_URL) and handle
missing PR_NUMBER safely.
🧹 Nitpick comments (1)
.github/workflows/qa-claude.yaml (1)

56-57: Pin Claude Code CLI version for deterministic CI behavior.

npm install -g @anthropic-ai/claude-code`` pulls whichever version is current at runtime. Pinning avoids drift between runs.

🔧 Proposed fix
-      - name: Install Claude Code
-        run: npm install -g `@anthropic-ai/claude-code`
+      - name: Install Claude Code
+        run: npm install -g `@anthropic-ai/claude-code`@<tested-version>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/qa-claude.yaml around lines 56 - 57, Pin the Claude Code
CLI in the GitHub Actions step to avoid drifting versions by replacing the
unpinned install command in the "Install Claude Code" step (the line containing
npm install -g `@anthropic-ai/claude-code`) with a pinned version, e.g. npm
install -g `@anthropic-ai/claude-code`@<version> (or a caret/tilde range like
@...@^1.2.3 if you prefer controlled updates); ensure you pick and document the
desired version string and update the workflow step accordingly so CI installs
that exact CLI release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/qa-claude.yaml:
- Around line 72-85: The inline shell step named "Write QA prompt" currently
interpolates untrusted GitHub context values (github.head_ref, github.ref_name,
github.event.pull_request.number, github.sha) directly into the run block; move
those context values into step-level env entries (or set them via a previous
step output) and reference the env variables inside the run to avoid shell
injection, and add a guard that handles non-PR runs (e.g., check if the PR
number env is present or use an if: condition like github.event_name ==
'pull_request' or provide a safe default) so the prompt generation never fails
when workflow_dispatch has no PR number. Ensure you update the step that writes
/tmp/qa-prompt.txt (the "Write QA prompt" step) to use the new env vars
(PR_NUMBER, BRANCH_NAME, COMMIT_SHA, SERVER_URL) and handle missing PR_NUMBER
safely.

---

Nitpick comments:
In @.github/workflows/qa-claude.yaml:
- Around line 56-57: Pin the Claude Code CLI in the GitHub Actions step to avoid
drifting versions by replacing the unpinned install command in the "Install
Claude Code" step (the line containing npm install -g `@anthropic-ai/claude-code`)
with a pinned version, e.g. npm install -g `@anthropic-ai/claude-code`@<version>
(or a caret/tilde range like @...@^1.2.3 if you prefer controlled updates);
ensure you pick and document the desired version string and update the workflow
step accordingly so CI installs that exact CLI release.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4bf3ea08-1679-4438-871f-0c3b269250f3

📥 Commits

Reviewing files that changed from the base of the PR and between aee988a and c088765.

📒 Files selected for processing (1)
  • .github/workflows/qa-claude.yaml

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

QA Report — 2026-03-05

Automated frontend QA pass on branch sno-skills (commit dbd8462).

Environment: CI · ComfyUI 0.15.1 · Frontend v1.41.12 · CPU-only · Linux

Results

Category Pass Fail Skip Total
Routes & Load 3 0 1 4
Canvas 6 0 8 14
Node Operations 2 0 8 10
Sidebar 6 0 1 7
Topbar 5 1 2 8
Settings 6 1 0 7
Bottom Panel 3 0 1 4
Execution 2 0 3 5
File Operations 2 0 3 5
Advanced 3 0 2 5
Error Handling 2 0 2 4
Responsive 2 0 1 3
Total 42 2 32 76

Issues Found

1. Missing i18n translation — menuLabels.Share (Minor)
The "Share" menu item in the Workflow Actions dropdown and tab right-click context menu displays the raw i18n key menuLabels.Share instead of translated text. The item is currently disabled so it is cosmetic only.
Screenshot: docs/qa/screenshots/07-workflow-actions-menu-i18n-issue.png

2. Settings search does not filter (Minor)
Typing in the Settings search box does not visibly reduce the list of settings shown. The left nav panel was initially hidden; opening it did not resolve the issue. May be a scroll/display bug or require investigation.

Notable Observations

  • State persistence confirmed: page reload restored both workflow tabs correctly
  • Server logs show KeyError: 'Unknown user: default' (backend issue, handled gracefully by frontend redirect)
  • FPS counter showed 30-60 FPS throughout, good rendering performance in CI
  • 13 workflow templates visible in template chooser

Full report: docs/qa/2026-03-05-001-report.md

Copy link
Copy Markdown
Contributor

@christian-byrne christian-byrne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried it out? How does it work? Can you share an amp thread URL so I can see?

@snomiao
Copy link
Copy Markdown
Member Author

snomiao commented Mar 6, 2026

Have you tried it out? How does it work? Can you share an amp thread URL so I can see?

still drafting by claude code, let me try amp soon so i can share a thread URL with you : D

Currently my roadmap is:

  • All OS support [Github Action CI, linux/mac/windows] -- tested only linux now
  • Works on all agents that supports "skills+MCP" combination (by npx skills add)
  • Use real browser, [by playwright, user's own chrome, or any browser]
  • Triggerable by @comfy-pr-bot on specific problem/PR/issues, instead of full-qa test (which is demonstrated by this PR)
  • Able to Capture test Videos
  • Optional upload generated reports to cache store (likely store it online only for 7 days -- 14days)
    • e.g. for GH Actions CI, it will be github-artifacts (insteadof currently)
    • and if running in GCP (the @comfy-pr-bot 's runnner) it is designed to

@christian-byrne

@snomiao snomiao requested a review from christian-byrne March 6, 2026 08:38
@snomiao
Copy link
Copy Markdown
Member Author

snomiao commented Mar 7, 2026

QA Artifacts — Video Recording Test

Run: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/22789598253

Artifact Details
Video 6.4MB screen recording of full QA session (mp4)
Screenshots 20+ screenshots of each test area
Report Full QA report markdown

Download all artifacts (available for 14 days)

@snomiao snomiao added the qa-run Trigger QA workflow label Mar 7, 2026
@snomiao
Copy link
Copy Markdown
Member Author

snomiao commented Mar 7, 2026

@christian-byrne — I haven't run it end-to-end yet from a public thread, but the QA workflow has been tested in CI (see the runs on this branch). The comfy-qa skill drives Claude CLI + Playwright MCP to navigate all routes, test canvas/node ops/sidebar/settings, then commits a markdown report to docs/qa/. The CI workflow runs it on Linux/macOS/Windows and uploads video artifacts. I can trigger a full local run and share output if helpful.

@github-actions github-actions bot force-pushed the sno-skills branch 2 times, most recently from 78973e8 to 0bb7d43 Compare March 7, 2026 04:36
@socket-security
Copy link
Copy Markdown

socket-security bot commented Mar 8, 2026

All alerts resolved. Learn more about Socket for GitHub.

This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored.

View full report

github-actions bot pushed a commit that referenced this pull request Mar 8, 2026
Automated QA report for PR #9430 on branch sno-skills covering all
frontend routes and interactive features.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

QA Artifacts — Multi-OS

Run: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/22816497018

OS Artifact Recording
Linux qa-report-Linux-22816497018 Xvfb + x11grab
macOS qa-report-macOS-22816497018 avfoundation
Windows qa-report-Windows-22816497018 gdigrab

Download artifacts (14 days)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

QA Artifacts — Multi-OS

Run: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/22817040092

OS Artifact Recording
Linux qa-report-Linux-22817040092 Xvfb + x11grab
macOS qa-report-macOS-22817040092 avfoundation
Windows qa-report-Windows-22817040092 gdigrab

Download artifacts (14 days)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

QA Artifacts — Multi-OS

Run: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/22818315023

OS Artifact Recording
Linux qa-report-Linux-22818315023 Xvfb + x11grab
macOS qa-report-macOS-22818315023 avfoundation
Windows qa-report-Windows-22818315023 gdigrab

Download artifacts (14 days)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

QA 🔍 Focused

QA Badge

Linux QA

Run: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/23905050701 · Download artifacts · All videos
Commits: main d946694 · PR 04bf4cb

Video Review

linux QA Video Report

AI Review

Summary

This PR introduces a robust automated QA pipeline and Claude Code skills for issue reproduction and regression testing. The PR branch includes scripts for test generation, video recording, and CI integration.

The provided videos demonstrate the automated QA system (using the newly added qa-record.ts infrastructure) executing a smoke test on both the BEFORE (main) and AFTER (PR) versions of the application.

  • The test covers: User login, opening/browsing the "Templates" modal, and navigating the main "File" menu.
  • Comparison: The behavior and UI are identical in both videos. This confirms that the new QA infrastructure is compatible with the existing frontend and that no regressions were introduced to these core functional areas.
  • Bug Fixes: While the PR description mentions fixing several specific issues (e.g., [Bug]: Sampler preview disappears when switching tabs or applications #10688 regarding preview images on tab switch), the provided videos do not demonstrate those specific scenarios (e.g., no tab switching occurs). Therefore, the individual bug fixes cannot be verified from these videos alone.

Behavior Changes

Behavior Before (main) After (PR) Verdict
Login Flow Successfully logs in with user "qa-ci". Successfully logs in with user "qa-ci". No Change
Template Browser Opens modal; preview images and videos for "Getting Started" templates load correctly. Identical behavior; templates load and display correctly. No Change
File Menu Displays options: New, Open, Save, Save As, Export, Export (API). Identical menu structure and item availability. No Change
Nodes 2.0 BETA Badge Visible in the main menu with blue styling. Visible with identical styling. No Change
Automated Recording Automated cursor movement and sequence execution functional. Automated cursor movement and sequence execution functional. No Change

Timeline Comparison

Time Type Severity Before (main) After (PR)
0:00-0:03 State None Login screen presented and "Next" clicked. Identical login sequence.
0:05-0:08 Visual None Templates modal opens; images and video thumbnails for "1.2 Starter" load/play. Identical loading and playback behavior.
0:12-0:14 Menu None Main "File" menu navigated via automation. Identical menu navigation and layout.

Confirmed Issues

No issues confirmed. The PR branch behaves identically to the main branch in the tested paths.


Possible Issues (Needs Human Verification)

Overall Risk

Low. The PR primarily adds developer tooling (scripts, CI workflows, and Claude skills). The small changes to frontend components (LazyImage.vue, dialogStore.ts) do not appear to impact core stability or the appearance of the template browser/menus. The consistency between the BEFORE and AFTER videos suggests high stability for the new infrastructure.

Verdict

{"verdict": "INCONCLUSIVE", "risk": "low", "confidence": "high"}

@snomiao snomiao changed the title feat: add skills/ folder with comfy-qa skill feat: add Claude Code skills and automated QA CI pipeline Mar 14, 2026
@snomiao snomiao marked this pull request as ready for review March 16, 2026 02:30
@snomiao snomiao requested a review from a team as a code owner March 16, 2026 02:30
Copilot AI review requested due to automatic review settings March 16, 2026 02:30
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 16, 2026
github-actions bot added a commit that referenced this pull request Mar 23, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 24, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 24, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 24, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 24, 2026
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 24, 2026
dante01yoon
dante01yoon previously approved these changes Mar 27, 2026
@snomiao
Copy link
Copy Markdown
Member Author

snomiao commented Mar 27, 2026

@dante01yoon Thank you for review! This branch still iterating fast and may need more time to be mature : D

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated 5 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment on lines +734 to +757
HAS_FILES=false

# Check for before files (flat or in subdirectory)
if [ -d "qa-artifacts/before" ] && find qa-artifacts/before -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
# Check for after files
if [ -d "qa-artifacts/after" ] && find qa-artifacts/after -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then
HAS_FILES=true
fi

if [ "$HAS_FILES" = true ]; then
mkdir -p "$REPORT_DIR"
# Copy all before files (handles both flat and nested layouts)
find qa-artifacts/before -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
# Copy all after files (overwrites duplicates with after versions)
find qa-artifacts/after -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
echo "Merged $os artifacts into $REPORT_DIR"
ls -la "$REPORT_DIR/" | head -20
break # Only create one report dir (multi-OS not yet supported in parallel mode)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-OS merge loop isn’t actually OS-scoped: the find qa-artifacts/{before,after} checks consider all downloaded artifacts, so the first OS in the loop will always be treated as having files. This prevents producing separate qa-report-{OS}-{run_id} directories and breaks qa-full (3-OS) output expectations.

Suggested change
HAS_FILES=false
# Check for before files (flat or in subdirectory)
if [ -d "qa-artifacts/before" ] && find qa-artifacts/before -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
# Check for after files
if [ -d "qa-artifacts/after" ] && find qa-artifacts/after -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
if [ "$HAS_FILES" = true ]; then
mkdir -p "$REPORT_DIR"
# Copy all before files (handles both flat and nested layouts)
find qa-artifacts/before -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
# Copy all after files (overwrites duplicates with after versions)
find qa-artifacts/after -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
echo "Merged $os artifacts into $REPORT_DIR"
ls -la "$REPORT_DIR/" | head -20
break # Only create one report dir (multi-OS not yet supported in parallel mode)
BEFORE_DIR="qa-artifacts/before/${os}"
AFTER_DIR="qa-artifacts/after/${os}"
HAS_FILES=false
# Check for before files for this OS (flat or in subdirectory)
if [ -d "$BEFORE_DIR" ] && find "$BEFORE_DIR" -type f \( -name '*.webm' -o -name '*.png' \) 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
# Check for after files for this OS
if [ -d "$AFTER_DIR" ] && find "$AFTER_DIR" -type f \( -name '*.webm' -o -name '*.png' \) 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
if [ "$HAS_FILES" = true ]; then
mkdir -p "$REPORT_DIR"
# Copy all before files for this OS (handles both flat and nested layouts)
if [ -d "$BEFORE_DIR" ]; then
find "$BEFORE_DIR" -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
fi
# Copy all after files for this OS (overwrites duplicates with after versions)
if [ -d "$AFTER_DIR" ]; then
find "$AFTER_DIR" -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
fi
echo "Merged $os artifacts into $REPORT_DIR"
ls -la "$REPORT_DIR/" | head -20
else
echo "No artifacts found for $os, skipping"

Copilot uses AI. Check for mistakes.
Comment on lines +734 to +757
HAS_FILES=false

# Check for before files (flat or in subdirectory)
if [ -d "qa-artifacts/before" ] && find qa-artifacts/before -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
# Check for after files
if [ -d "qa-artifacts/after" ] && find qa-artifacts/after -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then
HAS_FILES=true
fi

if [ "$HAS_FILES" = true ]; then
mkdir -p "$REPORT_DIR"
# Copy all before files (handles both flat and nested layouts)
find qa-artifacts/before -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
# Copy all after files (overwrites duplicates with after versions)
find qa-artifacts/after -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
echo "Merged $os artifacts into $REPORT_DIR"
ls -la "$REPORT_DIR/" | head -20
break # Only create one report dir (multi-OS not yet supported in parallel mode)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merge step copies every file from all OS artifacts into a single report directory (and then breaks). In qa-full, files like qa-session.webm from different runners will overwrite each other, so you’ll lose 2/3 of the videos. Instead, map each downloaded artifact subdir (e.g. qa-after-${os}-${run_id} / qa-before-${os}-${run_id}) into its matching qa-report-${os}-${run_id} directory without flattening.

Suggested change
HAS_FILES=false
# Check for before files (flat or in subdirectory)
if [ -d "qa-artifacts/before" ] && find qa-artifacts/before -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
# Check for after files
if [ -d "qa-artifacts/after" ] && find qa-artifacts/after -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
if [ "$HAS_FILES" = true ]; then
mkdir -p "$REPORT_DIR"
# Copy all before files (handles both flat and nested layouts)
find qa-artifacts/before -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
# Copy all after files (overwrites duplicates with after versions)
find qa-artifacts/after -type f 2>/dev/null | while read f; do
cp "$f" "$REPORT_DIR/" 2>/dev/null || true
done
echo "Merged $os artifacts into $REPORT_DIR"
ls -la "$REPORT_DIR/" | head -20
break # Only create one report dir (multi-OS not yet supported in parallel mode)
BEFORE_DIR="qa-artifacts/before/qa-before-${os}-${{ github.run_id }}"
AFTER_DIR="qa-artifacts/after/qa-after-${os}-${{ github.run_id }}"
HAS_FILES=false
# Check for before files for this OS
if [ -d "$BEFORE_DIR" ] && find "$BEFORE_DIR" -type f \( -name '*.webm' -o -name '*.png' \) 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
# Check for after files for this OS
if [ -d "$AFTER_DIR" ] && find "$AFTER_DIR" -type f \( -name '*.webm' -o -name '*.png' \) 2>/dev/null | grep -q .; then
HAS_FILES=true
fi
if [ "$HAS_FILES" = true ]; then
mkdir -p "$REPORT_DIR"
# Map the per-OS BEFORE artifact subdir into the matching report directory
if [ -d "$BEFORE_DIR" ]; then
echo "Copying BEFORE artifacts for $os from $BEFORE_DIR to $REPORT_DIR"
cp -R "$BEFORE_DIR" "$REPORT_DIR/" 2>/dev/null || true
fi
# Map the per-OS AFTER artifact subdir into the matching report directory
if [ -d "$AFTER_DIR" ]; then
echo "Copying AFTER artifacts for $os from $AFTER_DIR to $REPORT_DIR"
cp -R "$AFTER_DIR" "$REPORT_DIR/" 2>/dev/null || true
fi
echo "Merged $os artifacts into $REPORT_DIR"
ls -la "$REPORT_DIR/" | head -20
else
echo "No artifacts found for $os in $BEFORE_DIR or $AFTER_DIR"

Copilot uses AI. Check for mistakes.
Comment on lines +935 to +936
# Push to {branch}-add-qa-test
TEST_BRANCH="${PR_BRANCH}-add-qa-test"
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEST_BRANCH is derived directly from github.head_ref || github.ref_name and then used in git checkout/git push. Branch names can include characters that are legal in Git but awkward/unsafe in scripts (spaces, ~, ^, : etc.), and this also risks collisions. Prefer generating a sanitized deterministic branch name like qa-tests/pr-${PR_NUM} (or sanitize PR_BRANCH to an allowlist) before using it in git commands.

Suggested change
# Push to {branch}-add-qa-test
TEST_BRANCH="${PR_BRANCH}-add-qa-test"
# Push to deterministic QA test branch
TEST_BRANCH="qa-tests/pr-${PR_NUM}"

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +80
if [ "$HAS_BEFORE" = "1" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When HAS_BEFORE is true, the card always references qa-${os}.mp4 for the After video, but HAS_AFTER is also set when only qa-${os}-pass1.mp4 exists. In multi-pass runs that include a before video, this produces broken video/src and download links. Consider selecting the actual after video path (qa-${os}.mp4 if present, else qa-${os}-pass1.mp4) or copying/symlinking pass1 to qa-${os}.mp4 during staging.

Suggested change
if [ "$HAS_BEFORE" = "1" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"
AFTER_VIDEO_BASENAME="qa-${os}.mp4"
if [ ! -f "$DEPLOY_DIR/${AFTER_VIDEO_BASENAME}" ] && [ -f "$DEPLOY_DIR/qa-${os}-pass1.mp4" ]; then
AFTER_VIDEO_BASENAME="qa-${os}-pass1.mp4"
fi
if [ "$HAS_BEFORE" = "1" ]; then
CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=auto><source src=${AFTER_VIDEO_BASENAME} type=video/mp4></video></div><div class=comp-dl><a class=dl href=${AFTER_VIDEO_BASENAME} download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>"

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +21
on:
# TODO: remove push trigger before merge
push:
branches: [sno-skills, sno-qa-*]
pull_request:
types: [labeled]
branches: [main]
issues:
types: [labeled]
workflow_dispatch:
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow is configured to run on label events but has no guard for forked PRs. Since these jobs rely on repository secrets (Gemini/Cloudflare/etc.), forked PR runs will fail noisily (and still consume runner time). Add an early skip in resolve-matrix (or job-level if:) when github.event.pull_request.head.repo.fork == true (and similarly for issue events if desired).

Copilot uses AI. Check for mistakes.
@snomiao snomiao added the claude-review Add to trigger a PR code review from Claude Code label Mar 27, 2026
@snomiao snomiao marked this pull request as draft March 28, 2026 11:07
@snomiao snomiao force-pushed the sno-skills branch 2 times, most recently from 4f2969d to a3dd897 Compare March 31, 2026 09:48
@snomiao snomiao requested a review from Copilot March 31, 2026 11:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 27 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment on lines +36 to +48
export type DialogComponentProps = CustomDialogComponentProps &
Record<string, unknown>

export interface DialogInstance<
H extends Component = Component,
B extends Component = Component,
F extends Component = Component
> {
export interface DialogInstance {
key: string
visible: boolean
title?: string
headerComponent?: H
headerProps?: ComponentAttrs<H>
component: B
contentProps: ComponentAttrs<B>
footerComponent?: F
footerProps?: ComponentAttrs<F>
headerComponent?: Component
headerProps?: Record<string, unknown>
component: Component
contentProps: Record<string, unknown>
footerComponent?: Component
footerProps?: Record<string, unknown>
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DialogComponentProps/DialogInstance, the change from ComponentAttrs<typeof GlobalDialog> + generic component props to Record<string, unknown> removes most compile-time validation for dialog props (e.g., onMaximize, onAfterHide, and component-specific props). This makes it easy to pass incorrect props without TypeScript catching it. If the goal is to avoid importing the SFC type here (e.g., to break a circular dependency), consider exporting the dialog prop types from a non-SFC module (or re-exporting GlobalDialog prop types) and keeping DialogInstance generic so consumers still get typed contentProps/headerProps/footerProps.

Copilot uses AI. Check for mistakes.
if: >-
always() &&
(needs.qa-after.result == 'success' || needs.qa-before.result == 'success') &&
(github.event.pull_request.number || github.event_name == 'push' || github.event_name == 'workflow_dispatch')
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report is currently gated to only run for PR, push, or workflow_dispatch events. When the workflow is triggered via issues: labeled (e.g. qa-issue), github.event_name is issues and github.event.pull_request.number is empty, so the report/deploy/comment phase will never run even if qa-before succeeds. Update the if: to also allow github.event_name == 'issues' (and ensure the comment-posting logic targets issues correctly).

Suggested change
(github.event.pull_request.number || github.event_name == 'push' || github.event_name == 'workflow_dispatch')
(github.event.pull_request.number || github.event_name == 'push' || github.event_name == 'workflow_dispatch' || github.event_name == 'issues')

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +59
env:
EXTRA_SERVER_PARAMS: ${{ inputs.extra_server_params }}
run: |
python main.py --cpu --multi-user --front-end-root ../dist ${{ inputs.extra_server_params }} &
wait-for-it --service 127.0.0.1:8188 -t 600
python main.py --cpu --multi-user --front-end-root ../dist $EXTRA_SERVER_PARAMS &
for i in $(seq 1 300); do
curl -sf http://127.0.0.1:8188/api/system_stats >/dev/null 2>&1 && echo "Server ready" && exit 0
sleep 2
done
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXTRA_SERVER_PARAMS is expanded unquoted in the python main.py ... $EXTRA_SERVER_PARAMS command. If the input contains spaces or shell metacharacters, this can break argument parsing or lead to unexpected shell behavior. Prefer parsing it into an array and passing it as "${EXTRA_ARGS[@]}" (or otherwise ensure robust quoting).

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,135 @@
<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><title>QA Session Recordings</title>
<link rel=preconnect href=https://fonts.googleapis.com><link rel=preconnect href=https://fonts.gstatic.com crossorigin><link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel=stylesheet>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The report template pulls marked from a CDN without version pinning or SRI, and then renders markdown via marked.parse() directly into innerHTML. Even though qa-deploy-pages.sh escapes </>, markdown links like javascript: URLs can still produce XSS in the deployed report site. Consider either (1) self-hosting a pinned marked build (or pinning the CDN URL to an exact version + adding integrity), and (2) sanitizing the generated HTML (e.g., DOMPurify) or disabling/validating link protocols in the marked renderer.

Suggested change
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/marked@12.0.2/marked.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.6/dist/purify.min.js"></script>
<script>
(function () {
if (typeof marked === 'undefined' || typeof DOMPurify === 'undefined') {
return;
}
var originalParse = marked.parse;
marked.parse = function (markdown, options) {
var dirty = originalParse.call(marked, markdown, options);
return DOMPurify.sanitize(dirty);
};
})();
</script>

Copilot uses AI. Check for mistakes.
Three-phase pipeline triggered by labels (qa-changes, qa-full, qa-issue):
1. Research: Claude writes Playwright E2E tests to reproduce reported bugs
2. Reproduce: Deterministic replay with video recording
3. Report: Deploy results to Cloudflare Pages with badges

Key design decisions:
- Playwright assertions are source of truth (not AI vision)
- Agent has readFixture/readTest tools to discover project patterns
- Bug-specific assertions required (trivial assertions banned)
- Main branch dist cached by SHA to speed up before/after comparisons
- QA deps installed inline in CI (no package.json changes needed)

Verified across 48 runs (22 PRs + 26 issues) with 0 false positives.

Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0
Co-authored-by: Amp <amp@ampcode.com>
@snomiao snomiao changed the title feat: add Claude Code skills and automated QA CI pipeline feat: add automated QA pipeline with E2E test-driven bug reproduction Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CI/CD claude-review Add to trigger a PR code review from Claude Code size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants