feat: add automated QA pipeline with E2E test-driven bug reproduction#9430
feat: add automated QA pipeline with E2E test-driven bug reproduction#9430
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a QA skill pointer, a new ComfyUI frontend QA skill document, and a GitHub Actions workflow that runs Playwright via an MCP-driven Claude action to produce dated markdown QA reports and optionally create draft PRs. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer/Agent
participant Repo as GitHub Repo
participant CI as CI Runner
participant Claude as Claude CLI/Action
participant Browser as Playwright Browser
participant Server as ComfyUI Server
Dev->>Repo: open PR / trigger workflow_dispatch / add qa-run label
Repo->>CI: start qa job (checkout & setup)
CI->>Server: start ComfyUI on port 8188
CI->>Claude: invoke claude-code-action with MCP config + QA prompt
Claude->>Browser: drive Playwright MCP tests against Server
Browser-->>Claude: return test results & artifacts
Claude-->>CI: generate dated markdown QA report
CI->>Repo: commit/upload report to docs/qa/ and post PR comment / create draft PR
Repo-->>Dev: report available / PR comment created
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🎨 Storybook: ✅ Built — View Storybook |
🎭 Playwright: ✅ 933 passed, 0 failed · 6 flaky📊 Browser Reports
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/comfy-qa/SKILL.md`:
- Around line 208-210: Add a language identifier to the fenced code block that
contains "docs/qa/YYYY-MM-DD-NNN-report.md" so it complies with MD040; update
the block starting with ``` to ```text (i.e., change the opening fence for that
snippet to ```text) and keep the closing fence unchanged.
- Around line 217-219: The current command `ls docs/qa/ | grep "$(date
+%Y-%m-%d)" | wc -l` uses a file count and can reuse an existing NNN; replace it
with logic that lists files in docs/qa/ matching the current date pattern,
extracts the numeric prefix (the NNN part), computes max(NNN) and then uses
max+1 as the next index (with the same zero-padding used elsewhere) — locate the
code that runs or documents this command (search for the exact `ls docs/qa/ |
grep "$(date +%Y-%m-%d)" | wc -l` line) and change it to parse numeric prefixes,
take the maximum, and increment to produce a collision-safe NNN.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 443b0713-ef5e-4192-be9d-55819a1d0f94
📒 Files selected for processing (4)
.claude/skillsdocs/qa/.gitkeepskills/comfy-qa/SKILL.mdskills/writing-playwright-tests/SKILL.md
There was a problem hiding this comment.
♻️ Duplicate comments (2)
skills/comfy-qa/SKILL.md (2)
222-224:⚠️ Potential issue | 🟡 MinorMake
NNNcalculation collision-safe.Using file count can reuse an existing index if reports were deleted or numbering is sparse. Use
max(NNN)+1instead.Suggested fix
-```bash -ls docs/qa/ | grep "$(date +%Y-%m-%d)" | wc -l -``` +```bash +DATE="$(date +%Y-%m-%d)" +NEXT_INDEX="$( + ls docs/qa/ 2>/dev/null \ + | sed -n "s/^${DATE}-\([0-9][0-9][0-9]\)-report\.md$/\1/p" \ + | sort -n \ + | tail -1 \ + | awk '{printf "%03d", ($1 == "" ? 1 : $1 + 1)}' +)" +echo "$NEXT_INDEX" +```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/comfy-qa/SKILL.md` around lines 222 - 224, Replace the fragile file-count approach with a collision-safe max-index calculation: compute DATE via DATE="$(date +%Y-%m-%d)", then derive NEXT_INDEX by listing docs/qa, extracting the three-digit NNN from filenames matching the pattern "${DATE}-NNN-report.md", sorting numerically, taking the highest value, and outputting printf "%03d" of (max+1) (handling empty result as 001). Update the snippet around the previous ls|grep|wc pipeline to use the DATE variable and the NEXT_INDEX pipeline so new reports always pick max(NNN)+1 rather than file count.
211-213:⚠️ Potential issue | 🟡 MinorAdd a language identifier to the fenced code block.
Line 211 opens a fenced block without a language, which fails markdownlint MD040.
Suggested fix
-``` +```text docs/qa/YYYY-MM-DD-NNN-report.md</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@skills/comfy-qa/SKILL.mdaround lines 211 - 213, The fenced code block in
SKILL.md that contains "docs/qa/YYYY-MM-DD-NNN-report.md" is missing a language
identifier (causing markdownlint MD040); update the fence opening fromto include a language tag such astext (ormd) so the block becomestext
followed by the file path and closing ``` to satisfy the linter (locate the
fenced block around the "docs/qa/YYYY-MM-DD-NNN-report.md" line).</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In@skills/comfy-qa/SKILL.md:
- Around line 222-224: Replace the fragile file-count approach with a
collision-safe max-index calculation: compute DATE via DATE="$(date +%Y-%m-%d)",
then derive NEXT_INDEX by listing docs/qa, extracting the three-digit NNN from
filenames matching the pattern "${DATE}-NNN-report.md", sorting numerically,
taking the highest value, and outputting printf "%03d" of (max+1) (handling
empty result as 001). Update the snippet around the previous ls|grep|wc pipeline
to use the DATE variable and the NEXT_INDEX pipeline so new reports always pick
max(NNN)+1 rather than file count.- Around line 211-213: The fenced code block in SKILL.md that contains
"docs/qa/YYYY-MM-DD-NNN-report.md" is missing a language identifier (causing
markdownlint MD040); update the fence opening fromto include a language tag such astext (ormd) so the block becomestext followed by the file
path and closing ``` to satisfy the linter (locate the fenced block around the
"docs/qa/YYYY-MM-DD-NNN-report.md" line).</details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro **Run ID**: `40fa76c2-d818-48a8-87b1-0891b19a18a1` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between aaecc6613a8d2552eb0ec45080018e572b0efd4c and 5d7458f30e9fd04387d979c26fc9763eebe17693. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `skills/comfy-qa/SKILL.md` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
.github/workflows/qa-claude.yaml (1)
24-29: Consider reducing permissions to least-privilege.Several permissions appear unused in this workflow:
id-token: write— typically for OIDC authentication with cloud providers, not used hereissues: write— no issue operations are performedpackages: read— no package registry readsRemoving unused permissions reduces the blast radius if the workflow or its dependencies are compromised.
♻️ Suggested permission reduction
permissions: contents: write pull-requests: write - issues: write - id-token: write - packages: read🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/qa-claude.yaml around lines 24 - 29, The workflow currently grants overly broad permissions; remove the unused permission entries (id-token: write, issues: write, packages: read) and restrict the permissions block to only what's needed (e.g., keep contents: write and pull-requests: write or set unused items to none) so the permissions section only lists required keys; update the permissions block in the file by deleting the id-token, issues, and packages lines (or setting them to 'none') to enforce least-privilege.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/qa-claude.yaml:
- Around line 81-82: The step that runs `gh pr comment ${{
github.event.pull_request.number }} --body "..."` can fail when the workflow is
started via workflow_dispatch because `github.event.pull_request.number` is
empty; update the workflow to guard that step by checking for a PR context
(e.g., use a conditional like `if: github.event.pull_request` or `if:
github.event_name == 'pull_request'`) so the `gh pr comment` command only runs
when `github.event.pull_request.number` exists, and reference the `gh pr
comment` invocation and `github.event.pull_request.number` in your change.
- Line 59: Replace the floating tag "uses: anthropics/claude-code-action@v1"
with the repository-pinned SHA used elsewhere
(ff34ce0ff04a470bd3fa56c1ef391c8f1c19f8e9, annotated as v1.0.38) so the
third-party action is SHA-pinned; update the "uses:
anthropics/claude-code-action@v1" line accordingly in the workflow to reference
that exact commit SHA.
- Line 89: Replace the mutable dist-tag in the Playwright MCP command argument
so CI is deterministic: change the string that contains "@playwright/mcp@latest"
(appearing inside the "--mcp-config" args) to an explicit version like
"@playwright/mcp@0.0.68" (or whatever tested version you want to pin) so the
Playwright MCP dependency is fixed for reproducible builds.
---
Nitpick comments:
In @.github/workflows/qa-claude.yaml:
- Around line 24-29: The workflow currently grants overly broad permissions;
remove the unused permission entries (id-token: write, issues: write, packages:
read) and restrict the permissions block to only what's needed (e.g., keep
contents: write and pull-requests: write or set unused items to none) so the
permissions section only lists required keys; update the permissions block in
the file by deleting the id-token, issues, and packages lines (or setting them
to 'none') to enforce least-privilege.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9fb7fedc-e41f-4db8-904f-0230bfac1051
📒 Files selected for processing (1)
.github/workflows/qa-claude.yaml
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.github/workflows/qa-claude.yaml (2)
66-66:⚠️ Potential issue | 🟠 MajorPin Playwright MCP to a tested version (not
@latest).Using a mutable dist-tag in CI makes QA runs non-reproducible and can break unexpectedly.
Suggested fix
- "args": ["@playwright/mcp@latest", "--headless"] + "args": ["@playwright/mcp@<tested-version>", "--headless"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/qa-claude.yaml at line 66, Replace the mutable dist-tag in the Playwright MCP args by pinning the package to a specific, tested semver (e.g., replace the string "@playwright/mcp@latest" in the workflow args array with a fixed version like "@playwright/mcp@X.Y.Z"); update the version to the exact release you validated in CI and commit the change so QA runs are reproducible and won't float to unexpected releases.
80-80:⚠️ Potential issue | 🟡 MinorGuard PR-comment instructions for
workflow_dispatchruns without PR context.When manually dispatched,
github.event.pull_request.numberis empty, sogh pr comment ${PR_NUM}becomes invalid. Add an explicit “skip if no PR number” rule in the prompt.Suggested fix
- 7. Post a summary comment on this PR using: - gh pr comment ${PR_NUM} --body '<your summary>' + 7. If PR number is present, post a summary comment: + gh pr comment ${PR_NUM} --body '<your summary>' + If PR number is empty (workflow_dispatch without PR), skip this step.Also applies to: 105-107
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/qa-claude.yaml at line 80, The workflow sets PR_NUM="${{ github.event.pull_request.number }}" and later runs gh pr comment ${PR_NUM} which fails for workflow_dispatch (no pull_request context); update the workflow to guard/comment when PR_NUM is empty—either set PR_NUM conditionally (e.g., only assign if github.event.pull_request exists) or add an explicit check before invoking gh pr comment to skip commenting when PR_NUM is empty; apply the same guard around the other gh pr comment usages referenced (the blocks around lines with the other PR comment invocations, e.g., the second usage at the 105-107 area) so no comment is attempted without a valid PR number.
🧹 Nitpick comments (1)
.github/workflows/qa-claude.yaml (1)
57-57: Pin@anthropic-ai/claude-codeto an explicit version.Global install without a version introduces CI drift. Use a fixed tested version for deterministic workflow behavior.
Suggested fix
- - name: Install Claude Code - run: npm install -g `@anthropic-ai/claude-code` + - name: Install Claude Code + run: npm install -g `@anthropic-ai/claude-code`@<tested-version>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/qa-claude.yaml at line 57, The workflow step currently installs `@anthropic-ai/claude-code` without a version which allows CI drift; update the step that runs "npm install -g `@anthropic-ai/claude-code`" to pin to a fixed, tested version (for example replace with `@anthropic-ai/claude-code`@<version>) or reference a workflow variable (e.g., CLAUDE_CLI_VERSION) and use that variable in the install command so the CI installs a deterministic release; ensure you update any workflow environment or defaults to define the chosen version.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/qa-claude.yaml:
- Around line 73-82: The workflow is interpolating untrusted expressions
(github.head_ref/github.ref_name) directly into the run shell causing
command-injection risk; move BRANCH, PR_NUM and SHA into the step env: map
BRANCH to "${{ github.head_ref || github.ref_name }}", PR_NUM to "${{
github.event.pull_request.number }}", and SHA to "${{ github.sha }}" and then
reference them inside the run block only via the safe environment variables
BRANCH, PR_NUM, and SHA (do not use expression interpolation inside the run
string).
---
Duplicate comments:
In @.github/workflows/qa-claude.yaml:
- Line 66: Replace the mutable dist-tag in the Playwright MCP args by pinning
the package to a specific, tested semver (e.g., replace the string
"@playwright/mcp@latest" in the workflow args array with a fixed version like
"@playwright/mcp@X.Y.Z"); update the version to the exact release you validated
in CI and commit the change so QA runs are reproducible and won't float to
unexpected releases.
- Line 80: The workflow sets PR_NUM="${{ github.event.pull_request.number }}"
and later runs gh pr comment ${PR_NUM} which fails for workflow_dispatch (no
pull_request context); update the workflow to guard/comment when PR_NUM is
empty—either set PR_NUM conditionally (e.g., only assign if
github.event.pull_request exists) or add an explicit check before invoking gh pr
comment to skip commenting when PR_NUM is empty; apply the same guard around the
other gh pr comment usages referenced (the blocks around lines with the other PR
comment invocations, e.g., the second usage at the 105-107 area) so no comment
is attempted without a valid PR number.
---
Nitpick comments:
In @.github/workflows/qa-claude.yaml:
- Line 57: The workflow step currently installs `@anthropic-ai/claude-code`
without a version which allows CI drift; update the step that runs "npm install
-g `@anthropic-ai/claude-code`" to pin to a fixed, tested version (for example
replace with `@anthropic-ai/claude-code`@<version>) or reference a workflow
variable (e.g., CLAUDE_CLI_VERSION) and use that variable in the install command
so the CI installs a deterministic release; ensure you update any workflow
environment or defaults to define the chosen version.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6f48ad6c-c830-415a-a740-e750c95c1531
📒 Files selected for processing (1)
.github/workflows/qa-claude.yaml
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.github/workflows/qa-claude.yaml (1)
72-85:⚠️ Potential issue | 🔴 CriticalMove untrusted GitHub context values out of inline
runinterpolation (and guard non-PR runs).Line 82 and Line 83 interpolate PR-controlled context directly inside a shell
runblock. That is a script-injection risk, and it also leaves Line 94 invalid whenworkflow_dispatchhas no PR number.🔧 Proposed fix
- name: Write QA prompt + env: + BRANCH: ${{ github.head_ref || github.ref_name }} + PR_NUM: ${{ github.event.pull_request.number || 'N/A' }} + SHA: ${{ github.sha }} run: | cat > /tmp/qa-prompt.txt <<PROMPT You are running an automated QA pass on the ComfyUI frontend. @@ Environment details: - CI=true - Server URL: http://127.0.0.1:8188 - - Branch: ${{ github.head_ref || github.ref_name }} - - PR: #${{ github.event.pull_request.number }} - - Commit: ${{ github.sha }} + - Branch: ${BRANCH} + - PR: #${PR_NUM} + - Commit: ${SHA} @@ - 7. Post a summary comment on this PR using: - gh pr comment ${{ github.event.pull_request.number }} --body '<your summary>' + 7. If PR_NUM is not N/A, post a summary comment using: + gh pr comment ${PR_NUM} --body '<your summary>' + Skip this step for workflow_dispatch runs without PR context.Also applies to: 93-95
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/qa-claude.yaml around lines 72 - 85, The inline shell step named "Write QA prompt" currently interpolates untrusted GitHub context values (github.head_ref, github.ref_name, github.event.pull_request.number, github.sha) directly into the run block; move those context values into step-level env entries (or set them via a previous step output) and reference the env variables inside the run to avoid shell injection, and add a guard that handles non-PR runs (e.g., check if the PR number env is present or use an if: condition like github.event_name == 'pull_request' or provide a safe default) so the prompt generation never fails when workflow_dispatch has no PR number. Ensure you update the step that writes /tmp/qa-prompt.txt (the "Write QA prompt" step) to use the new env vars (PR_NUMBER, BRANCH_NAME, COMMIT_SHA, SERVER_URL) and handle missing PR_NUMBER safely.
🧹 Nitpick comments (1)
.github/workflows/qa-claude.yaml (1)
56-57: Pin Claude Code CLI version for deterministic CI behavior.
npm install -g@anthropic-ai/claude-code`` pulls whichever version is current at runtime. Pinning avoids drift between runs.🔧 Proposed fix
- - name: Install Claude Code - run: npm install -g `@anthropic-ai/claude-code` + - name: Install Claude Code + run: npm install -g `@anthropic-ai/claude-code`@<tested-version>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/qa-claude.yaml around lines 56 - 57, Pin the Claude Code CLI in the GitHub Actions step to avoid drifting versions by replacing the unpinned install command in the "Install Claude Code" step (the line containing npm install -g `@anthropic-ai/claude-code`) with a pinned version, e.g. npm install -g `@anthropic-ai/claude-code`@<version> (or a caret/tilde range like @...@^1.2.3 if you prefer controlled updates); ensure you pick and document the desired version string and update the workflow step accordingly so CI installs that exact CLI release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In @.github/workflows/qa-claude.yaml:
- Around line 72-85: The inline shell step named "Write QA prompt" currently
interpolates untrusted GitHub context values (github.head_ref, github.ref_name,
github.event.pull_request.number, github.sha) directly into the run block; move
those context values into step-level env entries (or set them via a previous
step output) and reference the env variables inside the run to avoid shell
injection, and add a guard that handles non-PR runs (e.g., check if the PR
number env is present or use an if: condition like github.event_name ==
'pull_request' or provide a safe default) so the prompt generation never fails
when workflow_dispatch has no PR number. Ensure you update the step that writes
/tmp/qa-prompt.txt (the "Write QA prompt" step) to use the new env vars
(PR_NUMBER, BRANCH_NAME, COMMIT_SHA, SERVER_URL) and handle missing PR_NUMBER
safely.
---
Nitpick comments:
In @.github/workflows/qa-claude.yaml:
- Around line 56-57: Pin the Claude Code CLI in the GitHub Actions step to avoid
drifting versions by replacing the unpinned install command in the "Install
Claude Code" step (the line containing npm install -g `@anthropic-ai/claude-code`)
with a pinned version, e.g. npm install -g `@anthropic-ai/claude-code`@<version>
(or a caret/tilde range like @...@^1.2.3 if you prefer controlled updates);
ensure you pick and document the desired version string and update the workflow
step accordingly so CI installs that exact CLI release.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4bf3ea08-1679-4438-871f-0c3b269250f3
📒 Files selected for processing (1)
.github/workflows/qa-claude.yaml
QA Report — 2026-03-05Automated frontend QA pass on branch Environment: CI · ComfyUI 0.15.1 · Frontend v1.41.12 · CPU-only · Linux Results
Issues Found1. Missing i18n translation — 2. Settings search does not filter (Minor) Notable Observations
Full report: |
christian-byrne
left a comment
There was a problem hiding this comment.
Have you tried it out? How does it work? Can you share an amp thread URL so I can see?
still drafting by claude code, let me try amp soon so i can share a thread URL with you : D Currently my roadmap is:
|
QA Artifacts — Video Recording TestRun: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/22789598253
Download all artifacts (available for 14 days) |
|
@christian-byrne — I haven't run it end-to-end yet from a public thread, but the QA workflow has been tested in CI (see the runs on this branch). The |
78973e8 to
0bb7d43
Compare
|
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. |
Automated QA report for PR #9430 on branch sno-skills covering all frontend routes and interactive features. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
72c7832 to
212a6ca
Compare
QA Artifacts — Multi-OSRun: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/22816497018
Download artifacts (14 days) |
QA Artifacts — Multi-OSRun: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/22817040092
Download artifacts (14 days) |
QA Artifacts — Multi-OSRun: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/22818315023
Download artifacts (14 days) |
QA 🔍 FocusedRun: https://github.com/Comfy-Org/ComfyUI_frontend/actions/runs/23905050701 · Download artifacts · All videos Video Reviewlinux QA Video Report
AI ReviewSummaryThis PR introduces a robust automated QA pipeline and Claude Code skills for issue reproduction and regression testing. The PR branch includes scripts for test generation, video recording, and CI integration. The provided videos demonstrate the automated QA system (using the newly added
Behavior Changes
Timeline Comparison
Confirmed IssuesNo issues confirmed. The PR branch behaves identically to the main branch in the tested paths. Possible Issues (Needs Human Verification)
Overall RiskLow. The PR primarily adds developer tooling (scripts, CI workflows, and Claude skills). The small changes to frontend components ( Verdict{"verdict": "INCONCLUSIVE", "risk": "low", "confidence": "high"} |
|
@dante01yoon Thank you for review! This branch still iterating fast and may need more time to be mature : D |
| HAS_FILES=false | ||
|
|
||
| # Check for before files (flat or in subdirectory) | ||
| if [ -d "qa-artifacts/before" ] && find qa-artifacts/before -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then | ||
| HAS_FILES=true | ||
| fi | ||
| # Check for after files | ||
| if [ -d "qa-artifacts/after" ] && find qa-artifacts/after -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then | ||
| HAS_FILES=true | ||
| fi | ||
|
|
||
| if [ "$HAS_FILES" = true ]; then | ||
| mkdir -p "$REPORT_DIR" | ||
| # Copy all before files (handles both flat and nested layouts) | ||
| find qa-artifacts/before -type f 2>/dev/null | while read f; do | ||
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | ||
| done | ||
| # Copy all after files (overwrites duplicates with after versions) | ||
| find qa-artifacts/after -type f 2>/dev/null | while read f; do | ||
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | ||
| done | ||
| echo "Merged $os artifacts into $REPORT_DIR" | ||
| ls -la "$REPORT_DIR/" | head -20 | ||
| break # Only create one report dir (multi-OS not yet supported in parallel mode) |
There was a problem hiding this comment.
The per-OS merge loop isn’t actually OS-scoped: the find qa-artifacts/{before,after} checks consider all downloaded artifacts, so the first OS in the loop will always be treated as having files. This prevents producing separate qa-report-{OS}-{run_id} directories and breaks qa-full (3-OS) output expectations.
| HAS_FILES=false | |
| # Check for before files (flat or in subdirectory) | |
| if [ -d "qa-artifacts/before" ] && find qa-artifacts/before -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then | |
| HAS_FILES=true | |
| fi | |
| # Check for after files | |
| if [ -d "qa-artifacts/after" ] && find qa-artifacts/after -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then | |
| HAS_FILES=true | |
| fi | |
| if [ "$HAS_FILES" = true ]; then | |
| mkdir -p "$REPORT_DIR" | |
| # Copy all before files (handles both flat and nested layouts) | |
| find qa-artifacts/before -type f 2>/dev/null | while read f; do | |
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | |
| done | |
| # Copy all after files (overwrites duplicates with after versions) | |
| find qa-artifacts/after -type f 2>/dev/null | while read f; do | |
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | |
| done | |
| echo "Merged $os artifacts into $REPORT_DIR" | |
| ls -la "$REPORT_DIR/" | head -20 | |
| break # Only create one report dir (multi-OS not yet supported in parallel mode) | |
| BEFORE_DIR="qa-artifacts/before/${os}" | |
| AFTER_DIR="qa-artifacts/after/${os}" | |
| HAS_FILES=false | |
| # Check for before files for this OS (flat or in subdirectory) | |
| if [ -d "$BEFORE_DIR" ] && find "$BEFORE_DIR" -type f \( -name '*.webm' -o -name '*.png' \) 2>/dev/null | grep -q .; then | |
| HAS_FILES=true | |
| fi | |
| # Check for after files for this OS | |
| if [ -d "$AFTER_DIR" ] && find "$AFTER_DIR" -type f \( -name '*.webm' -o -name '*.png' \) 2>/dev/null | grep -q .; then | |
| HAS_FILES=true | |
| fi | |
| if [ "$HAS_FILES" = true ]; then | |
| mkdir -p "$REPORT_DIR" | |
| # Copy all before files for this OS (handles both flat and nested layouts) | |
| if [ -d "$BEFORE_DIR" ]; then | |
| find "$BEFORE_DIR" -type f 2>/dev/null | while read f; do | |
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | |
| done | |
| fi | |
| # Copy all after files for this OS (overwrites duplicates with after versions) | |
| if [ -d "$AFTER_DIR" ]; then | |
| find "$AFTER_DIR" -type f 2>/dev/null | while read f; do | |
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | |
| done | |
| fi | |
| echo "Merged $os artifacts into $REPORT_DIR" | |
| ls -la "$REPORT_DIR/" | head -20 | |
| else | |
| echo "No artifacts found for $os, skipping" |
| HAS_FILES=false | ||
|
|
||
| # Check for before files (flat or in subdirectory) | ||
| if [ -d "qa-artifacts/before" ] && find qa-artifacts/before -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then | ||
| HAS_FILES=true | ||
| fi | ||
| # Check for after files | ||
| if [ -d "qa-artifacts/after" ] && find qa-artifacts/after -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then | ||
| HAS_FILES=true | ||
| fi | ||
|
|
||
| if [ "$HAS_FILES" = true ]; then | ||
| mkdir -p "$REPORT_DIR" | ||
| # Copy all before files (handles both flat and nested layouts) | ||
| find qa-artifacts/before -type f 2>/dev/null | while read f; do | ||
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | ||
| done | ||
| # Copy all after files (overwrites duplicates with after versions) | ||
| find qa-artifacts/after -type f 2>/dev/null | while read f; do | ||
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | ||
| done | ||
| echo "Merged $os artifacts into $REPORT_DIR" | ||
| ls -la "$REPORT_DIR/" | head -20 | ||
| break # Only create one report dir (multi-OS not yet supported in parallel mode) |
There was a problem hiding this comment.
This merge step copies every file from all OS artifacts into a single report directory (and then breaks). In qa-full, files like qa-session.webm from different runners will overwrite each other, so you’ll lose 2/3 of the videos. Instead, map each downloaded artifact subdir (e.g. qa-after-${os}-${run_id} / qa-before-${os}-${run_id}) into its matching qa-report-${os}-${run_id} directory without flattening.
| HAS_FILES=false | |
| # Check for before files (flat or in subdirectory) | |
| if [ -d "qa-artifacts/before" ] && find qa-artifacts/before -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then | |
| HAS_FILES=true | |
| fi | |
| # Check for after files | |
| if [ -d "qa-artifacts/after" ] && find qa-artifacts/after -name '*.webm' -o -name '*.png' 2>/dev/null | grep -q .; then | |
| HAS_FILES=true | |
| fi | |
| if [ "$HAS_FILES" = true ]; then | |
| mkdir -p "$REPORT_DIR" | |
| # Copy all before files (handles both flat and nested layouts) | |
| find qa-artifacts/before -type f 2>/dev/null | while read f; do | |
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | |
| done | |
| # Copy all after files (overwrites duplicates with after versions) | |
| find qa-artifacts/after -type f 2>/dev/null | while read f; do | |
| cp "$f" "$REPORT_DIR/" 2>/dev/null || true | |
| done | |
| echo "Merged $os artifacts into $REPORT_DIR" | |
| ls -la "$REPORT_DIR/" | head -20 | |
| break # Only create one report dir (multi-OS not yet supported in parallel mode) | |
| BEFORE_DIR="qa-artifacts/before/qa-before-${os}-${{ github.run_id }}" | |
| AFTER_DIR="qa-artifacts/after/qa-after-${os}-${{ github.run_id }}" | |
| HAS_FILES=false | |
| # Check for before files for this OS | |
| if [ -d "$BEFORE_DIR" ] && find "$BEFORE_DIR" -type f \( -name '*.webm' -o -name '*.png' \) 2>/dev/null | grep -q .; then | |
| HAS_FILES=true | |
| fi | |
| # Check for after files for this OS | |
| if [ -d "$AFTER_DIR" ] && find "$AFTER_DIR" -type f \( -name '*.webm' -o -name '*.png' \) 2>/dev/null | grep -q .; then | |
| HAS_FILES=true | |
| fi | |
| if [ "$HAS_FILES" = true ]; then | |
| mkdir -p "$REPORT_DIR" | |
| # Map the per-OS BEFORE artifact subdir into the matching report directory | |
| if [ -d "$BEFORE_DIR" ]; then | |
| echo "Copying BEFORE artifacts for $os from $BEFORE_DIR to $REPORT_DIR" | |
| cp -R "$BEFORE_DIR" "$REPORT_DIR/" 2>/dev/null || true | |
| fi | |
| # Map the per-OS AFTER artifact subdir into the matching report directory | |
| if [ -d "$AFTER_DIR" ]; then | |
| echo "Copying AFTER artifacts for $os from $AFTER_DIR to $REPORT_DIR" | |
| cp -R "$AFTER_DIR" "$REPORT_DIR/" 2>/dev/null || true | |
| fi | |
| echo "Merged $os artifacts into $REPORT_DIR" | |
| ls -la "$REPORT_DIR/" | head -20 | |
| else | |
| echo "No artifacts found for $os in $BEFORE_DIR or $AFTER_DIR" |
| # Push to {branch}-add-qa-test | ||
| TEST_BRANCH="${PR_BRANCH}-add-qa-test" |
There was a problem hiding this comment.
TEST_BRANCH is derived directly from github.head_ref || github.ref_name and then used in git checkout/git push. Branch names can include characters that are legal in Git but awkward/unsafe in scripts (spaces, ~, ^, : etc.), and this also risks collisions. Prefer generating a sanitized deterministic branch name like qa-tests/pr-${PR_NUM} (or sanitize PR_BRANCH to an allowlist) before using it in git commands.
| # Push to {branch}-add-qa-test | |
| TEST_BRANCH="${PR_BRANCH}-add-qa-test" | |
| # Push to deterministic QA test branch | |
| TEST_BRANCH="qa-tests/pr-${PR_NUM}" |
| if [ "$HAS_BEFORE" = "1" ]; then | ||
| CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>" |
There was a problem hiding this comment.
When HAS_BEFORE is true, the card always references qa-${os}.mp4 for the After video, but HAS_AFTER is also set when only qa-${os}-pass1.mp4 exists. In multi-pass runs that include a before video, this produces broken video/src and download links. Consider selecting the actual after video path (qa-${os}.mp4 if present, else qa-${os}-pass1.mp4) or copying/symlinking pass1 to qa-${os}.mp4 during staging.
| if [ "$HAS_BEFORE" = "1" ]; then | |
| CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-${os}.mp4 download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>" | |
| AFTER_VIDEO_BASENAME="qa-${os}.mp4" | |
| if [ ! -f "$DEPLOY_DIR/${AFTER_VIDEO_BASENAME}" ] && [ -f "$DEPLOY_DIR/qa-${os}-pass1.mp4" ]; then | |
| AFTER_VIDEO_BASENAME="qa-${os}-pass1.mp4" | |
| fi | |
| if [ "$HAS_BEFORE" = "1" ]; then | |
| CARDS="${CARDS}<div class='card reveal' style='--i:${CARD_COUNT}'><div class=card-header><span class=platform><span class=icon>${ICON}</span>${os}</span><span class=links>${REPORT_LINK}</span></div><div class=comparison><div class=comp-panel><div class=comp-label>Before <span class=comp-tag>main</span></div><div class=video-wrap><video controls muted preload=auto><source src=qa-before-${os}.mp4 type=video/mp4></video></div><div class=comp-dl><a class=dl href=qa-before-${os}.mp4 download>${DL_ICON}Before</a></div></div><div class=comp-panel><div class=comp-label>After <span class=comp-tag>PR</span></div><div class=video-wrap><video controls muted preload=auto><source src=${AFTER_VIDEO_BASENAME} type=video/mp4></video></div><div class=comp-dl><a class=dl href=${AFTER_VIDEO_BASENAME} download>${DL_ICON}After</a></div></div></div>${REPORT_HTML}</div>" |
| on: | ||
| # TODO: remove push trigger before merge | ||
| push: | ||
| branches: [sno-skills, sno-qa-*] | ||
| pull_request: | ||
| types: [labeled] | ||
| branches: [main] | ||
| issues: | ||
| types: [labeled] | ||
| workflow_dispatch: |
There was a problem hiding this comment.
The workflow is configured to run on label events but has no guard for forked PRs. Since these jobs rely on repository secrets (Gemini/Cloudflare/etc.), forked PR runs will fail noisily (and still consume runner time). Add an early skip in resolve-matrix (or job-level if:) when github.event.pull_request.head.repo.fork == true (and similarly for issue events if desired).
4f2969d to
a3dd897
Compare
src/stores/dialogStore.ts
Outdated
| export type DialogComponentProps = CustomDialogComponentProps & | ||
| Record<string, unknown> | ||
|
|
||
| export interface DialogInstance< | ||
| H extends Component = Component, | ||
| B extends Component = Component, | ||
| F extends Component = Component | ||
| > { | ||
| export interface DialogInstance { | ||
| key: string | ||
| visible: boolean | ||
| title?: string | ||
| headerComponent?: H | ||
| headerProps?: ComponentAttrs<H> | ||
| component: B | ||
| contentProps: ComponentAttrs<B> | ||
| footerComponent?: F | ||
| footerProps?: ComponentAttrs<F> | ||
| headerComponent?: Component | ||
| headerProps?: Record<string, unknown> | ||
| component: Component | ||
| contentProps: Record<string, unknown> | ||
| footerComponent?: Component | ||
| footerProps?: Record<string, unknown> |
There was a problem hiding this comment.
In DialogComponentProps/DialogInstance, the change from ComponentAttrs<typeof GlobalDialog> + generic component props to Record<string, unknown> removes most compile-time validation for dialog props (e.g., onMaximize, onAfterHide, and component-specific props). This makes it easy to pass incorrect props without TypeScript catching it. If the goal is to avoid importing the SFC type here (e.g., to break a circular dependency), consider exporting the dialog prop types from a non-SFC module (or re-exporting GlobalDialog prop types) and keeping DialogInstance generic so consumers still get typed contentProps/headerProps/footerProps.
.github/workflows/pr-qa.yaml
Outdated
| if: >- | ||
| always() && | ||
| (needs.qa-after.result == 'success' || needs.qa-before.result == 'success') && | ||
| (github.event.pull_request.number || github.event_name == 'push' || github.event_name == 'workflow_dispatch') |
There was a problem hiding this comment.
report is currently gated to only run for PR, push, or workflow_dispatch events. When the workflow is triggered via issues: labeled (e.g. qa-issue), github.event_name is issues and github.event.pull_request.number is empty, so the report/deploy/comment phase will never run even if qa-before succeeds. Update the if: to also allow github.event_name == 'issues' (and ensure the comment-posting logic targets issues correctly).
| (github.event.pull_request.number || github.event_name == 'push' || github.event_name == 'workflow_dispatch') | |
| (github.event.pull_request.number || github.event_name == 'push' || github.event_name == 'workflow_dispatch' || github.event_name == 'issues') |
| env: | ||
| EXTRA_SERVER_PARAMS: ${{ inputs.extra_server_params }} | ||
| run: | | ||
| python main.py --cpu --multi-user --front-end-root ../dist ${{ inputs.extra_server_params }} & | ||
| wait-for-it --service 127.0.0.1:8188 -t 600 | ||
| python main.py --cpu --multi-user --front-end-root ../dist $EXTRA_SERVER_PARAMS & | ||
| for i in $(seq 1 300); do | ||
| curl -sf http://127.0.0.1:8188/api/system_stats >/dev/null 2>&1 && echo "Server ready" && exit 0 | ||
| sleep 2 | ||
| done |
There was a problem hiding this comment.
EXTRA_SERVER_PARAMS is expanded unquoted in the python main.py ... $EXTRA_SERVER_PARAMS command. If the input contains spaces or shell metacharacters, this can break argument parsing or lead to unexpected shell behavior. Prefer parsing it into an array and passing it as "${EXTRA_ARGS[@]}" (or otherwise ensure robust quoting).
| @@ -0,0 +1,135 @@ | |||
| <!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><title>QA Session Recordings</title> | |||
| <link rel=preconnect href=https://fonts.googleapis.com><link rel=preconnect href=https://fonts.gstatic.com crossorigin><link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel=stylesheet> | |||
| <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script> | |||
There was a problem hiding this comment.
The report template pulls marked from a CDN without version pinning or SRI, and then renders markdown via marked.parse() directly into innerHTML. Even though qa-deploy-pages.sh escapes </>, markdown links like javascript: URLs can still produce XSS in the deployed report site. Consider either (1) self-hosting a pinned marked build (or pinning the CDN URL to an exact version + adding integrity), and (2) sanitizing the generated HTML (e.g., DOMPurify) or disabling/validating link protocols in the marked renderer.
| <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/marked@12.0.2/marked.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/dompurify@3.1.6/dist/purify.min.js"></script> | |
| <script> | |
| (function () { | |
| if (typeof marked === 'undefined' || typeof DOMPurify === 'undefined') { | |
| return; | |
| } | |
| var originalParse = marked.parse; | |
| marked.parse = function (markdown, options) { | |
| var dirty = originalParse.call(marked, markdown, options); | |
| return DOMPurify.sanitize(dirty); | |
| }; | |
| })(); | |
| </script> |
Three-phase pipeline triggered by labels (qa-changes, qa-full, qa-issue): 1. Research: Claude writes Playwright E2E tests to reproduce reported bugs 2. Reproduce: Deterministic replay with video recording 3. Report: Deploy results to Cloudflare Pages with badges Key design decisions: - Playwright assertions are source of truth (not AI vision) - Agent has readFixture/readTest tools to discover project patterns - Bug-specific assertions required (trivial assertions banned) - Main branch dist cached by SHA to speed up before/after comparisons - QA deps installed inline in CI (no package.json changes needed) Verified across 48 runs (22 PRs + 26 issues) with 0 false positives. Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0 Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019d519b-004f-71ce-b970-96edd971fbe0 Co-authored-by: Amp <amp@ampcode.com>
Summary
Automated QA pipeline that reproduces reported bugs using Playwright E2E tests. Claude writes tests, runs them, records video evidence. Deploys results to Cloudflare Pages with badges.
Changes
qa-changes,qa-full,qa-issue)@google/generative-ai,@anthropic-ai/claude-agent-sdk(installed inline in CI)Pipeline
Files (18 files, ~7.5k lines)
.github/workflows/pr-qa.yamlscripts/qa-agent.tsscripts/qa-record.tsscripts/qa-reproduce.tsscripts/qa-analyze-pr.tsscripts/qa-video-review.tsscripts/qa-deploy-pages.shscripts/qa-generate-test.tsknip.config.tsResults (48 runs, 0 false positives)
Reproduced ✓
Not Reproducible ✗
Inconclusive / PRs
Triggers
qa-issuelabel on issue → reproduce bugqa-changeslabel on PR → before/after comparisonqa-fulllabel on PR → 3-OS matrix./scripts/qa-batch.sh 10394 10238→ batchReview Focus
ANTHROPIC_API_KEY_QA,GEMINI_API_KEY,CLOUDFLARE_API_TOKENsecrets