claude-build-fix-trigger #5

Workflow file for this run

.github/workflows/claude-build-fix.yml at bff8058

	name: Claude build-failure fix

	on:
	issues:
	types: [labeled]
	repository_dispatch:
	types: [claude-build-fix-trigger]

	# One in-flight fix per issue. Newer triggers queue behind the current run so
	# the counter can't be misread, and so two events don't both spawn Claude.
	concurrency:
	group: claude-build-fix-${{ github.event.issue.number \|\| github.event.client_payload.issue_number }}
	cancel-in-progress: false

	jobs:
	fix:
	# Path 1: human (or non-GITHUB_TOKEN actor) added `build-failure` to an issue.
	# Path 2/3: cron workflow dispatched `claude-build-fix-trigger`.
	if: \|
	(github.event_name == 'issues' && github.event.label.name == 'build-failure') \|\|
	github.event_name == 'repository_dispatch'
	runs-on: ubuntu-latest
	# 45 min budget: Claude clones a fresh vscode/ (1-3 min shallow), dry-applies
	# ~30 patches sequentially, analyzes failure logs, edits patch files, and
	# pushes — with --max-turns 50 this can creep past 30 min on multi-patch
	# breaks. Hitting the timeout still marks the attempt as consumed, so a
	# tight cap accelerates needs-human on transient slowness.
	timeout-minutes: 45
	permissions:
	contents: write
	pull-requests: write
	issues: write
	id-token: write
	actions: read
	steps:
	- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3

	# Resolve which issue number we're operating on and what failing run kicked
	# this off. The fields live in different places on the two event payloads.
	#
	# SECURITY: repository_dispatch payloads are attacker-controllable by anyone
	# with a token holding `repo` scope on this repo. BOTH dispatched fields
	# flow into Claude's prompt template (lines further down) and so are a
	# prompt-injection vector — newlines + "ignore prior instructions" text
	# would be passed verbatim into Claude's context. We fail loudly on either:
	# - issue_number not purely numeric
	# - run_url not a recognizable GitHub Actions run URL
	# The regex on run_url is intentionally tight: only the host (alphanumerics,
	# dots, hyphens) and the GH Actions run path (alphanumerics, dots, slashes,
	# hyphens, underscores ending in `/actions/runs/<digits>`). A newline or
	# synthetic instruction text cannot match.
	- name: Resolve issue number + run URL
	id: ctx
	env:
	EVENT_NAME: ${{ github.event_name }}
	ISSUE_FROM_LABEL: ${{ github.event.issue.number }}
	ISSUE_FROM_DISPATCH: ${{ github.event.client_payload.issue_number }}
	RUN_URL_FROM_DISPATCH: ${{ github.event.client_payload.run_url }}
	run: \|
	set -eo pipefail
	if [ "$EVENT_NAME" = "issues" ]; then
	# github.event.issue.number is GitHub-generated, always numeric.
	echo "issue_number=$ISSUE_FROM_LABEL" >> "$GITHUB_OUTPUT"
	# No run URL on a human label-add; leave blank — Claude is told to
	# fall back to `gh run list --workflow cron-build-and-release.yml`.
	echo "run_url=" >> "$GITHUB_OUTPUT"
	else
	if ! printf '%s' "$ISSUE_FROM_DISPATCH" \| grep -Eq '^[0-9]+$'; then
	echo "::error::client_payload.issue_number is not a positive integer: '$ISSUE_FROM_DISPATCH'"
	exit 1
	fi
	if ! printf '%s' "$RUN_URL_FROM_DISPATCH" \| grep -Eq '^https://[A-Za-z0-9.-]+/[A-Za-z0-9._/-]+/actions/runs/[0-9]+$'; then
	echo "::error::client_payload.run_url is not a GitHub Actions run URL: '$RUN_URL_FROM_DISPATCH'"
	exit 1
	fi
	echo "issue_number=$ISSUE_FROM_DISPATCH" >> "$GITHUB_OUTPUT"
	echo "run_url=$RUN_URL_FROM_DISPATCH" >> "$GITHUB_OUTPUT"
	fi

	# Count prior attempt markers on the issue. The marker is an HTML comment
	# so it never renders to the human reader but is greppable via the API.
	# NOTE: the marker string `<!-- claude-build-fix-attempt -->` is also
	# written below in the "Mark this attempt" step. Keep them in sync.
	#
	# Two filters combine to close the channel:
	# 1. user.login == "github-actions[bot]" — excludes human-pasted
	# markers AND markers in claude[bot]-authored PR comments.
	# 2. body \| startswith("<!-- claude-build-fix-attempt -->") — the
	# marker must be the FIRST line of the comment. Claude has
	# `gh issue comment` access (Bash is in the settings allowlist
	# with GH_TOKEN in the env), so if a Claude iteration quoted or
	# paraphrased a prior marker, the marker text would land on a
	# later line (Markdown blockquote prefix, intro sentence, etc.) —
	# `contains` would inflate the counter, `startswith` won't.
	# The Mark this attempt step below preserves the "marker is first line"
	# invariant. If you ever reorder its heredoc body, also revisit here.
	#
	# Why raw REST instead of `gh issue view --json comments`: `gh` normalizes
	# bot logins by stripping the `[bot]` suffix (`github-actions[bot]` becomes
	# `github-actions`). The author-scoping filter was silently never matching
	# — counter always returned 0, cap was effectively disabled. The raw API
	# preserves the canonical `[bot]`-suffixed login on `.user.login`, which
	# also matches the GitHub UI's attribution and what every other GitHub
	# actor-filter in this org keys on. --paginate handles any issue thread
	# with >30 comments (the per-page default) so the counter doesn't undercount.
	- name: Count prior fix attempts
	id: count
	env:
	GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
	ISSUE: ${{ steps.ctx.outputs.issue_number }}
	run: \|
	# pipefail so a `gh` API failure doesn't silently leave $attempts blank
	# and let the workflow no-op forever on the next dispatch.
	set -eo pipefail
	attempts=$(gh api "/repos/$GITHUB_REPOSITORY/issues/$ISSUE/comments" --paginate \
	--jq '[.[]
	\| select(.user.login == "github-actions[bot]")
	\| select(.body \| startswith("<!-- claude-build-fix-attempt -->"))]
	\| length')
	attempt_num=$((attempts + 1))
	echo "attempts=$attempts" >> "$GITHUB_OUTPUT"
	echo "attempt_num=$attempt_num" >> "$GITHUB_OUTPUT"
	echo "::notice::Prior attempts on issue #$ISSUE: $attempts (cap is 3)"

	# Cap-hit branch: post the summary comment FIRST (audit record), then
	# swap labels, then surface the terminal state on the Actions UI.
	# Order matters: if the label PATCH fails after a successful comment, the
	# next dispatch will hit the cap again and re-run this step idempotently.
	# If we swapped labels first and the comment failed, the label would say
	# "human owns this" with no explanatory record on the issue thread.
	# The workflow itself still exits 0 — we don't want it filing a
	# build-failure issue against itself.
	- name: Give up at cap
	if: fromJSON(steps.count.outputs.attempts) >= 3
	env:
	GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
	ISSUE: ${{ steps.ctx.outputs.issue_number }}
	RUN_URL: ${{ steps.ctx.outputs.run_url }}
	run: \|
	set -eo pipefail
	# If we've already announced the cap on this issue, short-circuit:
	# otherwise a human re-pushing to the fix PR (which re-fires
	# report-pr-failure) would re-trigger this step on every iteration
	# and append duplicate "Auto-fix loop stopped" comments forever.
	# The `needs-human` label is the canonical signal that we already
	# gave up; `gh issue view --json labels --jq` returns the list,
	# `grep -qx` matches exact lines (so 'needs-human-review' or
	# similar wouldn't false-positive).
	if gh issue view "$ISSUE" --json labels --jq '.labels[].name' \| grep -qx needs-human; then
	echo "::notice::Cap already announced on issue #$ISSUE; skipping duplicate summary"
	exit 0
	fi
	gh issue comment "$ISSUE" --body "$(cat <<EOF
	Auto-fix loop stopped after 3 attempts.

	Latest failing run: ${RUN_URL:-(see workflow history)}

	Swapping label \`build-failure\` -> \`needs-human\`. A person needs to look at the fix PR (or open one manually). The three prior attempt markers are above; \`gh run list --workflow cron-build-and-release.yml --status failure\` shows what each iteration tried.
	EOF
	)"
	# Defensively create `needs-human` so first-use on a repo without
	# the label pre-defined doesn't fail the workflow. `\|\| true` covers
	# the already-exists case across gh CLI versions.
	gh label create needs-human --color FBCA04 --description "Auto-fix loop gave up; human needs to take it from here" 2>/dev/null \|\| true
	# Tolerate `build-failure` being already absent: on a re-trigger
	# after the cap already swapped labels, the human may have removed
	# the label or only the dispatch was re-fired. Either way, removing
	# an absent label should not red-X the workflow.
	gh issue edit "$ISSUE" --remove-label build-failure 2>/dev/null \|\| true
	gh issue edit "$ISSUE" --add-label needs-human
	# Surface on the Actions run page itself — without this the workflow
	# finishes green and three failed iterations + one give-up look like a
	# successful run from the UI.
	echo "::warning::Auto-fix cap reached on issue #$ISSUE; label swapped to needs-human"
	cat >> "$GITHUB_STEP_SUMMARY" <<EOF
	## Cap reached on issue #$ISSUE

	Auto-fix loop stopped after 3 attempts. Label swapped from \`build-failure\` to \`needs-human\`.

	- Latest failing run: ${RUN_URL:-(see workflow history)}
	- Issue: ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/issues/$ISSUE
	EOF

	# Below-cap branch: mark this attempt BEFORE invoking Claude, so a crash
	# mid-action still counts. Avoids infinite-retry-on-transient-error.
	# NOTE: the marker string `<!-- claude-build-fix-attempt -->` is also
	# read above in the "Count prior fix attempts" step. Keep them in sync.
	#
	# INVARIANT: the marker MUST be the first line of the comment body.
	# The counter step uses `startswith` to filter out cases where the
	# marker text leaks into a Claude-paraphrased or human-quoted comment
	# (where it would naturally land on a later line). Don't reorder the
	# heredoc body to put a header line ahead of the marker.
	- name: Mark this attempt
	if: fromJSON(steps.count.outputs.attempts) < 3
	env:
	GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
	ATTEMPT_NUM: ${{ steps.count.outputs.attempt_num }}
	ISSUE: ${{ steps.ctx.outputs.issue_number }}
	RUN_URL: ${{ steps.ctx.outputs.run_url }}
	run: \|
	set -eo pipefail
	gh issue comment "$ISSUE" --body "$(cat <<EOF
	<!-- claude-build-fix-attempt -->
	Starting fix attempt ${ATTEMPT_NUM}/3 in run ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}.
	Triggering failure: ${RUN_URL:-(none, manual label add)}
	EOF
	)"

	# Pre-stage upstream vscode/ at the currently-pinned commit so Claude
	# doesn't burn 3–5 min of its 45-min budget on the clone every iteration.
	# The split between cache/restore and cache/save (instead of the combined
	# actions/cache action) is deliberate: actions/cache auto-saves at job
	# end, which would persist Claude's patch-modified vscode/ as the cache
	# entry. Dry-apply on the next iteration would then start from a dirty
	# state. Save explicitly here, right after the clone, so the cache always
	# holds a clean pre-patch tree.
	#
	# Key is hashFiles('upstream/stable.json'): when Spencer (or Claude)
	# bumps the pinned commit, the cache key changes and the next run does a
	# fresh clone at the new commit. GitHub LRU-evicts old entries; the
	# ~500MB-1GB vscode/ entries are well within the 10GB per-repo budget.
	- name: Restore vscode/ cache
	id: vscode_cache
	if: fromJSON(steps.count.outputs.attempts) < 3
	uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
	with:
	path: vscode
	key: vscode-${{ hashFiles('upstream/stable.json') }}

	# Cache miss → clone fresh. Partial clone + shallow fetch keeps the
	# network cost to ~30–60s. If the failing build was actually against an
	# upstream tag NEWER than what's pinned on main (cron fired on an
	# advance before the auto-release PR merged), Claude can `git fetch
	# --depth 1 origin <commit> && git checkout FETCH_HEAD` in place from
	# this base — still much faster than a full re-clone.
	- name: Pre-stage vscode/ at the pinned commit
	if: \|
	fromJSON(steps.count.outputs.attempts) < 3 &&
	steps.vscode_cache.outputs.cache-hit != 'true'
	run: \|
	set -eo pipefail
	# Defensive: if actions/cache/restore partially populated vscode/
	# without reporting cache-hit (cache-service blip; rare but observed
	# in the wild), the clone below would fail with "destination path
	# 'vscode' already exists" and burn an attempt on infra noise.
	rm -rf vscode
	tag=$(jq -r '.tag' upstream/stable.json)
	commit=$(jq -r '.commit' upstream/stable.json)
	# Defensive shape check on values flowing into shell commands.
	# upstream/stable.json is trusted (only `main` can write it), but
	# validate anyway — cheap defense against a future supply-chain
	# mishap that lands a crafted .tag/.commit value on main. The same
	# values would otherwise be interpolated into `git fetch <commit>`
	# and a notice log without escaping.
	if ! printf '%s' "$tag" \| grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+$'; then
	echo "::error::upstream/stable.json .tag is not semver-shaped: '$tag'"
	exit 1
	fi
	if ! printf '%s' "$commit" \| grep -Eq '^[0-9a-f]{40}$'; then
	echo "::error::upstream/stable.json .commit is not a 40-char hex SHA: '$commit'"
	exit 1
	fi
	git clone --filter=blob:none --no-checkout \
	https://github.com/microsoft/vscode.git vscode
	cd vscode
	git fetch --depth 1 origin "$commit"
	git checkout FETCH_HEAD
	echo "::notice::vscode/ ready at tag $tag commit $commit ($(du -sh . \| cut -f1))"

	# Save the clean tree NOW, before Claude touches it. Skipped on cache
	# hit (entry already exists; save would no-op anyway, but the explicit
	# guard saves a 1–2s upload check).
	- name: Save clean vscode/ to cache
	if: \|
	fromJSON(steps.count.outputs.attempts) < 3 &&
	steps.vscode_cache.outputs.cache-hit != 'true'
	uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
	with:
	path: vscode
	key: vscode-${{ hashFiles('upstream/stable.json') }}

	- name: Run Claude
	if: fromJSON(steps.count.outputs.attempts) < 3
	uses: anthropics/claude-code-action@593d7a5c4e0073569f74772c2b7b64c30ec14707 # v1
	with:
	allowed_bots: "*"
	claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
	settings: .github/claude-code-settings.json
	# The superpowers plugin is published via the
	# `claude-plugins-official` marketplace at
	# github.com/anthropics/claude-plugins-official. That marketplace's
	# manifest pins a specific commit of github.com/obra/superpowers
	# (which is the actual source-of-truth repo for the skill). Using
	# the marketplace ref here means we pick up whatever SHA Anthropic
	# has curated; don't point at obra/superpowers directly unless you
	# specifically need to bypass that curation.
	# NOT anthropics/claude-code — that's the CLI source repo, not a
	# plugin marketplace.
	plugin_marketplaces: 'https://github.com/anthropics/claude-plugins-official.git'
	plugins: 'superpowers@claude-plugins-official'
	additional_permissions: \|
	actions: read
	claude_args: \|
	--model sonnet
	--max-turns 50
	--disallowedTools "Bash(gh pr merge:),Bash(git push --force:),Bash(git push --force-with-lease:),Bash(git push -f:)"
	# Branch protection on `main` is the only meaningful guard against
	# Claude pushing to / merging into main. The disallowedTools list
	# above only blocks the most obvious force-push and gh-merge paths;
	# it can't catch every refspec variant pointing at main (a previous
	# attempt at `Bash(git push:main)` was syntactically dead — `:*` is
	# only recognized at the end of a pattern, and no glob can cover
	# every `git push origin <sha>:refs/heads/main` shape Claude could
	# construct). Repo settings MUST keep branch protection on main:
	# require PR review + status checks before merge. If protection is
	# ever disabled, this loop loses its safety net.
	prompt: \|
	/superpowers:systematic-debugging

	A BradfordCode build has failed and we need to fix it.

	Context:
	- Issue: #${{ steps.ctx.outputs.issue_number }}
	- Failing run: ${{ steps.ctx.outputs.run_url }}
	- This is attempt ${{ steps.count.outputs.attempt_num }} of 3.

	If the failing run URL above is empty, fall back to:
	gh run list --workflow cron-build-and-release.yml --status failure --limit 1 --json databaseId,url

	Follow systematic-debugging Phase 1 (root cause via `gh run view --log-failed`)
	BEFORE proposing any fix. Common failure classes from prior incidents:
	- patch apply rejections after upstream rebase (00-brand-remove-branding,
	51-ext-copilot-remove-it, etc.)
	- compile-src TypeScript errors from dangling references after our patches
	- `vscode-min-prepack` ASCII hygiene check on non-ASCII regex literals
	- native module ABI mismatch (NODE_MODULE_VERSION) after Node version bumps

	The workflow has already staged `vscode/` for you at the commit
	pinned in `upstream/stable.json` (partial clone, shallow fetch).
	Use it directly for the dry-apply pattern in CLAUDE.md — apply
	patches in CI order against the staged tree to confirm a fix
	before pushing. If the failing build was against a NEWER upstream
	tag than what's pinned, `cd vscode && git fetch --depth 1 origin
	<commit> && git checkout FETCH_HEAD` to roll the staged tree
	forward; it's faster than re-cloning.

	Open a PR (or push to the existing fix PR for this issue) that addresses
	the root cause.

	Hard rules:
	- Never push to main.
	- Never force-push.
	- Never merge.
	- If opening a new PR, the PR body MUST contain the line:

	Refs #${{ steps.ctx.outputs.issue_number }}

	on its own line. The cron's `report-pr-failure` job parses this line
	to dispatch the next iteration; without it, the loop breaks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

claude-build-fix-trigger #5

Workflow file

claude-build-fix-trigger #5

Uh oh!

Workflow file for this run