ci: run privileged workflows from main; pin actions; harden CI

behinddwalls · behinddwalls · commit b2ddb11907e6 · 2026-06-04T19:52:03.000-07:00
> ⚠️ **P0 — required before merge.** `STACK_REBASE_TOKEN` is currently a plain > repository secret, so any same-repo PR can exfiltrate it today by referencing > `${{ secrets.STACK_REBASE_TOKEN }}` in a modified/added workflow — it runs on > the PR, *before any review*, and log-masking is trivially bypassed. The > `pull_request_target` change here does **not** close that (an attacker uses a > different workflow to read the ambient secret). The actual fix is moving the > PAT into a `main`-only Environment so PR-triggered jobs cannot obtain it — see > item **#1** in "Required manual follow-up". Treat that as blocking. ## Summary ### Why? A review of `.github/workflows/` found one privilege-escalation hole and several supply-chain / least-privilege gaps. GitHub runs a `pull_request` workflow from the PR's HEAD ref, so a PR can modify the workflow that runs on it. `rebase-stack.yml` — which holds a repo-write PAT (STACK_REBASE_TOKEN) and triggers on `pull_request: closed` — was therefore exploitable: a contributor could rewrite the file in their own PR (dropping the `merged` guard), close it, and run arbitrary code with the PAT. Same-repo branch PRs (the stacked-PR model) DO receive secrets, so this was reachable by any contributor with push access or a compromised account. Privileged automation must run its definition from `main`, never from PR-controlled content. CI test jobs must still execute PR code — that is the purpose of a test pipeline — so the goal is to lock down *privileged* execution and *supply-chain* references, not to stop PR testing. The defense rests on an invariant: the untrusted-code path (`ci.yml` + its composite actions) carries no secrets and a read-only token, so there is nothing for a malicious PR to steal. ### What? - rebase-stack.yml: trigger `pull_request` -> `pull_request_target`, so GitHub always runs main's version of the file regardless of PR content. Safe here because the job never builds/runs PR code — it only replays commits via `git rebase` and validates the diff is byte-identical before force-pushing. Adds `environment: stack-rebase` to scope the PAT to a protected, main-only environment. - ci.yml: `persist-credentials: false` on every checkout (jobs execute untrusted PR code); PR-only `concurrency` cancellation (never cancels push/merge_group); new `workflow-security` job (actionlint + zizmor) wired into the required-checks gate; the gate's debug echo now passes `toJSON(needs)` via env to avoid in-script template expansion. - workflow-security: added a guard step that fails CI if `ci.yml` or its composite actions reference a repository secret (GITHUB_TOKEN allowlisted). This keeps the untrusted-code path secret-free, catching accidental reintroduction of a reachable secret. (It catches honest mistakes; the malicious case is covered by environment-scoped secrets + CODEOWNERS review.) - All third-party actions pinned to immutable commit SHAs (checkout, cache, actionlint, zizmor) so a moved tag cannot inject unapproved code. - .github/zizmor.yml: documents the intentional, reviewed exceptions (dangerous-triggers on the deliberate pull_request_target; artipacked on the two workflows that must persist credentials to push). - CODEOWNERS: require admin review for any change under `.github/`. ## Test Plan - ✅ `actionlint .github/workflows/*.yml` — no issues. - ✅ `zizmor --persona=regular .github/workflows/` (offline + online) — "No findings to report" (3 intentional ignores, documented in zizmor.yml). - ✅ Secrets guard: passes on the current secret-free path, and fires correctly when a `${{ secrets.* }}` reference is introduced. - ✅ All workflow/action YAML parses; no unpinned action refs remain. - Post-merge verification: open a PR that edits rebase-stack.yml (add an `echo PWNED` step, drop the `merged` guard) and CLOSE it without merging — confirm GitHub runs main's version (no PWNED, job skipped). Then merge a 2-PR stack and confirm children are still rebased/retargeted as before. ## Required manual follow-up (repo / org settings) These cannot be expressed in files and must be applied by a maintainer. - [ ] **🔴 P0 (blocking): Create the `stack-rebase` Environment**, restrict its deployment branches to `main`, and move `STACK_REBASE_TOKEN` into it so the PAT is unreachable from any PR-triggered job. Until this is done the PAT is exfiltratable today (see the callout above), and `rebase-stack.yml` (which now declares `environment: stack-rebase`) will not run. - [ ] **Settings → Actions → General → Fork pull request workflows from outside collaborators**: "Require approval for all external contributors" so PR code never auto-executes for new contributors. - [ ] **Settings → Actions → General → Workflow permissions**: default `GITHUB_TOKEN` to read-only; uncheck "Allow GitHub Actions to create and approve pull requests" unless required. - [ ] **Branch protection / ruleset on `main`**: require PR review, require review from Code Owners (so `/.github/` changes need admin approval), require the `Required Checks` status, block force-pushes. - [ ] **(Optional)** Restrict allowed actions to selected/verified creators with pinned SHAs.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -1 +1,6 @@
-* @uber/submitqueue-admins @uber/submitqueue-maintainers @behinddwalls @sbalabanov 
+* @uber/submitqueue-admins @uber/submitqueue-maintainers @behinddwalls @sbalabanov
+
+# CI workflows, composite actions, and other automation are privileged
+# (secrets, write tokens). Require admin review for any change under .github/.
+# CODEOWNERS is last-match-wins, so this overrides the `*` rule for these paths.
+/.github/ @uber/submitqueue-admins
diff --git a/.github/actions/setup/action.yml b/.github/actions/setup/action.yml
@@ -17,7 +17,7 @@ runs:
   using: composite
   steps:
     - name: Setup Bazel cache
-      uses: actions/cache@v4
+      uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4
       with:
         path: |
           ~/.cache/bazel-disk-cache
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -20,6 +20,13 @@ on:
 permissions:
   contents: read
 
+# Cancel superseded runs for the same PR to save runner minutes. Never cancel
+# in-progress runs for `push` (main) or `merge_group` — those must complete so
+# the default branch and merge queue always have a definitive result.
+concurrency:
+  group: ci-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
+
 jobs:
   # ---------------------------------------------------------------------------
   # LINT (gofmt via Go SDK, yamlfmt via go run)
@@ -29,7 +36,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
 
       - name: Run linters
@@ -43,7 +54,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
 
       - name: Check module files are tidy
@@ -60,7 +75,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
 
       - name: Build project
@@ -77,7 +96,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
 
       - name: Run E2E tests
@@ -88,7 +111,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
 
       - name: Run Gateway integration tests
@@ -99,7 +126,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
 
       - name: Run Orchestrator integration tests
@@ -113,7 +144,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
       - uses: ./.github/actions/run-bazel-test
         with:
@@ -124,7 +159,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
       - uses: ./.github/actions/run-bazel-test
         with:
@@ -135,7 +174,11 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
       - uses: ./.github/actions/run-bazel-test
         with:
@@ -149,12 +192,71 @@ jobs:
     if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          # This job executes untrusted PR code (make build/test/lint). Don't
+          # leave the GITHUB_TOKEN in the workspace git config while it runs.
+          persist-credentials: false
       - uses: ./.github/actions/setup
       - uses: ./.github/actions/run-bazel-test
         with:
           target: //test/integration/submitqueue/core/consumer/...
 
+  # ---------------------------------------------------------------------------
+  # WORKFLOW SECURITY LINT
+  #
+  # Guards against regressions in the workflows themselves: actionlint checks
+  # general validity; zizmor audits for GitHub Actions security smells
+  # (dangerous triggers, unpinned `uses:`, credential persistence, template
+  # injection). Keeps the pull_request_target / SHA-pinning hardening from
+  # silently eroding in future edits.
+  # ---------------------------------------------------------------------------
+  workflow-security:
+    name: Workflow Security Lint
+    if: ${{ github.event_name != 'pull_request' || github.event.pull_request.draft == false }}
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        with:
+          persist-credentials: false
+      - name: actionlint
+        uses: raven-actions/actionlint@205b530c5d9fa8f44ae9ed59f341a0db994aa6f8 # v2.1.2
+      - name: zizmor
+        uses: zizmorcore/zizmor-action@5f14fd08f7cf1cb1609c1e344975f152c7ee938d # v0.5.6
+        with:
+          # Pin the zizmor TOOL version (distinct from the action tag above) for
+          # reproducible audits matching local validation.
+          version: "1.25.2"
+          # Fail the job on findings without requiring GitHub Advanced Security
+          # / SARIF upload (which needs security-events: write and degrades on
+          # fork PRs). Keeps the gate self-contained.
+          advanced-security: false
+
+      # ci.yml and the composite actions it calls run untrusted PR code under
+      # the `pull_request` trigger. A repository secret referenced anywhere on
+      # that path is reachable by a malicious PR and can be exfiltrated on the
+      # PR run (before any review), so this path must stay secret-free — route
+      # any secret-bearing step through a SEPARATE trusted workflow
+      # (workflow_run / pull_request_target) that does not execute PR code.
+      #
+      # This guard catches ACCIDENTAL reintroduction by honest contributors; a
+      # malicious actor controlling ci.yml could delete the guard itself, so the
+      # real defenses remain (a) secrets scoped to a main-only Environment so a
+      # PR-triggered job cannot obtain them, and (b) CODEOWNERS review on
+      # .github/. GITHUB_TOKEN (least-privilege, read-only here) is allowlisted.
+      - name: Guard — no repository secrets on the untrusted-code path
+        run: |
+          hits="$(grep -rnE '\$\{\{[^}]*secrets\.' \
+            .github/workflows/ci.yml .github/actions \
+            | grep -vE 'secrets\.GITHUB_TOKEN' || true)"
+          if [ -n "$hits" ]; then
+            echo "::error::Repository secret referenced on the untrusted-code CI path (ci.yml / composite actions):" >&2
+            echo "$hits" >&2
+            echo "Move secret-bearing steps to a separate trusted workflow that does not run PR code." >&2
+            exit 1
+          fi
+          echo "OK: no repository secrets referenced in ci.yml or its composite actions."
+
   # ---------------------------------------------------------------------------
   # REQUIRED CHECKS GATE
   #
@@ -179,12 +281,17 @@ jobs:
       - queue-integration-test
       - storage-integration-test
       - consumer-integration-test
+      - workflow-security
     steps:
       - name: Fail if any required check did not succeed
         if: ${{ contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled') || contains(needs.*.result, 'skipped') }}
+        # Pass the job-results context via env (not inline ${{ }} in the script)
+        # so there is no template expansion inside the run block.
+        env:
+          NEEDS_JSON: ${{ toJSON(needs) }}
         run: |
           echo "One or more required checks did not succeed:" >&2
-          echo '${{ toJSON(needs) }}' >&2
+          echo "$NEEDS_JSON" >&2
           exit 1
 
       - name: All required checks passed
diff --git a/.github/workflows/rebase-stack.yml b/.github/workflows/rebase-stack.yml
@@ -37,8 +37,26 @@
 
 name: Rebase Stacked PRs
 
+# SECURITY: this workflow is triggered by `pull_request_target`, NOT
+# `pull_request`. The distinction is critical because this job holds a
+# privileged, repo-write PAT (STACK_REBASE_TOKEN).
+#
+#   - `pull_request`        runs the workflow file from the PR's HEAD ref.
+#                           A contributor could edit THIS file in their own PR
+#                           (e.g. drop the `merged` guard), close the PR, and
+#                           run arbitrary code with the PAT. Same-repo branch
+#                           PRs (the stacked-PR model) DO receive secrets, so
+#                           this is exploitable — a secret-exfiltration hole.
+#   - `pull_request_target` always runs the workflow file from the DEFAULT
+#                           branch (main). PR edits to this file are ignored,
+#                           so the privileged logic is fixed at main's version.
+#
+# `pull_request_target` is safe HERE specifically because the job never builds
+# or executes PR code: it only replays commits with `git rebase` (which runs
+# nothing from the tree) and validates the resulting diff is byte-identical
+# before force-pushing. It does not run `make`, tests, or any PR script.
 on:
-  pull_request:
+  pull_request_target:
     types:
       - closed
 
@@ -52,8 +70,11 @@ jobs:
     name: Rebase Stack
     if: github.event.pull_request.merged == true
     runs-on: ubuntu-latest
+    # Scope the privileged PAT to a protected Environment restricted to `main`
+    # so the secret cannot be used from any other ref/context.
+    environment: stack-rebase
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
         with:
           # Fetch full history so rebase --onto works correctly.
           fetch-depth: 0
diff --git a/.github/workflows/tag-go-pseudoversion.yml b/.github/workflows/tag-go-pseudoversion.yml
@@ -24,7 +24,7 @@ jobs:
     name: Create and push pseudo-version tag
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
         with:
           fetch-depth: 0
 
diff --git a/.github/zizmor.yml b/.github/zizmor.yml
@@ -0,0 +1,25 @@
+# zizmor (GitHub Actions security auditor) configuration.
+#
+# Every ignore below is an INTENTIONAL, reviewed exception with a concrete
+# justification — not a blanket mute. New findings outside these are expected
+# to fail the `workflow-security` CI job.
+rules:
+  # rebase-stack.yml deliberately uses `pull_request_target` so the privileged
+  # workflow definition (which holds the STACK_REBASE_TOKEN PAT) is always taken
+  # from `main`, never from PR-controlled content. It never builds or runs PR
+  # code — only `git rebase` replay + byte-identical diff validation before
+  # force-push — so the usual pull_request_target code-execution risk does not
+  # apply. See the SECURITY comment block at the top of that workflow.
+  dangerous-triggers:
+    ignore:
+      - rebase-stack.yml
+  # These workflows MUST persist the checkout credentials because they push:
+  #   - rebase-stack.yml force-pushes rebased branches using STACK_REBASE_TOKEN
+  #   - tag-go-pseudoversion.yml pushes pseudo-version tags using the default token
+  # Neither uploads build artifacts, so persisted credentials cannot leak via
+  # the artipacked vector. CI jobs that DO run untrusted PR code set
+  # persist-credentials: false in ci.yml.
+  artipacked:
+    ignore:
+      - rebase-stack.yml
+      - tag-go-pseudoversion.yml