Skip to content

ci(llm-gate): matrix jobs failing with 'Exit prior to config file resolving' on ~50% of recent runs #272

@stevenobiajulu

Description

@stevenobiajulu

Symptom

The LLM-Based Quality Gate (.github/workflows/llm-based-quality-gate.yml, added in #253) has been failing at the workflow level on ~5 of the last 6 PRs. The failures aren't the LLM finding issues — they're the Gemini CLI failing to start, so every matrix job exits with no result and the aggregate posts a NO RESULTS comment.

Evidence

Last 6 LLM-gate runs on main and recent PRs (collected 2026-05-27):

Run PR Conclusion Aggregate verdict
26490355255 #271 success ✅ PASS (14/14 with real justifications)
26487987643 (PR closed) failure NO RESULTS
26481977858 (PR closed) failure NO RESULTS
26481976773 (PR closed) failure not checked
26479483400 (fix/allure-labels-validator-false-positives) failure not checked
26447905249 (llm-gate-phase1) failure not checked

NO RESULTS comments observed on PR #270 and #269 (safe-docx#270 LLM-gate comment, safe-docx#269 LLM-gate comment).

Root cause snippet

From a representative failed matrix job (Check 02 — Live DOM namespace-safe OOXML writes on run 26487987643, job 77999406077):

[...checkout + setup steps succeed...]
Exit prior to config file resolving
##[error]Process completed with exit code 1.

The error is emitted by the Gemini CLI (@google/gemini-cli@0.39.1) BEFORE any model invocation. It happens before the .gemini/settings.json file the composite action writes is even read.

The Gemini CLI's "Exit prior to config file resolving" message is internal — it surfaces when the CLI's own config-resolution layer can't bootstrap. Common causes documented in google-gemini/gemini-cli issues: API quota exhaustion at the project level, malformed CLI config, missing GEMINI_API_KEY env var, npm install corruption.

Why this blocks LLM-gate promotion

The user asked whether to promote safe-docx from advisory (LLM_GATE_BLOCKING=0) to enforcing (LLM_GATE_BLOCKING=1 + required status check). Promotion now would create a worst-of-both-worlds state:

  • Aggregate and post review still exits 0 on NO RESULTS because the blocking check fires on warns > 0, not on total == 0.
  • A required status check Aggregate and post review would pass on NO RESULTS runs → PRs merge without any real LLM review having occurred.
  • The setup looks enforcing, but actually isn't.

The gate must consistently PRODUCE results (PASS or WARN, not NO RESULTS) before promotion is safe.

Investigation directions (in order of likelihood)

  1. API quota exhaustion on the Google project. As of 2026-05-27 a second repo (UseJunior/tests-renderer) now consumes the same Google project's free-tier quota via a separate GEMINI_API_KEY. The free tier for gemini-3.5-flash is rate-limited per project. Verify in Google AI Studio → Project quotas whether quota was already constrained when these runs happened; if so, either upgrade the project tier or rotate the gate to a dedicated Google project per repo.

  2. @google/gemini-cli@0.39.1 regression. Test whether bumping LLM_GATE_CLI_VERSION to a newer Gemini CLI version fixes the bootstrap. Pin candidate: latest stable from npm view @google/gemini-cli versions. If a newer version resolves the bootstrap, update via gh variable set LLM_GATE_CLI_VERSION --body <new> --repo UseJunior/safe-docx.

  3. npm install corruption under the $RUNNER_TEMP empty-config pattern. The composite action installs Gemini CLI from $RUNNER_TEMP with NPM_CONFIG_USERCONFIG / NPM_CONFIG_GLOBALCONFIG forced to empty files (security hardening in feat(ci): add LLM-Based Quality Gate (Phase 1, advisory) #253). Verify the install actually completes — add a gemini --version check after install, and surface the output to the Actions log so we can see whether the binary even runs.

  4. A missing GEMINI_API_KEY env var on certain runner spawns. The composite action passes GEMINI_API_KEY: \${{ inputs.gemini-api-key }}. Verify the workflow secret is set at the org or repo level (not user-level) so it survives across matrix-job spawns.

Acceptance criteria

  • 10 consecutive LLM-gate runs across real PRs end with the aggregate posting a verdict comment that is NOT NO RESULTS (i.e., at least one PASS or WARN per matrix item).

  • The "Exit prior to config file resolving" line does not appear in any matrix job's log across those 10 runs.

  • Once that holds for ~2 weeks, the gate can be promoted via:

    gh variable set LLM_GATE_BLOCKING --body 1 --repo UseJunior/safe-docx
    # plus add 'Aggregate and post review' to branch protection required_status_checks

Related

  • UseJunior/tests-renderer now runs the same gate workflow pattern (verbatim copy, see tests-renderer#13). Its smoke run PR #14 succeeded end-to-end with a real GEMINI_API_KEY, so the workflow itself is sound; the failure is specific to safe-docx's API-key + project + version combination.
  • The same Google project (projects/522706245871) now backs two API keys (safe-docx's existing key + the newly-created Tests Renderer CI Gemini API Key). Watching for cross-repo quota interaction may be necessary once both repos are active.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions