Skip to content

[DO-NOT-MERGE] Demonstrate CI gating-check bug: build fail -> Check job status green#2209

Closed
leofang wants to merge 1 commit into
NVIDIA:mainfrom
leofang:demo/gating-check-bug
Closed

[DO-NOT-MERGE] Demonstrate CI gating-check bug: build fail -> Check job status green#2209
leofang wants to merge 1 commit into
NVIDIA:mainfrom
leofang:demo/gating-check-bug

Conversation

@leofang

@leofang leofang commented Jun 13, 2026

Copy link
Copy Markdown
Member

Do not merge. This is a demonstration of the bug described in #2208.

Injects exit 7 early in ci/tools/env-vars so every Build * matrix entry fails at the env-vars step (before cibuildwheel runs). Tests downstream will be skipped due to needs-failure propagation.

Expected outcome:

  • Build linux-64, Build linux-aarch64, Build win-64 → failure
  • Test linux-64, Test linux-aarch64, Test win-64, Test sdist * → skipped
  • Check job status → success ← the bug being demonstrated

This PR will be closed once CI confirms the reproduction.

Refs #2208

DO NOT MERGE. ci/tools/env-vars exits 7 in build mode to make every
Build matrix entry fail. Demonstrates that Check job status reports
success despite the failure.

Refs: NVIDIA#2208
@copy-pr-bot

copy-pr-bot Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the CI/CD CI/CD infrastructure label Jun 13, 2026
@leofang

leofang commented Jun 13, 2026

Copy link
Copy Markdown
Member Author

/ok to test deddae4

@leofang

leofang commented Jun 13, 2026

Copy link
Copy Markdown
Member Author

Reproduced. Final state of this CI run (https://github.com/NVIDIA/cuda-python/actions/runs/27451928558):

  • 24/24 Build * matrix entries → fail at the injected env-vars build step.
  • Test linux-* / Test win-64skipped (needs-failure propagation from build-*).
  • Test sdist linux-64 / Test sdist win-64pass (these don't depend on build-*).
  • Check job status → ✅ success ← the bug.

The single required status check on main (Check job status) reports green despite 24 Build failures. Closing this demo PR; tracking continues in #2208.

@leofang leofang closed this Jun 13, 2026
@leofang leofang deleted the demo/gating-check-bug branch June 13, 2026 01:23
leofang added a commit to leofang/cuda-python that referenced this pull request Jun 16, 2026
Mirrors the NVIDIA#2209 reproducer on this branch's new aggregator. Expected
outcome:
- All Build * matrix entries fail at the env-vars build step.
- Downstream Test * jobs are skipped (cascaded needs-failure).
- Check job status now reports failure -- the symmetric, post-fix
  counterpart to NVIDIA#2209 (where Check job status reported success despite
  the same set of failures).

DO NOT MERGE while this commit is present. Remove before merge.

Refs NVIDIA#2208, NVIDIA#2209.
rwgk pushed a commit that referenced this pull request Jun 16, 2026
* ci: replace 'cancelled||failure' aggregator with CCCL-pattern check_result

Fixes #2208. The previous `Check job status` body only treated a dep's
`result == 'cancelled' || == 'failure'` as failure, letting `'skipped'`
slip through silently. When a `build-*` job fails, the dependent
`test-*` job is set to `'skipped'` by default needs-failure
propagation, and the aggregator passes -- exactly the case demonstrated
by #2209.

Adopt CCCL's `check_result` pattern: explicit `expected="success"` per
dependency, with `expected="skipped"` for legitimate `[doc-only]` skips,
and an early short-circuit for `[no-ci]`. Now any deviation from the
expected status (including `'skipped'` from a failed upstream) fails
the aggregator.

Reference: NVIDIA/cccl ci-workflow-pull-request.yml L463-L526.

* [demo] Re-inject env-vars exit 7 to validate gating-check fix

Mirrors the #2209 reproducer on this branch's new aggregator. Expected
outcome:
- All Build * matrix entries fail at the env-vars build step.
- Downstream Test * jobs are skipped (cascaded needs-failure).
- Check job status now reports failure -- the symmetric, post-fix
  counterpart to #2209 (where Check job status reported success despite
  the same set of failures).

DO NOT MERGE while this commit is present. Remove before merge.

Refs #2208, #2209.

* Revert "[demo] Re-inject env-vars exit 7 to validate gating-check fix"

This reverts commit 811388c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant