Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 37 additions & 47 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -432,52 +432,42 @@ jobs:
steps:
- name: Exit
run: |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[STYLE] Redundant comments explaining the always() condition.

Why it matters: The extensive comments at the beginning of the job are no longer relevant to the new logic and add unnecessary noise. The new logic is self-explanatory with the check_result function.

Suggested fix: Remove the block of comments starting from line 434 down to line 473 (inclusive) that explains the always() condition and the previous logic.


Explanation for juniors: The comments describe a complex reasoning for using always() and the previous failure condition. With the introduction of the check_result function and a more explicit status checking mechanism, these comments are outdated and can be removed to improve readability.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] Incorrect handling of skipped jobs in CI aggregation.

Why it matters: The previous logic cancelled || failure would cause jobs that were intentionally skipped (e.g., due to doc-only flag) to be treated as failures if their upstream dependencies were cancelled. This prevented the CI from passing when it should have.

Suggested fix: Replace the old if conditions checking for cancelled || failure with a new check_result function that explicitly checks for the expected status ('success' or 'skipped').


Explanation for juniors: The new check_result function, along with the expected variable, provides a more robust way to handle job statuses. It allows for defining what status is expected for each job, including 'skipped' when the doc-only flag is true. This ensures that intentionally skipped jobs do not cause the CI to fail.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PERFORMANCE] Potentially unnecessary execution of CI jobs.

Why it matters: The if: always() on the job ensures that this aggregation step runs even if upstream jobs are skipped. While this is necessary for the aggregation logic to function, it might lead to this job running when it's not strictly needed if the should-skip condition is met.

Suggested fix: Add an explicit check at the beginning of the job to exit early if needs.should-skip.outputs.skip is true.


Explanation for juniors: The new logic includes an early exit condition: if [[ "${{ needs.should-skip.outputs.skip }}" == "true" ]]; then echo "[no-ci] - skipping aggregator checks"; exit 0; fi. This prevents the rest of the aggregation logic from running when the build is intended to be skipped, improving efficiency.

# if any dependencies were cancelled or failed, that's a failure
#
# see https://docs.github.com/en/actions/reference/workflows-and-actions/expressions#always
# and https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/troubleshooting-required-status-checks#handling-skipped-but-required-checks
# for why this cannot be encoded in the job-level `if:` field
#
# TL; DR: `$REASONS`
#
# The intersection of skipped-as-success and required status checks
# creates a scenario where if you DON'T `always()` run this job, the
# status check UI will block merging and if you DO `always()` run and
# a dependency is _cancelled_ (due to a critical failure, which is
# somehow not considered a failure ¯\_(ツ)_/¯) then the critically
# failing job(s) will timeout causing a cancellation here and the
# build to succeed which we don't want (originally this was just
# 'exit 0')
#
# Note: When [doc-only] is in PR title, test jobs are intentionally
# skipped and should not cause failure.
#
# detect-changes gates whether heavy test matrices run at all; if it
# does not succeed, downstream test jobs are skipped rather than
# failed, which would otherwise go unnoticed here. Require its
# success explicitly so a broken gating step cannot masquerade as a
# green CI run.
doc_only=${{ needs.should-skip.outputs.doc-only }}
if ${{ needs.detect-changes.result != 'success' }}; then
exit 1
fi
if ${{ needs.doc.result == 'cancelled' || needs.doc.result == 'failure' }}; then
exit 1
# GitHub treats `result == 'skipped'` as success for required
# status checks (see CCCL gate comment + cccl#605). The previous
# `cancelled || failure` predicate let upstream build failures
# propagate as `skipped` on downstream test jobs and silently
# pass this aggregator. Adopt CCCL's `check_result` pattern:
# require an explicit `expected` status per dependency, where
# anything else (including `skipped` from a failed upstream)
# fails the gate. `if: always()` on the job still ensures this
# step runs even when needs are skipped.
if [[ "${{ needs.should-skip.outputs.skip }}" == "true" ]]; then
echo "[no-ci] - skipping aggregator checks"
exit 0
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[STYLE] Inconsistent variable assignment for doc_only.

Why it matters: The doc_only variable was previously assigned using ${{ needs.should-skip.outputs.doc-only }} directly within the if condition. Now, it's assigned to a shell variable doc_only using doc_only="${{ needs.should-skip.outputs.doc-only }}" before being used.

Suggested fix: Ensure consistent variable assignment. The new assignment doc_only="${{ needs.should-skip.outputs.doc-only }}" is correct for shell scripting.


Explanation for juniors: The change from using the GitHub Actions expression directly in the if statement to assigning it to a shell variable doc_only is a stylistic improvement for clarity and consistency within the shell script block. The new approach is more readable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[STYLE] Complex and repetitive if conditions.

Why it matters: The original if conditions checking for multiple job results (cancelled || failure) were long and repetitive, making them hard to read and maintain.

Suggested fix: Replace the complex if conditions with calls to the check_result function.


Explanation for juniors: The introduction of the check_result function encapsulates the logic for checking job statuses. This makes the code much cleaner, more readable, and easier to extend or modify in the future. Each job's status is now checked individually with a clear expected outcome.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[STYLE] Lack of explicit success status tracking.

Why it matters: The previous logic relied on exit 1 to indicate failure. There was no explicit variable to track the overall success status of the aggregation.

Suggested fix: Introduce a status variable initialized to 'success' and update it to 'failed' within the check_result function if any check fails.


Explanation for juniors: The new status variable provides a clear way to track the overall outcome of the CI aggregation. The check_result function modifies this variable upon encountering an unexpected job result, and the final [[ "$status" == "success" ]] expression determines the job's overall success or failure.

if ${{ needs.test-sdist-linux.result == 'cancelled' ||
needs.test-sdist-linux.result == 'failure' ||
needs.test-sdist-windows.result == 'cancelled' ||
needs.test-sdist-windows.result == 'failure' }}; then
exit 1
fi
if [[ "${doc_only}" != "true" ]]; then
if ${{ needs.test-linux-64.result == 'cancelled' ||
needs.test-linux-64.result == 'failure' ||
needs.test-linux-aarch64.result == 'cancelled' ||
needs.test-linux-aarch64.result == 'failure' ||
needs.test-windows.result == 'cancelled' ||
needs.test-windows.result == 'failure' }}; then
exit 1

doc_only="${{ needs.should-skip.outputs.doc-only }}"
status="success"
check_result() {
name=$1; expected=$2; result=$3
echo "Checking $name: result='$result' (expected '$expected')"
if [[ "$result" != "$expected" ]]; then
echo "::error::$name did not match expected result"
status="failed"
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[STYLE] Redundant exit 0.

Why it matters: The exit 0 at the end of the script was redundant because the script would naturally exit with a success code if no exit 1 was encountered. The new logic uses a final expression to determine the exit code.

Suggested fix: Remove the exit 0 statement.


Explanation for juniors: The new logic concludes with [[ "$status" == "success" ]]. This expression evaluates to true (exit code 0) if status is 'success', and false (exit code 1) otherwise. This is a more idiomatic way to handle script exit codes based on a condition and makes the redundant exit 0 unnecessary.

fi
exit 0
}

# always expected to succeed (even in [doc-only] mode)
check_result "should-skip" "success" "${{ needs.should-skip.result }}"
check_result "detect-changes" "success" "${{ needs.detect-changes.result }}"
check_result "doc" "success" "${{ needs.doc.result }}"

# [doc-only] flips these from 'success' to 'skipped'
if [[ "$doc_only" == "true" ]]; then expected="skipped"; else expected="success"; fi
check_result "test-sdist-linux" "$expected" "${{ needs.test-sdist-linux.result }}"
check_result "test-sdist-windows" "$expected" "${{ needs.test-sdist-windows.result }}"
check_result "test-linux-64" "$expected" "${{ needs.test-linux-64.result }}"
check_result "test-linux-aarch64" "$expected" "${{ needs.test-linux-aarch64.result }}"
check_result "test-windows" "$expected" "${{ needs.test-windows.result }}"

[[ "$status" == "success" ]]