Skip to content

Add free-threaded Python 3.14 (3.14t) to CI matrix#3360

Merged
brendancol merged 6 commits into
mainfrom
issue-3359
Jun 16, 2026
Merged

Add free-threaded Python 3.14 (3.14t) to CI matrix#3360
brendancol merged 6 commits into
mainfrom
issue-3359

Conversation

@brendancol

@brendancol brendancol commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Closes #3359

Adds the free-threaded (no-GIL) build of Python 3.14 to the CI matrix.

  • Add 3.14t to the python axis on both the PR fast lane and the full push/nightly lane. actions/setup-python@v5 resolves it to the free-threaded build.
  • Mark the free-threaded job continue-on-error, keyed on endsWith(matrix.python, 't'), so a broken install or a thread-safety failure reports without blocking merges. numba and parts of the stack are still finishing their no-GIL support, so this job will likely go red before it stays green. Promote it to a required job once it stabilizes.

This is a CI-config-only change, so no library code, tests, docs, or README rows are touched.

Test plan:

  • YAML parses (yaml.safe_load)
  • CI shows 3.14t jobs on this PR for all three OSes, reporting as allowed-failure

Also fixes the free-threaded race the new lane surfaced: Closes #3361

Add 3.14t to both the PR fast lane and the full push/nightly lane so the
no-GIL build gets exercised. Mark the free-threaded job continue-on-error
(keyed on the 't' suffix) so it reports signal while numba and the rest of
the stack finish their free-threaded support, without blocking merges.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Add free-threaded Python 3.14 (3.14t) to CI matrix

CI-config-only change, one file (.github/workflows/test.yml). The Actions expressions are valid and the rollout approach (allowed-failure experimental job) is the right call. No blockers.

Blockers (must fix before merge)

  • None.

Suggestions (should fix, not blocking)

  • No timeout-minutes on the run job (.github/workflows/test.yml:24). The free-threading guide calls this out because deadlocks are more likely without the GIL, and this repo already sets timeouts in docs.yml and test-cog-validator.yml. With continue-on-error, a hung 3.14t job has nothing to stop it before the 360-minute default, so it burns runner minutes without ever gating a merge. Cap the experimental entry, e.g. timeout-minutes: ${{ endsWith(matrix.python, 't') && 30 || 360 }}.

Nits (optional improvements)

  • The PR fast lane goes from 3 jobs to 6 (2 Python versions x 3 OSes). Intentional and cheap, noting it for visibility.
  • Possible follow-up: set PYTHON_GIL=0 for the 3.14t job so it actually exercises the no-GIL path. By default CPython re-enables the GIL for C extensions that declare they need it, which can hide incompatibilities behind a green run. The tradeoff is that forcing it off can abort at import for extensions that have not declared support yet, so it is reasonable to leave for a later iteration once the job installs cleanly.

What looks good

  • continue-on-error: ${{ endsWith(matrix.python, 't') }} keys on the t suffix, so it covers any future free-threaded version without another edit, and reads clearly.
  • Both lanes (PR and push/nightly) updated consistently.
  • Matrix values stay quoted inside the JSON array, so 3.14/3.14t are not coerced to YAML floats.
  • actions/setup-python@v5 is already used here and resolves 3.14t to the free-threaded build.

Checklist

  • Actions expressions valid (endsWith, job-level continue-on-error)
  • Both matrix lanes updated
  • Allowed-failure semantics correct (won't block merges)
  • Timeout guard on the experimental job (suggested)
  • No library code, tests, docs, or README affected

Address review suggestion: a no-GIL deadlock on the 3.14t job would
otherwise run to the 360-minute default. Cap the free-threaded entry at
30 minutes (keyed on the 't' suffix); other versions keep the default.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after timeout commit)

Re-reviewed after 93571b9. Disposition of the earlier findings:

  • Suggestion (timeout guard): fixed. Added timeout-minutes: ${{ endsWith(matrix.python, 't') && 30 || 360 }}, keyed the same way as continue-on-error. The 3.14t job is capped at 30 minutes; 3.12/3.13/3.14 keep the 360-minute default, so their behavior is unchanged.
  • Nit (3→6 PR jobs): dismissed. Intended; exercising 3.14t on PRs is the point of the change.
  • Nit (PYTHON_GIL=0): deferred. Whether to force the no-GIL path is a design choice. The first iteration is deliberately conservative so we can confirm the job installs and runs before tightening it. Worth revisiting once 3.14t is green.

No new findings. YAML parses; the timeout expression resolves to a number per matrix entry.

@brendancol

Copy link
Copy Markdown
Contributor Author

CI note: the 3.14t jobs on ubuntu/windows are red, but the pytest workflow run still concluded successcontinue-on-error kept the allowed-failure free-threaded job from gating the merge. All GIL builds (3.12/3.13/3.14) pass on every OS, and macOS 3.14t passes too.

The one failure is a pre-existing thread-safety bug surfaced by no-GIL, not introduced here: a zlib decompression race in the chunked COG sidecar read (test_fsspec_chunked_open_resolves_sidecar_overview). Tracked in #3361. This PR only adds the matrix entry, so it's good to merge independently of that fix.

fsspec's MemoryFileSystem (and any backend whose open() returns a shared
file object) handed every _CloudSource.read_range/read_all call the same
process-global handle. Concurrent windowed reads -- dask chunk tasks plus
read_ranges' own worker pool -- raced on that single cursor, corrupting
tile bytes (zlib 'incorrect header check') or reading a just-closed handle
('I/O operation on closed file'). The GIL serialized the reads enough to
mask it; the free-threaded 3.14t lane exposed it.

Route both reads through the stateless cat_file ranged API, which returns
a fresh byte slice per call with no shared cursor. cat_file matches
seek+read semantics including the EOF clamp the COG header prefetch needs.

Add a deterministic regression test that forces the shared-cursor
interleaving with a seek barrier, so it fails on the old implementation
even under the GIL, plus a live memory:// round-trip check.

Surfaced by the 3.14t CI lane added in this PR (#3359).
@brendancol

Copy link
Copy Markdown
Contributor Author

Fixed the free-threaded race (#3361) on this branch

Root cause: _CloudSource.read_range/read_all (xrspatial/geotiff/_sources.py) read via fs.open(path, 'rb') + seek + read. For fsspec's MemoryFileSystem, open('rb') returns the same shared process-global handle every call (verified: fs.open(p) is fs.open(p) -> True). Concurrent windowed reads — dask chunk tasks plus read_ranges' own worker pool — raced on that single cursor: one thread's seek landed under another's read, corrupting tile bytes (zlib.error: incorrect header check) or hitting a just-closed handle (I/O operation on closed file). The GIL serialized the reads enough to hide it; the new 3.14t lane exposed it.

Fix: route both reads through the stateless cat_file(path, start=, end=) ranged API — a fresh byte slice per call, no shared cursor. cat_file matches seek+read semantics exactly, including the EOF clamp the COG header prefetch relies on (verified across (0,5), (0,100), (3,100), (2,0), (5,10)).

Regression test (tests/read/test_cloud_source_concurrency_3361.py): a plain thread stress test shows 0 mismatches under the GIL — the GIL fully masks this — so the test instead forces the shared-cursor interleaving with a threading.Barrier in seek. It fails deterministically on the old implementation (even under the GIL) and passes on the fix. Confirmed by reverting the fix: the test fails; restored: it passes.

Verification:

  • test_fsspec_chunked_open_resolves_sidecar_overview (the original 3.14t failure): passes.
  • geotiff read + integration + unit suites: 1624 + 1476 passed, no regressions.
  • flake8/isort clean.

The PR now Closes #3361 as well as #3359.

Set PYTHON_GIL=0 only on the 3.14t matrix entry so the tests exercise the
real no-GIL path instead of letting CPython silently re-enable the GIL for
a C extension that hasn't declared free-threaded support (which would hide
thread-safety bugs behind a green run). Keyed on the 't' suffix with an
empty fallback for the GIL builds, where PYTHON_GIL=0 is a fatal error
('Disabling the GIL is not supported by this build').
Setting PYTHON_GIL=0 at the job level crashed the Setup Python step on
macOS: actions/setup-python invokes a standard (GIL) bootstrap interpreter
while installing 3.14t, and PYTHON_GIL=0 is a fatal error there
('config_read_gil: Disabling the GIL is not supported by this build').

Move PYTHON_GIL=0 onto the two pytest run steps so it only applies to the
already-installed free-threaded interpreter under test, never to
setup-python or pip. Still keyed on the 't' suffix with an empty fallback
for the GIL builds.
@brendancol brendancol merged commit 13e4ce0 into main Jun 16, 2026
10 checks passed
@brendancol brendancol deleted the issue-3359 branch June 25, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Free-threaded (3.14t) data race: zlib decompress error in chunked COG sidecar read Add free-threaded Python 3.14 (3.14t) to the CI test matrix

1 participant