Skip to content

Make test_chunks_is_lazy counter thread-safe for the free-threaded CI lane#3364

Merged
brendancol merged 1 commit into
mainfrom
issue-3363
Jun 16, 2026
Merged

Make test_chunks_is_lazy counter thread-safe for the free-threaded CI lane#3364
brendancol merged 1 commit into
mainfrom
issue-3363

Conversation

@brendancol

@brendancol brendancol commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Closes #3363

The 3.14t full-suite job on main fails on one test:

test_chunks_is_lazy_does_not_call_internal_reader
  AssertionError: expected 16 per-chunk decodes after compute, got 15

result.compute() runs the 16 per-chunk reads through dask's threaded scheduler. Under PYTHON_GIL=0 (the 3.14t pytest step) those run truly in parallel, and the counting wrapper's counter['calls'] += 1 is a non-atomic read-modify-write that loses an update, so 16 real decodes count as 15.

  • Guard the increment with a threading.Lock so the count is exact under both the GIL and the free-threaded interpreter.
  • Test-instrumentation fix only: the chunked decode already runs 16 times and returns correct data. No product code changes.
  • Only this test is affected; the other counter-based tests assert a single non-concurrent call (== 1) and can't lose an update.

Surfaced by the 3.14t lane (#3360). That job is allowed-failure, so main still concludes success, but this clears the red.

Test plan:

  • pytest xrspatial/geotiff/tests/vrt/test_window.py passes under the GIL (no regression).
  • 3.14t full lane goes green.

)

The 3.14t full-suite job fails on test_chunks_is_lazy_does_not_call_internal_reader
with 'expected 16 per-chunk decodes after compute, got 15'. compute() runs
the 16 per-chunk reads through dask's threaded scheduler; under PYTHON_GIL=0
the bare counter['calls'] += 1 in the counting wrapper races and loses an
update, so 16 real decodes count as 15. Guard the increment with a
threading.Lock. Test-instrumentation fix only; the chunked decode already
runs 16 times correctly.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: thread-safe counter for test_chunks_is_lazy

Test-only change, one file. The diagnosis and fix are right: compute() fans the 16 per-chunk reads across dask's threaded scheduler, and under PYTHON_GIL=0 the bare counter['calls'] += 1 is a racing read-modify-write that drops an update. A threading.Lock around the increment makes the count exact. No blockers.

Blockers

  • None.

Suggestions

  • None. The lock scope is minimal (wraps only the increment, not real_read), so it adds no contention on the actual decode and doesn't perturb timing.

Nits

  • None.

What looks good

  • The lazy assertion (counter['calls'] == 0 before compute()) is unaffected — counting_read isn't called until compute, so the pre-compute count stays 0.
  • Correctly scoped: confirmed the other counter-based tests (test_metadata.py, test_overview.py, test_visibility.py) assert a single non-concurrent call (== 1), which can't lose an update, so they don't need the same guard.
  • Import added in stdlib alphabetical order; flake8 clean.
  • Product code untouched — the chunked decode already runs 16 times and returns correct data; this only fixes the instrumentation.

Checklist

  • Root cause correct (non-atomic counter under free-threading)
  • Fix is minimal and thread-safe
  • Pre-compute lazy assertion preserved
  • No product code touched
  • Scope verified against sibling counter tests

@brendancol brendancol merged commit 104e85f into main Jun 16, 2026
11 of 12 checks passed
@brendancol brendancol deleted the issue-3363 branch June 25, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Free-threaded CI: test_chunks_is_lazy counter race miscounts decodes (15 != 16)

1 participant