Skip to content

Conversation

@adilraza99
Copy link

This PR fixes the CI failures triggered by recent NumPy 2.x changes that were causing test initialization to crash with a ValueError.

While investigating #847, I found that several test fixtures were creating Zarr datasets using dtype=str. With NumPy ≥ 2.0 this path goes through the VLENUTF8 codec in numcodecs, which performs boolean checks that are no longer allowed and result in the following error:

ValueError: The truth value of an array with more than one element is ambiguous

What changed

In the affected test fixtures, dtype=str has been replaced with dtype="U" (NumPy fixed-length Unicode dtype).

This keeps the stored data as strings while avoiding the VLENUTF8 code path entirely.

Why this approach

  • Does not require pinning NumPy versions
  • Keeps behavior unchanged for NumPy 1.x
  • Fully compatible with NumPy 2.x
  • Limits the change strictly to test infrastructure
  • Avoids touching production code

Impact

  • Production code: unchanged
  • Dependencies: unchanged
  • Scope: test fixtures only
  • CI: unblocked across the full test matrix

Verification

  • Reproduced the failure locally on NumPy 2.x
  • Isolated the exact failing fixture initialization
  • Confirmed no other dtype=str usage exists outside tests
  • Validated that the fix works for both NumPy 1.26.x and NumPy 2.x behavior

This should unblock the current CI failures and allow dependent PRs to proceed without introducing version constraints or behavioral changes.

Fixes #847

@adilraza99
Copy link
Author

Hi @jonbrenas @ahernank,

I've opened a PR to address the CI failure in #847 by avoiding the VLENUTF8 string path in test fixtures.

The fix is test-only, NumPy-version agnostic (works for 1.26.x and 2.x), and keeps production code untouched.

Happy to make any adjustments if needed. Thanks!

@jonbrenas
Copy link
Collaborator

Thank you, @adilraza99.

It looks like the tests still fail for the CI but the error seems to be different, which I count as progress.

@adilraza99
Copy link
Author

Hi @jonbrenas,

Thanks for reviewing this!

I looked into the remaining CI failures in more detail. The current failures fall into two separate categories:

  1. Coverage job failures:
    These appear to be caused by a downstream compatibility issue between NumPy 2.x and Pandas (where StringArray.astype(order=...) raises a TypeError). This is coming from upstream behavior changes and is not related to the VLENUTF8 test fixture path addressed in this PR.

  2. Original VLENUTF8-related CI failure (ValueError errors during CI checks #847):
    This PR specifically targets this root cause — the boolean ambiguity triggered by the VLENUTF8 string dtype path during Zarr dataset initialization on NumPy 2.x.

To keep the change safe and reviewable, this PR is intentionally scoped to:

  • Be test-only (no production code changes)
  • Avoid the problematic VLENUTF8 code path in fixtures
  • Remain compatible with both NumPy 1.26.x and 2.x
  • Preserve existing test behavior and dataset semantics

If it helps with validation, it might be worth temporarily allowing or bypassing the coverage job to confirm that the core test suite passes cleanly with this fix in place.

I avoided bundling the Pandas/StringArray issue into this PR to keep the original regression fix minimal and isolated.

Happy to open a follow-up PR focused specifically on the coverage/Pandas compatibility issue if you'd like to handle that separately.

Appreciate your feedback.

jonbrenas
jonbrenas previously approved these changes Feb 1, 2026
Copy link
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jonbrenas jonbrenas self-requested a review February 1, 2026 22:15
@jonbrenas jonbrenas dismissed their stale review February 1, 2026 22:16

New errors that we need to investigate.

@jonbrenas
Copy link
Collaborator

Thank you, @adilraza99.

My bad, I forgot to check the Pandas error and I didn't recognize that it was the one showing up. I tried to rerun the tests with numpy==1.26.4, though, and it looks like a different error (due to a (0, n, m) slice, apparently) is causing tests to fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError errors during CI checks

2 participants