Skip to content

fix: handle pandas 3.0 default StringDtype#1777

Merged
SFJohnson24 merged 2 commits into
cdisc-org:mainfrom
filippsatverily:filipps/pandas3-handle-stringdtype
Jun 23, 2026
Merged

fix: handle pandas 3.0 default StringDtype#1777
SFJohnson24 merged 2 commits into
cdisc-org:mainfrom
filippsatverily:filipps/pandas3-handle-stringdtype

Conversation

@filippsatverily

@filippsatverily filippsatverily commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Pandas 3.0 changes the default string dtype from object to StringDtype, which requires these changes:

  1. Regex operators: .map() now returns nullable BooleanDtype, where pd.NA & True raises instead of returning False. Adds a _map_regex() helper that normalizes to numpy bool via .fillna(False).astype(bool), used by all prefix/suffix/matches regex operators.

  2. Case-insensitive comparisons: .lower() on a non-string value (e.g. pd.NA) raises AttributeError. Guards with isinstance(target_val, str) before calling .lower().

  3. Empty-column detection in record_count: checks dtype == "object" to identify string columns, which misses StringDtype. Uses pd.api.types.is_string_dtype() instead.

  4. Date validation: simplifies the is_valid_date guard to not isinstance(date_string, str), which already handles None, pd.NA, and any other non-string type.

Tested scenarios:

  • Full pytest suite: 1746 passed, 11 skipped, 0 failed (pandas 2.3.3, dask 2025.12.0)
  • Ran validation on CDISC_Pilot_Study_v4_FIXED.json: 201 SUCCESS, 6 SKIPPED, 0 errors

@filippsatverily filippsatverily marked this pull request as ready for review June 22, 2026 21:40
@filippsatverily

Copy link
Copy Markdown
Contributor Author

@SFJohnson24 another commit from #1745

@SFJohnson24

Copy link
Copy Markdown
Collaborator

@filippsatverily it looks like this is failing our linter for dataframe_operators, can you reformat?

@filippsatverily filippsatverily force-pushed the filipps/pandas3-handle-stringdtype branch from b1eaee8 to 7b9d25a Compare June 23, 2026 18:29
@filippsatverily

Copy link
Copy Markdown
Contributor Author

@filippsatverily it looks like this is failing our linter for dataframe_operators, can you reformat?

@SFJohnson24 done

@SFJohnson24 SFJohnson24 merged commit 29eef3d into cdisc-org:main Jun 23, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants