Skip to content

fix: replace Dask GroupBy path in Distinct (.apply(set) fails in pandas 3.0)#1781

Open
filippsatverily wants to merge 2 commits into
cdisc-org:mainfrom
filippsatverily:filipps/pandas3-fix-distinct-groupby
Open

fix: replace Dask GroupBy path in Distinct (.apply(set) fails in pandas 3.0)#1781
filippsatverily wants to merge 2 commits into
cdisc-org:mainfrom
filippsatverily:filipps/pandas3-fix-distinct-groupby

Conversation

@filippsatverily

@filippsatverily filippsatverily commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Pandas 3.0 changes the behavior of .apply(set) / .apply(list) on a GroupBy, returning a DataFrame instead of a Series. This breaks the Dask else branch in Distinct._execute_operation.

Instead of adding a Dask-specific code path, this materializes dask DataFrames to pandas at the top of the method, letting the existing pandas groupby/agg logic handle everything. This eliminates the Dask-specific else branch entirely and also resolves 11 dask GroupBy FutureWarning instances (145 → 134 warnings).

Tested scenarios:

  • Full pytest suite: 1746 passed, 11 skipped, 0 failed (pandas 2.3.3, dask 2025.12.0)
  • Ran validation on CDISC_Pilot_Study_v4_FIXED.json: 201 SUCCESS, 6 SKIPPED, 0 errors

@filippsatverily filippsatverily force-pushed the filipps/pandas3-fix-distinct-groupby branch from e7ffcc5 to 444d277 Compare June 23, 2026 18:55
@filippsatverily filippsatverily marked this pull request as ready for review June 23, 2026 18:56
@filippsatverily

Copy link
Copy Markdown
Contributor Author

@SFJohnson24 I'm a bit out of my depth on this one TBH, please LMK if this PR has problems I'm not seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants