Scheduled weekly dependency update for week 14 #681

coderabbitai · 2026-04-06T14:59:28Z

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify datasets 4.8.4 version and check usage patterns echo "=== Checking PyPI for datasets 4.8.4 ===" curl -s https://pypi.org/pypi/datasets/4.8.4/json | jq -r '.info.version // "VERSION NOT FOUND"' echo -e "\n=== Searching for datasets library usage in codebase ===" rg -n --type=py -C3 'from datasets import|import datasets' -g '!requirements*.txt' echo -e "\n=== Checking for common datasets API patterns that may have changed ===" rg -n --type=py -C2 'datasets\.(load_dataset|Dataset|DatasetDict|load_from_disk)' -g '!requirements*.txt' echo -e "\n=== Checking for security advisories ===" gh api graphql -f query=' { securityVulnerabilities(first: 10, ecosystem: PIP, package: "datasets") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }'

Repository: DagsHub/client

Length of output: 1454

🏁 Script executed:

#!/bin/bash # Check the actual usage of datasets import and hf_ds variable echo "=== Full context of datasets import ===" sed -n '45,70p' dagshub/data_engine/model/query_result.py echo -e "\n=== Search for usage of hf_ds variable ===" rg -n 'hf_ds\.' dagshub/data_engine/model/query_result.py -A2 -B2 echo -e "\n=== Check if datasets is used in actual function implementations ===" rg -n 'def.*datasets|def.*hf_ds' dagshub/data_engine/model/query_result.py -A10 echo -e "\n=== Check type annotations using hf_ds ===" rg -n 'hf_ds\[|-> .*hf_ds|: .*hf_ds' dagshub/data_engine/model/query_result.py echo -e "\n=== Search for datasets usage in test files ===" find . -name '*test*.py' -type f | head -20 | xargs rg -l 'datasets' 2>/dev/null | head -5 echo -e "\n=== Check requirements.txt for datasets version ===" cat requirements.txt 2>/dev/null || echo "No requirements.txt found"

Repository: DagsHub/client

Length of output: 1323

🏁 Script executed:

#!/bin/bash # Check context around line 322 to find method definition and tests echo "=== Method containing from_pandas usage ===" sed -n '300,330p' dagshub/data_engine/model/query_result.py echo -e "\n=== Function signature ===" rg -n 'def.*from_pandas|def.*\(.*\).*from_pandas' dagshub/data_engine/model/query_result.py -B20 | head -40 echo -e "\n=== Search for test files related to query_result ===" find . -name '*test*query*result*.py' -o -name '*query*result*test*.py' | head -10 echo -e "\n=== Check if from_pandas is tested ===" rg -n 'from_pandas' --type=py -l

Repository: DagsHub/client

Length of output: 1412

🏁 Script executed:

#!/bin/bash # Find the method containing from_pandas by searching backwards echo "=== Find method definition containing from_pandas (lines 280-325) ===" sed -n '280,325p' dagshub/data_engine/model/query_result.py echo -e "\n=== Check test_queryresult.py ===" wc -l ./tests/data_engine/test_queryresult.py echo -e "\n=== Content of test file ===" head -100 ./tests/data_engine/test_queryresult.py

Repository: DagsHub/client

Length of output: 2909

🏁 Script executed:

#!/bin/bash # Get the full method signature echo "=== Find method signature ===" sed -n '270,285p' dagshub/data_engine/model/query_result.py echo -e "\n=== Search for method def containing as_huggingface or similar ===" rg -n 'def.*as.*huggingface|def.*hf_dataset' dagshub/data_engine/model/query_result.py -A2 echo -e "\n=== Check if there's a method near line 280 ===" sed -n '275,282p' dagshub/data_engine/model/query_result.py

Repository: DagsHub/client

Length of output: 1336

🌐 Web query:

datasets library 3.6.0 to 4.8.4 breaking changes from_pandas API

💡 Result:

Between datasets==3.6.0 and datasets==4.8.4, Dataset.from_pandas(...) itself did not meaningfully “break” at the signature/parameter level (it still takes df, features=None, info=None, split=None, preserve_index=None, and the preserve_index=None behavior around RangeIndex is the same). (raw.githubusercontent.com)

The practical breaking changes you’re most likely to hit in code paths involving from_pandas are:

Column access return type changed in v4.0.0
In datasets>=4.0.0, ds["col"] returns a datasets.Column object (lazy column) rather than the plain Python list many users relied on in 3.x. If your code does list operations directly (e.g., ds["col"].append(...), set(ds["col"]), JSON-serializing it, etc.), you may need to wrap it: list(ds["col"]) (or otherwise materialize it) to get the old behavior. (github.com)

Sequence feature type was replaced (v4.0.0)
If you pass features= to from_pandas and you used datasets.Sequence(...), datasets>=4.0.0 replaced the Sequence feature type with the new List type (and Sequence becomes a utility that produces List/dict depending on subfeatures). This can break schemas you construct for from_pandas(..., features=...) (especially for nested/list columns) and anything that checks feature types explicitly. (github.com)

IterableDataset variant gained an extra parameter (not breaking, but can surprise)
IterableDataset.from_pandas(...) in current docs includes num_shards (default 1). This is additive (so not a breaking change), but if you have wrappers that mirror/validate exact call signatures, it can matter. (huggingface.co)

Citations:

1: https://raw.githubusercontent.com/huggingface/datasets/3.6.0/src/datasets/arrow_dataset.py

2: https://github.com/huggingface/datasets/releases/tag/4.0.0

3: https://github.com/huggingface/datasets/releases/tag/4.0.0

4: https://huggingface.co/docs/datasets/package_reference/main_classes

Add test coverage for as_hf_dataset() method before upgrading datasets.

The from_pandas() API signature is compatible between 3.6.0 and 4.8.4. However, datasets 4.x introduced breaking changes in column access behavior (columns return datasets.Column objects instead of lists) and deprecated Sequence feature types. While the current as_hf_dataset() implementation doesn't directly use these features, it has no test coverage. Add a test to verify the method works correctly with 4.8.4 before this dependency update is deployed.

-Original file line number
+Diff line change
@@ Expand Up / @@ -4,4 +4,4 @@ sphinx-autobuild==2025.8.25 @@
     sphinx-copybutton==0.5.2
     sphinx-sitemap==2.9.0
     sphinx-click==6.2.0
-    sphinx-autodoc-typehints==3.9.9
+    sphinx-autodoc-typehints==3.9.11

-Original file line number
+Diff line change
@@ -1,9 +1,9 @@
-    setuptools==80.9.0
-    pytest==8.3.5
+    setuptools==82.0.1
+    pytest==9.0.2
     respx==0.22.0
     pytest-git==1.8.0
-    pytest-env==1.1.5
-    pytest-mock==3.14.0
-    fiftyone==1.5.2
-    datasets==3.6.0
-    ultralytics==8.3.237
+    pytest-env==1.6.0
+    pytest-mock==3.15.1
+    fiftyone==1.14.0
+    datasets==4.8.4
+    ultralytics==8.4.33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduled weekly dependency update for week 14 #681

Uh oh!

Diff view

Diff view

There are no files selected for viewing

coderabbitai Bot Apr 6, 2026

Uh oh!

Uh oh!

Scheduled weekly dependency update for week 14 #681

Uh oh!

Scheduled weekly dependency update for week 14 #681

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!