Skip to content

chore(deps-dev): update datasets requirement from >=3.0 to >=5.0.0#43

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/datasets-gte-5.0.0
Open

chore(deps-dev): update datasets requirement from >=3.0 to >=5.0.0#43
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/datasets-gte-5.0.0

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Jun 8, 2026

Copy link
Copy Markdown
Contributor

Updates the requirements on datasets to permit the latest version.

Release notes

Sourced from datasets's releases.

5.0.0

Datasets Features

Agent traces

  • Parse Agent traces messages for SFT using teich by @​lhoestq in huggingface/datasets#8232

    • Agent traces from claude_code/pi/codex and others can now be loaded with load_dataset
    • Using the teich library (new optional dependency), traces are parsed to messages to enable training on traces using e.g. trl
    • Load the data:
    >>> from datasets import load_dataset
    >>> ds = load_dataset("lhoestq/agent-traces-example", split="train")
    >>> ds[0]["messages"]
    [{'role': 'user', 'content': 'Download a random dataset from Hugging Face, use DuckDB to inspect it, and come back with a short report about it. Be concise and include: dataset name, what files/format you found, row count or rough size if you can determine it,...'
     ...]
    • Train on agent traces:
    trl sft --dataset-name lhoestq/agent-traces-example ...

Next-level shuffling in streaming mode

  • Use multiple input shards for shuffle buffer by @​lhoestq in huggingface/datasets#8194

    ds = load_dataset(..., streaming=True)
    ds = ds.shuffle(seed=42)
    # or configure local buffer shuffling manually, default is:
    ds = ds.shuffle(seed=42, buffer_size=1000, max_buffer_input_shards=10)

    before👎:

    after✨:

    toy example comparison

    from datasets import IterableDataset
    ds = IterableDataset.from_dict({"i": range(123_456_789)}, num_shards=1024)
    ds = ds.shuffle(seed=42)
    print("Cold start ids:")

... (truncated)

Commits
  • 68ac1a9 Release: 5.0.0 (#8239)
  • cfe4492 Support composed splits in streaming datasets (#8220)
  • fd67320 Keep None as a real null in Json() columns instead of the string "null" (#8231)
  • 10cdc81 Fix iterable skip over full Arrow blocks (#8236)
  • b7c064d Parse agent traces messages for SFT using teich (#8232)
  • 31e92f1 fix: embed_external_files=True for mesh support (#8224)
  • d168d5f feat: add TsFile (Apache IoTDB) packaged builder with per-device wide format ...
  • 992f3cf fix(map): fix progress bar exceeding total when load_from_cache_file=False (#...
  • 8474a91 Fix single lance file form pylance 7.0 (#8225)
  • d4284e9 feat: add 3D mesh support and MeshFolder builder (#8055)
  • Additional commits viewable in compare view

@dependabot @github

dependabot Bot commented on behalf of github Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Labels

The following labels could not be found: dependencies. Please create it before Dependabot can add it to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

Updates the requirements on [datasets](https://github.com/huggingface/datasets) to permit the latest version.
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@3.0.0...5.0.0)

---
updated-dependencies:
- dependency-name: datasets
  dependency-version: 5.0.0
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot changed the title build(deps-dev): update datasets requirement from >=3.0 to >=5.0.0 chore(deps-dev): update datasets requirement from >=3.0 to >=5.0.0 Jun 15, 2026
@dependabot dependabot Bot force-pushed the dependabot/pip/datasets-gte-5.0.0 branch from eaa2009 to 0266c5a Compare June 15, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants