Skip to content

feat: add DispatchDataLoader for custom iterable dataloaders#4045

Open
KrishVenky wants to merge 2 commits into
huggingface:mainfrom
KrishVenky:feat/dispatch-dataloader-iterable
Open

feat: add DispatchDataLoader for custom iterable dataloaders#4045
KrishVenky wants to merge 2 commits into
huggingface:mainfrom
KrishVenky:feat/dispatch-dataloader-iterable

Conversation

@KrishVenky
Copy link
Copy Markdown

Closes #2975

Adds DispatchDataLoader, a lightweight wrapper that makes any Python iterable usable with Accelerator.prepare() without requiring it to be a torch.utils.data.DataLoader. Also adds custom_classes to DataLoaderConfiguration so the Accelerator can auto-detect and wrap user-defined iterable types.

What does this PR do?

Implements the feature requested in #2975 ("barebones dataloader for any iterable"):

  • DispatchDataLoader (data_loader.py): wraps any iterable, calls __iter__ on each pass, and applies automatic device placement via send_to_device. Exported from the top-level accelerate package.
  • DataLoaderConfiguration.custom_classes (utils/dataclasses.py): a tuple of class types. When set, Accelerator.prepare() detects instances of those classes and wraps them in DispatchDataLoader automatically.
  • _prepare_one (accelerator.py): recognises DispatchDataLoader directly, and wraps custom_classes instances.

Usage:

from accelerate import Accelerator, DispatchDataLoader
from accelerate.utils import DataLoaderConfiguration

class MyDataSource:
    def __iter__(self):
        for i in range(100):
            yield {"input_ids": torch.tensor([i])}

config = DataLoaderConfiguration(custom_classes=(MyDataSource,))
accelerator = Accelerator(dataloader_config=config)
loader = accelerator.prepare(MyDataSource())  # returns DispatchDataLoader

Closes huggingface#2975. Adds DispatchDataLoader, a lightweight wrapper that makes
any Python iterable usable with Accelerator.prepare() without requiring
it to be a torch.utils.data.DataLoader.

- data_loader.py: new DispatchDataLoader class with device placement
- utils/dataclasses.py: custom_classes field on DataLoaderConfiguration
- accelerator.py: _prepare_one recognises DispatchDataLoader and custom_classes
- __init__.py: exports DispatchDataLoader
- tests/test_data_loader.py: 8 new tests, 39/39 passing
@KrishVenky KrishVenky force-pushed the feat/dispatch-dataloader-iterable branch from 3c7f0ae to dd43451 Compare May 21, 2026 14:12
Forwards unknown attribute access to the wrapped iterable, matching
the pattern used by DataLoaderAdapter. Raises AttributeError cleanly
when the attribute does not exist on the wrapped object either.
Adds two tests covering delegation and missing-attribute behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Barebones dataloader to allow for any type of iterable dataloader-like object to be used. Should just handle device placement

1 participant