Skip to content

Support very large bucket directories #276

@simonlsk

Description

@simonlsk

Right now the DagsHubFilesystem offers a listdir method that returns a list. What if I am trying to access a very large bucket directory, I cannot expect that list to be infinitely big.
Example snippet that will time out:

from dagshub.streaming import DagsHubFilesystem
fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/radiant-mlhub-dataset")
fs.listdir("s3://radiant-mlhub/bigearthnet")

I propose that the client implements a fs.Walk that returns a generator with potentially infinite content.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions