Skip to content

Add openarm-dataset-upload#53

Merged
kou merged 5 commits into
enactic:mainfrom
k1000dai:upload
Jun 18, 2026
Merged

Add openarm-dataset-upload#53
kou merged 5 commits into
enactic:mainfrom
k1000dai:upload

Conversation

@k1000dai

@k1000dai k1000dai commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Fix GH-51

Summary

Adds an openarm-dataset-upload CLI that publishes an OpenArm dataset directory to the Hugging Face Hub, creating the dataset repo if needed, generating a dataset card, and tagging the upload with the dataset version.

sample: https://huggingface.co/datasets/k1000dai/fixture

  • add Apache-2.0 licence as default (same as lerobot)
  • JPEG files are converted into tar.
  • metadata is in the dataset card.
  • dataset version information is used to create a tag (git)
  • OpenArm tag is added to the Repo, for convenience.

Comment thread src/openarm_dataset/upload.py Outdated
Comment thread src/openarm_dataset/upload.py Outdated
Comment on lines +198 to +203
parser.add_argument(
"--licence",
default="apache-2.0",
help="The licence to associate with the dataset on the Hugging Face Hub. "
"Defaults to Apache-2.0.",
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, let's add license: in metadata.yaml and Metadata.license.

Can we rename this to --default-license that is used when metadata misses license information?

We can work on this as a follow-up task.

Comment thread src/openarm_dataset/upload.py
Comment thread src/openarm_dataset/dataset.py Outdated
Comment on lines +13 to +16
{{ dataset_description | default("", true) }}

- **Homepage:** {{ url | default("[More Information Needed]", true)}}
- **Paper:** {{ paper | default("[More Information Needed]", true)}}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add these information to metadata.yaml as a follow-up task.

@kou kou changed the title upload script Add openarm-dataset-upload Jun 17, 2026
k1000dai and others added 2 commits June 17, 2026 18:35
Comment thread src/openarm_dataset/upload.py Outdated

@kou kou left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment thread src/openarm_dataset/upload.py Outdated
Comment thread src/openarm_dataset/upload.py Outdated
Comment thread src/openarm_dataset/upload.py Outdated
Comment thread src/openarm_dataset/upload.py Outdated
Comment thread src/openarm_dataset/upload.py Outdated
)
parser.add_argument(
"--licence",
default="apache-2.0",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should use SPDX ID ( https://spdx.org/licenses/ ) for license ID:

Suggested change
default="apache-2.0",
default="Apache-2.0",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huggingfaced api only accepts a small letter.https://huggingface.co/docs/hub/repositories-licenses

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... Hugging Face doesn't use SPDX ID...
Then we should change the "Defaults to Apache-2.0." text ("Apache-2.0" -> "apache-2.0"). Anyway, let's work on license related thing as a follow-up task.

Comment thread src/openarm_dataset/upload.py
Comment on lines +187 to +191
parser.add_argument(
"--repo-id",
required=True,
help="Target Hugging Face dataset repository id, e.g. username/dataset-name",
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better that we use our recommended naming convention (we should describe it...) as the default. Let's work on it as a follow-up task.

Comment thread src/openarm_dataset/card_template.md Outdated
@kou kou merged commit fd7858d into enactic:main Jun 18, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for uploading dataset to Hugging Face

3 participants