Skip to content

How to generate our own pretrain dataset? #18

@RobinHan24

Description

@RobinHan24

As mentioned in readme, I followed to run the script preprocess_pretrain_10k.py to generate data in data-bin/pretrain_10k, but how can I generate myown data which is in data-src/pretrain_10k, thanks a lot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions