Skip to content
This repository was archived by the owner on Jul 16, 2025. It is now read-only.
This repository was archived by the owner on Jul 16, 2025. It is now read-only.

Incorrect pretraining data format for Factual Adapter #2

@theblackcat102

Description

@theblackcat102

I have followed the code here and generate all 3 tsv files under DisExtract/data/books/ALL18_2019jan02_[valid, train, test].tsv. However the format is not aligned with the required json file to run pretraining for Factual Adapter. The format of the tsv is also different than the required json format as well.

The content format of generated tsv file after executing python producer.py is as follows:

[Sentence 1]\t[Sentence 2]\t[Marker]
...

The required json file format should be as follows:

{ "sent" : "Sentence 1", "tokens": "sentence 2", "pairs" : [ ... ] }
...

Is there a conversion script that convert generated tsv format to json?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions