Skip to content

top1000.dev contains just the same queries as that in queries.dev.small.tsv #23

@haiahaiah

Description

@haiahaiah

Hi, I am replicating the project and I found that the dataset you provide may have missed some data.

Specifically, I found that the set of query id in top1000.dev.tar.gz is equal to the set of query id in qrels.dev.small.tsv (by the way, the set of query id in qrels.dev.small.tsv and queries.dev.small.tsv is same, which is definitely true).

So I am wondering top1000.dev.tar.gz should be renamed to top1000.dev.small.tar.gz? And there should have 'true' top1000.dev.tar.gz which is candidate file of queries in qrels.dev.tsv?

By the way, I found some files' number of colum Num Records in the table is not correct. First, it says that top1000.dev.tar.gz has 6,669,195 Num Records whereas I found the downloaded top1000.dev.tar.gz has 6668967. Second, triples.train.small.tar.gz in table has 39,782,779 Num Records whereas I found the downloaded triples.train.small.tar.gz has 39780811. And I guess this may not be a problem because the diff is a little small compared to the total number?

Thanks~

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions