Hi, I am replicating the project and I found that the dataset you provide may have missed some data.
Specifically, I found that the set of query id in top1000.dev.tar.gz is equal to the set of query id in qrels.dev.small.tsv (by the way, the set of query id in qrels.dev.small.tsv and queries.dev.small.tsv is same, which is definitely true).
So I am wondering top1000.dev.tar.gz should be renamed to top1000.dev.small.tar.gz? And there should have 'true' top1000.dev.tar.gz which is candidate file of queries in qrels.dev.tsv?
By the way, I found some files' number of colum Num Records in the table is not correct. First, it says that top1000.dev.tar.gz has 6,669,195 Num Records whereas I found the downloaded top1000.dev.tar.gz has 6668967. Second, triples.train.small.tar.gz in table has 39,782,779 Num Records whereas I found the downloaded triples.train.small.tar.gz has 39780811. And I guess this may not be a problem because the diff is a little small compared to the total number?
Thanks~
Hi, I am replicating the project and I found that the dataset you provide may have missed some data.
Specifically, I found that the set of query id in top1000.dev.tar.gz is equal to the set of query id in qrels.dev.small.tsv (by the way, the set of query id in qrels.dev.small.tsv and queries.dev.small.tsv is same, which is definitely true).
So I am wondering top1000.dev.tar.gz should be renamed to top1000.dev.small.tar.gz? And there should have 'true' top1000.dev.tar.gz which is candidate file of queries in qrels.dev.tsv?
By the way, I found some files' number of colum Num Records in the table is not correct. First, it says that top1000.dev.tar.gz has 6,669,195 Num Records whereas I found the downloaded
top1000.dev.tar.gzhas 6668967. Second, triples.train.small.tar.gz in table has 39,782,779 Num Records whereas I found the downloadedtriples.train.small.tar.gzhas 39780811. And I guess this may not be a problem because the diff is a little small compared to the total number?Thanks~