I've just investigated an issue that confused me until I looked into the code. When testing the bulk Rivet import (#958) on the QA instance, one submission (ins1777678.tar.gz) had a large number (1472) of data tables. However, only 908 data tables were shown in an empty search (using sort_by=latest). 1000 data tables were returned for a search with only one result like (inspire_id:1777678). After uploading to my local instance, 911 data tables were shown in an empty search. This behaviour is governed by the code lines:
|
data_search_size = size * OPENSEARCH_MAX_RESULT_WINDOW // LIMIT_MAX_RESULTS_PER_PAGE |
|
data_search = data_search[0:data_search_size] |
where the OPENSEARCH_MAX_RESULT_WINDOW = 10000, LIMIT_MAX_RESULTS_PER_PAGE = 100 and the default size = 10, so data_search_size = 1000. This limit is imposed to avoid exceeding limits on the number of OpenSearch results (10000). However, the value of data_search_size applies to tables from all records returned in a single page of search results. This means that the number of tables displayed for a record will depend on the number of tables of other records on the same page and on the value of the size parameter. I think it would be less confusing if the maximum number of tables per record was set to some fixed value per record that did not depend on other records in the same search.
I've just investigated an issue that confused me until I looked into the code. When testing the bulk Rivet import (#958) on the QA instance, one submission (
ins1777678.tar.gz) had a large number (1472) of data tables. However, only 908 data tables were shown in an empty search (usingsort_by=latest). 1000 data tables were returned for a search with only one result like (inspire_id:1777678). After uploading to my local instance, 911 data tables were shown in an empty search. This behaviour is governed by the code lines:hepdata/hepdata/ext/opensearch/api.py
Lines 173 to 174 in 77dc140
where the
OPENSEARCH_MAX_RESULT_WINDOW = 10000,LIMIT_MAX_RESULTS_PER_PAGE = 100and the defaultsize = 10, sodata_search_size = 1000. This limit is imposed to avoid exceeding limits on the number of OpenSearch results (10000). However, the value ofdata_search_sizeapplies to tables from all records returned in a single page of search results. This means that the number of tables displayed for a record will depend on the number of tables of other records on the same page and on the value of thesizeparameter. I think it would be less confusing if the maximum number of tables per record was set to some fixed value per record that did not depend on other records in the same search.