fix and improve: billion scale image search#1861
fix and improve: billion scale image search#1861abhishekkrthakur wants to merge 1 commit intomasterfrom
Conversation
billion image
|
great job on fixing this! Running it as we speak. |
|
most excellent! we can test this once a week or so. an alternative to hosting ourselves can be to download and just extract the subset in the python code? |
|
The file is 14.8GB, so won't be able to run it in default github action runner at all unfortunately. |
|
But it seems we can stream the parquet file directly with pyarrow instead of downloading it first though |
|
it might be slower. we are dealing with millions of rows. so storing on disk should be preferred. we are streaming rows in code though in batches of 50k. nevertheless, if you want me to make that change, please let me know |
|
for ci we can also use a sample by the way |
|
Yes. If we use a sample it would solve both ci and making it more user friendly imho. |
|
the ci failure doesnt seem to be coming from this pr |
|
right, I fixed tue build in a separate pr |
billion image
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.