improve --check performance when indexing to postgres#111
improve --check performance when indexing to postgres#111
Conversation
|
Cool! Do you or @klassenjs use the |
|
@clairecporter I just started using I also realized that my current approach in this PR is flawed since I'm using the same GDAL path to build the temp table, so now it's just checking how consistent gdal is rather than if gdal wrote to postgres correctly. I think the way to go is to do something like |
|
Feel free to strip out the --check logic if you can be reasonably sure the errors and warnings are properbly caught. |
I tried running
index_setsm.pywith--checkbut this was very slow (this was run using gdal 3.9.3 and python 3.12.11 on our compute cluster), it took nearly 7 hours when run against a ~32 million record postgres table to check for 16 records. Looks like a lot of the time was spent due to the client fetching all rows from the database table into memory.This PR improves execution time by performing the check within postgres instead. I am seeing this complete in about 12 minutes now against the aforementioned dataset. Not sure if this is the best way to perform the check on the database side though.