To reduce the memory required for writing large dataframes, a new mode sync_filediffs is being implemented in the mysql.Connection class.
The approach is to do as much as possible out of memory.
On receiving a dataframe, the df is written to disk.
The db table which should be updated is also downloaded chunkwise to disk.
Then the filediffs package is used to find the differences between the two dataframes and save them to disk.
After that the update part and the delete part are read back into memory and the database is updated.
A first version is already implemented on the sync_filediffs branch.
Still open Issues are
- The verbose logging has to be improved so it integrates better into the codebase.
- The temporary file management has to be improved.
- The
query method's output format. Changing it seems to be a breaking change.
To reduce the memory required for writing large dataframes, a new mode
sync_filediffsis being implemented in the mysql.Connection class.The approach is to do as much as possible out of memory.
On receiving a dataframe, the df is written to disk.
The db table which should be updated is also downloaded chunkwise to disk.
Then the
filediffspackage is used to find the differences between the two dataframes and save them to disk.After that the update part and the delete part are read back into memory and the database is updated.
A first version is already implemented on the sync_filediffs branch.
Still open Issues are
querymethod's output format. Changing it seems to be a breaking change.