Prediction Deep Copy / Replace #225
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A small "Zen of Python" enhancement for using idiomatic
copy.deepcopy()andcopy.replace()functions with correct and performant behavior.I previously implemented
Prediction.copy()becausecopy.deepcopy(prediction)copied immutable objects, which is unnecessary and particularly expensive when OCR is assigned. (Copying the whole associated table, its cells, and all their tokens for each prediction copy is slow, to no one's surprise.)I've since learned the behaviour of
copy.deepcopy()can be customized by implementing__deepcopy__()on classes. This PR dropsPrediction.copy()in favor of the idiomaticcopy.deepcopy(), customizing the latter's behavior to not copy immutable objects, thus getting the same speed and memory improvements as the customPrediction.copy()implementation.This PR also adds support for the Python 3.13+
copy.replace()function via__replace__(). It has better ergonomics thandataclasses.replace()by performing a deep copy as above and additionally supports assignment of computed class properties.Where before:
Now:
Note
Introduces idiomatic, performant copying for predictions and results.
__deepcopy__toPredictionand subclasses (DocumentExtraction,Summarization,Unbundling) to avoid copying immutable/large OCR objects; implementResult.__deepcopy__(deep-copiespredictionsonly)copy()implementations with standardcopy.deepcopy(); removedataclasses.replace-based copiescopy.replace()viaPrediction.__replace__, and unshadow it in subclasses (del <Subclass>.__replace__)Written by Cursor Bugbot for commit af42845. This will update automatically on new commits. Configure here.