Skip to content

Conversation

@mawelborn
Copy link
Contributor

@mawelborn mawelborn commented Jan 23, 2026

A small "Zen of Python" enhancement for using idiomatic copy.deepcopy() and copy.replace() functions with correct and performant behavior.

I previously implemented Prediction.copy() because copy.deepcopy(prediction) copied immutable objects, which is unnecessary and particularly expensive when OCR is assigned. (Copying the whole associated table, its cells, and all their tokens for each prediction copy is slow, to no one's surprise.)

I've since learned the behaviour of copy.deepcopy() can be customized by implementing __deepcopy__() on classes. This PR drops Prediction.copy() in favor of the idiomatic copy.deepcopy(), customizing the latter's behavior to not copy immutable objects, thus getting the same speed and memory improvements as the custom Prediction.copy() implementation.

This PR also adds support for the Python 3.13+ copy.replace() function via __replace__(). It has better ergonomics than dataclasses.replace() by performing a deep copy as above and additionally supports assignment of computed class properties.

Where before:

new_prediction = dataclasses.replace(
    prediction.copy(),  # Must remember to copy to avoid shared mutable state.
    label="EXCEPTION_FLAG",
    confidence=1.0,  # Computed property not supported by `replace()`.
    span=etl_output.tokens[0].span,  # Same here and for other properties.
)

Now:

new_prediction = copy.replace(
    prediction,  # Mutable state handled for you.
    label="EXCEPTION_FLAG",
    confidence=1.0,  # Computed properties can be updated like attributes.
    span=etl_output.tokens[0].span,
)

Note

Introduces idiomatic, performant copying for predictions and results.

  • Add optimized __deepcopy__ to Prediction and subclasses (DocumentExtraction, Summarization, Unbundling) to avoid copying immutable/large OCR objects; implement Result.__deepcopy__ (deep-copies predictions only)
  • Replace custom copy() implementations with standard copy.deepcopy(); remove dataclasses.replace-based copies
  • Implement Python 3.13+ copy.replace() via Prediction.__replace__, and unshadow it in subclasses (del <Subclass>.__replace__)
  • Minor import cleanups aligned with the new copy strategy

Written by Cursor Bugbot for commit af42845. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.



# Unshadow `Prediction.__replace__`.
del Classification.__replace__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeError on Python < 3.13 from del replace

High Severity

The del ClassName.__replace__ statements (in classification.py, extraction.py, documentextraction.py, formextraction.py, summarization.py, and unbundling.py) will raise AttributeError on Python 3.10–3.12 because @dataclass only generates __replace__ starting in Python 3.13. Since the project specifies requires-python = ">=3.10" in pyproject.toml, these import-time deletions will crash on all supported Python versions except 3.13+.

Additional Locations (2)

Fix in Cursor Fix in Web

@mawelborn mawelborn changed the title Mawelborn/prediction deepcopy Prediction Deep Copy / Replace Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants