Release temp .mlpackage eagerly in MLModel.__del__#2671
Release temp .mlpackage eagerly in MLModel.__del__#2671john-rocky wants to merge 1 commit intoapple:mainfrom
Conversation
|
What happens if a someone tries to use a model after calling |
| class TestMLModelTempCleanup: | ||
| def test_close_removes_temp_package(self, tmp_path, monkeypatch): | ||
| """`MLModel.close()` deletes the temp .mlpackage and is idempotent.""" | ||
| monkeypatch.setenv("TMPDIR", str(tmp_path)) |
There was a problem hiding this comment.
Is there another way to could do this? If we ever run tests in parallel, I think this will cause issues.
There was a problem hiding this comment.
Good catch. Rewrote the tests to never touch $TMPDIR or tempfile.tempdir; instead each test captures the specific model.package_path it just created and asserts cleanup on that path only. Counting tmp*.mlpackage in the shared tmpdir was both races with parallel tests and weaker than necessary — the new test_no_leak_across_many_mlmodel_constructions collects every model's own path into a list and asserts each one is gone after del, which is a strictly stronger claim.
Verified under pytest -n 4 (pytest-xdist) locally: 4 passed, no races.
test_user_provided_package_path_is_not_deleted still uses tmp_path, but only as the destination for MLModel.save() — it's the per-test unique directory pytest hands out, not a $TMPDIR override, so still parallel-safe.
`MLModel` constructed from a `Model_pb2` spec allocates a temporary
`.mlpackage` under `$TMPDIR` via `_create_mlpackage`. Historically
the only cleanup path was `atexit.register(cleanup, self.package_path)`,
so the directory survived until the Python process exited.
In workflows that construct many short-lived MLModel objects — most
visibly `coremltools.optimize.coreml.experimental.linear_quantize_activations`,
which builds one temporary model per op-group batch in
`_ModelDebugger.predict_intermediate_outputs` — this caused tens of
gigabytes of leaked temp directories.
Measured on macOS 15 / Apple Silicon / coremltools 9.0 / Python 3.12:
- 8-layer 2048-dim toy model, 4 calibration samples:
leaks +6 `.mlpackage` dirs / +402.9 MB per call.
- Gemma 4 E2B chunk2 (~300 MB per package):
~38 GB retained before the process finishes; fills a 49 GB disk.
With this patch both numbers drop to 0.
This change adds `MLModel.close()` and `MLModel.__del__` that release
the temp package as soon as the Python object goes out of scope.
The C++ proxy is dropped first so the compiled `.mlmodelc` mmap is
released before `rmtree`. User-owned `.mlpackage` paths loaded from
disk are never flagged `is_temp_package` and are never touched.
The existing `atexit` hook is kept as a safety net for objects that
are still alive at interpreter shutdown.
Includes regression tests under
`coremltools/test/modelpackage/test_mlmodel_temp_cleanup.py` that
cover `close()` idempotency, `__del__` cleanup, no-leak across many
constructions, and that user-provided paths are preserved.
4c4e745 to
8846dbd
Compare
|
Thanks for the review. Agreed — renamed to
Updated the docstring to say so explicitly. Force-pushed rather than adding a fixup commit since the PR is still a single logical change — let me know if you'd prefer fixup commits kept around for easier re-review. Also rebased on top of latest main ( |
Release temp
.mlpackageeagerly in MLModelFixes #2670
What this changes
Adds
MLModel.close()andMLModel.__del__so the temporary.mlpackageallocated by_create_mlpackage(when anMLModelisconstructed from a
Model_pb2spec) is released when the Pythonobject goes out of scope, rather than only at interpreter exit.
The existing
atexit.register(cleanup, self.package_path)is kept asa safety net for long-lived objects that are still alive at shutdown.
Why
coremltools.optimize.coreml.experimental.linear_quantize_activationsconstructs one temp
MLModelper op-group batch during calibration(see
_ModelDebugger.predict_intermediate_outputs). Because cleanupwas tied to
atexit, a single calibration run on a medium-sizedtransformer leaks tens of gigabytes of
$TMPDIRbefore the processfinishes — often enough to fill the disk and crash the calibration
halfway through.
Reproduced on
coremltools==9.0, macOS 15 (Darwin 25.0.0), AppleSilicon, Python 3.12. The leak measures +6 temp
.mlpackage/+402.9 MB on a toy 8-layer 2048-dim reproducer with 4 calibration
samples, and ~38 GB in realistic Gemma 4 E2B chunk2 runs. With
the patch applied, the same reproducer leaks 0 directories / 0 MB.
Approach
self.__proxy__first so the CoreML runtime can close its mmapof the compiled
.mlmodelcbefore the Python side removes thebacking directory.
rmtree(self.package_path, ignore_errors=True)only whenis_temp_packageis true — user-provided paths are never touched.close()is idempotent and safe to call more than once.__del__swallows all exceptions (interpreter-shutdown-safe).Test plan
New file
coremltools/test/modelpackage/test_mlmodel_temp_cleanup.py(included in this PR) covers:
close()removes the temp package and is idempotent.__del__removes the temp package when the Python object goes outof scope.
delleaves the tmp-dir countunchanged (the regression test for the activation-calibration bug).
.mlpackagepaths loaded from disk are never flaggedis_temp_packageand must survivedel.Plus the existing suite:
pytest coremltools/test/modelpackage -q.Backwards compatibility
close()and__del__are additive. The only observable change isthat temp packages disappear sooner than they used to. Any user code
that was relying on the temp path being readable after the MLModel
was destroyed (there should be none — the directory was already
documented as internal) should switch to
MLModel.save(path).