Skip to content

fix: clean up staging ODB after dvc add --out transfer#11008

Closed
BALOGUN-DAVID wants to merge 2 commits intotreeverse:mainfrom
BALOGUN-DAVID:main
Closed

fix: clean up staging ODB after dvc add --out transfer#11008
BALOGUN-DAVID wants to merge 2 commits intotreeverse:mainfrom
BALOGUN-DAVID:main

Conversation

@BALOGUN-DAVID
Copy link
Copy Markdown

@BALOGUN-DAVID BALOGUN-DAVID commented Feb 27, 2026


Fix Staging ODB Cleanup for dvc add --out

📝 Description

When running dvc add --out, DVC copies source files into the cache via a staging mechanism but fails to clean up the temporary staging ODB (stored in memfs) and temporary upload files after the transfer completes.
This causes lingering temporary files and memfs bloat.
This issue was confirmed as a bug by DVC maintainers.

Approach:
This PR adds a staging.clear() call inside the Output.transfer() method in dvc/output.py, directly after otransfer() successfully moves the objects from the staging area to the cache ODB.

Impact:
Prevents memory leaks and disk space accumulation by properly freeing up temporary memfs staging files and caching artifacts upon successful completion of the dvc add --out command.


✅ Test Results

  • Added a new test test_add_with_out_cleans_up_staging in tests/func/test_add.py that uses a mocker spy on ObjectDB.clear to ensure the cleanup function is triggered.
  • Validated that all existing tests related to add --out pass smoothly (e.g.,
    test_add_with_out, test_add_force_overwrite_out, test_add_to_cache_different_name).

🧩 Visual Evidence — output.py Changes

# dvc/output.py # (Showing lines 1059-1066)
                 callback=cb,
             )

             staging.clear()  # <- ADDED THIS LINE

         self.hash_info = obj.hash_info
         self.files = None
         return obj

🧪 Visual Evidence — Test Changes

# tests/func/test_add.py # (Showing lines 896-905)
def test_add_with_out_cleans_up_staging(tmp_dir, dvc, mocker):
    """Test that 'dvc add --out' cleans up the staging ODB after transfer."""
    from dvc_objects.db import ObjectDB

    tmp_dir.gen({"foo": "foo"})
    clear_spy = mocker.spy(ObjectDB, "clear")
    dvc.add("foo", out="out_foo")

    assert (tmp_dir / "out_foo").read_text() == "foo"
    assert clear_spy.call_count >= 1  # <- ADDED ASSERTION

📋 Checklist

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@github-project-automation github-project-automation Bot moved this to Backlog in DVC Feb 27, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 27, 2026

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.98%. Comparing base (2431ec6) to head (819d928).
⚠️ Report is 196 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11008      +/-   ##
==========================================
+ Coverage   90.68%   90.98%   +0.30%     
==========================================
  Files         504      505       +1     
  Lines       39795    41150    +1355     
  Branches     3141     3263     +122     
==========================================
+ Hits        36087    37440    +1353     
- Misses       3042     3071      +29     
+ Partials      666      639      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@BALOGUN-DAVID
Copy link
Copy Markdown
Author

BALOGUN-DAVID commented Mar 2, 2026

Hi @skshetry, I’ve just opened this PR to address #10783 . When you have a moment, I would really appreciate your review and feedback on the implementation. Thank you!

@skshetry
Copy link
Copy Markdown
Collaborator

skshetry commented Mar 25, 2026

Hi, this PR does not seem to work. You can verify this with the following test:

def test_add_out_cleans_up_staging(tmp_dir, dvc):
    tmp_dir.gen({"src_dir": {"a.txt": "aaa", "b.txt": "bbb"}})
    odb = dvc.cache.local
    assert list(odb.fs.find(odb.path)) == []

    dvc.add("src_dir", out="dst_dir")

    assert (tmp_dir / "dst_dir").read_text() == {"a.txt": "aaa", "b.txt": "bbb"}
    assert not [p for p in odb.fs.find(odb.path) if p.endswith(".tmp")]

The test that you have added only checks if the method that you added gets called or not.

Removes leftover .tmp files from the destination cache directly after transfer to prevent disk bloat. Includes target test case for dvc add --out. Fixes treeverse#11008.
@BALOGUN-DAVID
Copy link
Copy Markdown
Author

Thanks for the catch !

I've updated the PR to explicitly hook into dvc/output.py's transfer() method and clean up any lingering .tmp files directly from the destination odb, right after staging.clear() finishes.

I've also integrated your exact test into tests/func/test_add.py::test_add_out_cleans_up_staging, and it's passing cleanly now. Let me know if everything looks good or if there is anything else you'd like me to adjust!
image

Comment thread dvc/output.py
Comment on lines +1064 to +1066
for path in odb.fs.find(odb.path):
if path.endswith(".tmp"):
odb.fs.remove(path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external .tmp cleanup here feels brittle, and is much broader than necessary.

@skshetry
Copy link
Copy Markdown
Collaborator

Thanks for taking the time to open this PR. I am going to close it for now, as I do not think it fully addresses the issue and it seems to depend on implementation details. This bug is also not a high priority for me at the moment.

@skshetry skshetry closed this Mar 26, 2026
@github-project-automation github-project-automation Bot moved this from Backlog to Done in DVC Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants