repository: fix Repository.delete edge cases (#9815)#9820
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9820 +/- ##
==========================================
+ Coverage 84.94% 85.15% +0.20%
==========================================
Files 92 92
Lines 15291 15294 +3
Branches 2296 2298 +2
==========================================
+ Hits 12989 13023 +34
+ Misses 1611 1580 -31
Partials 691 691 ☔ View full report in Codecov by Harness. |
Tolerate a stale earlier object in the pack via a compact_pack complete flag, skip the per-delete index write in check --repair via a delete persist flag, and make verify_data report the files whose chunks it removes. Add a regression test for the stale-earlier-object case.
ada4e8b to
4fe7e0e
Compare
| data = corrupt(repository.get(chunk.id), 123) | ||
| # re-putting the same id would not repoint the index: close() persists new index | ||
| # entries incrementally, not updates to existing ones, so the next process would still | ||
| # read the old, good copy. delete first, then put the corrupted bytes as a fresh entry. | ||
| repository.delete(chunk.id) | ||
| repository.put(chunk.id, data) |
There was a problem hiding this comment.
yeah, that should work better.
even better than that would be to simulate reality of bit corruption more closely: use some low-level code that directly corrupts the file on disk without going through delete/put and without modifying the index.
| A pack holds several objects, so we rewrite it via compact_pack(complete=False) rather than | ||
| dropping the whole pack. With persist=True the full chunk index is written back so the next |
There was a problem hiding this comment.
Can you please read these AI-made comments and make sure they don't talk about the bad hacks of the past?
Dropping a whole pack instead of a defect chunk is nothing we need to discuss or even mention.
| return self._pack_writer.add(id, data) | ||
|
|
||
| def delete(self, id): | ||
| def delete(self, id, *, persist=True): |
There was a problem hiding this comment.
persist is a bad name, maybe update_index or so?
| keep_ids and drop_ids come from the caller iterating the chunk index for this pack_id; they | ||
| must not overlap. With complete=True (compact) they are the pack's whole object set, so their | ||
| ranges must tile it from offset 0 with no gap or overlap. With complete=False (delete) the index | ||
| view may be partial (a re-put can repoint an earlier object to another pack while its bytes stay | ||
| here), so we copy only the known survivors at their real offsets and allow gaps. Kept objects are | ||
| copied into a new pack via store.defrag and repointed; dropped objects' index entries are removed. |
There was a problem hiding this comment.
rewrite this to be more clear.
also remove crap like "come from the caller iterating chunk index for this pack id". a description of parameters needs to tell what they are and what we do here with them, not where a call comes from or what the caller is doing.
Closes #9815.
Follow-up to #9813, which made
Repository.deletea real single-object delete. That surfaced three issues:1. compact_pack contiguity assert.
delete()works off the chunk index, which can still list a stale earlier copy of a re-put object whose bytes remain in the pack. The retainedoffset == coveredassert then failed because the target no longer starts at offset 0.compact_packnow takes acompleteflag:compactkeeps the strict check,deletepassescomplete=Falseand copies the known survivors at their real offsets, tolerating the gap.2. Quadratic index writes in
check --repair.delete()rewrote the full chunk index on every call, andverify_data's repair loop calls it once per defect chunk.delete()now takes apersistflag;check --repairpassespersist=Falseand the index is rebuilt once infinish(), so there is no per-chunk full index write.3. Misleading repair output.
verify_datalogged that defect chunks "can not be removed yet" while actually removing them, and it dropped chunks from the repository index but not from the checker's own index, so the affected files were not reported in the same run. The messages are corrected and the chunk is dropped from the checker index too.Adds a regression test for the stale-earlier-object case.