file: add range-level sparse file caching with hole punching#26
file: add range-level sparse file caching with hole punching#26achille-roussel wants to merge 2 commits into
Conversation
spenczar
left a comment
There was a problem hiding this comment.
What is the motivation of this change, and how will we verify it meets those goals?
|
I'm making this because overall this cache has been really helpful everywhere I added it, and I want to convert more use of in-memory cache to on-disk, but the file cache didn't support GetObject with ranges, so I know this will break when it starts reading parquet files. On merge, this will be mostly a no-op, but I want to test it with our query engines. It will be a different pull request on other repositories to enable it. |
|
@achille-roussel I still don't really understand, sorry... so this is in order to support range queries? Can you help me connect the dots on how hole punching (which is a new term to me) is related to range queries? Sorry to be pushy on this, but I think this is the right new way to think about code review ;) |
|
Let's just talk this through in our next 1:1, no rush :) |
Implement fine-grained cache eviction using sparse files and hole punching. Instead of evicting entire files, individual ranges can be evicted by punching holes in the sparse cache file, reclaiming disk space while preserving other cached ranges. Key changes: - Add platform-specific sparse file support (darwin/linux) with SEEK_HOLE, SEEK_DATA, and hole punching via F_PUNCHHOLE/fallocate - Track cached ranges individually in the LRU with a secondary index for efficient file-level operations (e.g., ETag invalidation) - Add comprehensive tests for sparse file operations and range-level eviction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test suite improvements for range-level sparse file caching: Tests added to cache_test.go: - TestSecondaryIndexConsistency: verify c.files index consistency - TestEmptyFileCleanupAfterEviction: verify file cleanup on full eviction - TestConcurrentRangeRequests: parallel goroutines requesting ranges - TestOverlappingRangeRequests: overlapping range handling - TestZeroByteObjectCaching: zero-byte object edge case - TestRangeBeyondFileSize: range extending past object size - TestCacheRehydrationWithSparseFiles: cache restart behavior - TestBlockAlignmentDuringFetch: block alignment optimization - TestMultipleFilesLRUEviction: LRU eviction across files - TestCorruptedCacheMetadataRecovery: cache file deletion recovery - TestCacheRevalidationWithBackendError: fail-open revalidation behavior Tests added to sparse_test.go: - TestDiskUsageAccuracy: verify diskUsage() accuracy - TestHolePunchingReducesDiskUsage: verify hole punching reclaims space - TestSparseFileWithMultipleHoles: alternating data/hole patterns Add t.Parallel() to all tests for ~35% speedup (2.0s -> 1.3s). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
17e70ea to
a4adced
Compare
Summary
SEEK_HOLE,SEEK_DATA, and hole punching viaF_PUNCHHOLE/fallocateTest plan
go test ./file/...passesTestPunchHoleverifies hole punching reduces disk usageTestRangeLevelEvictionverifies range eviction punches holes instead of deleting filespunchHoleat 100%🤖 Generated with Claude Code