feat: introduce container data structure for batch small file writing#659
feat: introduce container data structure for batch small file writing#659thuongle2210 wants to merge 96 commits intoCurvineIO:mainfrom
Conversation
…l related parameters
…File at some functions
…mplete_inode_entry
Thank you for your excellent work. We have a holiday for Chinese new year in past few days, we will review this pr from today and wish you have a good day |
|
Can this PR reduce the metadata memory usage in scenarios involving a large number of small files? |
|
This PR can reduce the number of RPC calls and block allocations during writes. However, after the underlying data structure changes, it may not be very friendly for scenarios involving random reads of small files. If randomly read files are scattered across different blocks, we need to locate the offset first before reading. When the read cache hit rate is low, this could amplify I/O. |
Maybe ContainerEntry (which is a similar file entry) can handle metadata memory consumption via lazy loading? This PR is quite large, so I think I'll add it in a subsequent PR. |
A container has exactly one block and SmallFileMeta stored offset. |
There was a problem hiding this comment.
Pull request overview
This PR introduces a Container inode type to pack multiple small files into a single shared block, reducing metadata overhead from O(N) inodes+blocks to O(1) per batch. This addresses part of issue #433 regarding small file performance.
Changes:
- Adds new
InodeContainertype and container lifecycle operations (create, add_block, complete) - Generalizes journal entries from file-specific to inode-generic to support both File and Container types
- Renames proto messages and RPC codes from
*Batch*to*Container*for clarity
Reviewed changes
Copilot reviewed 42 out of 45 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| curvine-common/proto/common.proto | Adds Container file type and container_name field to FileStatusProto (breaking change) |
| curvine-common/proto/master.proto | Renames batch messages to container messages and adds container-specific responses |
| curvine-common/proto/worker.proto | Renames batch write messages to container write messages |
| curvine-server/src/master/meta/inode/inode_container.rs | New InodeContainer type with SmallFileMeta tracking |
| curvine-server/src/master/meta/inode/inode_view.rs | Extends InodeView enum with Container variant |
| curvine-server/src/master/meta/inode/inode_dir.rs | Adds container_index for mapping files to containers |
| curvine-server/src/master/meta/fs_dir.rs | Implements create_container and complete_container operations |
| curvine-server/src/master/meta/store/inode_store.rs | Updates RocksDB persistence for containers |
| curvine-server/src/master/journal/* | Generalizes journal entries for File and Container inodes |
| curvine-server/src/master/master_handler.rs | Implements create_container and complete_container RPC handlers |
| curvine-server/src/worker/handler/batch_write_handler.rs | Implements container block writing on worker side |
| curvine-client/src/file/fs_client.rs | Adds create_container and complete_container client APIs |
| curvine-client/src/block/* | Implements ContainerBlockWriter for batch small file writing |
| curvine-tests/tests/fs_test.rs | Adds test verifying 2 distinct block IDs for batch writes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @lzjqsdd , I have just resolved all threads. Thanks! |
Description
Introduces Container — a new inode type that packs multiple small
files into a single shared block. N small files now require
1 inode + 1 block instead of N inodes + N blocks.
Resolve part of: Small files performance enhancement
Key Changes
InodeContainertype withFileType::ContainerandInodeView::Containervariant. Tracks per-file position viaHashMap<String, SmallFileMeta>.create_container→add_container_block→
write→complete_container, on both master and worker.validate_add_block,get_block_locs,acquire_new_block) fromInodeFiletoInodeViewto support bothFile and Container.
CreateFileEntry,CompleteFileEntry) to inode-generic (CreateInodeEntry,CompleteInodeEntry) with Container-aware replay.apply_new_block,apply_complete_inode_entry,get_file_locations, and treereconstruction all handle Container.
*Batch*to*Container*.of file count.
Performance Impact
Future Work
FsBatchWriterclient interface.