Skip to content

feat: introduce container data structure for batch small file writing#659

Open
thuongle2210 wants to merge 96 commits intoCurvineIO:mainfrom
thuongle2210:feat/introduce-container-for-batch-writing
Open

feat: introduce container data structure for batch small file writing#659
thuongle2210 wants to merge 96 commits intoCurvineIO:mainfrom
thuongle2210:feat/introduce-container-for-batch-writing

Conversation

@thuongle2210
Copy link
Contributor

@thuongle2210 thuongle2210 commented Feb 12, 2026

Description

Introduces Container — a new inode type that packs multiple small
files into a single shared block. N small files now require
1 inode + 1 block instead of N inodes + N blocks.

Resolve part of: Small files performance enhancement

Key Changes

  1. New InodeContainer type with FileType::Container and
    InodeView::Container variant. Tracks per-file position via
    HashMap<String, SmallFileMeta>.
  2. New end-to-end write flow: create_containeradd_container_block
    writecomplete_container, on both master and worker.
  3. Extend block-related methods (validate_add_block, get_block_locs,
    acquire_new_block) from InodeFile to InodeView to support both
    File and Container.
  4. Generalize journal entries from File-specific (CreateFileEntry,
    CompleteFileEntry) to inode-generic (CreateInodeEntry,
    CompleteInodeEntry) with Container-aware replay.
  5. Update RocksDB persistence: apply_new_block,
    apply_complete_inode_entry, get_file_locations, and tree
    reconstruction all handle Container.
  6. Rename proto messages and RPC codes from *Batch* to *Container*.
  7. Unit tests demonstrate batch writing consumes 1 block ID regardless
    of file count.

Performance Impact

  • Metadata reduction: O(1) inodes and blocks per batch instead of O(N).
  • Foundation for batch writing;

Future Work

  1. Complete Container lifecycle (such as delete operation).
  2. FsBatchWriter client interface.
  3. Add advanced features: manual small file compaction or background compaction job integrated with streaming.
  4. Move Container to middle/top folder levels (instead of leaf) if user demand is high.

thuong and others added 30 commits December 29, 2025 08:15
@thuongle2210
Copy link
Contributor Author

Dear @szbr9486 and @bigbigxu, could you please review this MR? Container packs large numbers of small files and dramatically reduces the number of inodes stored in metadata.

@thuongle2210 thuongle2210 changed the title feat: add container data structure for batch small file writing feat: introduce container data structure for batch small file writing Feb 12, 2026
@szbr9486
Copy link
Contributor

Dear @szbr9486 and @bigbigxu, could you please review this MR? Container packs large numbers of small files and dramatically reduces the number of inodes stored in metadata.尊敬的[姓名],能否请您审核一下这个合并请求?容器会将大量小文件打包在一起,并大幅减少元数据中存储的 inode 数量。

Thank you for your excellent work. We have a holiday for Chinese new year in past few days, we will review this pr from today and wish you have a good day

@szbr9486 szbr9486 requested a review from bigbigxu February 24, 2026 01:27
@szbr9486
Copy link
Contributor

Can this PR reduce the metadata memory usage in scenarios involving a large number of small files?

@lzjqsdd
Copy link
Member

lzjqsdd commented Feb 27, 2026

This PR can reduce the number of RPC calls and block allocations during writes. However, after the underlying data structure changes, it may not be very friendly for scenarios involving random reads of small files. If randomly read files are scattered across different blocks, we need to locate the offset first before reading. When the read cache hit rate is low, this could amplify I/O.

@thuongle2210
Copy link
Contributor Author

thuongle2210 commented Feb 27, 2026

Can this PR reduce the metadata memory usage in scenarios involving a large number of small files?

Maybe ContainerEntry (which is a similar file entry) can handle metadata memory consumption via lazy loading? This PR is quite large, so I think I'll add it in a subsequent PR.

@thuongle2210
Copy link
Contributor Author

thuongle2210 commented Feb 27, 2026

This PR can reduce the number of RPC calls and block allocations during writes. However, after the underlying data structure changes, it may not be very friendly for scenarios involving random reads of small files. If randomly read files are scattered across different blocks, we need to locate the offset first before reading. When the read cache hit rate is low, this could amplify I/O.

A container has exactly one block and SmallFileMeta stored offset.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Container inode type to pack multiple small files into a single shared block, reducing metadata overhead from O(N) inodes+blocks to O(1) per batch. This addresses part of issue #433 regarding small file performance.

Changes:

  • Adds new InodeContainer type and container lifecycle operations (create, add_block, complete)
  • Generalizes journal entries from file-specific to inode-generic to support both File and Container types
  • Renames proto messages and RPC codes from *Batch* to *Container* for clarity

Reviewed changes

Copilot reviewed 42 out of 45 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
curvine-common/proto/common.proto Adds Container file type and container_name field to FileStatusProto (breaking change)
curvine-common/proto/master.proto Renames batch messages to container messages and adds container-specific responses
curvine-common/proto/worker.proto Renames batch write messages to container write messages
curvine-server/src/master/meta/inode/inode_container.rs New InodeContainer type with SmallFileMeta tracking
curvine-server/src/master/meta/inode/inode_view.rs Extends InodeView enum with Container variant
curvine-server/src/master/meta/inode/inode_dir.rs Adds container_index for mapping files to containers
curvine-server/src/master/meta/fs_dir.rs Implements create_container and complete_container operations
curvine-server/src/master/meta/store/inode_store.rs Updates RocksDB persistence for containers
curvine-server/src/master/journal/* Generalizes journal entries for File and Container inodes
curvine-server/src/master/master_handler.rs Implements create_container and complete_container RPC handlers
curvine-server/src/worker/handler/batch_write_handler.rs Implements container block writing on worker side
curvine-client/src/file/fs_client.rs Adds create_container and complete_container client APIs
curvine-client/src/block/* Implements ContainerBlockWriter for batch small file writing
curvine-tests/tests/fs_test.rs Adds test verifying 2 distinct block IDs for batch writes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@thuongle2210
Copy link
Contributor Author

Hi @lzjqsdd , I have just resolved all threads. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants