Skip to content

perf(raft): improve Raft log processing speed and related optimizations#711

Merged
szbr9486 merged 1 commit intomainfrom
xuen1
Mar 16, 2026
Merged

perf(raft): improve Raft log processing speed and related optimizations#711
szbr9486 merged 1 commit intomainfrom
xuen1

Conversation

@bigbigxu
Copy link
Contributor

Summary

This PR improves Raft log processing throughput and applies related config, storage, and logging changes. The main change is the Raft run loop: use tokio::select! over a ticker and receiver.recv(), and when a message arrives, batch-drain up to raft_batch_size pending messages in one pass. Additional edits touch config, RocksDB/Raft storage, journal apply order, and log/assert cleanup.

Files changed:

Path Description
curvine-common/src/conf/journal_conf.rs Config: remove raft_poll_interval_ms, add raft_batch_size; enable batch_append; writer_flush_batch_ms 100→10
curvine-common/src/raft/raft_node.rs Run loop (select! + batch drain), on_ready order, snapshot trigger by op_id
curvine-common/src/raft/storage/rocks_storage_core.rs Compact check and delete_entry (range delete)
curvine-common/src/rocksdb/db_engine.rs flush order; write_batch with write_opt
curvine-common/src/rocksdb/write_batch.rs Add delete_range_cf
curvine-server/src/master/journal/journal_loader.rs Apply op_id/rpc_id at start of batch step
curvine-server/src/master/journal/journal_writer.rs Log tweaks (drop send_entry debug, snapshot info + inode_id)
curvine-server/src/master/meta/fs_dir.rs Remove redundant assert!(!inode.is_file_entry())
curvine-server/src/master/meta/inode/inode_path.rs Remove redundant assert!(!v.is_file_entry()) in clone_last_file

1. Raft run loop and config (raft_node.rs, journal_conf.rs)

Run loop: Replaced the previous pattern (poll_interval + timeout around recv(), then tick) with tokio::select! { biased; ticker.tick() => raw.tick(); recv() => handle one + batch-drain }. When the channel has backlog, each wake-up can process up to raft_batch_size (default 8) messages via try_recv(), reducing wake-ups and improving log throughput.

Config: Removed raft_poll_interval_ms; added raft_batch_size (default 8). Enabled batch_append: true in Raft config. Reduced default writer_flush_batch_ms from 100 to 10.

Snapshot: Snapshot trigger now uses op_id from FSM state (last_snapshot_op_id) instead of applied index (last_snapshot_applied).


2. on_ready order (raft_node.rs)

Order: Persist new log entries and then send persisted_messages before applying committed entries. So followers can be notified earlier; FSM apply of already-committed entries no longer blocks the persistence pipeline.


3. Raft storage compact and delete (rocks_storage_core.rs, write_batch.rs, db_engine.rs)

  • Compact: If compact_index > last_index(), return an error instead of panicking; condition changed from > last_index() + 1 to > last_index() so we do not compact past the last log entry.
  • delete_entry: Use RocksDB range delete (delete_range_cf) instead of a per-key delete loop. Added WriteBatch::delete_range_cf in write_batch.rs.
  • DBEngine: flush(): when WAL is enabled call flush_wal(sync), otherwise flush_mem(sync). write_batch() now uses write_opt(batch, &self.write_opt).

4. Journal apply and logging (journal_loader.rs, journal_writer.rs)

  • journal_loader: Set applied.op_id and applied.rpc_id at the start of each batch loop iteration (before the match on the entry), so applied position is updated per entry.
  • journal_writer: Removed debug!("send_entry ..."). Leader snapshot info log now includes inode_id and the existing cost/entries/dir fields.

5. Assert and inode cleanup (fs_dir.rs, inode_path.rs)

  • fs_dir: Removed two assert!(!inode.is_file_entry()) (in the status and path resolution paths).
  • inode_path: Removed assert!(!v.is_file_entry()) in clone_last_file().

Copilot AI review requested due to automatic review settings March 14, 2026 23:51
@bigbigxu bigbigxu requested a review from szbr9486 March 14, 2026 23:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on increasing Raft log processing throughput by reducing wake-ups in the Raft run loop and batching inbound message handling, while also applying related optimizations across configuration, RocksDB/Raft storage, journal apply ordering, and logging/assert cleanup.

Changes:

  • Reworked Raft node run loop to use tokio::select! with a tick interval plus batch-drain of queued messages (raft_batch_size).
  • Optimized Raft/RocksDB storage behaviors (range delete for log compaction paths, write options usage, updated flush behavior).
  • Adjusted journal apply bookkeeping ordering and reduced logging / redundant asserts in inode/fs paths.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
curvine-common/src/conf/journal_conf.rs Removes poll interval config, adds batching config, enables batch_append, adjusts flush batch timing defaults.
curvine-common/src/raft/raft_node.rs Implements select-based ticking + batched recv; reorders on_ready processing; snapshots triggered by FSM op_id.
curvine-common/src/raft/storage/rocks_storage_core.rs Makes compaction bounds safer (error instead of panic) and switches entry deletion to RocksDB range deletes.
curvine-common/src/rocksdb/db_engine.rs Changes flush behavior based on WAL and uses write_opt for batched writes.
curvine-common/src/rocksdb/write_batch.rs Adds delete_range_cf helper used by Raft log storage.
curvine-server/src/master/journal/journal_loader.rs Updates applied op_id/rpc_id earlier per batch iteration.
curvine-server/src/master/journal/journal_writer.rs Removes per-entry debug log and enriches snapshot info log fields.
curvine-server/src/master/meta/fs_dir.rs Removes redundant asserts in inode lookup/handling paths.
curvine-server/src/master/meta/inode/inode_path.rs Removes redundant assert in clone_last_file().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@bigbigxu bigbigxu force-pushed the xuen1 branch 2 times, most recently from d1ab83c to 4606c58 Compare March 15, 2026 01:56
- Raft run loop: switch to tokio::select! (ticker + recv), batch-drain up to
  raft_batch_size messages per wake-up via try_recv() to improve log throughput
- Config: replace raft_poll_interval_ms with raft_batch_size; enable batch_append;
  reduce writer_flush_batch_ms default 100→10
- on_ready: persist entries then send_messages before apply_committed_entries
  so followers are notified earlier; snapshot trigger uses op_id instead of
  applied index
- RocksDB:  delete_entry uses delete_range_cf; flush uses write_opt; add WriteBatch::delete_range_cf
- Journal: apply op_id/rpc_id at start of batch step; reduce log noise (drop
  send_entry debug, add inode_id to snapshot info); remove redundant asserts
  in fs_dir and inode_path
@szbr9486 szbr9486 merged commit 23bf26c into main Mar 16, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants