Append commit instead of individual transactions to commitlog #4140

kim · 2026-01-27T14:59:07Z

Changes the commitlog (and durability) write API, such that the caller decides how many transactions are in a single commit, and has to supply the transaction offsets.

This simplifies commitlog-side buffering logic to essentially a BufWriter (which, of course, we must not forget to flush). This will help throughput, but offers less opportunity to retry failed writes. This is probably a good thing, as disks can fail in erratic ways, and we should rather crash and re-verify the commitlog (suffix) than continue writing.

To that end, this patch liberally raises panics when there is a chance that internal state could be "poisoned" by partial writes, which may be debatable.

Motivation

The main motivation is to avoid maintaining the transaction offset in two places in such a way that they could diverge. As ordering commits is the responsibility of the datastore, we make it authoritative on this matter -- the commitlog will still check that offsets are contiguous, and refuse to commit if that's not the case.

A secondary, related motivation is the following:

A "commit" is an atomic unit of storage, meaning that a torn (partial) write of a commit will render the entire commit corrupt. There hasn't been a compelling case where we would want this, and have always configured the server to write exactly one transaction per commit.
The code to handle buffering of transactions is, however, rather complex, as it tries hard to allow the caller to retry writes at commit boundaries. An unfortunate consequence of this is that we'd flush to the OS very often, leaving throughput performance on the table.

So, if there is a compelling case for batching multiple transactions in a commit, it should be the datastore's responsibility.

API and ABI breaking changes

Breaks internal APIs only.

Expected complexity level and risk

5 - Mostly for the risk

Testing

Existing tests.

This moves the following responsibilities to the datastore: - maintenance of the transaction offset - deciding how many transactions are in a commit

Allowing to restore `Committed` return

kim · 2026-01-28T16:25:02Z

Nominating @gefjon and @Centril because they appeared in the reviewer suggestions.
@Centril specifically for hints on how the get rid of Box<[_]> allocations.
@Shubham8287 because of previous work on the commitlog.
@joshua-spacetime because of suggesting the change initially.

kim · 2026-01-28T16:28:22Z

crates/commitlog/src/tests/partial.rs

+/// Tests that, when a partial write occurs, we can read all flushed commits
+/// up until the faulty one.
 #[test]
-fn reopen() {


I'm not sure what this was supposed to test originally, so I removed it.

kim · 2026-01-28T16:29:04Z

crates/commitlog/src/tests/partial.rs

+/// up until the faulty one.
 #[test]
-fn reopen() {
+fn read_log_up_to_partial_write() {


This is basically the test previously named reopen.

kim · 2026-01-28T16:57:11Z

crates/commitlog/src/commitlog.rs

+        let writer = &mut self.head;
+        let committed = writer.commit(transactions)?;
+        if writer.len() >= self.opts.max_segment_size {
+            self.flush().expect("failed to flush segment");


This seemed a bit surprising to me at first -- but the BufWriter has no way of knowing how many bytes did make it. So if flush fails, the buffer is basically garbage.

kim added 11 commits January 27, 2026 15:56

Append commit instead of individual transactions to commitlog

06f9c2e

This moves the following responsibilities to the datastore: - maintenance of the transaction offset - deciding how many transactions are in a commit

Restore some commentary

a0de7d9

Clear commit before returning error

e73abcc

More commentary

d5b29cc

Panic if >u16::MAX transactions

de918d5

Allowing to restore `Committed` return

Docs

f722edb

set_epoch doesn't need to flush

550aa4b

Return Committed from all commit methods

876f07b

Docs

0cf8def

Use assert

a50b422

Restore the commit corruption after ENOSPC test

fd27477

kim linked an issue Jan 28, 2026 that may be closed by this pull request

Make datastore responsible for maintaining the transaction offset #4125

Open

kim added 2 commits January 28, 2026 17:17

Add TODO

f37fad3

Fix fallocate tests

4a16bf5

kim requested review from Centril, Shubham8287, gefjon and joshua-spacetime January 28, 2026 16:22

kim marked this pull request as ready for review January 28, 2026 16:25

kim commented Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Append commit instead of individual transactions to commitlog #4140

Append commit instead of individual transactions to commitlog #4140

kim commented Jan 27, 2026 •

edited

Loading

Uh oh!

kim commented Jan 28, 2026

Uh oh!

kim Jan 28, 2026

Uh oh!

kim Jan 28, 2026

Uh oh!

kim Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Append commit instead of individual transactions to commitlog #4140

Are you sure you want to change the base?

Append commit instead of individual transactions to commitlog #4140

Conversation

kim commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

API and ABI breaking changes

Expected complexity level and risk

Testing

Uh oh!

kim commented Jan 28, 2026

Uh oh!

kim Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

kim Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

kim Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kim commented Jan 27, 2026 •

edited

Loading