-
Notifications
You must be signed in to change notification settings - Fork 680
Append commit instead of individual transactions to commitlog #4140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This moves the following responsibilities to the datastore: - maintenance of the transaction offset - deciding how many transactions are in a commit
Allowing to restore `Committed` return
|
Nominating @gefjon and @Centril because they appeared in the reviewer suggestions. |
| /// Tests that, when a partial write occurs, we can read all flushed commits | ||
| /// up until the faulty one. | ||
| #[test] | ||
| fn reopen() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this was supposed to test originally, so I removed it.
| /// up until the faulty one. | ||
| #[test] | ||
| fn reopen() { | ||
| fn read_log_up_to_partial_write() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the test previously named reopen.
| let writer = &mut self.head; | ||
| let committed = writer.commit(transactions)?; | ||
| if writer.len() >= self.opts.max_segment_size { | ||
| self.flush().expect("failed to flush segment"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seemed a bit surprising to me at first -- but the BufWriter has no way of knowing how many bytes did make it. So if flush fails, the buffer is basically garbage.
Changes the commitlog (and durability) write API, such that the caller decides how many transactions are in a single commit, and has to supply the transaction offsets.
This simplifies commitlog-side buffering logic to essentially a
BufWriter(which, of course, we must not forget to flush). This will help throughput, but offers less opportunity to retry failed writes. This is probably a good thing, as disks can fail in erratic ways, and we should rather crash and re-verify the commitlog (suffix) than continue writing.To that end, this patch liberally raises panics when there is a chance that internal state could be "poisoned" by partial writes, which may be debatable.
Motivation
The main motivation is to avoid maintaining the transaction offset in two places in such a way that they could diverge. As ordering commits is the responsibility of the datastore, we make it authoritative on this matter -- the commitlog will still check that offsets are contiguous, and refuse to commit if that's not the case.
A secondary, related motivation is the following:
A "commit" is an atomic unit of storage, meaning that a torn (partial) write of a commit will render the entire commit corrupt. There hasn't been a compelling case where we would want this, and have always configured the server to write exactly one transaction per commit.
The code to handle buffering of transactions is, however, rather complex, as it tries hard to allow the caller to retry writes at commit boundaries. An unfortunate consequence of this is that we'd flush to the OS very often, leaving throughput performance on the table.
So, if there is a compelling case for batching multiple transactions in a commit, it should be the datastore's responsibility.
API and ABI breaking changes
Breaks internal APIs only.
Expected complexity level and risk
5 - Mostly for the risk
Testing
Existing tests.