use serde instead of hand rolled serialization#18
use serde instead of hand rolled serialization#18piscisaureus wants to merge 1 commit intoevmar:mainfrom
Conversation
|
This is interesting! What does the ultimate serialization end up as? (I read it in a hex editor to debug it sometimes.) What is the performance impact? (I don't have a good way of measuring it, hmmmm.) |
cbor. The format was fairly arbitrarily chosen, the only constraint being that it needs to store its own serialized length so we can serialize multiple entries back-to-back . You can pick any other listed on https://serde.rs/ if you wish.
If that's a concern, you could pick yaml or json.
I haven't measured it but I'm certain it's negligible (esp. when using a binary format like cbor). Disassembled it for fun: https://gist.github.com/piscisaureus/d09b68b083d680295fa5dfa00ab95dbd |
|
I'm mildly positive on this. I had considered serde initially but I was reluctant to take on dependencies I don't well understand, and in particular here I don't understand what the actual serialization ends up being, but I can read about cbor I guess. I think JSON/YAML aren't acceptable for perf reasons. Reading this is in the startup critical path and in my recollection the comparable file in Ninja ends up being megabytes. I will look into a way to evaluating the perf of this and get back to you. |
|
I tried this: This measures the no-op time to check that clang-format is up to date, which involves reading this db. With the in-tree impl I got With this patch (and main merged, to make the comparison fair) I got Some other notes: the binary size grew by like 200kb which is like 15% which is no problem. Based on my understanding of the serde docs, it could be used to implement a serialization more similar to the current serialization (which was chosen for performance reasons). In other words I think we could still use serde with some more custom logic instead of cbor and maybe get most of the benefit? Not sure, I'm not too familiar with serde. |
That's possible, albeit of somewhat debatable benefit. We could also try bincode.
I suspect that it is not the serialization itself that causes the slowdown, but the fact that it "flushes to disk" on every write. That's expected because: serde_cbor::ser::to_writer(&mut self.w, &entry)?; // <-- self.w is a `File`.I think if we put a buffered writer there it might get a little better. |
|
I've updated the PR to use a BufReader/BufWriter, hopefully that will reduce time spent in the kernel. |
|
OK I'm convinced now that (lack of) buffering was indeed the issue. Hand rolled:Serde without buffering:Serde with BufReader/BufWriter: |
804ac03 to
d3756e0
Compare
|
I think to resummarize this one, it adds a dependency, grows the binary by a bit, makes the code a bit slower, and saves ~140 lines of serialization code. I'm not sure it's really worth it. I wonder if using bincode directly would be better? Its encoding looks pretty similar to what I wrote. |
03de3da to
afe500f
Compare
|
I tried porting this on top of #58: https://github.com/GrigorenkoPV/n2/tree/serde Comparison between 05d11b9 (baseline) & f827571 (serde): Surprisingly, it is ~4% faster. The binary did, however, get bigger. Stripped versions, size in bytes: Another thing worth mentioning is that this allows to get rid of some dubious unsafe code like unsafe { MaybeUninit::uninit().assume_init() },from here. Should I create a PR? I feel like I should mention @piscisaureus's authorship in my comment, because it is mostly his code, but I don't actually know how to do it. Probably there's some git command for that. |
|
Closing in favor of #59 |
No description provided.