Conversation
…e message Adds a bench-prost-bytes crate that builds prost types with prost-build's .bytes(['.']) substitution (bytes::Bytes for every bytes field) and decodes from bytes::Bytes input so prost's zero-copy copy_to_bytes slicing path is actually exercised. Only decode/merge are measured; the substitution does not affect the encode path. Introduces a MediaFrame benchmark message (single large bytes body + repeated bytes chunks + map<string, bytes> attachments) so the new variant has bytes fields to work with. The existing four messages are string-heavy and leave the feature inert — MediaFrame exercises it. Also: - random_bytes in gen-datasets uses rng.fill_bytes (was elementwise). - charts/generate.py gains a 'prost (bytes)' series and a MediaFrame row; README performance tables and explanatory note are updated. - Taskfile.yml adds bench-prost-bytes + bench-cross-prost-bytes and integrates the new variant into bench-cross. Throughput on MediaFrame decode (MiB/s, native, 50 payloads, ~11 KB each): buffa (view) 73,004 / prost (bytes) 23,516 / buffa 16,816 / prost 9,648. perf-stat evidence in the PR description.
asacamano
reviewed
Apr 23, 2026
| @@ -6,314 +6,314 @@ czigtasqle xjjjtn-84 * | |||
| yfms-sojymtdroogcxeindwlqpekwbdrgxjiasirlyqruqseit����ゞ� | |||
Collaborator
Author
There was a problem hiding this comment.
oh whoops yeah let me check if I can deterministically produce these at the start of runs instead of committing them
Collaborator
Author
There was a problem hiding this comment.
actually will keep it this way for now, and replace with a pre-generation step via taskfile later
asacamano
previously approved these changes
Apr 23, 2026
MediaFrame's ~70 GiB/s binary-decode throughput compressed the other four messages' bars into a few pixels on the shared-scale composite charts. Replace each composite SVG with one chart per (chart-type × message) so each chart picks its own nice-max, making the smaller throughput differences readable again. - charts/generate.py: loop over (chart, message) pairs; drop series with no value for the current message so empty bars don't render. - Delete the four composite SVGs; add 20 per-message SVGs. - README: list the 5 per-message charts vertically under each section.
Both bench_test.go and google/benches/protobuf.rs had a fixed list of four messages. Add MediaFrame so the new bytes-heavy dataset is exercised across all four implementations, not just buffa + prost variants. Updated numbers (MiB/s binary decode of MediaFrame): buffa 16,816, buffa (view) 73,004, prost 9,648, prost (bytes) 23,516, protobuf-v4 17,633, Go 1,241. protobuf-v4's arena allocator lands near buffa owned on decode but trails on encode (~10 GiB/s vs buffa's 46). Ancillary drift from a fresh cross-impl run refreshes the other four messages too.
…ts to GiB/s - Axis labels use plain integers with commas up to 9,999 — '1.2k' for 1,200 was hard to read at a glance when '1,200' fit in the same space. - When a chart's max value exceeds 10 GiB/s (10,240 MiB/s), rescale the whole chart to GiB/s so the axis doesn't need 'k' at all. MediaFrame binary decode / encode and the buffa (view) LogRecord benches trip this — '73.0' GiB/s is cleaner than '73k' or '75,000' MiB/s. - Bar-inline values follow the same unit as the axis: integer MiB/s with thousands-separator commas, or two-decimal GiB/s.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a benchmark variant that enables prost's
bytes::Bytessubstitution (prost-build::Config::bytes(["."])) and a bytes-heavyMediaFramemessage so the feature has something to work with. Motivated by issue #56 ("Clarify zero-copy semantics") — the commenter correctly pointed out that prost has a similar-in-spirit feature and asked for a fair comparison; this PR puts one in the README.What's new
benchmarks/prost-bytes/— mirrorsbenchmarks/prost/but builds with.bytes(["."])and decodes frombytes::Bytesinput (soBytes::copy_to_bytesis the ref-count slice, not a copy). Onlydecode/mergeare measured — the substitution does not affect the encode path, so those benches would be redundant with the existingprostcrate.bench.MediaFrame— primarybytes body(1–10 KB) +repeated bytes chunks(2–6 × 0.2–2 KB) +map<string, bytes> attachments(0–4 × 50–500 B). Plumbed throughgen-datasets,bench-buffa(owned + view + json),bench-prost,bench-prost-bytes.task bench-prost-bytesandtask bench-cross-prost-bytes;task bench-crossnow runs all three variants and producesbenchmarks/results/prost-bytes.json.benchmarks/charts/generate.pygains aprost (bytes)series and aMediaFramerow; the READMEPerformancesection is updated.gen-datasets:random_bytesnow usesrng.fill_bytesinstead of an elementwiserng.random()loop.Headline numbers (MiB/s,
task bench-cross, Docker)Binary decode
On the four non-bytes messages,
prost (bytes)tracks defaultprostwithin noise — the substitution only affects protobytesfields, and these messages have none. OnMediaFrameit's ~2.4× default prost, confirming the feature lands when it has bytes fields to work with.buffa (view)stays well ahead of both prost variants because it borrowsstrings,messages, andmap/repeatedscaffolding from the input buffer too, not justbytespayloads.perf statevidence (MediaFrame decode, native)Native run with
perf stat -e task-clock,cycles,instructions,L1-dcache-load-misses,branch-missesaround--measurement-time 6 --sample-size 10; per-decoded-message rates (non-hermetic, ±few-% run-to-run):prost (bytes)closes the cache-miss gap (113 → 5 L1 misses/msg — allocator traffic from cloning bytes payloads is gone). It still runs ~2.8× the instructions of views becauseString::from_utf8/HashMap::insert/Vec::pushwork is still happening for the non-bytes fields (frame_id,content_type, map keys, repeated scaffolding). Views skip that scaffolding — strings stay&'a str,attachmentsis aMapView,chunksis aRepeatedView.Files touched
benchmarks/prost-bytes/+benchmarks/Dockerfile.bench-prost-bytesbenchmarks/proto/bench_messages.proto— MediaFramebenchmarks/gen-datasets/src/main.rs— gen_media_frame, random_bytes → fill_bytesbenchmarks/buffa/benches/protobuf.rs— MediaFrame owned + view + jsonbenchmarks/prost/benches/protobuf.rs— MediaFrame decode + jsonbenchmarks/charts/generate.py—prost (bytes)series, MediaFrame rowCargo.toml,Taskfile.yml,README.md, chart SVGs, tables.mdNotes for reviewers
prost (bytes)are deliberately absent (same codepath as defaultprost— see the module docstring onbenchmarks/prost-bytes/src/lib.rs).protobuf-v4and Go don't have aMediaFramerow — those suites have a fixed set of messages we didn't extend for this experiment.bench-crossignores missing rows cleanly.gen-datasets' module boilerplate gained#[allow(clippy::enum_variant_names, clippy::upper_case_acronyms, ...)]attributes: on currentmainthe generatedlog_record::Severityenum variants (UNSPECIFIED/DEBUG/...) and the oneofValue*names trip those lints. The attributes match what's applied to generated code elsewhere in the workspace.