Improve msgspec benchmarks by ofek · Pull Request #24 · leo-gan/GLD.SerializerBenchmark

ofek · 2026-04-29T23:35:18Z

Summary

This updates the Python msgspec benchmark to use the API patterns recommended for high-performance msgspec applications, and adds a separate native MessagePack benchmark using msgspec.msgpack.

The existing msgspec benchmark encoded the shared stdlib dataclass fixtures directly. While msgspec supports dataclasses, its documented and most efficient modeling API is msgspec.Struct. This PR changes the benchmark to convert the canonical dataclass fixtures to generated Struct models before timed repetitions, then measures serialization/deserialization of those Struct instances.

Changes

Generate msgspec Struct types dynamically from the canonical dataclasses in benchmark.data.models.
Use array_like=True for the generated Structs.
Pre-build and reuse msgspec.json.Encoder / typed msgspec.json.Decoder instances per benchmark data type.
Use encode_into with a reusable bytearray for stream serialization.
Add a new msgspec-msgpack serializer using msgspec.msgpack.Encoder / typed msgspec.msgpack.Decoder.
Add serializer lifecycle hooks so serializers can prepare schemas/codecs and serializer-native fixture objects outside the timed loop.
Keep correctness comparison against the original canonical dataclass fixture, rather than the serializer-native prepared object.
Update Python serializer docs and summary counts to include msgspec-msgpack.

Why

The goal is to benchmark idiomatic, high-performance msgspec usage rather than compatibility-path dataclass handling.

Moving fixture conversion outside the timed loop models an application that already uses msgspec Struct types in its data model. The timed region still measures the important runtime costs: encoding, decoding, payload size, allocation, and semantic roundtrip correctness.

Generating Structs from the existing dataclasses avoids maintaining parallel hand-written msgspec models, which would be easy to let drift from the canonical benchmark fixtures.

About `array_like=True`

array_like=True encodes Struct instances as positional arrays instead of maps keyed by field name. This reduces payload size and avoids repeatedly writing field names for every object.

This is not intended to artificially boost msgspec at the expense of idiomatic usage. It is a documented msgspec option for schema-oriented workloads where both sides share the model definition. That matches other schema-based serializers already in the suite:

protobuf encodes numbered field tags rather than full field names.
Avro uses a shared schema for schemaless binary encoding.

For a benchmark suite that includes schema-aware formats, array-like Structs are a fair representation of msgspec's compact schema-oriented mode.

Limitations

ObjectGraph remains unsupported for msgspec and msgspec-msgpack. The fixture contains circular references/object identity cycles, which JSON and MessagePack formats do not represent.

Benchmark Results

I compared this branch against the previous implementation using:

python -m benchmark.runner 100 msgspec

The first repetition was excluded as warmup, matching the benchmark runner's reporting logic.

Overall:

Current msgspec JSON: 1.11x geomean faster for serialize+deserialize, average payload size 0.74x of the previous implementation.
New msgspec-msgpack: 1.24x geomean faster than the previous JSON baseline, average payload size 0.53x of the previous implementation.

Data / mode	Old msgspec ops/s	Current JSON ops/s	JSON speedup	JSON size	msgspec-msgpack ops/s	msgpack speedup	msgpack size
EDI_835 bytes	23,102	29,109	1.26x	591 vs 1,730	34,055	1.47x	644
EDI_835 stream	20,893	25,701	1.23x	591 vs 1,730	31,008	1.48x	644
Integer bytes	385,899	410,281	1.06x	10 vs 10	399,707	1.04x	5
Integer stream	220,591	206,503	0.94x	10 vs 10	203,534	0.92x	5
Person bytes	37,319	37,909	1.02x	466 vs 917	42,494	1.14x	363
Person stream	32,325	32,735	1.01x	466 vs 917	34,379	1.06x	363
SimpleObject bytes	156,629	212,551	1.36x	67 vs 102	234,831	1.50x	56
SimpleObject stream	129,110	147,000	1.14x	67 vs 102	124,659	0.97x	56
StringArray bytes	21,090	21,205	1.01x	1,893 vs 1,901	22,454	1.06x	1,694
StringArray stream	19,459	19,041	0.98x	1,893 vs 1,901	19,929	1.02x	1,694
Telemetry bytes	33,807	40,416	1.20x	2,070 vs 2,185	59,339	1.76x	1,024
Telemetry stream	33,698	40,845	1.21x	2,070 vs 2,185	61,395	1.82x	1,024

Validation

Ran:

uv run --no-sync python -m compileall src
uv run --no-sync python -m benchmark.runner 5 msgspec
uv run --no-sync python -m benchmark.runner 1
uv run --no-sync python -m benchmark.runner 100 msgspec
git diff --check

Disclaimer

I coaxed Codex 5.5 xhigh into achieving what I wanted throughout the day when I had some downtime.

gemini-code-assist

Code Review

This pull request introduces the msgspec-msgpack serializer and refactors the existing msgspec implementation to utilize msgspec.Struct models. By generating these models from canonical dataclasses and performing data conversion outside the timed benchmark loop, the suite now better reflects high-performance usage patterns. Additionally, the benchmark runner was updated with prepare and prepare_data hooks to support this pre-timing setup. The review feedback focuses on improving the robustness of stream-based serialization by correctly handling buffer offsets and lengths when using encode_into and getbuffer().

gemini-code-assist · 2026-04-29T23:40:38Z

+        self._encoder.encode_into(obj, self._buffer)
+        stream.write(self._buffer)


While msgspec.json.Encoder.encode_into resizes a bytearray to fit the encoded data when offset=0, relying on this behavior for serialization to a stream is correct but subtle. If msgspec were to change its implementation to only grow the buffer (similar to how many other libraries handle bytearray buffers), this would result in trailing garbage being written to the stream. A more robust approach would be to use the return value of encode_into (which returns the number of bytes written) to slice the buffer or use a memoryview.

Suggested change

self._encoder.encode_into(obj, self._buffer)

stream.write(self._buffer)

n = self._encoder.encode_into(obj, self._buffer)

stream.write(memoryview(self._buffer)[:n])

It was locally verified that both msgspec.json.Encoder.encode_into and msgspec.msgpack.Encoder.encode_into return None; the docstring says the bytearray is truncated to the end of the serialized message. So using n = encode_into(...) would break.

leo-gan

LGTM
I'll try to use the new base Serializer prepare() and prepare_data() methods on other serializers. Thanks for a good idea!!

Improve msgspec benchmarks

7cc6f66

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

address review

15896ca

leo-gan approved these changes Apr 30, 2026

View reviewed changes

leo-gan merged commit 2a1a7f2 into leo-gan:master Apr 30, 2026
4 checks passed

ofek deleted the msgspec-improvements branch April 30, 2026 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve msgspec benchmarks#24

Improve msgspec benchmarks#24
leo-gan merged 2 commits intoleo-gan:masterfrom
ofek:msgspec-improvements

ofek commented Apr 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

ofek Apr 29, 2026

Uh oh!

Uh oh!

leo-gan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self._encoder.encode_into(obj, self._buffer)
		stream.write(self._buffer)

Conversation

ofek commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why

About array_like=True

Limitations

Benchmark Results

Validation

Disclaimer

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

ofek Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leo-gan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ofek commented Apr 29, 2026 •

edited

Loading

About `array_like=True`