A Dockerized benchmarking suite evaluating 10 Python serializers across 7 realistic data structures, designed to match the methodology and output format of the companion .NET (C#) benchmark.
| Group | Serializers | Notes |
|---|---|---|
| JSON | orjson, msgspec, rapidjson |
Text-based, schema-optional. |
| Binary | msgpack, msgspec-msgpack, cbor2 |
Compact binary, schema-optional. |
| Schema | protobuf, avro |
Requires .proto/.avsc schemas and code generation. |
| Python-native | pickle, cloudpickle |
Built-in serialization, handles arbitrary objects. |
All 7 types mirror the C# benchmark to enable cross-language comparisons:
| Test Class | Purpose & Stress Points |
|---|---|
| Person | Nested objects, enums, strings — the "gold standard" general-use POCO. |
| Integer | Primitive throughput ceiling. |
| Telemetry | Numeric arrays and high-frequency data; tests binary format efficiency. |
| SimpleObject | Minimal overhead baseline. |
| StringArray | Array of 100 strings; tests memory allocation and string encoding. |
| EDI_835 | Deeply nested health-care claim document; tests recursion depth. |
| ObjectGraph | Circular references; only pickle and cloudpickle are expected to pass. |
- bytes mode (analogous to C#
string): Serializer produces/consumesbytesdirectly. - stream mode (analogous to C#
Stream): Serializer writes to/reads fromio.BytesIO.
Every serializer is tested in both modes, matching C# coverage. For libraries without a native stream API, the benchmark adapts by writing the bytes output to BytesIO.
| Metric | How It Is Measured | Rationale |
|---|---|---|
| Throughput (ops/sec) | 1_000_000_000 / nanoseconds for serialize, deserialize, and combined. |
Matches C# tick-based ops/sec. |
| Latency | Total elapsed nanoseconds per repetition (warm-up excluded when repetitions > 1). |
Equivalent to C# model; per-call p50/p99 omitted to avoid instrumentation overhead. |
| Memory Allocation | tracemalloc peak allocated bytes during each repetition. |
Standard Python heap profiler; documents that C-extension allocations (orjson, msgpack) may be under-counted. |
| Output Size | len(bytes) in bytes mode; BytesIO.tell() in stream mode. |
Directly comparable to C# Size. |
| Type Fidelity | Semantic roundtrip equality score (1.0 = perfect, 0.0 = failure). | Relaxes strict type identity: datetime vs ISO string, tuple vs list, etc., are considered equal if they represent the same logical value. |
- Format Parity: The C# suite writes a specific CSV schema (
StringOrStream,TestDataName,Repetitions,RepetitionIndex,SerializerName,TimeSer,TimeDeser,Size,TimeSerAndDeser,OpPerSecSer,OpPerSecDeser,OpPerSecSerAndDeser). A custom runner guarantees identical column layout plus the two Python-specific extensions (MemoryPeakBytes,FidelityScore). - Warm-up Logic: C# excludes repetition index
0whenrepetitions > 1. Replicating this exactly in a generic framework is fragile. - Multi-metric Integration: pytest-benchmark is built around latency only. Adding
tracemallocpeaks and semantic comparers inside a pytest fixture adds measurement noise and fixture overhead. - Stream vs Bytes Dual Mode: pytest-benchmark's
benchmark()fixture expects a callable; orchestrating two distinct APIs (bytes vs stream) with shared error tracking is cleaner in a standalone loop.
Python has no direct equivalent to .NET GC allocation counters. tracemalloc is the most reliable built-in tool for tracking Python heap allocations during a code block. It is well documented that C-extension allocations (e.g., inside orjson's or msgpack's C code) are not captured. This limitation is explicitly noted in the results to avoid misleading comparisons.
Python serializers vary wildly in type fidelity:
- JSON stores
datetimeas ISO strings. msgpackconvertstuple→list.- Schema serializers return generated classes, not the original dataclass.
A semantic comparer treats two values as equal if they represent the same logical data, even if Python types differ. This prevents false failures for well-behaved serializers while still catching genuine data loss.
Ensure Docker is installed.
cd python
./scripts/run-benchmarks.sh smoke| Mode | Command | Description |
|---|---|---|
| Smoke | ./scripts/run-benchmarks.sh smoke |
1 repetition of pickle on Person. Verifies the image and environment. |
| Verify All | ./scripts/run-benchmarks.sh all-single |
1 repetition of all serializers on all data. Checks for compatibility issues. |
| Full Run | ./scripts/run-benchmarks.sh full |
100 repetitions of all serializers. |
| Custom | ./scripts/run-benchmarks.sh custom 50 "json" "Person" |
Custom repetitions and filters. |
docker logs -f $(docker ps -lq)Logs are saved to logs/python/:
benchmark-log.csv: Raw per-repetition metrics.benchmark-errors.csv: Failure details.
Requires Python 3.12+ and uv.
cd python
uv sync
uv run python -m benchmark.runner 100python -m benchmark.runner <repetitions> [serializerFilter] [dataFilter]
Examples:
# 100 reps, all serializers, all data
uv run python -m benchmark.runner 100
# 10 reps, only JSON serializers, only Person
uv run python -m benchmark.runner 10 "json" "Person"
# 1 rep, only binary serializers
uv run python -m benchmark.runner 1 "msgpack" ""- Create a file in
src/benchmark/serializers/implementingSerializerfromsrc/benchmark/serializers/base.py. - Register it in
src/benchmark/runner.pyinALL_SERIALIZERS. - Add the dependency to
pyproject.tomland runuv sync.
- Define a
@dataclassinsrc/benchmark/data/models.py. - Add a generator in
src/benchmark/data/generator.py. - Register the
(name, class)pair insrc/benchmark/runner.pyinALL_TEST_DATA. - Add conversion logic in schema serializers (
protobuf.py,avro.py) if applicable.
The benchmark outputs logs/python/benchmark-log.csv with the following columns:
StringOrStream,TestDataName,Repetitions,RepetitionIndex,
SerializerName,TimeSer,TimeDeser,Size,
TimeSerAndDeser,OpPerSecSer,OpPerSecDeser,OpPerSecSerAndDeser,
MemoryPeakBytes,FidelityScore
Aggregate results are printed to stdout after each run in a format aligned with the C# console output.
Authored by Leonid Ganeline