feat: cross-runtime quantized comparison (pytorch-quantized vs onnx-quantized) by SAY-5 · Pull Request #1 · SAY-5/quant-explorer

SAY-5 · 2026-05-10T21:46:40Z

v4: cross-runtime ONNX comparison

Exports each PTQ config to ONNX (FP32 via torch.onnx.export, INT8 via onnxruntime.quantization) and benches under ONNX Runtime CPU EP. Compares top-1 / latency / on-disk size against the PyTorch quantized runtime. Asserts top-1 parity within +/-1pp.

Numbers (full 10k CIFAR-10 test split, M-series CPU):

config	pt_top1	onnx_top1	top1_pp	pt_p50_ms	onnx_p50_ms
fp32_baseline	82.3%	82.3%	0.00	1.83	0.83
dynamic_int8	82.3%	82.3%	0.00	1.14	0.38
static_int8_per_tensor	82.1%	82.1%	-0.05	1.77	0.18
static_int8_per_channel	82.0%	82.3%	+0.27	1.27	0.18

All four configs pass the +/-1pp structural-parity gate. See artifacts/results/cross_runtime.{json,md} and docs/cross_runtime.md.

…uantized)

…ity gate

… document split from publishable +/-1pp

SAY-5 and others added 5 commits May 10, 2026 14:45

feat: cross-runtime quantized comparison (pytorch-quantized vs onnx-q…

c33cefe

…uantized)

ci: enlarge cross-runtime smoke subset to 2000 samples for stable par…

c4925e2

…ity gate

ci: loosen cross-runtime smoke gate to +/-5pp (regression canary) and…

b560208

… document split from publishable +/-1pp

ci: quote step name containing a colon to fix YAML parse

39e54e2

chore: remove em-dashes from comments

17ffca4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cross-runtime quantized comparison (pytorch-quantized vs onnx-quantized)#1

feat: cross-runtime quantized comparison (pytorch-quantized vs onnx-quantized)#1
SAY-5 wants to merge 5 commits into
mainfrom
SAY-5/quant-explorer

SAY-5 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SAY-5 commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant