Skip to content

feat: cross-runtime quantized comparison (pytorch-quantized vs onnx-quantized)#1

Open
SAY-5 wants to merge 5 commits into
mainfrom
SAY-5/quant-explorer
Open

feat: cross-runtime quantized comparison (pytorch-quantized vs onnx-quantized)#1
SAY-5 wants to merge 5 commits into
mainfrom
SAY-5/quant-explorer

Conversation

@SAY-5
Copy link
Copy Markdown
Owner

@SAY-5 SAY-5 commented May 10, 2026

v4: cross-runtime ONNX comparison

Exports each PTQ config to ONNX (FP32 via torch.onnx.export, INT8 via onnxruntime.quantization) and benches under ONNX Runtime CPU EP. Compares top-1 / latency / on-disk size against the PyTorch quantized runtime. Asserts top-1 parity within +/-1pp.

Numbers (full 10k CIFAR-10 test split, M-series CPU):

config pt_top1 onnx_top1 top1_pp pt_p50_ms onnx_p50_ms
fp32_baseline 82.3% 82.3% 0.00 1.83 0.83
dynamic_int8 82.3% 82.3% 0.00 1.14 0.38
static_int8_per_tensor 82.1% 82.1% -0.05 1.77 0.18
static_int8_per_channel 82.0% 82.3% +0.27 1.27 0.18

All four configs pass the +/-1pp structural-parity gate. See artifacts/results/cross_runtime.{json,md} and docs/cross_runtime.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant