diff --git a/README.md b/README.md index aaa90b0..9bcf371 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ # News +- [May. 2026] 🚀 Added ONNX export support with a hybrid inference recipe (backbone + extracted classifier head), int8 quantization, and a FunASR-free runtime example. See [`scripts/onnx/`](./scripts/onnx/README.md). - [Oct. 2024] 🔧 We update the usage in the FunASR interface with source selection. "ms" or "modelscope" for China mainland users; "hf" or "huggingface" for other overseas users. **We recommend using FunASR interface for a smooth landing.** - [Jun. 2024] 🔧 We fix a bug in emotion2vec+. Please re-pull the latest code. - [May. 2024] 🔥 Speech emotion recognition foundation model: **emotion2vec+**, with 9-class emotions has been released on [Model Scope](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary) and [Hugging Face](https://huggingface.co/emotion2vec). Check out a series of emotion2vec+ (seed, base, large) models for SER with high performance **(We recommend this release instead of the Jan. 2024 release)**. diff --git a/scripts/onnx/README.md b/scripts/onnx/README.md new file mode 100644 index 0000000..fe43f71 --- /dev/null +++ b/scripts/onnx/README.md @@ -0,0 +1,150 @@ +# ONNX export workflow for emotion2vec + +End-to-end recipe for converting emotion2vec models (including the fine-tuned +`emotion2vec_plus_*` classifiers) to **ONNX**, running them with +`onnxruntime`, and validating the output against FunASR's `generate()`. + +## Background + +FunASR [PR #2359](https://github.com/modelscope/FunASR/pull/2359) (merged +January 2025, shipped in `funasr >= 1.2.3`) added a `model.export()` path +that traces the SSL backbone to ONNX. However: + +- The exported `forward` returns the **backbone output only** — shape + `[batch, sequence_length, embed_dim]` — i.e. the *features*, not the + 9-class emotion probabilities. +- For the fine-tuned classifier variants (`emotion2vec_plus_seed`, + `emotion2vec_plus_base`, `emotion2vec_plus_large`), the classification + head — a single `Linear(embed_dim, num_classes)` named `proj` — must be + **extracted separately from `model.pt`** and applied at inference time. +- The exported file is named `emotion2vec` (no extension) — rename to + `*.onnx` for clarity. + +This directory provides the missing scripts plus a corrected int8 +quantization workflow. + +## The hybrid inference recipe + +``` +raw 16 kHz Float32 waveform shape: [1, num_samples] + │ + ▼ ONNX backbone (in onnxruntime) +features shape: [1, T, embed_dim] + │ + ▼ mean-pool over the time axis +pooled shape: [embed_dim] + │ + ▼ proj head (extracted from model.pt): logits = W · pooled + b +logits shape: [num_classes] + │ + ▼ softmax +probabilities shape: [num_classes] +``` + +The waveform-normalization step (`(x - mean) / sqrt(var + 1e-5)`) is +**folded into the exported ONNX graph** by FunASR's `export_forward`, so +no JS/Python preprocessing of the audio is required — feed the raw +waveform straight in. + +## Files + +| File | Purpose | +|------|---------| +| `export_backbone.py` | Wraps `AutoModel(...).export(type='onnx', ...)`. | +| `extract_head.py` | Pulls `proj.weight`, `proj.bias`, and label names from `model.pt` + `tokens.txt` into a small JSON. | +| `quantize.py` | Dynamic int8 quantization, **with two refinements**: per-channel weight scales, and skipping activation×activation MatMul nodes (the attention's Q·Kᵀ and softmax·V, which quantize poorly). | +| `validate.py` | Runs FunASR `generate()` and the ONNX-hybrid path on the same audio and reports per-emotion drift. | +| `inference_example.py` | Minimal standalone runtime — WAV in, emotion out, **no FunASR or PyTorch at runtime**. | +| `requirements.txt` | Python dependencies. | + +## Usage + +Install dependencies (a fresh venv is recommended): + +```bash +pip install -r requirements.txt +``` + +### Step 1 — export the backbone + +```bash +python export_backbone.py --model iic/emotion2vec_plus_large +``` + +The exported file lands in the ModelScope cache directory, typically +`~/.cache/modelscope/hub/models//`. It is named `emotion2vec` +(no extension). Rename it: + +```bash +# Linux / macOS +mv ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/emotion2vec \ + emotion2vec.onnx +``` + +### Step 2 — extract the classifier head + +```bash +python extract_head.py \ + --checkpoint ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/model.pt \ + --tokens ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/tokens.txt \ + --output emotion2vec_head.json +``` + +Produces a ~160 KB JSON: `{labels: [...], weight: [[...]], bias: [...]}`. + +### Step 3 (optional) — int8-quantize the ONNX + +```bash +python quantize.py --input emotion2vec.onnx --output emotion2vec.int8.onnx +``` + +Typical size reduction: ~3× (e.g. 649 MB → 195 MB). + +### Step 4 — validate numerically against FunASR + +```bash +python validate.py --model iic/emotion2vec_plus_large \ + --onnx emotion2vec.onnx \ + --head emotion2vec_head.json +``` + +On `emotion2vec_plus_large`, the fp32 ONNX matches FunASR `generate()` +within ~3e-05 (numerical fp32 noise). The int8 build (step 3) drifts on +the order of 1e-04 on confident inputs. + +### Step 5 — minimal runtime example + +```bash +python inference_example.py --onnx emotion2vec.onnx \ + --head emotion2vec_head.json \ + --wav some_clip_16k_mono.wav +``` + +This runs the entire hybrid pipeline using only `onnxruntime` + `numpy` + +the head JSON — no `funasr` or `torch` at runtime. Useful for porting +inference to other languages: the recipe (`session.run` → mean-pool → +linear → softmax) is a handful of lines. + +## Notes + +- **`extract_features` vs full forward** — FunASR's `export_meta.py` wires + the export's `forward` to call `_original_forward(features_only=True)`, + which is equivalent to `extract_features`. The classifier `proj` is + applied *outside* this forward in `inference()`, which is why it's + absent from the ONNX. +- **`emotion2vec_base` (representation model)** — has no `proj` head. The + ONNX backbone is the whole story; use the features directly. + `extract_head.py` will exit with a clear error if `proj.weight` isn't + found in the checkpoint. +- **int8 quantization drift** — naive `quantize_dynamic` with + `op_types_to_quantize=['MatMul']` quantizes *every* MatMul including + the attention's activation×activation matmuls (Q·Kᵀ, softmax·V), which + drifts heavily (worst-case ~0.17 of probability mass on uncertain + inputs). `quantize.py` excludes those nodes by inspecting which MatMul + inputs are graph initializers (i.e. weights). This mirrors what + `torch.quantize_dynamic(model, {nn.Linear})` does naturally — those + matmuls aren't `nn.Linear` modules, so torch leaves them alone. +- **Per-channel weights** — `per_channel=True` in `quantize_dynamic` + gives one scale per output channel rather than one per tensor; + standard practice for transformer weights and a meaningful drift + reduction. diff --git a/scripts/onnx/export_backbone.py b/scripts/onnx/export_backbone.py new file mode 100644 index 0000000..cf45bbd --- /dev/null +++ b/scripts/onnx/export_backbone.py @@ -0,0 +1,69 @@ +""" +Export the emotion2vec backbone to ONNX via FunASR's built-in exporter. + +This uses the model.export() path added in FunASR PR #2359 ("Make Emotion2vec +support onnx", merged January 2025, shipped in funasr >= 1.2.3). + +The exported ONNX represents the SSL backbone only: + input float32 [batch, num_samples] (raw 16 kHz waveform) + output float32 [batch, T, embed_dim] (frame-level features) + +For fine-tuned classifier variants (emotion2vec_plus_*), the proj head is NOT +in the exported graph - extract it separately with extract_head.py. + +Usage: + python export_backbone.py --model iic/emotion2vec_plus_large + +The file is written to the ModelScope cache directory and is named +"emotion2vec" with no extension - rename to *.onnx for clarity. +""" + +import argparse +import os +import sys + +try: + sys.stdout.reconfigure(encoding="utf-8") +except Exception: + pass + +from funasr import AutoModel + + +def main() -> None: + ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + ap.add_argument("--model", default="iic/emotion2vec_plus_large", + help="ModelScope model id (default: iic/emotion2vec_plus_large)") + ap.add_argument("--opset", type=int, default=13, + help="ONNX opset version (default: 13, matches PR #2359)") + ap.add_argument("--quantize", action="store_true", + help="Apply FunASR's built-in quantization during export. " + "Not recommended - use quantize.py for a tuned int8 build.") + args = ap.parse_args() + + print(f"Loading {args.model} ...") + model = AutoModel(model=args.model, disable_update=True) + + print(f"Exporting to ONNX (opset={args.opset}, quantize={args.quantize}) ...") + result = model.export(type="onnx", quantize=args.quantize, opset_version=args.opset) + print(f"export() returned: {result}") + + paths = [result] if isinstance(result, (str, os.PathLike)) else list(result or []) + for p in paths: + p = str(p) + if os.path.isdir(p): + print(f"\nDIR {p}") + for f in sorted(os.listdir(p)): + fp = os.path.join(p, f) + if os.path.isfile(fp): + size = os.path.getsize(fp) / 1e6 + print(f" {f} ({size:.1f} MB)") + elif os.path.isfile(p): + print(f"\nFILE {p} ({os.path.getsize(p) / 1e6:.1f} MB)") + + print("\nNote: the exported ONNX is named 'emotion2vec' with no extension.") + print(" Rename it to 'emotion2vec.onnx' before passing to the next steps.") + + +if __name__ == "__main__": + main() diff --git a/scripts/onnx/extract_head.py b/scripts/onnx/extract_head.py new file mode 100644 index 0000000..8619c18 --- /dev/null +++ b/scripts/onnx/extract_head.py @@ -0,0 +1,105 @@ +""" +Extract the classification head (proj layer + labels) from a fine-tuned +emotion2vec_plus_* checkpoint into a JSON file. + +FunASR's model.export() exports the SSL backbone only. For the fine-tuned +classifier variants the model architecture is: + + backbone(waveform) -> features [T, embed_dim] + pooled = features.mean(time) # mean-pool over frames + logits = proj(pooled) # Linear(embed_dim, num_classes) + probs = softmax(logits) + +The proj layer (`Linear(embed_dim, num_classes)`) lives in the checkpoint +under the keys `proj.weight` and `proj.bias`. We dump those plus the label +names (read from tokens.txt) into a small JSON, so the classifier can be +applied at inference time in any language - it's just a matmul and a softmax. + +Usage: + python extract_head.py \\ + --checkpoint ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/model.pt \\ + --tokens ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/tokens.txt \\ + --output emotion2vec_head.json + +For the SSL representation models (e.g. emotion2vec_base) there is no proj +head; the script exits with a clear error in that case. +""" + +import argparse +import json +import os +import sys + +try: + sys.stdout.reconfigure(encoding="utf-8") +except Exception: + pass + +import torch + + +def normalize_label(raw: str) -> str: + """Map a raw token to a clean english label. + + FunASR's tokens.txt entries look like "/english" (e.g. "生气/angry"), + plus a special "" token which we surface as "unknown". + """ + if not raw or raw.strip() == "": + return "unknown" + if "/" in raw: + return raw.split("/")[-1].strip().lower() + return raw.strip().lower() + + +def main() -> None: + ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + ap.add_argument("--checkpoint", required=True, help="path to model.pt") + ap.add_argument("--tokens", required=True, help="path to tokens.txt") + ap.add_argument("--output", default="emotion2vec_head.json", + help="output JSON path (default: emotion2vec_head.json)") + args = ap.parse_args() + + print(f"Loading checkpoint: {args.checkpoint}") + ck = torch.load(args.checkpoint, map_location="cpu") + # FunASR / fairseq checkpoints are dicts with a 'model' sub-dict; some are + # plain state_dicts. Handle both. + if isinstance(ck, dict) and "model" in ck: + sd = ck["model"] + else: + sd = ck + + if "proj.weight" not in sd or "proj.bias" not in sd: + sys.exit( + "ERROR: proj.weight / proj.bias not found in checkpoint.\n" + " This is likely an SSL/representation model (e.g. emotion2vec_base)\n" + " with no classification head. The ONNX backbone alone is the\n" + " complete inference graph for that variant - use its features\n" + " directly. This script is for fine-tuned classifier variants\n" + " (emotion2vec_plus_seed / _base / _large)." + ) + + W = sd["proj.weight"] + B = sd["proj.bias"] + print(f" proj.weight {tuple(W.shape)} proj.bias {tuple(B.shape)}") + + with open(args.tokens, encoding="utf-8") as f: + raw_labels = [line.strip() for line in f if line.strip()] + labels = [normalize_label(lab) for lab in raw_labels] + + if len(labels) != W.shape[0]: + sys.exit(f"ERROR: label count ({len(labels)}) != proj output dim ({W.shape[0]})") + + print(f" labels: {labels}") + + out = { + "labels": labels, + "weight": W.tolist(), # shape [num_classes, embed_dim] + "bias": B.tolist(), # shape [num_classes] + } + with open(args.output, "w") as f: + json.dump(out, f) + print(f"Wrote {args.output} ({os.path.getsize(args.output)} bytes)") + + +if __name__ == "__main__": + main() diff --git a/scripts/onnx/inference_example.py b/scripts/onnx/inference_example.py new file mode 100644 index 0000000..efff273 --- /dev/null +++ b/scripts/onnx/inference_example.py @@ -0,0 +1,85 @@ +""" +Minimal standalone inference: WAV file -> emotion label, using only the +exported ONNX backbone + extracted head JSON. No FunASR, no PyTorch at +runtime - just onnxruntime + numpy. + +This is what you'd ship to production. The whole "classifier" portion is +literally ~6 lines of numpy (pooling, matmul, softmax). + +The waveform-normalization step is folded into the exported ONNX graph by +FunASR's export_forward, so feeding raw 16 kHz Float32 samples is correct - +no preprocessing required. + +Usage: + python inference_example.py --onnx emotion2vec.onnx \\ + --head emotion2vec_head.json \\ + --wav some_clip_16k_mono.wav +""" + +import argparse +import json +import sys +import wave + +try: + sys.stdout.reconfigure(encoding="utf-8") +except Exception: + pass + +import numpy as np +import onnxruntime as ort + + +def load_wav_16k_mono(path: str) -> np.ndarray: + """Load a 16-bit PCM mono 16 kHz WAV as Float32 in [-1, 1].""" + with wave.open(path, "rb") as w: + sr = w.getframerate() + if sr != 16000: + sys.exit(f"Expected 16 kHz WAV, got {sr} Hz") + if w.getnchannels() != 1: + sys.exit("Expected mono WAV") + if w.getsampwidth() != 2: + sys.exit("Expected 16-bit PCM WAV") + frames = w.readframes(w.getnframes()) + return np.frombuffer(frames, dtype=np.int16).astype(np.float32) / 32768.0 + + +def softmax(x: np.ndarray) -> np.ndarray: + e = np.exp(x - x.max()) + return e / e.sum() + + +def main() -> None: + ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + ap.add_argument("--onnx", required=True, help="path to emotion2vec ONNX") + ap.add_argument("--head", required=True, help="path to emotion2vec_head.json") + ap.add_argument("--wav", required=True, help="16 kHz mono 16-bit PCM WAV") + args = ap.parse_args() + + head = json.load(open(args.head)) + W = np.array(head["weight"], dtype=np.float32) # [num_classes, embed_dim] + B = np.array(head["bias"], dtype=np.float32) # [num_classes] + labels = head["labels"] + + audio = load_wav_16k_mono(args.wav) + print(f"Loaded {len(audio) / 16000:.2f} s of audio") + + sess = ort.InferenceSession(args.onnx, providers=["CPUExecutionProvider"]) + in_name = sess.get_inputs()[0].name + + # --- the entire hybrid runtime --- + feats = sess.run(None, {in_name: audio.reshape(1, -1)})[0] # [1, T, embed_dim] + pooled = feats[0].mean(axis=0) # [embed_dim] + probs = softmax(W @ pooled + B) # [num_classes] + # --------------------------------- + + order = np.argsort(-probs) + print() + print(f" {'label':<12}{'score':>10}") + for i in order: + print(f" {labels[i]:<12}{probs[i]:>10.4f}") + print(f"\nTop emotion: {labels[order[0]]} (score {probs[order[0]]:.4f})") + + +if __name__ == "__main__": + main() diff --git a/scripts/onnx/quantize.py b/scripts/onnx/quantize.py new file mode 100644 index 0000000..ec922f1 --- /dev/null +++ b/scripts/onnx/quantize.py @@ -0,0 +1,106 @@ +""" +Optionally quantize the ONNX backbone to int8 (dynamic), with two +refinements that keep drift small. + +Two important refinements vs. a naive `op_types_to_quantize=['MatMul']`: + + 1. Skip activation x activation MatMul nodes - the attention's Q.K^T and + softmax.V. They have NO constant weight on either input. Quantizing them + to int8 is brutal: softmax outputs are mostly near-zero with a few sharp + peaks and don't survive 256 levels. torch.quantize_dynamic(model, + {nn.Linear}) naturally avoids them (those ops aren't nn.Linear modules); + the ONNX equivalent is excluding them by node name. We detect them by + "neither input is a graph initializer". + + 2. per_channel=True - one scale per output channel rather than one per + tensor. Standard practice for transformer weights, lower drift. + +Usage: + python quantize.py --input emotion2vec.onnx --output emotion2vec.int8.onnx + +Typical result: ~3x size reduction (e.g. 649 MB -> 195 MB). +""" + +import argparse +import os +import sys + +try: + sys.stdout.reconfigure(encoding="utf-8") +except Exception: + pass + +import onnx +from onnxruntime.quantization import quantize_dynamic, QuantType + + +def find_activation_matmuls(model_path: str): + """Return (exclude_names, weight_count, act_act_count, source_path_to_use). + + Scans the graph for MatMul nodes. A "weight matmul" has at least one input + that is a graph initializer (i.e. a constant weight tensor). An "activation + x activation matmul" has both inputs dynamic - we want to exclude those + from quantization. + + If any such node lacks a name (some exporters skip them), assigns synthetic + names and writes a side-loaded copy of the model since quantize_dynamic + expects nodes_to_exclude to identify nodes by name. + """ + m = onnx.load(model_path) + inits = {init.name for init in m.graph.initializer} + exclude = [] + weight_mm = 0 + act_act_mm = 0 + modified = False + for i, node in enumerate(m.graph.node): + if node.op_type != "MatMul": + continue + if any(inp in inits for inp in node.input): + weight_mm += 1 + else: + act_act_mm += 1 + if not node.name: + node.name = f"matmul_{i}" + modified = True + exclude.append(node.name) + + src_to_use = model_path + if modified: + src_to_use = model_path + ".named.tmp.onnx" + onnx.save(m, src_to_use) + return exclude, weight_mm, act_act_mm, src_to_use + + +def main() -> None: + ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + ap.add_argument("--input", required=True, help="fp32 ONNX produced by step 1") + ap.add_argument("--output", required=True, help="output int8 ONNX path") + args = ap.parse_args() + + if not os.path.isfile(args.input): + sys.exit(f"input model not found: {args.input}") + + exclude, weight_mm, act_act_mm, src = find_activation_matmuls(args.input) + print(f"MatMul nodes - weight: {weight_mm} | activation x activation: {act_act_mm} (excluded)") + + print(f"Quantizing (dynamic int8, MatMul-weight only, per-channel)") + print(f" in : {args.input}") + print(f" out: {args.output}") + quantize_dynamic( + src, + args.output, + weight_type=QuantType.QInt8, + op_types_to_quantize=["MatMul"], + nodes_to_exclude=exclude, + per_channel=True, + ) + if src != args.input and os.path.exists(src): + os.remove(src) + + si = os.path.getsize(args.input) / 1e6 + so = os.path.getsize(args.output) / 1e6 + print(f"Done. {si:.0f} MB -> {so:.0f} MB ({so / si * 100:.0f}%)") + + +if __name__ == "__main__": + main() diff --git a/scripts/onnx/requirements.txt b/scripts/onnx/requirements.txt new file mode 100644 index 0000000..1b1b50f --- /dev/null +++ b/scripts/onnx/requirements.txt @@ -0,0 +1,5 @@ +funasr>=1.2.3 +torch>=2.0 +onnx>=1.14 +onnxruntime>=1.16 +numpy<2 diff --git a/scripts/onnx/validate.py b/scripts/onnx/validate.py new file mode 100644 index 0000000..f8968eb --- /dev/null +++ b/scripts/onnx/validate.py @@ -0,0 +1,108 @@ +""" +Validate the ONNX + extracted head against FunASR ground truth. + +Runs the same audio through two paths and reports per-emotion drift: + + A) FunASR AutoModel.generate() <- ground truth + B) ONNX backbone + mean-pool + proj + softmax <- hybrid recipe + +For the fp32 ONNX the worst-case drift should be at the fp rounding floor +(~3e-05 on emotion2vec_plus_large). For an int8 build (from quantize.py) +expect ~1e-04 on confident inputs, climbing on intentionally ambiguous +(random-noise) inputs - that's a property of dynamic quantization, not a +recipe bug. + +Usage: + python validate.py --model iic/emotion2vec_plus_large \\ + --onnx emotion2vec.onnx \\ + --head emotion2vec_head.json +""" + +import argparse +import json +import os +import sys + +try: + sys.stdout.reconfigure(encoding="utf-8") +except Exception: + pass + +import numpy as np +import onnxruntime as ort +from funasr import AutoModel + + +def normalize_label(raw: str) -> str: + if not raw or raw.strip() == "": + return "unknown" + if "/" in raw: + return raw.split("/")[-1].strip().lower() + return raw.strip().lower() + + +def softmax(x: np.ndarray) -> np.ndarray: + e = np.exp(x - x.max()) + return e / e.sum() + + +def main() -> None: + ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + ap.add_argument("--model", default="iic/emotion2vec_plus_large") + ap.add_argument("--onnx", required=True) + ap.add_argument("--head", required=True) + ap.add_argument("--seconds", nargs="+", type=float, default=[2.0, 4.0, 6.5], + help="test clip lengths in seconds (default: 2.0 4.0 6.5)") + args = ap.parse_args() + + head = json.load(open(args.head)) + W = np.array(head["weight"], dtype=np.float32) + B = np.array(head["bias"], dtype=np.float32) + labels = head["labels"] + + print(f"Loading ONNX: {args.onnx}") + sess = ort.InferenceSession(args.onnx, providers=["CPUExecutionProvider"]) + in_name = sess.get_inputs()[0].name + + print(f"Loading FunASR {args.model} (ground truth) ...") + fa = AutoModel(model=args.model, disable_update=True) + + def path_onnx(audio: np.ndarray) -> np.ndarray: + feats = sess.run(None, {in_name: audio.reshape(1, -1).astype(np.float32)})[0] + pooled = feats[0].mean(axis=0) + return softmax(W @ pooled + B) + + def path_funasr(audio: np.ndarray) -> np.ndarray: + res = fa.generate(audio, granularity="utterance", extract_embedding=False)[0] + d = { + normalize_label(l): float(s) + for l, s in zip(res["labels"], res["scores"]) + } + return np.array([d.get(lab, 0.0) for lab in labels], dtype=np.float32) + + rng = np.random.default_rng(0) + worst = 0.0 + for secs in args.seconds: + audio = rng.standard_normal(int(secs * 16000)).astype(np.float32) + fa_s = path_funasr(audio) + on_s = path_onnx(audio) + diff = np.abs(fa_s - on_s) + worst = max(worst, float(diff.max())) + + print(f"\n=== {secs}s clip ===") + print(f" {'label':<11}{'funasr':>10}{'onnx':>10}{'diff':>12}") + for i, lab in enumerate(labels): + print(f" {lab:<11}{fa_s[i]:>10.5f}{on_s[i]:>10.5f}{diff[i]:>12.2e}") + + print(f"\n{'=' * 40}") + print(f"WORST absolute score difference: {worst:.2e}") + if worst < 1e-3: + print("PASS - recipe is exact within fp32 noise") + elif worst < 1e-2: + print("CLOSE - minor drift, typical of well-tuned int8") + else: + print("NOTABLE drift - inspect the per-label table above") + + +if __name__ == "__main__": + main()