diff --git a/README.md b/README.md
index aaa90b0..9bcf371 100644
--- a/README.md
+++ b/README.md
@@ -18,6 +18,7 @@
 </div>
 
 # News
+- [May. 2026] 🚀 Added ONNX export support with a hybrid inference recipe (backbone + extracted classifier head), int8 quantization, and a FunASR-free runtime example. See [`scripts/onnx/`](./scripts/onnx/README.md).
 - [Oct. 2024] 🔧 We update the usage in the FunASR interface with source selection. "ms" or "modelscope" for China mainland users; "hf" or "huggingface" for other overseas users. **We recommend using FunASR interface for a smooth landing.**
 - [Jun. 2024] 🔧 We fix a bug in emotion2vec+. Please re-pull the latest code. 
 - [May. 2024] 🔥 Speech emotion recognition foundation model: **emotion2vec+**, with 9-class emotions has been released on [Model Scope](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary) and [Hugging Face](https://huggingface.co/emotion2vec). Check out a series of emotion2vec+ (seed, base, large) models for SER with high performance **(We recommend this release instead of the Jan. 2024 release)**. 
diff --git a/scripts/onnx/README.md b/scripts/onnx/README.md
new file mode 100644
index 0000000..fe43f71
--- /dev/null
+++ b/scripts/onnx/README.md
@@ -0,0 +1,150 @@
+# ONNX export workflow for emotion2vec
+
+End-to-end recipe for converting emotion2vec models (including the fine-tuned
+`emotion2vec_plus_*` classifiers) to **ONNX**, running them with
+`onnxruntime`, and validating the output against FunASR's `generate()`.
+
+## Background
+
+FunASR [PR #2359](https://github.com/modelscope/FunASR/pull/2359) (merged
+January 2025, shipped in `funasr >= 1.2.3`) added a `model.export()` path
+that traces the SSL backbone to ONNX. However:
+
+- The exported `forward` returns the **backbone output only** — shape
+  `[batch, sequence_length, embed_dim]` — i.e. the *features*, not the
+  9-class emotion probabilities.
+- For the fine-tuned classifier variants (`emotion2vec_plus_seed`,
+  `emotion2vec_plus_base`, `emotion2vec_plus_large`), the classification
+  head — a single `Linear(embed_dim, num_classes)` named `proj` — must be
+  **extracted separately from `model.pt`** and applied at inference time.
+- The exported file is named `emotion2vec` (no extension) — rename to
+  `*.onnx` for clarity.
+
+This directory provides the missing scripts plus a corrected int8
+quantization workflow.
+
+## The hybrid inference recipe
+
+```
+raw 16 kHz Float32 waveform   shape: [1, num_samples]
+            │
+            ▼  ONNX backbone (in onnxruntime)
+features                       shape: [1, T, embed_dim]
+            │
+            ▼  mean-pool over the time axis
+pooled                         shape: [embed_dim]
+            │
+            ▼  proj head (extracted from model.pt):  logits = W · pooled + b
+logits                         shape: [num_classes]
+            │
+            ▼  softmax
+probabilities                  shape: [num_classes]
+```
+
+The waveform-normalization step (`(x - mean) / sqrt(var + 1e-5)`) is
+**folded into the exported ONNX graph** by FunASR's `export_forward`, so
+no JS/Python preprocessing of the audio is required — feed the raw
+waveform straight in.
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `export_backbone.py`   | Wraps `AutoModel(...).export(type='onnx', ...)`. |
+| `extract_head.py`      | Pulls `proj.weight`, `proj.bias`, and label names from `model.pt` + `tokens.txt` into a small JSON. |
+| `quantize.py`          | Dynamic int8 quantization, **with two refinements**: per-channel weight scales, and skipping activation×activation MatMul nodes (the attention's Q·Kᵀ and softmax·V, which quantize poorly). |
+| `validate.py`          | Runs FunASR `generate()` and the ONNX-hybrid path on the same audio and reports per-emotion drift. |
+| `inference_example.py` | Minimal standalone runtime — WAV in, emotion out, **no FunASR or PyTorch at runtime**. |
+| `requirements.txt`     | Python dependencies. |
+
+## Usage
+
+Install dependencies (a fresh venv is recommended):
+
+```bash
+pip install -r requirements.txt
+```
+
+### Step 1 — export the backbone
+
+```bash
+python export_backbone.py --model iic/emotion2vec_plus_large
+```
+
+The exported file lands in the ModelScope cache directory, typically
+`~/.cache/modelscope/hub/models/<model_id>/`. It is named `emotion2vec`
+(no extension). Rename it:
+
+```bash
+# Linux / macOS
+mv ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/emotion2vec \
+   emotion2vec.onnx
+```
+
+### Step 2 — extract the classifier head
+
+```bash
+python extract_head.py \
+  --checkpoint ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/model.pt \
+  --tokens     ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/tokens.txt \
+  --output     emotion2vec_head.json
+```
+
+Produces a ~160 KB JSON: `{labels: [...], weight: [[...]], bias: [...]}`.
+
+### Step 3 (optional) — int8-quantize the ONNX
+
+```bash
+python quantize.py --input emotion2vec.onnx --output emotion2vec.int8.onnx
+```
+
+Typical size reduction: ~3× (e.g. 649 MB → 195 MB).
+
+### Step 4 — validate numerically against FunASR
+
+```bash
+python validate.py --model iic/emotion2vec_plus_large \
+                   --onnx  emotion2vec.onnx \
+                   --head  emotion2vec_head.json
+```
+
+On `emotion2vec_plus_large`, the fp32 ONNX matches FunASR `generate()`
+within ~3e-05 (numerical fp32 noise). The int8 build (step 3) drifts on
+the order of 1e-04 on confident inputs.
+
+### Step 5 — minimal runtime example
+
+```bash
+python inference_example.py --onnx emotion2vec.onnx \
+                            --head emotion2vec_head.json \
+                            --wav  some_clip_16k_mono.wav
+```
+
+This runs the entire hybrid pipeline using only `onnxruntime` + `numpy` +
+the head JSON — no `funasr` or `torch` at runtime. Useful for porting
+inference to other languages: the recipe (`session.run` → mean-pool →
+linear → softmax) is a handful of lines.
+
+## Notes
+
+- **`extract_features` vs full forward** — FunASR's `export_meta.py` wires
+  the export's `forward` to call `_original_forward(features_only=True)`,
+  which is equivalent to `extract_features`. The classifier `proj` is
+  applied *outside* this forward in `inference()`, which is why it's
+  absent from the ONNX.
+- **`emotion2vec_base` (representation model)** — has no `proj` head. The
+  ONNX backbone is the whole story; use the features directly.
+  `extract_head.py` will exit with a clear error if `proj.weight` isn't
+  found in the checkpoint.
+- **int8 quantization drift** — naive `quantize_dynamic` with
+  `op_types_to_quantize=['MatMul']` quantizes *every* MatMul including
+  the attention's activation×activation matmuls (Q·Kᵀ, softmax·V), which
+  drifts heavily (worst-case ~0.17 of probability mass on uncertain
+  inputs). `quantize.py` excludes those nodes by inspecting which MatMul
+  inputs are graph initializers (i.e. weights). This mirrors what
+  `torch.quantize_dynamic(model, {nn.Linear})` does naturally — those
+  matmuls aren't `nn.Linear` modules, so torch leaves them alone.
+- **Per-channel weights** — `per_channel=True` in `quantize_dynamic`
+  gives one scale per output channel rather than one per tensor;
+  standard practice for transformer weights and a meaningful drift
+  reduction.
diff --git a/scripts/onnx/export_backbone.py b/scripts/onnx/export_backbone.py
new file mode 100644
index 0000000..cf45bbd
--- /dev/null
+++ b/scripts/onnx/export_backbone.py
@@ -0,0 +1,69 @@
+"""
+Export the emotion2vec backbone to ONNX via FunASR's built-in exporter.
+
+This uses the model.export() path added in FunASR PR #2359 ("Make Emotion2vec
+support onnx", merged January 2025, shipped in funasr >= 1.2.3).
+
+The exported ONNX represents the SSL backbone only:
+    input   float32  [batch, num_samples]    (raw 16 kHz waveform)
+    output  float32  [batch, T, embed_dim]   (frame-level features)
+
+For fine-tuned classifier variants (emotion2vec_plus_*), the proj head is NOT
+in the exported graph - extract it separately with extract_head.py.
+
+Usage:
+    python export_backbone.py --model iic/emotion2vec_plus_large
+
+The file is written to the ModelScope cache directory and is named
+"emotion2vec" with no extension - rename to *.onnx for clarity.
+"""
+
+import argparse
+import os
+import sys
+
+try:
+    sys.stdout.reconfigure(encoding="utf-8")
+except Exception:
+    pass
+
+from funasr import AutoModel
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--model", default="iic/emotion2vec_plus_large",
+                    help="ModelScope model id (default: iic/emotion2vec_plus_large)")
+    ap.add_argument("--opset", type=int, default=13,
+                    help="ONNX opset version (default: 13, matches PR #2359)")
+    ap.add_argument("--quantize", action="store_true",
+                    help="Apply FunASR's built-in quantization during export. "
+                         "Not recommended - use quantize.py for a tuned int8 build.")
+    args = ap.parse_args()
+
+    print(f"Loading {args.model} ...")
+    model = AutoModel(model=args.model, disable_update=True)
+
+    print(f"Exporting to ONNX  (opset={args.opset}, quantize={args.quantize}) ...")
+    result = model.export(type="onnx", quantize=args.quantize, opset_version=args.opset)
+    print(f"export() returned: {result}")
+
+    paths = [result] if isinstance(result, (str, os.PathLike)) else list(result or [])
+    for p in paths:
+        p = str(p)
+        if os.path.isdir(p):
+            print(f"\nDIR  {p}")
+            for f in sorted(os.listdir(p)):
+                fp = os.path.join(p, f)
+                if os.path.isfile(fp):
+                    size = os.path.getsize(fp) / 1e6
+                    print(f"     {f}  ({size:.1f} MB)")
+        elif os.path.isfile(p):
+            print(f"\nFILE {p}  ({os.path.getsize(p) / 1e6:.1f} MB)")
+
+    print("\nNote: the exported ONNX is named 'emotion2vec' with no extension.")
+    print("      Rename it to 'emotion2vec.onnx' before passing to the next steps.")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/onnx/extract_head.py b/scripts/onnx/extract_head.py
new file mode 100644
index 0000000..8619c18
--- /dev/null
+++ b/scripts/onnx/extract_head.py
@@ -0,0 +1,105 @@
+"""
+Extract the classification head (proj layer + labels) from a fine-tuned
+emotion2vec_plus_* checkpoint into a JSON file.
+
+FunASR's model.export() exports the SSL backbone only. For the fine-tuned
+classifier variants the model architecture is:
+
+    backbone(waveform) -> features [T, embed_dim]
+    pooled = features.mean(time)                    # mean-pool over frames
+    logits = proj(pooled)                           # Linear(embed_dim, num_classes)
+    probs  = softmax(logits)
+
+The proj layer (`Linear(embed_dim, num_classes)`) lives in the checkpoint
+under the keys `proj.weight` and `proj.bias`. We dump those plus the label
+names (read from tokens.txt) into a small JSON, so the classifier can be
+applied at inference time in any language - it's just a matmul and a softmax.
+
+Usage:
+    python extract_head.py \\
+        --checkpoint ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/model.pt \\
+        --tokens     ~/.cache/modelscope/hub/models/iic/emotion2vec_plus_large/tokens.txt \\
+        --output     emotion2vec_head.json
+
+For the SSL representation models (e.g. emotion2vec_base) there is no proj
+head; the script exits with a clear error in that case.
+"""
+
+import argparse
+import json
+import os
+import sys
+
+try:
+    sys.stdout.reconfigure(encoding="utf-8")
+except Exception:
+    pass
+
+import torch
+
+
+def normalize_label(raw: str) -> str:
+    """Map a raw token to a clean english label.
+
+    FunASR's tokens.txt entries look like "<chinese>/english" (e.g. "生气/angry"),
+    plus a special "<unk>" token which we surface as "unknown".
+    """
+    if not raw or raw.strip() == "<unk>":
+        return "unknown"
+    if "/" in raw:
+        return raw.split("/")[-1].strip().lower()
+    return raw.strip().lower()
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--checkpoint", required=True, help="path to model.pt")
+    ap.add_argument("--tokens", required=True, help="path to tokens.txt")
+    ap.add_argument("--output", default="emotion2vec_head.json",
+                    help="output JSON path (default: emotion2vec_head.json)")
+    args = ap.parse_args()
+
+    print(f"Loading checkpoint: {args.checkpoint}")
+    ck = torch.load(args.checkpoint, map_location="cpu")
+    # FunASR / fairseq checkpoints are dicts with a 'model' sub-dict; some are
+    # plain state_dicts. Handle both.
+    if isinstance(ck, dict) and "model" in ck:
+        sd = ck["model"]
+    else:
+        sd = ck
+
+    if "proj.weight" not in sd or "proj.bias" not in sd:
+        sys.exit(
+            "ERROR: proj.weight / proj.bias not found in checkpoint.\n"
+            "       This is likely an SSL/representation model (e.g. emotion2vec_base)\n"
+            "       with no classification head. The ONNX backbone alone is the\n"
+            "       complete inference graph for that variant - use its features\n"
+            "       directly. This script is for fine-tuned classifier variants\n"
+            "       (emotion2vec_plus_seed / _base / _large)."
+        )
+
+    W = sd["proj.weight"]
+    B = sd["proj.bias"]
+    print(f"  proj.weight {tuple(W.shape)}   proj.bias {tuple(B.shape)}")
+
+    with open(args.tokens, encoding="utf-8") as f:
+        raw_labels = [line.strip() for line in f if line.strip()]
+    labels = [normalize_label(lab) for lab in raw_labels]
+
+    if len(labels) != W.shape[0]:
+        sys.exit(f"ERROR: label count ({len(labels)}) != proj output dim ({W.shape[0]})")
+
+    print(f"  labels: {labels}")
+
+    out = {
+        "labels": labels,
+        "weight": W.tolist(),  # shape [num_classes, embed_dim]
+        "bias": B.tolist(),    # shape [num_classes]
+    }
+    with open(args.output, "w") as f:
+        json.dump(out, f)
+    print(f"Wrote {args.output} ({os.path.getsize(args.output)} bytes)")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/onnx/inference_example.py b/scripts/onnx/inference_example.py
new file mode 100644
index 0000000..efff273
--- /dev/null
+++ b/scripts/onnx/inference_example.py
@@ -0,0 +1,85 @@
+"""
+Minimal standalone inference: WAV file -> emotion label, using only the
+exported ONNX backbone + extracted head JSON. No FunASR, no PyTorch at
+runtime - just onnxruntime + numpy.
+
+This is what you'd ship to production. The whole "classifier" portion is
+literally ~6 lines of numpy (pooling, matmul, softmax).
+
+The waveform-normalization step is folded into the exported ONNX graph by
+FunASR's export_forward, so feeding raw 16 kHz Float32 samples is correct -
+no preprocessing required.
+
+Usage:
+    python inference_example.py --onnx emotion2vec.onnx \\
+                                --head emotion2vec_head.json \\
+                                --wav  some_clip_16k_mono.wav
+"""
+
+import argparse
+import json
+import sys
+import wave
+
+try:
+    sys.stdout.reconfigure(encoding="utf-8")
+except Exception:
+    pass
+
+import numpy as np
+import onnxruntime as ort
+
+
+def load_wav_16k_mono(path: str) -> np.ndarray:
+    """Load a 16-bit PCM mono 16 kHz WAV as Float32 in [-1, 1]."""
+    with wave.open(path, "rb") as w:
+        sr = w.getframerate()
+        if sr != 16000:
+            sys.exit(f"Expected 16 kHz WAV, got {sr} Hz")
+        if w.getnchannels() != 1:
+            sys.exit("Expected mono WAV")
+        if w.getsampwidth() != 2:
+            sys.exit("Expected 16-bit PCM WAV")
+        frames = w.readframes(w.getnframes())
+    return np.frombuffer(frames, dtype=np.int16).astype(np.float32) / 32768.0
+
+
+def softmax(x: np.ndarray) -> np.ndarray:
+    e = np.exp(x - x.max())
+    return e / e.sum()
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--onnx", required=True, help="path to emotion2vec ONNX")
+    ap.add_argument("--head", required=True, help="path to emotion2vec_head.json")
+    ap.add_argument("--wav", required=True, help="16 kHz mono 16-bit PCM WAV")
+    args = ap.parse_args()
+
+    head = json.load(open(args.head))
+    W = np.array(head["weight"], dtype=np.float32)   # [num_classes, embed_dim]
+    B = np.array(head["bias"], dtype=np.float32)     # [num_classes]
+    labels = head["labels"]
+
+    audio = load_wav_16k_mono(args.wav)
+    print(f"Loaded {len(audio) / 16000:.2f} s of audio")
+
+    sess = ort.InferenceSession(args.onnx, providers=["CPUExecutionProvider"])
+    in_name = sess.get_inputs()[0].name
+
+    # --- the entire hybrid runtime ---
+    feats = sess.run(None, {in_name: audio.reshape(1, -1)})[0]   # [1, T, embed_dim]
+    pooled = feats[0].mean(axis=0)                                # [embed_dim]
+    probs = softmax(W @ pooled + B)                               # [num_classes]
+    # ---------------------------------
+
+    order = np.argsort(-probs)
+    print()
+    print(f"  {'label':<12}{'score':>10}")
+    for i in order:
+        print(f"  {labels[i]:<12}{probs[i]:>10.4f}")
+    print(f"\nTop emotion: {labels[order[0]]}  (score {probs[order[0]]:.4f})")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/onnx/quantize.py b/scripts/onnx/quantize.py
new file mode 100644
index 0000000..ec922f1
--- /dev/null
+++ b/scripts/onnx/quantize.py
@@ -0,0 +1,106 @@
+"""
+Optionally quantize the ONNX backbone to int8 (dynamic), with two
+refinements that keep drift small.
+
+Two important refinements vs. a naive `op_types_to_quantize=['MatMul']`:
+
+  1. Skip activation x activation MatMul nodes - the attention's Q.K^T and
+     softmax.V. They have NO constant weight on either input. Quantizing them
+     to int8 is brutal: softmax outputs are mostly near-zero with a few sharp
+     peaks and don't survive 256 levels. torch.quantize_dynamic(model,
+     {nn.Linear}) naturally avoids them (those ops aren't nn.Linear modules);
+     the ONNX equivalent is excluding them by node name. We detect them by
+     "neither input is a graph initializer".
+
+  2. per_channel=True - one scale per output channel rather than one per
+     tensor. Standard practice for transformer weights, lower drift.
+
+Usage:
+    python quantize.py --input emotion2vec.onnx --output emotion2vec.int8.onnx
+
+Typical result: ~3x size reduction (e.g. 649 MB -> 195 MB).
+"""
+
+import argparse
+import os
+import sys
+
+try:
+    sys.stdout.reconfigure(encoding="utf-8")
+except Exception:
+    pass
+
+import onnx
+from onnxruntime.quantization import quantize_dynamic, QuantType
+
+
+def find_activation_matmuls(model_path: str):
+    """Return (exclude_names, weight_count, act_act_count, source_path_to_use).
+
+    Scans the graph for MatMul nodes. A "weight matmul" has at least one input
+    that is a graph initializer (i.e. a constant weight tensor). An "activation
+    x activation matmul" has both inputs dynamic - we want to exclude those
+    from quantization.
+
+    If any such node lacks a name (some exporters skip them), assigns synthetic
+    names and writes a side-loaded copy of the model since quantize_dynamic
+    expects nodes_to_exclude to identify nodes by name.
+    """
+    m = onnx.load(model_path)
+    inits = {init.name for init in m.graph.initializer}
+    exclude = []
+    weight_mm = 0
+    act_act_mm = 0
+    modified = False
+    for i, node in enumerate(m.graph.node):
+        if node.op_type != "MatMul":
+            continue
+        if any(inp in inits for inp in node.input):
+            weight_mm += 1
+        else:
+            act_act_mm += 1
+            if not node.name:
+                node.name = f"matmul_{i}"
+                modified = True
+            exclude.append(node.name)
+
+    src_to_use = model_path
+    if modified:
+        src_to_use = model_path + ".named.tmp.onnx"
+        onnx.save(m, src_to_use)
+    return exclude, weight_mm, act_act_mm, src_to_use
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--input", required=True, help="fp32 ONNX produced by step 1")
+    ap.add_argument("--output", required=True, help="output int8 ONNX path")
+    args = ap.parse_args()
+
+    if not os.path.isfile(args.input):
+        sys.exit(f"input model not found: {args.input}")
+
+    exclude, weight_mm, act_act_mm, src = find_activation_matmuls(args.input)
+    print(f"MatMul nodes - weight: {weight_mm}  |  activation x activation: {act_act_mm} (excluded)")
+
+    print(f"Quantizing (dynamic int8, MatMul-weight only, per-channel)")
+    print(f"  in : {args.input}")
+    print(f"  out: {args.output}")
+    quantize_dynamic(
+        src,
+        args.output,
+        weight_type=QuantType.QInt8,
+        op_types_to_quantize=["MatMul"],
+        nodes_to_exclude=exclude,
+        per_channel=True,
+    )
+    if src != args.input and os.path.exists(src):
+        os.remove(src)
+
+    si = os.path.getsize(args.input) / 1e6
+    so = os.path.getsize(args.output) / 1e6
+    print(f"Done.  {si:.0f} MB  ->  {so:.0f} MB  ({so / si * 100:.0f}%)")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/onnx/requirements.txt b/scripts/onnx/requirements.txt
new file mode 100644
index 0000000..1b1b50f
--- /dev/null
+++ b/scripts/onnx/requirements.txt
@@ -0,0 +1,5 @@
+funasr>=1.2.3
+torch>=2.0
+onnx>=1.14
+onnxruntime>=1.16
+numpy<2
diff --git a/scripts/onnx/validate.py b/scripts/onnx/validate.py
new file mode 100644
index 0000000..f8968eb
--- /dev/null
+++ b/scripts/onnx/validate.py
@@ -0,0 +1,108 @@
+"""
+Validate the ONNX + extracted head against FunASR ground truth.
+
+Runs the same audio through two paths and reports per-emotion drift:
+
+    A) FunASR  AutoModel.generate()                  <- ground truth
+    B) ONNX backbone + mean-pool + proj + softmax    <- hybrid recipe
+
+For the fp32 ONNX the worst-case drift should be at the fp rounding floor
+(~3e-05 on emotion2vec_plus_large). For an int8 build (from quantize.py)
+expect ~1e-04 on confident inputs, climbing on intentionally ambiguous
+(random-noise) inputs - that's a property of dynamic quantization, not a
+recipe bug.
+
+Usage:
+    python validate.py --model iic/emotion2vec_plus_large \\
+                       --onnx  emotion2vec.onnx \\
+                       --head  emotion2vec_head.json
+"""
+
+import argparse
+import json
+import os
+import sys
+
+try:
+    sys.stdout.reconfigure(encoding="utf-8")
+except Exception:
+    pass
+
+import numpy as np
+import onnxruntime as ort
+from funasr import AutoModel
+
+
+def normalize_label(raw: str) -> str:
+    if not raw or raw.strip() == "<unk>":
+        return "unknown"
+    if "/" in raw:
+        return raw.split("/")[-1].strip().lower()
+    return raw.strip().lower()
+
+
+def softmax(x: np.ndarray) -> np.ndarray:
+    e = np.exp(x - x.max())
+    return e / e.sum()
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
+    ap.add_argument("--model", default="iic/emotion2vec_plus_large")
+    ap.add_argument("--onnx", required=True)
+    ap.add_argument("--head", required=True)
+    ap.add_argument("--seconds", nargs="+", type=float, default=[2.0, 4.0, 6.5],
+                    help="test clip lengths in seconds (default: 2.0 4.0 6.5)")
+    args = ap.parse_args()
+
+    head = json.load(open(args.head))
+    W = np.array(head["weight"], dtype=np.float32)
+    B = np.array(head["bias"], dtype=np.float32)
+    labels = head["labels"]
+
+    print(f"Loading ONNX: {args.onnx}")
+    sess = ort.InferenceSession(args.onnx, providers=["CPUExecutionProvider"])
+    in_name = sess.get_inputs()[0].name
+
+    print(f"Loading FunASR {args.model} (ground truth) ...")
+    fa = AutoModel(model=args.model, disable_update=True)
+
+    def path_onnx(audio: np.ndarray) -> np.ndarray:
+        feats = sess.run(None, {in_name: audio.reshape(1, -1).astype(np.float32)})[0]
+        pooled = feats[0].mean(axis=0)
+        return softmax(W @ pooled + B)
+
+    def path_funasr(audio: np.ndarray) -> np.ndarray:
+        res = fa.generate(audio, granularity="utterance", extract_embedding=False)[0]
+        d = {
+            normalize_label(l): float(s)
+            for l, s in zip(res["labels"], res["scores"])
+        }
+        return np.array([d.get(lab, 0.0) for lab in labels], dtype=np.float32)
+
+    rng = np.random.default_rng(0)
+    worst = 0.0
+    for secs in args.seconds:
+        audio = rng.standard_normal(int(secs * 16000)).astype(np.float32)
+        fa_s = path_funasr(audio)
+        on_s = path_onnx(audio)
+        diff = np.abs(fa_s - on_s)
+        worst = max(worst, float(diff.max()))
+
+        print(f"\n=== {secs}s clip ===")
+        print(f"  {'label':<11}{'funasr':>10}{'onnx':>10}{'diff':>12}")
+        for i, lab in enumerate(labels):
+            print(f"  {lab:<11}{fa_s[i]:>10.5f}{on_s[i]:>10.5f}{diff[i]:>12.2e}")
+
+    print(f"\n{'=' * 40}")
+    print(f"WORST absolute score difference: {worst:.2e}")
+    if worst < 1e-3:
+        print("PASS  - recipe is exact within fp32 noise")
+    elif worst < 1e-2:
+        print("CLOSE - minor drift, typical of well-tuned int8")
+    else:
+        print("NOTABLE drift - inspect the per-label table above")
+
+
+if __name__ == "__main__":
+    main()