Zapformer preview by danpovey · Pull Request #2082 · k2-fsa/icefall

danpovey · 2026-05-13T14:17:17Z

We are working on the writeup but this is in case anyone wants to try the latest version. Also note the --use-giga=True option in train.py and the --giga=True option in decode scripts.

Summary by CodeRabbit

New Features
- Full Zapformer ASR: training, streaming & non‑streaming inference, many decoding modes (greedy/beam/LM/rescoring/oracle), ONNX/TorchScript export and runtimes, pretrained-model decode tools, profiling, and evaluation outputs.
- End-to-end decoding utilities for batch and streaming audio, plus export/import tooling for ONNX/TorchScript models.
Documentation
- Added Zapformer training and decoding results with usage notes.
Chores
- Updated .gitignore to exclude generated PDF artifacts.

…ble2099conv_streaming' into deterministic_invertible2182conv # Conflicts: # egs/librispeech/ASR/zapformer/combined_scheduler.py # egs/librispeech/ASR/zapformer/train.py # egs/librispeech/ASR/zipformer/optim.py # egs/librispeech/ASR/zipformer/zipformer.py

…ove ballast from non-streaming version.

…ase .1->.15.

….8 and direct=0.1 to direct=0.15

…range of progress, use 0.95,0.05

This reverts commit 4da937c0f9eef0328f0fca13da836e48a51a5e58.

…vertible2190conv

…_decay_proportion=0.85 to cubic_decay_proportion=0.8, beta1=0.998 to beta1=0.995.

…ce length a little

…nto the main training loop

…vertible2217conv # Conflicts: # egs/librispeech/ASR/zapformer/model.py

… test.

danpovey · 2026-05-29T02:12:16Z

It seems the --base-lr default is set at an inappropriate value of 0.00065, it should be more like 0.02, although since you are using just one card I'd probably try more like 0.015.
Also you are averaging over way too many epochs, try more like 2 or 3. (the --max-copies means that later epochs take longer)
Also try regular non-streaming decode first.

pehonnet · 2026-05-29T08:39:57Z

For non streaming, I get indeed much better results:

python ./zapformer/decode.py     --epoch 30     --avg 3     --exp-dir ./zapformer/exp     --max-duration 1000     --decoding-method greedy_search --causal true --chunk-size 32 --left-context-frames 256
greedy_search	3.56	best for dev-clean
greedy_search	9.85	best for dev-other
greedy_search	3.79	best for test-clean
greedy_search	9.82	best for test-other

If I try with avg 15 to compare, I get similar results (~3.5% - 9.9%)

For the streaming script, with only 3 epochs it's even worse than 15:

python ./zapformer/streaming_decode.py   --epoch 30   --avg 3  --causal 1   --chunk-size 32   --left-context-frames 256   --exp-dir ./zapformer/exp   --decoding-method greedy_search   --num-decode-streams 1000
greedy_search	31.47	best for dev-clean
greedy_search	30.95	best for test-clean
greedy_search	45.98	best for dev-other
greedy_search	46.06	best for test-other

so there may be something wrong in the streaming decoding...
It could also be due to some small changes I had to do to make the onnx export run in the first place.
If I have time I will test your updated branch and report again.

Thanks!

danpovey · 2026-05-30T13:52:28Z

Thanks for reporting your progress! Here are some more updates, including some optimizer changes.
Kangwei recently made a bunch of fixes to the onnx-exporting, I believe, I have included those commits in what I just pushed. (note, this is my local branch 3240.). He said that he has not yet tested the streaming onnx-exporting though, although he may have made some fixes to it anyway.

danpovey · 2026-05-30T13:54:06Z

BTW, it's remarkable that it worked at all because that default LR was 30 times too low. If you have only one job you may need to reduce the LR a bit: for example, from the default of 0.02 to 0.015 or 0.013 or something like that.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

egs/librispeech/ASR/zapformer/batched_rubik.py (1)

193-211: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: matrix_shape computes the wrong (rows, cols) because cumprod isn’t cumulative

In egs/librispeech/ASR/zapformer/batched_rubik.py (lines 193-211), the docstring example matrix_shape([2, 3, 10]) = (6, 10) doesn’t match the implementation: current code uses cumprod = [2, 3, 10] and returns (10, 6). cumprod should contain cumulative products (e.g., [2, 6, 60]) derived from numel.

Proposed fix

 def matrix_shape(shape):
     """
     shape is expected to be a torch.Size or a list with at least two dimensions.
     Returns (rows, cols) such that a tensor of shape `shape` can be reshaped
     to size (rows, cols), by combining dimensions in a way that minimizes the
     difference between rows and cols.  e.g. matrix_shape([ 2, 3, 10 ]) = (6, 10)
     """
     shape = list(shape)
     cumprod = [ ]
     numel = 1
     for k in shape:
-        cumprod.append(k)
+        numel = numel * k
+        cumprod.append(numel)
-        numel = numel * k
     diffs = [ abs(k - numel // k) for k in cumprod ]
     min_diff = min(diffs)
     for i in range(len(shape)):
         if diffs[i] == min_diff:
             return cumprod[i], numel // cumprod[i]
     assert False, shape

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 193 - 211, The
function matrix_shape builds cumprod incorrectly as element copies instead of
cumulative products, causing wrong (rows, cols); fix matrix_shape by computing
cumulative products for cumprod (e.g., cumprod[i] = product of shape[0..i])
while maintaining numel as total product, then compute diffs = [abs(cp - numel
// cp) for cp in cumprod] and return (best_cp, numel // best_cp); update
references to cumprod, numel, diffs in matrix_shape accordingly and keep the
existing assert for fallback.

🧹 Nitpick comments (5)

egs/librispeech/ASR/zapformer/jit_pretrained.py (1)

159-159: 💤 Low value

Remove redundant no-op assignment.

current_encoder_out = current_encoder_out is a no-op and appears to be leftover dead code.

Suggested fix

         current_encoder_out = packed_encoder_out.data[start:end]
-        current_encoder_out = current_encoder_out
         # current_encoder_out's shape: (batch_size, encoder_out_dim)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/jit_pretrained.py` at line 159, Remove the
redundant no-op assignment "current_encoder_out = current_encoder_out" in the
code path that uses the variable current_encoder_out (in jit_pretrained.py) —
simply delete that line so the code relies on the existing current_encoder_out
value without the unnecessary self-assignment; ensure no other logic depends on
that line before committing.

egs/librispeech/ASR/zapformer/batched_rubik.py (2)

562-570: 💤 Low value

Test assumes CUDA availability.

The test function hardcodes device = torch.device('cuda') which will fail on systems without CUDA. Consider adding a fallback or making device configurable.

Suggested fix

-    device = torch.device('cuda')
-    `#device` = torch.device("cpu")
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 562 - 570, The
test _test_batched_rubik currently forces device = torch.device('cuda') which
fails when CUDA is unavailable; update _test_batched_rubik to choose device via
torch.device('cuda' if torch.cuda.is_available() else 'cpu') or accept a device
parameter so tests can run on CPU, and ensure any downstream tensors/models
created in _test_batched_rubik use that device variable (references:
_test_batched_rubik, device).

30-36: 💤 Low value

Clean up commented-out code and hardcoded dtype.

The commented-out import block and hardcoded COMPUTE_DTYPE = torch.bfloat16 reduce maintainability. Consider making COMPUTE_DTYPE configurable or documenting why bfloat16 is required.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 30 - 36, Remove
the dead commented import block and stop hardcoding COMPUTE_DTYPE; instead
attempt to import COMPUTE_DTYPE from nanochat.common (or another config source)
with a safe fallback (e.g., torch.bfloat16 if available, else torch.float32),
and/or expose COMPUTE_DTYPE as a configurable parameter or environment-driven
value so callers can override it; update the module to use the symbol
COMPUTE_DTYPE consistently and add a short inline comment documenting the
default choice and why bfloat16 is chosen.

egs/librispeech/ASR/zapformer/export-onnx.py (2)

462-462: 💤 Low value

Redundant model.to(device) calls.

The model is already moved to device at line 462. The subsequent model.to(device) calls on lines 480, 491, 514, and 532 are redundant.
Suggested fix - remove redundant calls
             logging.info(f"averaging {filenames}")
-            model.to(device)
             model.load_state_dict(average_checkpoints(filenames, device=device))
Apply similar removal for lines 491, 514, and 532.
Also applies to: 480-480, 491-491, 514-514, 532-532
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/export-onnx.py` at line 462, Remove the
redundant repeated model.to(device) calls: keep a single model.to(device) where
the model is initially moved (the existing call at line 462) and delete the
subsequent duplicate calls that reapply model.to(device) later in the script
(the extra invocations referencing model.to(device) around lines 480, 491, 514,
and 532); ensure only the initial model.to(device) remains so the model is
placed on device once before export logic that uses model.
314-333: ⚡ Quick win

enable_onnx_checker won’t break CI: PyTorch 1.13.1 supports it

requirements-ci.txt pins torch==1.13.1+cpu, and in that version torch.onnx.export accepts enable_onnx_checker (it controls whether the ONNX model checker runs during export). The “TypeError on newer PyTorch” risk depends on your runtime torch version, so if forward-compatibility is desired, retry the export without enable_onnx_checker on TypeError instead of removing it unconditionally.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/export-onnx.py` around lines 314 - 333, The
export block calls torch.onnx.export with enable_onnx_checker set, which may
raise TypeError on some torch versions; update the export logic around the
encoder export (the torch.onnx.export call for encoder_model producing
encoder_filename using inputs (x, x_lens) and outputs
"encoder_out"/"encoder_out_lens") to catch TypeError specifically and retry the
export without the enable_onnx_checker kwarg, while preserving the existing
logging/exception behavior for other errors; ensure you only remove
enable_onnx_checker on the retry and re-raise other exceptions as currently
done.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@egs/librispeech/ASR/zapformer/batched_rubik.py`:
- Around line 193-211: The function matrix_shape builds cumprod incorrectly as
element copies instead of cumulative products, causing wrong (rows, cols); fix
matrix_shape by computing cumulative products for cumprod (e.g., cumprod[i] =
product of shape[0..i]) while maintaining numel as total product, then compute
diffs = [abs(cp - numel // cp) for cp in cumprod] and return (best_cp, numel //
best_cp); update references to cumprod, numel, diffs in matrix_shape accordingly
and keep the existing assert for fallback.

---

Nitpick comments:
In `@egs/librispeech/ASR/zapformer/batched_rubik.py`:
- Around line 562-570: The test _test_batched_rubik currently forces device =
torch.device('cuda') which fails when CUDA is unavailable; update
_test_batched_rubik to choose device via torch.device('cuda' if
torch.cuda.is_available() else 'cpu') or accept a device parameter so tests can
run on CPU, and ensure any downstream tensors/models created in
_test_batched_rubik use that device variable (references: _test_batched_rubik,
device).
- Around line 30-36: Remove the dead commented import block and stop hardcoding
COMPUTE_DTYPE; instead attempt to import COMPUTE_DTYPE from nanochat.common (or
another config source) with a safe fallback (e.g., torch.bfloat16 if available,
else torch.float32), and/or expose COMPUTE_DTYPE as a configurable parameter or
environment-driven value so callers can override it; update the module to use
the symbol COMPUTE_DTYPE consistently and add a short inline comment documenting
the default choice and why bfloat16 is chosen.

In `@egs/librispeech/ASR/zapformer/export-onnx.py`:
- Line 462: Remove the redundant repeated model.to(device) calls: keep a single
model.to(device) where the model is initially moved (the existing call at line
462) and delete the subsequent duplicate calls that reapply model.to(device)
later in the script (the extra invocations referencing model.to(device) around
lines 480, 491, 514, and 532); ensure only the initial model.to(device) remains
so the model is placed on device once before export logic that uses model.
- Around line 314-333: The export block calls torch.onnx.export with
enable_onnx_checker set, which may raise TypeError on some torch versions;
update the export logic around the encoder export (the torch.onnx.export call
for encoder_model producing encoder_filename using inputs (x, x_lens) and
outputs "encoder_out"/"encoder_out_lens") to catch TypeError specifically and
retry the export without the enable_onnx_checker kwarg, while preserving the
existing logging/exception behavior for other errors; ensure you only remove
enable_onnx_checker on the retry and re-raise other exceptions as currently
done.

In `@egs/librispeech/ASR/zapformer/jit_pretrained.py`:
- Line 159: Remove the redundant no-op assignment "current_encoder_out =
current_encoder_out" in the code path that uses the variable current_encoder_out
(in jit_pretrained.py) — simply delete that line so the code relies on the
existing current_encoder_out value without the unnecessary self-assignment;
ensure no other logic depends on that line before committing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1c0cfe2a-d520-4285-9860-df3fe0260b41

📥 Commits

Reviewing files that changed from the base of the PR and between cae74c4 and ee43592.

📒 Files selected for processing (15)

egs/librispeech/ASR/RESULTS.md
egs/librispeech/ASR/zapformer/batched_rubik.py
egs/librispeech/ASR/zapformer/export-onnx.py
egs/librispeech/ASR/zapformer/export.py
egs/librispeech/ASR/zapformer/jit_pretrained.py
egs/librispeech/ASR/zapformer/model.py
egs/librispeech/ASR/zapformer/onnx_check.py
egs/librispeech/ASR/zapformer/onnx_pretrained.py
egs/librispeech/ASR/zapformer/pretrained.py
egs/librispeech/ASR/zapformer/rubik.py
egs/librispeech/ASR/zapformer/subsampling.py
egs/librispeech/ASR/zapformer/train.py
egs/librispeech/ASR/zapformer/zapformer.py
egs/librispeech/ASR/zapformer/zapformer_modules.py
egs/librispeech/ASR/zapformer/zapformer_utils.py

💤 Files with no reviewable changes (2)

egs/librispeech/ASR/zapformer/model.py
egs/librispeech/ASR/zapformer/rubik.py

✅ Files skipped from review due to trivial changes (1)

egs/librispeech/ASR/RESULTS.md

🚧 Files skipped from review as they are similar to previous changes (4)

egs/librispeech/ASR/zapformer/pretrained.py
egs/librispeech/ASR/zapformer/subsampling.py
egs/librispeech/ASR/zapformer/train.py
egs/librispeech/ASR/zapformer/zapformer.py

…ine LR schedule.

…nistic_invertible3243conv

…vertible3240conv

…ptim.py

danpovey and others added 30 commits March 11, 2026 13:04

Move self-attention weights input to after ff1.

027132a

Separate streaming and non-streaming versions of SequenceNorm and rem…

6f7999e

…ove ballast from non-streaming version.

Make min_factor simply added linearly (not affect progress) and incre…

1f908bf

…ase .1->.15.

Increase cubic_decay_proportion=0.75 back to cubic_decay_proportion=0…

a217e3d

….8 and direct=0.1 to direct=0.15

Increase cubic_decay_proportion from .8 to .85

e274669

Implement min_factor and max_factor in cosine scheduler via changing …

af73215

…range of progress, use 0.95,0.05

Revert "Move self-attention weights input to after ff1."

d343e0a

This reverts commit 4da937c0f9eef0328f0fca13da836e48a51a5e58.

Documentation changes.

f0410f6

Merge branch 'deterministic_invertible2182conv' into deterministic_in…

4ee3199

…vertible2190conv

Large amount of code cleanup and removal.

1b32247

Increase CorrelationLimiter limit from .35 to .45

075b702

Replace CosineLRScheduler with HalfCosineLRScheduler

f8db837

Implement InterpCosineLRScheduler

f3fd4d8

Some configuration changes; CorrelationLimiter power 0.45->0.4, cubic…

3cd1645

…_decay_proportion=0.85 to cubic_decay_proportion=0.8, beta1=0.998 to beta1=0.995.

Change where padding is done in ConvolutionModule and round up sequen…

56d283d

…ce length a little

Simplify the interface of model.py, moving SpecAug augmentation out i…

8b1ed2e

…nto the main training loop

Bug fix

39a6647

fix import

315471b

Bug fixes

ba0d20b

Merge branch 'deterministic_invertible2191conv' into deterministic_in…

145d2cc

…vertible2217conv # Conflicts: # egs/librispeech/ASR/zapformer/model.py

Change defaults and test code in optim.py, will not affect our runs.

d2ad0bb

Move code to batched_rubik, rubik instead of optim.py

2f227b3

add commonvoice dataset

096b914

Replace FftConv with BasisConv

27424b4

Fix wrong class names in super()

c3921f9

Implement WeightedMean to bypass convolutions; this breakds streaming…

957b23b

… test.

Bug fix re src_key_padding_mask, use it.

a429e48

Use 4, not 2, copies of the data.

fd8147c

Fix assertion.

bc7d0b6

pkufool added 5 commits May 28, 2026 10:20

fix to onnx exporting, not working yet

c3fc7d6

export transducer works

b638c5a

fix onnx inference

6d0080c

minor fixes

d284af8

minor fix

9cb74d0

danpovey added 4 commits May 30, 2026 21:41

merge kangwei's branch zapformer_preview

f9869c1

Fix some dtypes in optimizer.

de5a49f

Update the results.

3d67a58

Set base-lr to 0.02.

ee43592

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

danpovey and others added 15 commits May 30, 2026 22:01

Remove unnecessary state_dict/load_state_dict members.

6c2e9b6

Fix issue in matrix_shape() pointed out by AI on k2-fsa#2082

881a8ed

fix streaming jit export

3e78592

Fix from master for ctc_loss bug in torch

f96e36e

Take zipformer/model.py from master.

916a250

fix streaming export and pretrained inference

ae69eea

Use batched_rubik optimizer [muon-core] in zipformer, with interp-cos…

8f94d85

…ine LR schedule.

Merge remote-tracking branch 'kangwei/zapformer_preview' into determi…

8896e65

…nistic_invertible3243conv

Add giga/cv test sets for zipformer

60bcaa7

Make code more robust w.r.t. COMPUTE_DTYPE.

be8a101

Merge changes from origin/zapformer3127

73c7579

Remove comment.

a1de0b2

Merge branch 'deterministic_invertible3242conv' into deterministic_in…

209e1c7

…vertible3240conv

Remove muon.py

bc6955d

take zipformer/train.py from master, move this train.py to train_newo…

7e077af

…ptim.py

danpovey mentioned this pull request Jun 11, 2026

Zipformer performance suddenly drops after epoch 14 #2087

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zapformer preview#2082

Zapformer preview#2082
danpovey wants to merge 1275 commits into
k2-fsa:masterfrom
danpovey:zapformer3127

danpovey commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

danpovey commented May 29, 2026 •

edited

Loading

Uh oh!

pehonnet commented May 29, 2026

Uh oh!

danpovey commented May 30, 2026

Uh oh!

danpovey commented May 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

danpovey commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

danpovey commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pehonnet commented May 29, 2026

Uh oh!

danpovey commented May 30, 2026

Uh oh!

danpovey commented May 30, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danpovey commented May 13, 2026 •

edited by coderabbitai Bot

Loading

danpovey commented May 29, 2026 •

edited

Loading