Skip to content

Zapformer preview#2082

Open
danpovey wants to merge 1275 commits into
k2-fsa:masterfrom
danpovey:zapformer3127
Open

Zapformer preview#2082
danpovey wants to merge 1275 commits into
k2-fsa:masterfrom
danpovey:zapformer3127

Conversation

@danpovey

@danpovey danpovey commented May 13, 2026

Copy link
Copy Markdown
Collaborator

We are working on the writeup but this is in case anyone wants to try the latest version. Also note the --use-giga=True option in train.py and the --giga=True option in decode scripts.

Summary by CodeRabbit

  • New Features

    • Full Zapformer ASR: training, streaming & non‑streaming inference, many decoding modes (greedy/beam/LM/rescoring/oracle), ONNX/TorchScript export and runtimes, pretrained-model decode tools, profiling, and evaluation outputs.
    • End-to-end decoding utilities for batch and streaming audio, plus export/import tooling for ONNX/TorchScript models.
  • Documentation

    • Added Zapformer training and decoding results with usage notes.
  • Chores

    • Updated .gitignore to exclude generated PDF artifacts.

Review Change Stack

danpovey and others added 30 commits March 11, 2026 13:04
…ble2099conv_streaming' into deterministic_invertible2182conv

# Conflicts:
#	egs/librispeech/ASR/zapformer/combined_scheduler.py
#	egs/librispeech/ASR/zapformer/train.py
#	egs/librispeech/ASR/zipformer/optim.py
#	egs/librispeech/ASR/zipformer/zipformer.py
This reverts commit 4da937c0f9eef0328f0fca13da836e48a51a5e58.
…_decay_proportion=0.85 to cubic_decay_proportion=0.8, beta1=0.998 to beta1=0.995.
…vertible2217conv

# Conflicts:
#	egs/librispeech/ASR/zapformer/model.py
@danpovey

danpovey commented May 29, 2026

Copy link
Copy Markdown
Collaborator Author

It seems the --base-lr default is set at an inappropriate value of 0.00065, it should be more like 0.02, although since you are using just one card I'd probably try more like 0.015.
Also you are averaging over way too many epochs, try more like 2 or 3. (the --max-copies means that later epochs take longer)
Also try regular non-streaming decode first.

@pehonnet

Copy link
Copy Markdown
Contributor

For non streaming, I get indeed much better results:

python ./zapformer/decode.py     --epoch 30     --avg 3     --exp-dir ./zapformer/exp     --max-duration 1000     --decoding-method greedy_search --causal true --chunk-size 32 --left-context-frames 256
greedy_search	3.56	best for dev-clean
greedy_search	9.85	best for dev-other
greedy_search	3.79	best for test-clean
greedy_search	9.82	best for test-other

If I try with avg 15 to compare, I get similar results (~3.5% - 9.9%)

For the streaming script, with only 3 epochs it's even worse than 15:

python ./zapformer/streaming_decode.py   --epoch 30   --avg 3  --causal 1   --chunk-size 32   --left-context-frames 256   --exp-dir ./zapformer/exp   --decoding-method greedy_search   --num-decode-streams 1000
greedy_search	31.47	best for dev-clean
greedy_search	30.95	best for test-clean
greedy_search	45.98	best for dev-other
greedy_search	46.06	best for test-other

so there may be something wrong in the streaming decoding...
It could also be due to some small changes I had to do to make the onnx export run in the first place.
If I have time I will test your updated branch and report again.

Thanks!

@danpovey

Copy link
Copy Markdown
Collaborator Author

Thanks for reporting your progress! Here are some more updates, including some optimizer changes.
Kangwei recently made a bunch of fixes to the onnx-exporting, I believe, I have included those commits in what I just pushed. (note, this is my local branch 3240.). He said that he has not yet tested the streaming onnx-exporting though, although he may have made some fixes to it anyway.

@danpovey

Copy link
Copy Markdown
Collaborator Author

BTW, it's remarkable that it worked at all because that default LR was 30 times too low. If you have only one job you may need to reduce the LR a bit: for example, from the default of 0.02 to 0.015 or 0.013 or something like that.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
egs/librispeech/ASR/zapformer/batched_rubik.py (1)

193-211: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: matrix_shape computes the wrong (rows, cols) because cumprod isn’t cumulative

In egs/librispeech/ASR/zapformer/batched_rubik.py (lines 193-211), the docstring example matrix_shape([2, 3, 10]) = (6, 10) doesn’t match the implementation: current code uses cumprod = [2, 3, 10] and returns (10, 6). cumprod should contain cumulative products (e.g., [2, 6, 60]) derived from numel.

Proposed fix
 def matrix_shape(shape):
     """
     shape is expected to be a torch.Size or a list with at least two dimensions.
     Returns (rows, cols) such that a tensor of shape `shape` can be reshaped
     to size (rows, cols), by combining dimensions in a way that minimizes the
     difference between rows and cols.  e.g. matrix_shape([ 2, 3, 10 ]) = (6, 10)
     """
     shape = list(shape)
     cumprod = [ ]
     numel = 1
     for k in shape:
-        cumprod.append(k)
+        numel = numel * k
+        cumprod.append(numel)
-        numel = numel * k
     diffs = [ abs(k - numel // k) for k in cumprod ]
     min_diff = min(diffs)
     for i in range(len(shape)):
         if diffs[i] == min_diff:
             return cumprod[i], numel // cumprod[i]
     assert False, shape
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 193 - 211, The
function matrix_shape builds cumprod incorrectly as element copies instead of
cumulative products, causing wrong (rows, cols); fix matrix_shape by computing
cumulative products for cumprod (e.g., cumprod[i] = product of shape[0..i])
while maintaining numel as total product, then compute diffs = [abs(cp - numel
// cp) for cp in cumprod] and return (best_cp, numel // best_cp); update
references to cumprod, numel, diffs in matrix_shape accordingly and keep the
existing assert for fallback.
🧹 Nitpick comments (5)
egs/librispeech/ASR/zapformer/jit_pretrained.py (1)

159-159: 💤 Low value

Remove redundant no-op assignment.

current_encoder_out = current_encoder_out is a no-op and appears to be leftover dead code.

Suggested fix
         current_encoder_out = packed_encoder_out.data[start:end]
-        current_encoder_out = current_encoder_out
         # current_encoder_out's shape: (batch_size, encoder_out_dim)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/jit_pretrained.py` at line 159, Remove the
redundant no-op assignment "current_encoder_out = current_encoder_out" in the
code path that uses the variable current_encoder_out (in jit_pretrained.py) —
simply delete that line so the code relies on the existing current_encoder_out
value without the unnecessary self-assignment; ensure no other logic depends on
that line before committing.
egs/librispeech/ASR/zapformer/batched_rubik.py (2)

562-570: 💤 Low value

Test assumes CUDA availability.

The test function hardcodes device = torch.device('cuda') which will fail on systems without CUDA. Consider adding a fallback or making device configurable.

Suggested fix
-    device = torch.device('cuda')
-    `#device` = torch.device("cpu")
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 562 - 570, The
test _test_batched_rubik currently forces device = torch.device('cuda') which
fails when CUDA is unavailable; update _test_batched_rubik to choose device via
torch.device('cuda' if torch.cuda.is_available() else 'cpu') or accept a device
parameter so tests can run on CPU, and ensure any downstream tensors/models
created in _test_batched_rubik use that device variable (references:
_test_batched_rubik, device).

30-36: 💤 Low value

Clean up commented-out code and hardcoded dtype.

The commented-out import block and hardcoded COMPUTE_DTYPE = torch.bfloat16 reduce maintainability. Consider making COMPUTE_DTYPE configurable or documenting why bfloat16 is required.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 30 - 36, Remove
the dead commented import block and stop hardcoding COMPUTE_DTYPE; instead
attempt to import COMPUTE_DTYPE from nanochat.common (or another config source)
with a safe fallback (e.g., torch.bfloat16 if available, else torch.float32),
and/or expose COMPUTE_DTYPE as a configurable parameter or environment-driven
value so callers can override it; update the module to use the symbol
COMPUTE_DTYPE consistently and add a short inline comment documenting the
default choice and why bfloat16 is chosen.
egs/librispeech/ASR/zapformer/export-onnx.py (2)

462-462: 💤 Low value

Redundant model.to(device) calls.

The model is already moved to device at line 462. The subsequent model.to(device) calls on lines 480, 491, 514, and 532 are redundant.

Suggested fix - remove redundant calls
             logging.info(f"averaging {filenames}")
-            model.to(device)
             model.load_state_dict(average_checkpoints(filenames, device=device))

Apply similar removal for lines 491, 514, and 532.

Also applies to: 480-480, 491-491, 514-514, 532-532

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/export-onnx.py` at line 462, Remove the
redundant repeated model.to(device) calls: keep a single model.to(device) where
the model is initially moved (the existing call at line 462) and delete the
subsequent duplicate calls that reapply model.to(device) later in the script
(the extra invocations referencing model.to(device) around lines 480, 491, 514,
and 532); ensure only the initial model.to(device) remains so the model is
placed on device once before export logic that uses model.

314-333: ⚡ Quick win

enable_onnx_checker won’t break CI: PyTorch 1.13.1 supports it

requirements-ci.txt pins torch==1.13.1+cpu, and in that version torch.onnx.export accepts enable_onnx_checker (it controls whether the ONNX model checker runs during export). The “TypeError on newer PyTorch” risk depends on your runtime torch version, so if forward-compatibility is desired, retry the export without enable_onnx_checker on TypeError instead of removing it unconditionally.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@egs/librispeech/ASR/zapformer/export-onnx.py` around lines 314 - 333, The
export block calls torch.onnx.export with enable_onnx_checker set, which may
raise TypeError on some torch versions; update the export logic around the
encoder export (the torch.onnx.export call for encoder_model producing
encoder_filename using inputs (x, x_lens) and outputs
"encoder_out"/"encoder_out_lens") to catch TypeError specifically and retry the
export without the enable_onnx_checker kwarg, while preserving the existing
logging/exception behavior for other errors; ensure you only remove
enable_onnx_checker on the retry and re-raise other exceptions as currently
done.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@egs/librispeech/ASR/zapformer/batched_rubik.py`:
- Around line 193-211: The function matrix_shape builds cumprod incorrectly as
element copies instead of cumulative products, causing wrong (rows, cols); fix
matrix_shape by computing cumulative products for cumprod (e.g., cumprod[i] =
product of shape[0..i]) while maintaining numel as total product, then compute
diffs = [abs(cp - numel // cp) for cp in cumprod] and return (best_cp, numel //
best_cp); update references to cumprod, numel, diffs in matrix_shape accordingly
and keep the existing assert for fallback.

---

Nitpick comments:
In `@egs/librispeech/ASR/zapformer/batched_rubik.py`:
- Around line 562-570: The test _test_batched_rubik currently forces device =
torch.device('cuda') which fails when CUDA is unavailable; update
_test_batched_rubik to choose device via torch.device('cuda' if
torch.cuda.is_available() else 'cpu') or accept a device parameter so tests can
run on CPU, and ensure any downstream tensors/models created in
_test_batched_rubik use that device variable (references: _test_batched_rubik,
device).
- Around line 30-36: Remove the dead commented import block and stop hardcoding
COMPUTE_DTYPE; instead attempt to import COMPUTE_DTYPE from nanochat.common (or
another config source) with a safe fallback (e.g., torch.bfloat16 if available,
else torch.float32), and/or expose COMPUTE_DTYPE as a configurable parameter or
environment-driven value so callers can override it; update the module to use
the symbol COMPUTE_DTYPE consistently and add a short inline comment documenting
the default choice and why bfloat16 is chosen.

In `@egs/librispeech/ASR/zapformer/export-onnx.py`:
- Line 462: Remove the redundant repeated model.to(device) calls: keep a single
model.to(device) where the model is initially moved (the existing call at line
462) and delete the subsequent duplicate calls that reapply model.to(device)
later in the script (the extra invocations referencing model.to(device) around
lines 480, 491, 514, and 532); ensure only the initial model.to(device) remains
so the model is placed on device once before export logic that uses model.
- Around line 314-333: The export block calls torch.onnx.export with
enable_onnx_checker set, which may raise TypeError on some torch versions;
update the export logic around the encoder export (the torch.onnx.export call
for encoder_model producing encoder_filename using inputs (x, x_lens) and
outputs "encoder_out"/"encoder_out_lens") to catch TypeError specifically and
retry the export without the enable_onnx_checker kwarg, while preserving the
existing logging/exception behavior for other errors; ensure you only remove
enable_onnx_checker on the retry and re-raise other exceptions as currently
done.

In `@egs/librispeech/ASR/zapformer/jit_pretrained.py`:
- Line 159: Remove the redundant no-op assignment "current_encoder_out =
current_encoder_out" in the code path that uses the variable current_encoder_out
(in jit_pretrained.py) — simply delete that line so the code relies on the
existing current_encoder_out value without the unnecessary self-assignment;
ensure no other logic depends on that line before committing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1c0cfe2a-d520-4285-9860-df3fe0260b41

📥 Commits

Reviewing files that changed from the base of the PR and between cae74c4 and ee43592.

📒 Files selected for processing (15)
  • egs/librispeech/ASR/RESULTS.md
  • egs/librispeech/ASR/zapformer/batched_rubik.py
  • egs/librispeech/ASR/zapformer/export-onnx.py
  • egs/librispeech/ASR/zapformer/export.py
  • egs/librispeech/ASR/zapformer/jit_pretrained.py
  • egs/librispeech/ASR/zapformer/model.py
  • egs/librispeech/ASR/zapformer/onnx_check.py
  • egs/librispeech/ASR/zapformer/onnx_pretrained.py
  • egs/librispeech/ASR/zapformer/pretrained.py
  • egs/librispeech/ASR/zapformer/rubik.py
  • egs/librispeech/ASR/zapformer/subsampling.py
  • egs/librispeech/ASR/zapformer/train.py
  • egs/librispeech/ASR/zapformer/zapformer.py
  • egs/librispeech/ASR/zapformer/zapformer_modules.py
  • egs/librispeech/ASR/zapformer/zapformer_utils.py
💤 Files with no reviewable changes (2)
  • egs/librispeech/ASR/zapformer/model.py
  • egs/librispeech/ASR/zapformer/rubik.py
✅ Files skipped from review due to trivial changes (1)
  • egs/librispeech/ASR/RESULTS.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • egs/librispeech/ASR/zapformer/pretrained.py
  • egs/librispeech/ASR/zapformer/subsampling.py
  • egs/librispeech/ASR/zapformer/train.py
  • egs/librispeech/ASR/zapformer/zapformer.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants