Zapformer preview#2082
Conversation
…ble2099conv_streaming' into deterministic_invertible2182conv # Conflicts: # egs/librispeech/ASR/zapformer/combined_scheduler.py # egs/librispeech/ASR/zapformer/train.py # egs/librispeech/ASR/zipformer/optim.py # egs/librispeech/ASR/zipformer/zipformer.py
…ove ballast from non-streaming version.
….8 and direct=0.1 to direct=0.15
…range of progress, use 0.95,0.05
This reverts commit 4da937c0f9eef0328f0fca13da836e48a51a5e58.
…_decay_proportion=0.85 to cubic_decay_proportion=0.8, beta1=0.998 to beta1=0.995.
…ce length a little
…nto the main training loop
…vertible2217conv # Conflicts: # egs/librispeech/ASR/zapformer/model.py
|
It seems the --base-lr default is set at an inappropriate value of 0.00065, it should be more like 0.02, although since you are using just one card I'd probably try more like 0.015. |
|
For non streaming, I get indeed much better results: python ./zapformer/decode.py --epoch 30 --avg 3 --exp-dir ./zapformer/exp --max-duration 1000 --decoding-method greedy_search --causal true --chunk-size 32 --left-context-frames 256
greedy_search 3.56 best for dev-clean
greedy_search 9.85 best for dev-other
greedy_search 3.79 best for test-clean
greedy_search 9.82 best for test-otherIf I try with avg 15 to compare, I get similar results (~3.5% - 9.9%) For the streaming script, with only 3 epochs it's even worse than 15: python ./zapformer/streaming_decode.py --epoch 30 --avg 3 --causal 1 --chunk-size 32 --left-context-frames 256 --exp-dir ./zapformer/exp --decoding-method greedy_search --num-decode-streams 1000
greedy_search 31.47 best for dev-clean
greedy_search 30.95 best for test-clean
greedy_search 45.98 best for dev-other
greedy_search 46.06 best for test-otherso there may be something wrong in the streaming decoding... Thanks! |
|
Thanks for reporting your progress! Here are some more updates, including some optimizer changes. |
|
BTW, it's remarkable that it worked at all because that default LR was 30 times too low. If you have only one job you may need to reduce the LR a bit: for example, from the default of 0.02 to 0.015 or 0.013 or something like that. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
egs/librispeech/ASR/zapformer/batched_rubik.py (1)
193-211:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical:
matrix_shapecomputes the wrong (rows, cols) becausecumprodisn’t cumulativeIn
egs/librispeech/ASR/zapformer/batched_rubik.py(lines 193-211), the docstring examplematrix_shape([2, 3, 10]) = (6, 10)doesn’t match the implementation: current code usescumprod = [2, 3, 10]and returns(10, 6).cumprodshould contain cumulative products (e.g.,[2, 6, 60]) derived fromnumel.Proposed fix
def matrix_shape(shape): """ shape is expected to be a torch.Size or a list with at least two dimensions. Returns (rows, cols) such that a tensor of shape `shape` can be reshaped to size (rows, cols), by combining dimensions in a way that minimizes the difference between rows and cols. e.g. matrix_shape([ 2, 3, 10 ]) = (6, 10) """ shape = list(shape) cumprod = [ ] numel = 1 for k in shape: - cumprod.append(k) + numel = numel * k + cumprod.append(numel) - numel = numel * k diffs = [ abs(k - numel // k) for k in cumprod ] min_diff = min(diffs) for i in range(len(shape)): if diffs[i] == min_diff: return cumprod[i], numel // cumprod[i] assert False, shape🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 193 - 211, The function matrix_shape builds cumprod incorrectly as element copies instead of cumulative products, causing wrong (rows, cols); fix matrix_shape by computing cumulative products for cumprod (e.g., cumprod[i] = product of shape[0..i]) while maintaining numel as total product, then compute diffs = [abs(cp - numel // cp) for cp in cumprod] and return (best_cp, numel // best_cp); update references to cumprod, numel, diffs in matrix_shape accordingly and keep the existing assert for fallback.
🧹 Nitpick comments (5)
egs/librispeech/ASR/zapformer/jit_pretrained.py (1)
159-159: 💤 Low valueRemove redundant no-op assignment.
current_encoder_out = current_encoder_outis a no-op and appears to be leftover dead code.Suggested fix
current_encoder_out = packed_encoder_out.data[start:end] - current_encoder_out = current_encoder_out # current_encoder_out's shape: (batch_size, encoder_out_dim)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egs/librispeech/ASR/zapformer/jit_pretrained.py` at line 159, Remove the redundant no-op assignment "current_encoder_out = current_encoder_out" in the code path that uses the variable current_encoder_out (in jit_pretrained.py) — simply delete that line so the code relies on the existing current_encoder_out value without the unnecessary self-assignment; ensure no other logic depends on that line before committing.egs/librispeech/ASR/zapformer/batched_rubik.py (2)
562-570: 💤 Low valueTest assumes CUDA availability.
The test function hardcodes
device = torch.device('cuda')which will fail on systems without CUDA. Consider adding a fallback or making device configurable.Suggested fix
- device = torch.device('cuda') - `#device` = torch.device("cpu") + device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 562 - 570, The test _test_batched_rubik currently forces device = torch.device('cuda') which fails when CUDA is unavailable; update _test_batched_rubik to choose device via torch.device('cuda' if torch.cuda.is_available() else 'cpu') or accept a device parameter so tests can run on CPU, and ensure any downstream tensors/models created in _test_batched_rubik use that device variable (references: _test_batched_rubik, device).
30-36: 💤 Low valueClean up commented-out code and hardcoded dtype.
The commented-out import block and hardcoded
COMPUTE_DTYPE = torch.bfloat16reduce maintainability. Consider makingCOMPUTE_DTYPEconfigurable or documenting why bfloat16 is required.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egs/librispeech/ASR/zapformer/batched_rubik.py` around lines 30 - 36, Remove the dead commented import block and stop hardcoding COMPUTE_DTYPE; instead attempt to import COMPUTE_DTYPE from nanochat.common (or another config source) with a safe fallback (e.g., torch.bfloat16 if available, else torch.float32), and/or expose COMPUTE_DTYPE as a configurable parameter or environment-driven value so callers can override it; update the module to use the symbol COMPUTE_DTYPE consistently and add a short inline comment documenting the default choice and why bfloat16 is chosen.egs/librispeech/ASR/zapformer/export-onnx.py (2)
462-462: 💤 Low valueRedundant
model.to(device)calls.The model is already moved to device at line 462. The subsequent
model.to(device)calls on lines 480, 491, 514, and 532 are redundant.Suggested fix - remove redundant calls
logging.info(f"averaging {filenames}") - model.to(device) model.load_state_dict(average_checkpoints(filenames, device=device))Apply similar removal for lines 491, 514, and 532.
Also applies to: 480-480, 491-491, 514-514, 532-532
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egs/librispeech/ASR/zapformer/export-onnx.py` at line 462, Remove the redundant repeated model.to(device) calls: keep a single model.to(device) where the model is initially moved (the existing call at line 462) and delete the subsequent duplicate calls that reapply model.to(device) later in the script (the extra invocations referencing model.to(device) around lines 480, 491, 514, and 532); ensure only the initial model.to(device) remains so the model is placed on device once before export logic that uses model.
314-333: ⚡ Quick win
enable_onnx_checkerwon’t break CI: PyTorch 1.13.1 supports it
requirements-ci.txtpinstorch==1.13.1+cpu, and in that versiontorch.onnx.exportacceptsenable_onnx_checker(it controls whether the ONNX model checker runs during export). The “TypeError on newer PyTorch” risk depends on your runtime torch version, so if forward-compatibility is desired, retry the export withoutenable_onnx_checkeronTypeErrorinstead of removing it unconditionally.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@egs/librispeech/ASR/zapformer/export-onnx.py` around lines 314 - 333, The export block calls torch.onnx.export with enable_onnx_checker set, which may raise TypeError on some torch versions; update the export logic around the encoder export (the torch.onnx.export call for encoder_model producing encoder_filename using inputs (x, x_lens) and outputs "encoder_out"/"encoder_out_lens") to catch TypeError specifically and retry the export without the enable_onnx_checker kwarg, while preserving the existing logging/exception behavior for other errors; ensure you only remove enable_onnx_checker on the retry and re-raise other exceptions as currently done.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@egs/librispeech/ASR/zapformer/batched_rubik.py`:
- Around line 193-211: The function matrix_shape builds cumprod incorrectly as
element copies instead of cumulative products, causing wrong (rows, cols); fix
matrix_shape by computing cumulative products for cumprod (e.g., cumprod[i] =
product of shape[0..i]) while maintaining numel as total product, then compute
diffs = [abs(cp - numel // cp) for cp in cumprod] and return (best_cp, numel //
best_cp); update references to cumprod, numel, diffs in matrix_shape accordingly
and keep the existing assert for fallback.
---
Nitpick comments:
In `@egs/librispeech/ASR/zapformer/batched_rubik.py`:
- Around line 562-570: The test _test_batched_rubik currently forces device =
torch.device('cuda') which fails when CUDA is unavailable; update
_test_batched_rubik to choose device via torch.device('cuda' if
torch.cuda.is_available() else 'cpu') or accept a device parameter so tests can
run on CPU, and ensure any downstream tensors/models created in
_test_batched_rubik use that device variable (references: _test_batched_rubik,
device).
- Around line 30-36: Remove the dead commented import block and stop hardcoding
COMPUTE_DTYPE; instead attempt to import COMPUTE_DTYPE from nanochat.common (or
another config source) with a safe fallback (e.g., torch.bfloat16 if available,
else torch.float32), and/or expose COMPUTE_DTYPE as a configurable parameter or
environment-driven value so callers can override it; update the module to use
the symbol COMPUTE_DTYPE consistently and add a short inline comment documenting
the default choice and why bfloat16 is chosen.
In `@egs/librispeech/ASR/zapformer/export-onnx.py`:
- Line 462: Remove the redundant repeated model.to(device) calls: keep a single
model.to(device) where the model is initially moved (the existing call at line
462) and delete the subsequent duplicate calls that reapply model.to(device)
later in the script (the extra invocations referencing model.to(device) around
lines 480, 491, 514, and 532); ensure only the initial model.to(device) remains
so the model is placed on device once before export logic that uses model.
- Around line 314-333: The export block calls torch.onnx.export with
enable_onnx_checker set, which may raise TypeError on some torch versions;
update the export logic around the encoder export (the torch.onnx.export call
for encoder_model producing encoder_filename using inputs (x, x_lens) and
outputs "encoder_out"/"encoder_out_lens") to catch TypeError specifically and
retry the export without the enable_onnx_checker kwarg, while preserving the
existing logging/exception behavior for other errors; ensure you only remove
enable_onnx_checker on the retry and re-raise other exceptions as currently
done.
In `@egs/librispeech/ASR/zapformer/jit_pretrained.py`:
- Line 159: Remove the redundant no-op assignment "current_encoder_out =
current_encoder_out" in the code path that uses the variable current_encoder_out
(in jit_pretrained.py) — simply delete that line so the code relies on the
existing current_encoder_out value without the unnecessary self-assignment;
ensure no other logic depends on that line before committing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1c0cfe2a-d520-4285-9860-df3fe0260b41
📒 Files selected for processing (15)
egs/librispeech/ASR/RESULTS.mdegs/librispeech/ASR/zapformer/batched_rubik.pyegs/librispeech/ASR/zapformer/export-onnx.pyegs/librispeech/ASR/zapformer/export.pyegs/librispeech/ASR/zapformer/jit_pretrained.pyegs/librispeech/ASR/zapformer/model.pyegs/librispeech/ASR/zapformer/onnx_check.pyegs/librispeech/ASR/zapformer/onnx_pretrained.pyegs/librispeech/ASR/zapformer/pretrained.pyegs/librispeech/ASR/zapformer/rubik.pyegs/librispeech/ASR/zapformer/subsampling.pyegs/librispeech/ASR/zapformer/train.pyegs/librispeech/ASR/zapformer/zapformer.pyegs/librispeech/ASR/zapformer/zapformer_modules.pyegs/librispeech/ASR/zapformer/zapformer_utils.py
💤 Files with no reviewable changes (2)
- egs/librispeech/ASR/zapformer/model.py
- egs/librispeech/ASR/zapformer/rubik.py
✅ Files skipped from review due to trivial changes (1)
- egs/librispeech/ASR/RESULTS.md
🚧 Files skipped from review as they are similar to previous changes (4)
- egs/librispeech/ASR/zapformer/pretrained.py
- egs/librispeech/ASR/zapformer/subsampling.py
- egs/librispeech/ASR/zapformer/train.py
- egs/librispeech/ASR/zapformer/zapformer.py
…nistic_invertible3243conv
We are working on the writeup but this is in case anyone wants to try the latest version. Also note the --use-giga=True option in train.py and the --giga=True option in decode scripts.
Summary by CodeRabbit
New Features
Documentation
Chores