espnet-style attn_output_weight scaling and extra after-norm layer#204
Conversation
|
The scaling is just so that, assuming the input variance is about 1, the variance going into the softmax is about 1. |
|
Maybe there will be more WER difference at worse WERs, e.g. before LM rescoring. |
|
.. don't you have the test-other results? |
|
Results of before rescoring and "test_other" are giving soon(being re-tested.) |
Relative wer decrease seems no significant difference before and after LM rescoring.
|
|
still better though.. good..
…On Wednesday, June 2, 2021, LIyong.Guo ***@***.***> wrote:
Maybe there will be more WER difference at worse WERs, e.g. before LM
rescoring.
Relative wer decrease seems no significant difference before and after LM
rescoring.
avg epoch 16-20 no rescore no rescore 4-gram lattice rescore 4-gram
lattice rescore
test-clean test-other test-clean test-other
before 4.33 8.96 3.87 8.08
current 4.26 8.61 3.77 7.86
relative decrease 1.62% 3.91% 2.58% 2.72%
avg epoch 26-30 no rescore no rescore 4-gram lattice rescore 4-gram
lattice rescore
test-clean test-other test-clean test-other
before 4.31 8.98 3.86 8.07
current 4.14 8.41 3.69 7.68
relative decrease 3.94% 6.35% 4.40% 4.83%
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#204 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO4QDSQ3H7TBDTICNCLTQY36DANCNFSM456R7TKQ>
.
|
|
Can you make this an option passed in from the user code, like in your other branch, so that we can |
|
..I'm just concerned it might be disruptive to make this change as-is. |
To be compatible to previously trained models, maybe an optional config, e.g. is_espnet_structure (or another properer name) which default be false could be used. |
|
Yes. |
| num_decoder_layers=num_decoder_layers, | ||
| vgg_frontend=True) | ||
| vgg_frontend=True, | ||
| is_espnet_structure=True) |
There was a problem hiding this comment.
Should have this in training script too
There was a problem hiding this comment.
.. and it's better if you change the directory name, when changing the model structure.
you can remove a couple of older components of the filename, to stop it getting too long.
There was a problem hiding this comment.
Should have this in training script too
added.
.. and it's better if you change the directory name, when changing the model structure.
you can remove a couple of older components of the filename, to stop it getting too long.
- -noam-mmi-att-musan-sa-vgg
+ -mmi-att-sa-vgg-normlayer
|
Thanks a lot! |
Conformer structure differences are identified by loading espnet trained model into snowfall. #201
With these two modifications and 30 epoch training, final result is a bit better(3.69 < 3.86 as reported in #154) than otherwise.
Could you help verify their effectiveness (maybe they are just training variance)? @zhu-han @pzelasko
BTW, is there any mathmatics background which explains when to scaling during attn_output_weights computation? I read several papers, but failed to find a clue about this.
Rescoring WITH 4-gram lm lattice rescore
with modifications of this pr
results of 4-gram lattice rescore from #154
Rescoring WITHOUT 4-gram lm lattice rescore
with modifications of this pr
results from #154