support ivector training in pytorch model#3969
Conversation
|
Thanks a lot for reviewing!
…On Tue, Mar 3, 2020 at 9:47 AM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/aishell/s10/local/run_ivector_common.sh
<#3969 (comment)>:
> + ${temp_data_root}/${train_set}_sp_hires_max2 \
+ exp/nnet3${nnet3_affix}/extractor $ivectordir
+
+fi
+
+if [[ $stage -le 8 ]]; then
+ # Also extract iVectors for the test data, but in this case we don't need the speed
+ # perturbation (sp) or small-segment concatenation (comb).
+ for data in dev test; do
+ steps/online/nnet2/extract_ivectors_online.sh --cmd "$train_cmd" --nj 10 \
+ data/${data}_hires exp/nnet3${nnet3_affix}/extractor \
+ exp/nnet3${nnet3_affix}/ivectors_${data}_hires
+ done
+fi
+
+exit 0;
this file should end with a newline.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3969?email_source=notifications&email_token=AAZFLO2DVOTQ4UXOGPS7GC3RFROS5A5CNFSM4K7XDWTKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCXU67ZI#pullrequestreview-367652837>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO46SQT5HSXGZ6DFFHTRFROS5ANCNFSM4K7XDWTA>
.
|
|
Guys, I just want to mention something...
I think it would be better if we shifted (not necessarily right now..) to,
instead of exposing the Kaldi egs as a Dataset,
exposing them as a DataLoader. That way we could use the existing
command-line tools for things like shuffling and
time-shifting, and it will be much more efficient for I/O.
The idea is that the dataloader would, on every epoch, create a suitable
command line and read from it as a pipe.
If it was a distributed data-loader, probably the easiest way to do it
would be to make sure there is an appropriately
split scp file and give it the appropriate one. We could the scripts here
#3765
to generate the scp files. I want to merge this soon; one option is to
merge into pybind11 first to test it.
…On Tue, Mar 3, 2020 at 10:11 AM fanlu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/aishell/s10/chain/feat_dataset.py
<#3969 (comment)>:
>
with open(feats_scp, 'r') as f:
for line in f:
split = line.split()
assert len(split) == 2
- items.append(split)
-
- self.items = items
+ uttid, rxfilename =split
OK
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3969?email_source=notifications&email_token=AAZFLO6XPUG7BA6VEDOUVETRFRRNRA5CNFSM4K7XDWTKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCXVAYYY#discussion_r386762254>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2XIWLBFIYM4T3DZQ3RFRRNRANCNFSM4K7XDWTA>
.
|
|
@csukuangfj I have fixed the code with your suggestion. please have a look. |
|
OK, I'll run it after it's merged. |
|
I'll take a look at that pr and start to do this.
|
|
OK, merging. |
|
Great, thanks! Firstly, just doing the merge and figuring out how to use
those newer scripts to prepare the egs would be a great start.
…On Tue, Mar 3, 2020 at 11:19 AM Haowen Qiu ***@***.***> wrote:
I'll take a look at that pr and start to do this.
Guys, I just want to mention something... I think it would be better if we
shifted (not necessarily right now..) to, instead of exposing the Kaldi egs
as a Dataset, exposing them as a DataLoader. That way we could use the
existing command-line tools for things like shuffling and time-shifting,
and it will be much more efficient for I/O. The idea is that the dataloader
would, on every epoch, create a suitable command line and read from it as a
pipe. If it was a distributed data-loader, probably the easiest way to do
it would be to make sure there is an appropriately split scp file and give
it the appropriate one. We could the scripts here #3765
<#3765> to generate the scp files.
I want to merge this soon; one option is to merge into pybind11 first to
test it.
… <#m_3185835891003743640_>
On Tue, Mar 3, 2020 at 10:11 AM fanlu *@*.*> wrote: @.** commented on
this pull request. ------------------------------ In
egs/aishell/s10/chain/feat_dataset.py <#3969 (comment)
<#3969 (comment)>>: >
with open(feats_scp, 'r') as f: for line in f: split = line.split() assert
len(split) == 2 - items.append(split) - - self.items = items + uttid,
rxfilename =split OK — You are receiving this because you commented. Reply
to this email directly, view it on GitHub <#3969
<#3969>?email_source=notifications&email_token=AAZFLO6XPUG7BA6VEDOUVETRFRRNRA5CNFSM4K7XDWTKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCXVAYYY#discussion_r386762254>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZFLO2XIWLBFIYM4T3DZQ3RFRRNRANCNFSM4K7XDWTA
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3969?email_source=notifications&email_token=AAZFLO22JMLLIJ7M3UMK42LRFRZLFA5CNFSM4K7XDWTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENR5TZI#issuecomment-593746405>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO63INWGRC6Y5NEM6SLRFRZLFANCNFSM4K7XDWTA>
.
|
update latest result