forked from CGCL-codes/naturalcc
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathpython.log
More file actions
2269 lines (2269 loc) · 253 KB
/
Copy pathpython.log
File metadata and controls
2269 lines (2269 loc) · 253 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
nohup: ignoring input
Using backend: pytorch
[2021-03-22 10:30:46] INFO >> Load arguments in /home/wanyao/yang/naturalcc-dev/run/summarization/neural_transformer/relative/python_wan/python.yml (train.py:302, cli_main())
[2021-03-22 10:30:46] INFO >> {'criterion': 'neural_transformer', 'optimizer': 'torch_adam', 'lr_scheduler': 'fixed', 'tokenizer': None, 'bpe': None, 'common': {'no_progress_bar': 0, 'log_interval': 500, 'log_format': 'simple', 'tensorboard_logdir': '', 'memory_efficient_fp16': 0, 'fp16_no_flatten_grads': 0, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'empty_cache_freq': 0, 'task': 'be_summarization', 'seed': 1, 'cpu': 0, 'fp16': 0, 'fp16_opt_level': '01', 'bf16': 0, 'memory_efficient_bf16': 0, 'server_ip': '', 'server_port': ''}, 'dataset': {'num_workers': 3, 'skip_invalid_size_inputs_valid_test': 1, 'max_tokens': None, 'max_sentences': 64, 'required_batch_size_multiple': 1, 'dataset_impl': 'mmap', 'train_subset': 'train', 'valid_subset': 'valid', 'validate_interval': 1, 'fixed_validation_seed': None, 'disable_validation': 0, 'max_tokens_valid': None, 'max_sentences_valid': 1024, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0}, 'distributed_training': {'distributed_world_size': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'pipeline_model_parallel': 0, 'distributed_no_spawn': 0, 'ddp_backend': 'c10d', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': None, 'find_unused_parameters': 0, 'fast_stat_sync': 0, 'broadcast_buffers': 0, 'global_sync_iter': 50, 'warmup_iterations': 500, 'local_rank': -1, 'block_momentum': 0.875, 'block_lr': 1, 'use_nbm': 0, 'average_sync': 0}, 'task': {'data': '/mnt/wanyao/.ncc/python_wan/summarization/data-mmap', 'source_lang': 'code_tokens', 'target_lang': 'docstring_tokens', 'load_alignments': 0, 'left_pad_source': 0, 'left_pad_target': 0, 'max_source_positions': 400, 'max_target_positions': 32, 'upsample_primary': 1, 'truncate_source': 1, 'truncate_target': 1, 'eval_bleu': 1, 'eval_bleu_detok': 'space', 'eval_bleu_detok_args': None, 'eval_tokenized_bleu': 0, 'eval_bleu_remove_bpe': None, 'eval_bleu_args': None, 'eval_bleu_print_samples': 0}, 'model': {'arch': 'neural_transformer', 'pooler_dropout': 0.2, 'activation_fn': 'relu', 'dropout': 0.2, 'attention_dropout': 0.2, 'activation_dropout': 0.2, 'relu_dropout': 0.2, 'encoder_positional_embeddings': 1, 'encoder_learned_pos': 1, 'encoder_max_relative_len': 32, 'encoder_embed_path': 0, 'encoder_embed_dim': 512, 'encoder_ffn_embed_dim': 2048, 'encoder_layers': 6, 'encoder_attention_heads': 8, 'encoder_normalize_before': 0, 'encoder_position_encoding_version': 'ncc_learned', 'decoder_position_encoding_version': 'ncc_learned', 'multihead_attention_version': 'ncc', 'decoder_embed_path': '', 'decoder_positional_embeddings': 1, 'decoder_learned_pos': 1, 'decoder_max_relative_len': 0, 'decoder_embed_dim': 512, 'decoder_output_dim': 512, 'decoder_input_dim': 512, 'decoder_ffn_embed_dim': 2048, 'decoder_layers': 6, 'decoder_attention_heads': 8, 'decoder_normalize_before': 0, 'no_decoder_final_norm': 0, 'adaptive_softmax_cutoff': None, 'adaptive_softmax_dropout': 0.2, 'adaptive_softmax_factor': 0.0, 'share_decoder_input_output_embed': 1, 'share_all_embeddings': 0, 'adaptive_input': 0, 'adaptive_input_factor': 0.0, 'adaptive_input_cutoff': None, 'tie_adaptive_weights': 0, 'tie_adaptive_proj': 0, 'no_cross_attention': 0, 'cross_self_attention': 0, 'layer_wise_attention': 0, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'encoder_layers_to_keep': None, 'decoder_layers_to_keep': None, 'layernorm_embedding': 0, 'no_scale_embedding': 1, 'encoder_dropout_in': 0.2, 'encoder_dropout_out': 0.2, 'decoder_dropout_in': 0.2, 'decoder_dropout_out': 0.2, 'max_source_positions': 400, 'max_target_positions': 32}, 'optimization': {'max_epoch': 200, 'max_update': 0, 'clip_norm': 5, 'update_freq': [1], 'lrs': [0.0001], 'min_lr': -1, 'use_bmuf': 1, 'force_anneal': 0, 'warmup_updates': 0, 'end_learning_rate': 0.0, 'power': 1.0, 'total_num_update': 1000000, 'sentence_avg': 1, 'adam': {'adam_betas': '(0.9, 0.999)', 'adam_eps': 1e-08, 'weight_decay': 0.0, 'use_old_adam': 1}, 'sgd': {'momentum': 0, 'weight_decay': 0, 'dampening': 0, 'nesterov': 0}, 'lr_shrink': 0.99}, 'checkpoint': {'restore_file': 'checkpoint_last.pt', 'reset_dataloader': None, 'reset_lr_scheduler': None, 'reset_meters': None, 'reset_optimizer': None, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': 0, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': 0, 'no_epoch_checkpoints': 1, 'no_last_checkpoints': 1, 'no_save_optimizer_state': None, 'best_checkpoint_metric': 'bleu', 'maximize_best_checkpoint_metric': 1, 'patience': -1, 'save_dir': '/mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints', 'should_continue': 0, 'model_name_or_path': None, 'cache_dir': None, 'logging_steps': 500, 'save_steps': 2000, 'save_total_limit': 2, 'overwrite_output_dir': 0, 'overwrite_cache': 0}, 'eval': {'path': '/mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt', 'remove_bpe': None, 'quiet': 0, 'results_path': None, 'model_overrides': '{}', 'max_sentences': 512, 'beam': 1, 'nbest': 1, 'max_len_a': 0, 'max_len_b': 30, 'min_len': 1, 'match_source_len': 0, 'no_early_stop': 1, 'unnormalized': 0, 'no_beamable_mm': 0, 'lenpen': 1, 'unkpen': 0, 'replace_unk': None, 'sacrebleu': 0, 'score_reference': 0, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': 0, 'sampling_topk': -1, 'sampling_topp': -1, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': 0, 'print_step': 0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': 0, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': 0, 'retain_iter_history': 0, 'decoding_format': None, 'nltk_bleu': 1, 'rouge': 1}} (train.py:304, cli_main())
[2021-03-22 10:30:46] INFO >> single GPU training... (train.py:333, cli_main())
[2021-03-22 10:30:46] INFO >> [code_tokens] dictionary: 50000 types (be_summarization.py:137, setup_task())
[2021-03-22 10:30:46] INFO >> [docstring_tokens] dictionary: 30000 types (be_summarization.py:138, setup_task())
[2021-03-22 10:30:46] INFO >> truncate valid.code_tokens to 400 (be_summarization.py:72, load_langpair_dataset())
[2021-03-22 10:30:46] INFO >> truncate valid.docstring_tokens to 30 (be_summarization.py:80, load_langpair_dataset())
[2021-03-22 10:30:46] INFO >> loaded 18505 examples from: /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/valid.code_tokens (be_summarization.py:89, load_langpair_dataset())
[2021-03-22 10:30:46] INFO >> loaded 18505 examples from: /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/valid.docstring_tokens (be_summarization.py:90, load_langpair_dataset())
[2021-03-22 10:30:47] INFO >> NeuralTransformerModel(
(encoder): NeuralTransformerEncoder(
(embed_tokens): Embedding(50000, 512, padding_idx=0)
(embed_positions): LearnedPositionalEmbedding(400, 512)
(layers): ModuleList(
(0): NeuralTransformerEncoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
(relative_position_keys): Embedding(65, 64)
(relative_position_values): Embedding(65, 64)
)
(self_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(1): NeuralTransformerEncoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
(relative_position_keys): Embedding(65, 64)
(relative_position_values): Embedding(65, 64)
)
(self_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(2): NeuralTransformerEncoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
(relative_position_keys): Embedding(65, 64)
(relative_position_values): Embedding(65, 64)
)
(self_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(3): NeuralTransformerEncoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
(relative_position_keys): Embedding(65, 64)
(relative_position_values): Embedding(65, 64)
)
(self_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(4): NeuralTransformerEncoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
(relative_position_keys): Embedding(65, 64)
(relative_position_values): Embedding(65, 64)
)
(self_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(5): NeuralTransformerEncoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
(relative_position_keys): Embedding(65, 64)
(relative_position_values): Embedding(65, 64)
)
(self_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
)
)
(decoder): NeuralTransformerDecoder(
(embed_tokens): Embedding(30000, 512, padding_idx=0)
(embed_positions): LearnedPositionalEmbedding(32, 512)
(layers): ModuleList(
(0): NueralTransformerDecoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(self_attn_layer_norm): LayerNorm()
(encoder_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(encoder_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(1): NueralTransformerDecoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(self_attn_layer_norm): LayerNorm()
(encoder_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(encoder_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(2): NueralTransformerDecoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(self_attn_layer_norm): LayerNorm()
(encoder_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(encoder_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(3): NueralTransformerDecoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(self_attn_layer_norm): LayerNorm()
(encoder_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(encoder_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(4): NueralTransformerDecoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(self_attn_layer_norm): LayerNorm()
(encoder_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(encoder_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
(5): NueralTransformerDecoderLayer(
(self_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(self_attn_layer_norm): LayerNorm()
(encoder_attn): RelativeMultiheadAttention(
(k_proj): Linear(in_features=512, out_features=512, bias=True)
(v_proj): Linear(in_features=512, out_features=512, bias=True)
(q_proj): Linear(in_features=512, out_features=512, bias=True)
(out_proj): Linear(in_features=512, out_features=512, bias=True)
)
(encoder_attn_layer_norm): LayerNorm()
(fc1): Linear(in_features=512, out_features=2048, bias=True)
(fc2): Linear(in_features=2048, out_features=512, bias=True)
(ff_layer_norm): LayerNorm()
)
)
)
) (train.py:221, single_main())
[2021-03-22 10:30:47] INFO >> model neural_transformer, criterion NeuralTransformerCriterion (train.py:222, single_main())
[2021-03-22 10:30:48] INFO >> num. model params: 85399600 (num. trained: 85399600) (train.py:223, single_main())
[2021-03-22 10:30:53] INFO >> training on 1 GPUs (train.py:230, single_main())
[2021-03-22 10:30:53] INFO >> max tokens per GPU = None and max sentences per GPU = 64 (train.py:231, single_main())
[2021-03-22 10:30:53] INFO >> no existing checkpoint found /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_last.pt (ncc_trainer.py:269, load_checkpoint())
[2021-03-22 10:30:53] INFO >> loading train data for epoch 1 (ncc_trainer.py:283, get_train_iterator())
[2021-03-22 10:30:53] INFO >> truncate train.code_tokens to 400 (be_summarization.py:72, load_langpair_dataset())
[2021-03-22 10:30:53] INFO >> truncate train.docstring_tokens to 30 (be_summarization.py:80, load_langpair_dataset())
[2021-03-22 10:30:53] INFO >> loaded 55538 examples from: /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/train.code_tokens (be_summarization.py:89, load_langpair_dataset())
[2021-03-22 10:30:53] INFO >> loaded 55538 examples from: /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/train.docstring_tokens (be_summarization.py:90, load_langpair_dataset())
[2021-03-22 10:30:53] INFO >> NOTE: your device may support faster training with fp16 (ncc_trainer.py:154, _setup_optimizer())
/home/wanyao/yang/naturalcc-dev/ncc/utils/utils.py:574: UserWarning: amp_C fused kernels unavailable, disabling multi_tensor_l2norm; you may get better performance by installing NVIDIA's apex library
warnings.warn(
[2021-03-22 10:33:18] INFO >> epoch 001: 500 / 868 loss=394.487, nll_loss=37.936, bleu=0, ppl=2.62903e+11, wps=2413.3, ups=3.63, wpb=665.2, bsz=64, num_updates=500, lr=0.0001, gnorm=615.135, clip=100, train_wall=136, wall=145 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:34:59] INFO >> epoch 001 | loss 327.69 | nll_loss 31.453 | bleu 0 | ppl 2.94031e+09 | wps 2420.3 | ups 3.63 | wpb 666.6 | bsz 64 | num_updates 868 | lr 0.0001 | gnorm 511.731 | clip 100 | train_wall 236 | wall 246 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:36:44] INFO >> epoch 001 | valid on 'valid' subset | loss 181.012 | nll_loss 17.342 | bleu 9.61818 | ppl 166186 | wps 1955.2 | wpb 10165.6 | bsz 973.9 | num_updates 868 (progress_bar.py:269, print())
[2021-03-22 10:36:46] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 1 @ 868 updates, score 9.618184) (writing took 2.843599 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 10:37:29] INFO >> epoch 002: 132 / 868 loss=225.97, nll_loss=21.67, bleu=0, ppl=3.33585e+06, wps=1327.7, ups=1.99, wpb=667.4, bsz=64, num_updates=1000, lr=9.9e-05, gnorm=359.402, clip=100, train_wall=136, wall=396 (progress_bar.py:260, log())
[2021-03-22 10:39:47] INFO >> epoch 002: 632 / 868 loss=168.945, nll_loss=16.199, bleu=0, ppl=75245.1, wps=2427.9, ups=3.64, wpb=667.5, bsz=64, num_updates=1500, lr=9.9e-05, gnorm=306.585, clip=100, train_wall=136, wall=534 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:40:52] INFO >> epoch 002 | loss 166.761 | nll_loss 16.007 | bleu 0 | ppl 65834.8 | wps 1637.7 | ups 2.46 | wpb 666.6 | bsz 64 | num_updates 1736 | lr 9.9e-05 | gnorm 304.354 | clip 100 | train_wall 236 | wall 600 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:42:23] INFO >> epoch 002 | valid on 'valid' subset | loss 126.808 | nll_loss 12.149 | bleu 14.827 | ppl 4542.49 | wps 2284.6 | wpb 10165.6 | bsz 973.9 | num_updates 1736 | best_bleu 14.827 (progress_bar.py:269, print())
[2021-03-22 10:42:50] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 2 @ 1736 updates, score 14.826974) (writing took 26.741436 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 10:44:11] INFO >> epoch 003: 264 / 868 loss=139.538, nll_loss=13.412, bleu=0, ppl=10898.3, wps=1260.2, ups=1.89, wpb=665.6, bsz=64, num_updates=2000, lr=9.8e-05, gnorm=273.68, clip=100, train_wall=138, wall=798 (progress_bar.py:260, log())
[2021-03-22 10:46:28] INFO >> epoch 003: 764 / 868 loss=122.857, nll_loss=11.773, bleu=0, ppl=3499.28, wps=2436.1, ups=3.65, wpb=667.6, bsz=64, num_updates=2500, lr=9.8e-05, gnorm=248.625, clip=100, train_wall=135, wall=935 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:46:57] INFO >> epoch 003 | loss 125.275 | nll_loss 12.025 | bleu 0 | ppl 4166.24 | wps 1587 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 2604 | lr 9.8e-05 | gnorm 250.568 | clip 100 | train_wall 237 | wall 964 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:48:45] INFO >> epoch 003 | valid on 'valid' subset | loss 102.306 | nll_loss 9.802 | bleu 11.0808 | ppl 892.56 | wps 1884.1 | wpb 10165.6 | bsz 973.9 | num_updates 2604 | best_bleu 14.827 (progress_bar.py:269, print())
[2021-03-22 10:50:43] INFO >> epoch 004: 396 / 868 loss=110.891, nll_loss=10.642, bleu=0, ppl=1598.34, wps=1307.4, ups=1.96, wpb=666.9, bsz=64, num_updates=3000, lr=9.7e-05, gnorm=226.288, clip=100, train_wall=139, wall=1190 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:52:52] INFO >> epoch 004 | loss 105.694 | nll_loss 10.145 | bleu 0 | ppl 1132.33 | wps 1631.2 | ups 2.45 | wpb 666.6 | bsz 64 | num_updates 3472 | lr 9.7e-05 | gnorm 216.924 | clip 100 | train_wall 237 | wall 1319 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:54:29] INFO >> epoch 004 | valid on 'valid' subset | loss 92.668 | nll_loss 8.878 | bleu 14.6408 | ppl 470.61 | wps 2132.7 | wpb 10165.6 | bsz 973.9 | num_updates 3472 | best_bleu 14.827 (progress_bar.py:269, print())
[2021-03-22 10:54:43] INFO >> epoch 005: 28 / 868 loss=102.145, nll_loss=9.802, bleu=0, ppl=892.7, wps=1389.6, ups=2.08, wpb=666.6, bsz=64, num_updates=3500, lr=9.6e-05, gnorm=209.805, clip=100, train_wall=135, wall=1430 (progress_bar.py:260, log())
[2021-03-22 10:57:00] INFO >> epoch 005: 528 / 868 loss=95.312, nll_loss=9.153, bleu=0, ppl=569.23, wps=2424.5, ups=3.64, wpb=666.2, bsz=64, num_updates=4000, lr=9.6e-05, gnorm=194.673, clip=100, train_wall=136, wall=1567 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 10:58:34] INFO >> epoch 005 | loss 93.832 | nll_loss 9.006 | bleu 0 | ppl 514.29 | wps 1688.7 | ups 2.53 | wpb 666.6 | bsz 64 | num_updates 4340 | lr 9.6e-05 | gnorm 192.144 | clip 100 | train_wall 236 | wall 1661 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:00:16] INFO >> epoch 005 | valid on 'valid' subset | loss 86.033 | nll_loss 8.243 | bleu 13.345 | ppl 302.91 | wps 2027.4 | wpb 10165.6 | bsz 973.9 | num_updates 4340 | best_bleu 14.827 (progress_bar.py:269, print())
[2021-03-22 11:01:06] INFO >> epoch 006: 160 / 868 loss=90.224, nll_loss=8.666, bleu=0, ppl=406.27, wps=1352.7, ups=2.03, wpb=666, bsz=64, num_updates=4500, lr=9.5e-05, gnorm=186.46, clip=100, train_wall=136, wall=1813 (progress_bar.py:260, log())
[2021-03-22 11:03:24] INFO >> epoch 006: 660 / 868 loss=85.759, nll_loss=8.233, bleu=0, ppl=300.97, wps=2415.8, ups=3.62, wpb=666.6, bsz=64, num_updates=5000, lr=9.5e-05, gnorm=171.258, clip=100, train_wall=136, wall=1951 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:04:23] INFO >> epoch 006 | loss 85.578 | nll_loss 8.214 | bleu 0 | ppl 296.98 | wps 1658.6 | ups 2.49 | wpb 666.6 | bsz 64 | num_updates 5208 | lr 9.5e-05 | gnorm 171.223 | clip 100 | train_wall 237 | wall 2010 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:06:03] INFO >> epoch 006 | valid on 'valid' subset | loss 81.219 | nll_loss 7.781 | bleu 14.6896 | ppl 220.02 | wps 2054.2 | wpb 10165.6 | bsz 973.9 | num_updates 5208 | best_bleu 14.827 (progress_bar.py:269, print())
[2021-03-22 11:07:29] INFO >> epoch 007: 292 / 868 loss=81.288, nll_loss=7.803, bleu=0, ppl=223.31, wps=1359.4, ups=2.04, wpb=666.7, bsz=64, num_updates=5500, lr=9.4e-05, gnorm=158.53, clip=100, train_wall=137, wall=2197 (progress_bar.py:260, log())
[2021-03-22 11:09:48] INFO >> epoch 007: 792 / 868 loss=78.757, nll_loss=7.551, bleu=0, ppl=187.49, wps=2407.7, ups=3.61, wpb=667.3, bsz=64, num_updates=6000, lr=9.4e-05, gnorm=149.138, clip=100, train_wall=137, wall=2335 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:10:11] INFO >> epoch 007 | loss 78.939 | nll_loss 7.577 | bleu 0 | ppl 190.94 | wps 1665.9 | ups 2.5 | wpb 666.6 | bsz 64 | num_updates 6076 | lr 9.4e-05 | gnorm 150.883 | clip 100 | train_wall 238 | wall 2358 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:11:45] INFO >> epoch 007 | valid on 'valid' subset | loss 77.463 | nll_loss 7.422 | bleu 15.2367 | ppl 171.45 | wps 2174.8 | wpb 10165.6 | bsz 973.9 | num_updates 6076 | best_bleu 15.2367 (progress_bar.py:269, print())
[2021-03-22 11:12:12] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 7 @ 6076 updates, score 15.236682) (writing took 26.899306 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 11:14:16] INFO >> epoch 008: 424 / 868 loss=74.738, nll_loss=7.16, bleu=0, ppl=143.04, wps=1247.2, ups=1.87, wpb=667.7, bsz=64, num_updates=6500, lr=9.3e-05, gnorm=137.295, clip=100, train_wall=137, wall=2603 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:16:17] INFO >> epoch 008 | loss 73.226 | nll_loss 7.029 | bleu 0 | ppl 130.56 | wps 1577.2 | ups 2.37 | wpb 666.6 | bsz 64 | num_updates 6944 | lr 9.3e-05 | gnorm 131.44 | clip 100 | train_wall 235 | wall 2725 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:17:50] INFO >> epoch 008 | valid on 'valid' subset | loss 73.69 | nll_loss 7.06 | bleu 16.7192 | ppl 133.45 | wps 2233 | wpb 10165.6 | bsz 973.9 | num_updates 6944 | best_bleu 16.7192 (progress_bar.py:269, print())
[2021-03-22 11:18:17] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 8 @ 6944 updates, score 16.719207) (writing took 26.829999 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 11:18:39] INFO >> epoch 009: 56 / 868 loss=71.751, nll_loss=6.904, bleu=0, ppl=119.78, wps=1264.9, ups=1.9, wpb=665.1, bsz=64, num_updates=7000, lr=9.2e-05, gnorm=125.54, clip=100, train_wall=135, wall=2866 (progress_bar.py:260, log())
[2021-03-22 11:20:56] INFO >> epoch 009: 556 / 868 loss=68.739, nll_loss=6.591, bleu=0, ppl=96.43, wps=2430.3, ups=3.64, wpb=667.1, bsz=64, num_updates=7500, lr=9.2e-05, gnorm=114.327, clip=100, train_wall=135, wall=3003 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:22:23] INFO >> epoch 009 | loss 68.222 | nll_loss 6.548 | bleu 0 | ppl 93.59 | wps 1584.3 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 7812 | lr 9.2e-05 | gnorm 111.834 | clip 100 | train_wall 236 | wall 3090 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:24:02] INFO >> epoch 009 | valid on 'valid' subset | loss 71.461 | nll_loss 6.847 | bleu 16.0783 | ppl 115.08 | wps 2066.2 | wpb 10165.6 | bsz 973.9 | num_updates 7812 | best_bleu 16.7192 (progress_bar.py:269, print())
[2021-03-22 11:25:01] INFO >> epoch 010: 188 / 868 loss=66.553, nll_loss=6.391, bleu=0, ppl=83.9, wps=1359.1, ups=2.04, wpb=666.5, bsz=64, num_updates=8000, lr=9.1e-05, gnorm=105.2, clip=100, train_wall=137, wall=3248 (progress_bar.py:260, log())
[2021-03-22 11:27:22] INFO >> epoch 010: 688 / 868 loss=64.251, nll_loss=6.168, bleu=0, ppl=71.91, wps=2355.6, ups=3.53, wpb=666.4, bsz=64, num_updates=8500, lr=9.1e-05, gnorm=97.537, clip=100, train_wall=140, wall=3390 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:28:12] INFO >> epoch 010 | loss 64.443 | nll_loss 6.186 | bleu 0 | ppl 72.78 | wps 1654.7 | ups 2.48 | wpb 666.6 | bsz 64 | num_updates 8680 | lr 9.1e-05 | gnorm 97.665 | clip 100 | train_wall 240 | wall 3439 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:29:51] INFO >> epoch 010 | valid on 'valid' subset | loss 69.329 | nll_loss 6.642 | bleu 16.922 | ppl 99.89 | wps 2086.7 | wpb 10165.6 | bsz 973.9 | num_updates 8680 | best_bleu 16.922 (progress_bar.py:269, print())
[2021-03-22 11:30:30] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 10 @ 8680 updates, score 16.921956) (writing took 38.756878 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 11:32:03] INFO >> epoch 011: 320 / 868 loss=62.584, nll_loss=6.005, bleu=0, ppl=64.23, wps=1189.8, ups=1.78, wpb=666.7, bsz=64, num_updates=9000, lr=9e-05, gnorm=92.05, clip=100, train_wall=134, wall=3670 (progress_bar.py:260, log())
[2021-03-22 11:34:19] INFO >> epoch 011: 820 / 868 loss=61.423, nll_loss=5.907, bleu=0, ppl=60, wps=2432.4, ups=3.66, wpb=665.5, bsz=64, num_updates=9500, lr=9e-05, gnorm=88.26, clip=100, train_wall=135, wall=3807 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:34:33] INFO >> epoch 011 | loss 61.605 | nll_loss 5.913 | bleu 0 | ppl 60.26 | wps 1518.2 | ups 2.28 | wpb 666.6 | bsz 64 | num_updates 9548 | lr 9e-05 | gnorm 89.454 | clip 100 | train_wall 234 | wall 3821 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:36:04] INFO >> epoch 011 | valid on 'valid' subset | loss 68.418 | nll_loss 6.555 | bleu 18.0218 | ppl 94.03 | wps 2271.8 | wpb 10165.6 | bsz 973.9 | num_updates 9548 | best_bleu 18.0218 (progress_bar.py:269, print())
[2021-03-22 11:36:31] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 11 @ 9548 updates, score 18.021818) (writing took 26.804324 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 11:38:43] INFO >> epoch 012: 452 / 868 loss=59.256, nll_loss=5.69, bleu=0, ppl=51.64, wps=1262.2, ups=1.89, wpb=666.2, bsz=64, num_updates=10000, lr=9e-05, gnorm=85.906, clip=100, train_wall=137, wall=4070 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:40:38] INFO >> epoch 012 | loss 59.238 | nll_loss 5.686 | bleu 0 | ppl 51.48 | wps 1586.6 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 10416 | lr 9e-05 | gnorm 84.697 | clip 100 | train_wall 237 | wall 4185 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:42:11] INFO >> epoch 012 | valid on 'valid' subset | loss 67.78 | nll_loss 6.494 | bleu 18.415 | ppl 90.13 | wps 2222.6 | wpb 10165.6 | bsz 973.9 | num_updates 10416 | best_bleu 18.415 (progress_bar.py:269, print())
[2021-03-22 11:42:38] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 12 @ 10416 updates, score 18.415004) (writing took 26.866578 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 11:43:08] INFO >> epoch 013: 84 / 868 loss=59.111, nll_loss=5.662, bleu=0, ppl=50.63, wps=1260.9, ups=1.89, wpb=668.2, bsz=64, num_updates=10500, lr=8.9e-05, gnorm=83.404, clip=100, train_wall=136, wall=4335 (progress_bar.py:260, log())
[2021-03-22 11:45:24] INFO >> epoch 013: 584 / 868 loss=56.987, nll_loss=5.473, bleu=0, ppl=44.42, wps=2450.4, ups=3.68, wpb=666.1, bsz=64, num_updates=11000, lr=8.9e-05, gnorm=81.487, clip=100, train_wall=134, wall=4471 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:46:43] INFO >> epoch 013 | loss 57.077 | nll_loss 5.479 | bleu 0 | ppl 44.59 | wps 1587.2 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 11284 | lr 8.9e-05 | gnorm 80.723 | clip 100 | train_wall 234 | wall 4550 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:48:18] INFO >> epoch 013 | valid on 'valid' subset | loss 66.999 | nll_loss 6.419 | bleu 18.5463 | ppl 85.57 | wps 2151 | wpb 10165.6 | bsz 973.9 | num_updates 11284 | best_bleu 18.5463 (progress_bar.py:269, print())
[2021-03-22 11:48:54] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 13 @ 11284 updates, score 18.546316) (writing took 35.850216 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 11:50:02] INFO >> epoch 014: 216 / 868 loss=56.119, nll_loss=5.384, bleu=0, ppl=41.77, wps=1200.5, ups=1.8, wpb=667.1, bsz=64, num_updates=11500, lr=8.8e-05, gnorm=78.713, clip=100, train_wall=138, wall=4749 (progress_bar.py:260, log())
[2021-03-22 11:52:19] INFO >> epoch 014: 716 / 868 loss=55.23, nll_loss=5.296, bleu=0, ppl=39.28, wps=2431.5, ups=3.64, wpb=667.2, bsz=64, num_updates=12000, lr=8.8e-05, gnorm=78.223, clip=100, train_wall=135, wall=4886 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:53:02] INFO >> epoch 014 | loss 55.096 | nll_loss 5.288 | bleu 0 | ppl 39.08 | wps 1527.2 | ups 2.29 | wpb 666.6 | bsz 64 | num_updates 12152 | lr 8.8e-05 | gnorm 77.994 | clip 100 | train_wall 237 | wall 4929 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:54:34] INFO >> epoch 014 | valid on 'valid' subset | loss 67.058 | nll_loss 6.425 | bleu 19.4293 | ppl 85.91 | wps 2252.2 | wpb 10165.6 | bsz 973.9 | num_updates 12152 | best_bleu 19.4293 (progress_bar.py:269, print())
[2021-03-22 11:55:00] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 14 @ 12152 updates, score 19.42935) (writing took 26.840527 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 11:56:43] INFO >> epoch 015: 348 / 868 loss=53.764, nll_loss=5.167, bleu=0, ppl=35.92, wps=1263.5, ups=1.9, wpb=666, bsz=64, num_updates=12500, lr=8.7e-05, gnorm=76.722, clip=100, train_wall=136, wall=5150 (progress_bar.py:260, log())
[2021-03-22 11:59:02] INFO >> epoch 015: 848 / 868 loss=53.355, nll_loss=5.123, bleu=0, ppl=34.86, wps=2399.8, ups=3.6, wpb=666.2, bsz=64, num_updates=13000, lr=8.7e-05, gnorm=75.881, clip=100, train_wall=137, wall=5289 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 11:59:08] INFO >> epoch 015 | loss 53.25 | nll_loss 5.111 | bleu 0 | ppl 34.56 | wps 1580 | ups 2.37 | wpb 666.6 | bsz 64 | num_updates 13020 | lr 8.7e-05 | gnorm 76.275 | clip 100 | train_wall 237 | wall 5295 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:00:40] INFO >> epoch 015 | valid on 'valid' subset | loss 66.282 | nll_loss 6.35 | bleu 19.5357 | ppl 81.59 | wps 2258.3 | wpb 10165.6 | bsz 973.9 | num_updates 13020 | best_bleu 19.5357 (progress_bar.py:269, print())
[2021-03-22 12:01:13] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 15 @ 13020 updates, score 19.535703) (writing took 33.248723 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:03:31] INFO >> epoch 016: 480 / 868 loss=51.187, nll_loss=4.923, bleu=0, ppl=30.33, wps=1235, ups=1.86, wpb=665.2, bsz=64, num_updates=13500, lr=8.6e-05, gnorm=74.615, clip=100, train_wall=135, wall=5558 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:05:20] INFO >> epoch 016 | loss 51.449 | nll_loss 4.938 | bleu 0 | ppl 30.66 | wps 1553.7 | ups 2.33 | wpb 666.6 | bsz 64 | num_updates 13888 | lr 8.6e-05 | gnorm 74.198 | clip 100 | train_wall 237 | wall 5667 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:06:54] INFO >> epoch 016 | valid on 'valid' subset | loss 66.115 | nll_loss 6.334 | bleu 20.2127 | ppl 80.69 | wps 2221.5 | wpb 10165.6 | bsz 973.9 | num_updates 13888 | best_bleu 20.2127 (progress_bar.py:269, print())
[2021-03-22 12:07:20] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 16 @ 13888 updates, score 20.21269) (writing took 26.896777 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:07:58] INFO >> epoch 017: 112 / 868 loss=51.325, nll_loss=4.912, bleu=0, ppl=30.11, wps=1250.7, ups=1.87, wpb=668.7, bsz=64, num_updates=14000, lr=8.5e-05, gnorm=73.801, clip=100, train_wall=138, wall=5825 (progress_bar.py:260, log())
[2021-03-22 12:10:14] INFO >> epoch 017: 612 / 868 loss=49.663, nll_loss=4.761, bleu=0, ppl=27.11, wps=2451.3, ups=3.67, wpb=667.4, bsz=64, num_updates=14500, lr=8.5e-05, gnorm=72.732, clip=100, train_wall=134, wall=5962 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:11:27] INFO >> epoch 017 | loss 49.772 | nll_loss 4.777 | bleu 0 | ppl 27.42 | wps 1576.3 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 14756 | lr 8.5e-05 | gnorm 72.853 | clip 100 | train_wall 237 | wall 6034 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:13:04] INFO >> epoch 017 | valid on 'valid' subset | loss 66.032 | nll_loss 6.326 | bleu 20.3689 | ppl 80.25 | wps 2145.7 | wpb 10165.6 | bsz 973.9 | num_updates 14756 | best_bleu 20.3689 (progress_bar.py:269, print())
[2021-03-22 12:13:53] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 17 @ 14756 updates, score 20.368872) (writing took 49.082980 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:15:07] INFO >> epoch 018: 244 / 868 loss=48.914, nll_loss=4.706, bleu=0, ppl=26.1, wps=1135.9, ups=1.71, wpb=665.2, bsz=64, num_updates=15000, lr=8.4e-05, gnorm=72.346, clip=100, train_wall=138, wall=6254 (progress_bar.py:260, log())
[2021-03-22 12:17:28] INFO >> epoch 018: 744 / 868 loss=48.4, nll_loss=4.642, bleu=0, ppl=24.97, wps=2371, ups=3.55, wpb=667, bsz=64, num_updates=15500, lr=8.4e-05, gnorm=71.855, clip=100, train_wall=139, wall=6395 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:18:03] INFO >> epoch 018 | loss 48.117 | nll_loss 4.618 | bleu 0 | ppl 24.56 | wps 1463.2 | ups 2.19 | wpb 666.6 | bsz 64 | num_updates 15624 | lr 8.4e-05 | gnorm 71.861 | clip 100 | train_wall 240 | wall 6430 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:19:35] INFO >> epoch 018 | valid on 'valid' subset | loss 65.946 | nll_loss 6.318 | bleu 20.8986 | ppl 79.79 | wps 2236.1 | wpb 10165.6 | bsz 973.9 | num_updates 15624 | best_bleu 20.8986 (progress_bar.py:269, print())
[2021-03-22 12:20:03] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 18 @ 15624 updates, score 20.898626) (writing took 27.478516 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:21:54] INFO >> epoch 019: 376 / 868 loss=46.475, nll_loss=4.47, bleu=0, ppl=22.17, wps=1249.9, ups=1.88, wpb=665.1, bsz=64, num_updates=16000, lr=8.3e-05, gnorm=71.685, clip=100, train_wall=137, wall=6661 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:24:12] INFO >> epoch 019 | loss 46.576 | nll_loss 4.471 | bleu 0 | ppl 22.17 | wps 1567.2 | ups 2.35 | wpb 666.6 | bsz 64 | num_updates 16492 | lr 8.3e-05 | gnorm 71.327 | clip 100 | train_wall 239 | wall 6799 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:25:46] INFO >> epoch 019 | valid on 'valid' subset | loss 66.167 | nll_loss 6.339 | bleu 21.3355 | ppl 80.97 | wps 2198.4 | wpb 10165.6 | bsz 973.9 | num_updates 16492 | best_bleu 21.3355 (progress_bar.py:269, print())
[2021-03-22 12:26:13] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 19 @ 16492 updates, score 21.335502) (writing took 27.214389 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:26:22] INFO >> epoch 020: 8 / 868 loss=47.003, nll_loss=4.506, bleu=0, ppl=22.72, wps=1247.3, ups=1.87, wpb=667.6, bsz=64, num_updates=16500, lr=8.3e-05, gnorm=70.985, clip=100, train_wall=138, wall=6929 (progress_bar.py:260, log())
[2021-03-22 12:28:38] INFO >> epoch 020: 508 / 868 loss=44.579, nll_loss=4.284, bleu=0, ppl=19.48, wps=2443.7, ups=3.67, wpb=666, bsz=64, num_updates=17000, lr=8.3e-05, gnorm=70.75, clip=100, train_wall=134, wall=7065 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:30:20] INFO >> epoch 020 | loss 45.083 | nll_loss 4.327 | bleu 0 | ppl 20.07 | wps 1570.3 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 17360 | lr 8.3e-05 | gnorm 70.524 | clip 100 | train_wall 237 | wall 7167 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:31:53] INFO >> epoch 020 | valid on 'valid' subset | loss 66.35 | nll_loss 6.357 | bleu 21.668 | ppl 81.96 | wps 2221.6 | wpb 10165.6 | bsz 973.9 | num_updates 17360 | best_bleu 21.668 (progress_bar.py:269, print())
[2021-03-22 12:32:20] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 20 @ 17360 updates, score 21.668022) (writing took 26.966966 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:33:04] INFO >> epoch 021: 140 / 868 loss=44.929, nll_loss=4.307, bleu=0, ppl=19.79, wps=1253.1, ups=1.88, wpb=667.1, bsz=63.9, num_updates=17500, lr=8.2e-05, gnorm=70.153, clip=100, train_wall=137, wall=7331 (progress_bar.py:260, log())
[2021-03-22 12:35:23] INFO >> epoch 021: 640 / 868 loss=43.607, nll_loss=4.183, bleu=0, ppl=18.16, wps=2407.4, ups=3.61, wpb=667.2, bsz=64, num_updates=18000, lr=8.2e-05, gnorm=69.759, clip=100, train_wall=137, wall=7470 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:36:26] INFO >> epoch 021 | loss 43.601 | nll_loss 4.185 | bleu 0 | ppl 18.19 | wps 1583.3 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 18228 | lr 8.2e-05 | gnorm 69.797 | clip 100 | train_wall 235 | wall 7533 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:37:58] INFO >> epoch 021 | valid on 'valid' subset | loss 65.834 | nll_loss 6.307 | bleu 22.107 | ppl 79.2 | wps 2235.5 | wpb 10165.6 | bsz 973.9 | num_updates 18228 | best_bleu 22.107 (progress_bar.py:269, print())
[2021-03-22 12:38:25] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 21 @ 18228 updates, score 22.106985) (writing took 27.333132 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:39:47] INFO >> epoch 022: 272 / 868 loss=42.912, nll_loss=4.11, bleu=0, ppl=17.27, wps=1264.9, ups=1.89, wpb=668.2, bsz=64, num_updates=18500, lr=8.1e-05, gnorm=69.911, clip=100, train_wall=136, wall=7734 (progress_bar.py:260, log())
[2021-03-22 12:42:06] INFO >> epoch 022: 772 / 868 loss=42.364, nll_loss=4.072, bleu=0, ppl=16.82, wps=2392.5, ups=3.59, wpb=665.6, bsz=64, num_updates=19000, lr=8.1e-05, gnorm=69.604, clip=100, train_wall=137, wall=7873 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:42:34] INFO >> epoch 022 | loss 42.196 | nll_loss 4.05 | bleu 0 | ppl 16.57 | wps 1572.6 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 19096 | lr 8.1e-05 | gnorm 69.661 | clip 100 | train_wall 238 | wall 7901 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:44:07] INFO >> epoch 022 | valid on 'valid' subset | loss 66.251 | nll_loss 6.347 | bleu 22.38 | ppl 81.42 | wps 2222.2 | wpb 10165.6 | bsz 973.9 | num_updates 19096 | best_bleu 22.38 (progress_bar.py:269, print())
[2021-03-22 12:44:34] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 22 @ 19096 updates, score 22.379962) (writing took 26.902600 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:46:30] INFO >> epoch 023: 404 / 868 loss=40.725, nll_loss=3.912, bleu=0, ppl=15.05, wps=1260.6, ups=1.89, wpb=666, bsz=64, num_updates=19500, lr=8e-05, gnorm=69.667, clip=100, train_wall=136, wall=8137 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:48:40] INFO >> epoch 023 | loss 40.844 | nll_loss 3.92 | bleu 0 | ppl 15.14 | wps 1577.5 | ups 2.37 | wpb 666.6 | bsz 64 | num_updates 19964 | lr 8e-05 | gnorm 69.442 | clip 100 | train_wall 237 | wall 8268 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:50:14] INFO >> epoch 023 | valid on 'valid' subset | loss 66.497 | nll_loss 6.371 | bleu 22.6359 | ppl 82.77 | wps 2222.3 | wpb 10165.6 | bsz 973.9 | num_updates 19964 | best_bleu 22.6359 (progress_bar.py:269, print())
[2021-03-22 12:50:45] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 23 @ 19964 updates, score 22.635918) (writing took 31.975021 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:51:02] INFO >> epoch 024: 36 / 868 loss=41.055, nll_loss=3.944, bleu=0, ppl=15.4, wps=1224.8, ups=1.84, wpb=666.1, bsz=64, num_updates=20000, lr=7.9e-05, gnorm=69.005, clip=100, train_wall=138, wall=8409 (progress_bar.py:260, log())
[2021-03-22 12:53:19] INFO >> epoch 024: 536 / 868 loss=39.186, nll_loss=3.76, bleu=0, ppl=13.55, wps=2424.8, ups=3.64, wpb=666.6, bsz=64, num_updates=20500, lr=7.9e-05, gnorm=69.445, clip=100, train_wall=136, wall=8546 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:54:53] INFO >> epoch 024 | loss 39.523 | nll_loss 3.794 | bleu 0 | ppl 13.87 | wps 1553 | ups 2.33 | wpb 666.6 | bsz 64 | num_updates 20832 | lr 7.9e-05 | gnorm 69.324 | clip 100 | train_wall 237 | wall 8640 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 12:56:27] INFO >> epoch 024 | valid on 'valid' subset | loss 67.136 | nll_loss 6.432 | bleu 22.958 | ppl 86.35 | wps 2213.4 | wpb 10165.6 | bsz 973.9 | num_updates 20832 | best_bleu 22.958 (progress_bar.py:269, print())
[2021-03-22 12:56:54] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 24 @ 20832 updates, score 22.958019) (writing took 27.322317 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 12:57:47] INFO >> epoch 025: 168 / 868 loss=39.258, nll_loss=3.763, bleu=0, ppl=13.58, wps=1248.7, ups=1.87, wpb=667.7, bsz=64, num_updates=21000, lr=7.9e-05, gnorm=69.385, clip=100, train_wall=138, wall=8814 (progress_bar.py:260, log())
[2021-03-22 13:00:05] INFO >> epoch 025: 668 / 868 loss=38.382, nll_loss=3.68, bleu=0, ppl=12.82, wps=2408, ups=3.61, wpb=667.5, bsz=64, num_updates=21500, lr=7.9e-05, gnorm=69.107, clip=100, train_wall=137, wall=8952 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:01:01] INFO >> epoch 025 | loss 38.255 | nll_loss 3.672 | bleu 0 | ppl 12.75 | wps 1574.2 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 21700 | lr 7.9e-05 | gnorm 69.114 | clip 100 | train_wall 236 | wall 9008 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:02:35] INFO >> epoch 025 | valid on 'valid' subset | loss 66.293 | nll_loss 6.351 | bleu 23.4399 | ppl 81.65 | wps 2183.8 | wpb 10165.6 | bsz 973.9 | num_updates 21700 | best_bleu 23.4399 (progress_bar.py:269, print())
[2021-03-22 13:03:03] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 25 @ 21700 updates, score 23.43993) (writing took 27.457098 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 13:04:32] INFO >> epoch 026: 300 / 868 loss=37.149, nll_loss=3.577, bleu=0, ppl=11.93, wps=1246.8, ups=1.88, wpb=664.4, bsz=64, num_updates=22000, lr=7.8e-05, gnorm=69.059, clip=100, train_wall=136, wall=9219 (progress_bar.py:260, log())
[2021-03-22 13:06:49] INFO >> epoch 026: 800 / 868 loss=37.528, nll_loss=3.604, bleu=0, ppl=12.16, wps=2432.1, ups=3.65, wpb=666.4, bsz=64, num_updates=22500, lr=7.8e-05, gnorm=69.223, clip=100, train_wall=135, wall=9356 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:07:07] INFO >> epoch 026 | loss 37.109 | nll_loss 3.562 | bleu 0 | ppl 11.81 | wps 1577.7 | ups 2.37 | wpb 666.6 | bsz 64 | num_updates 22568 | lr 7.8e-05 | gnorm 69.27 | clip 100 | train_wall 235 | wall 9375 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:08:43] INFO >> epoch 026 | valid on 'valid' subset | loss 67.315 | nll_loss 6.449 | bleu 23.7449 | ppl 87.39 | wps 2151.7 | wpb 10165.6 | bsz 973.9 | num_updates 22568 | best_bleu 23.7449 (progress_bar.py:269, print())
[2021-03-22 13:09:33] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 26 @ 22568 updates, score 23.744881) (writing took 49.491113 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 13:11:37] INFO >> epoch 027: 432 / 868 loss=35.682, nll_loss=3.423, bleu=0, ppl=10.73, wps=1156.5, ups=1.73, wpb=666.9, bsz=64, num_updates=23000, lr=7.7e-05, gnorm=69.339, clip=100, train_wall=134, wall=9644 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:13:37] INFO >> epoch 027 | loss 35.82 | nll_loss 3.438 | bleu 0 | ppl 10.84 | wps 1483.9 | ups 2.23 | wpb 666.6 | bsz 64 | num_updates 23436 | lr 7.7e-05 | gnorm 69.219 | clip 100 | train_wall 235 | wall 9764 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:15:12] INFO >> epoch 027 | valid on 'valid' subset | loss 67.229 | nll_loss 6.441 | bleu 23.3629 | ppl 86.89 | wps 2178.6 | wpb 10165.6 | bsz 973.9 | num_updates 23436 | best_bleu 23.7449 (progress_bar.py:269, print())
[2021-03-22 13:15:36] INFO >> epoch 028: 64 / 868 loss=35.906, nll_loss=3.446, bleu=0, ppl=10.9, wps=1393.6, ups=2.09, wpb=666.5, bsz=64, num_updates=23500, lr=7.6e-05, gnorm=69.13, clip=100, train_wall=136, wall=9883 (progress_bar.py:260, log())
[2021-03-22 13:17:55] INFO >> epoch 028: 564 / 868 loss=34.399, nll_loss=3.304, bleu=0, ppl=9.87, wps=2404.7, ups=3.61, wpb=666.1, bsz=64, num_updates=24000, lr=7.6e-05, gnorm=69.383, clip=100, train_wall=137, wall=10022 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:19:19] INFO >> epoch 028 | loss 34.687 | nll_loss 3.329 | bleu 0 | ppl 10.05 | wps 1693.5 | ups 2.54 | wpb 666.6 | bsz 64 | num_updates 24304 | lr 7.6e-05 | gnorm 69.343 | clip 100 | train_wall 237 | wall 10106 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:20:54] INFO >> epoch 028 | valid on 'valid' subset | loss 68.626 | nll_loss 6.575 | bleu 23.908 | ppl 95.34 | wps 2172.9 | wpb 10165.6 | bsz 973.9 | num_updates 24304 | best_bleu 23.908 (progress_bar.py:269, print())
[2021-03-22 13:21:21] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 28 @ 24304 updates, score 23.907996) (writing took 27.232985 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 13:22:20] INFO >> epoch 029: 196 / 868 loss=34.231, nll_loss=3.284, bleu=0, ppl=9.74, wps=1255, ups=1.88, wpb=666.8, bsz=64, num_updates=24500, lr=7.5e-05, gnorm=69.178, clip=100, train_wall=135, wall=10287 (progress_bar.py:260, log())
[2021-03-22 13:24:39] INFO >> epoch 029: 696 / 868 loss=33.85, nll_loss=3.242, bleu=0, ppl=9.46, wps=2401.2, ups=3.59, wpb=668.2, bsz=64, num_updates=25000, lr=7.5e-05, gnorm=69.581, clip=100, train_wall=137, wall=10427 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:25:27] INFO >> epoch 029 | loss 33.572 | nll_loss 3.222 | bleu 0 | ppl 9.33 | wps 1571.4 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 25172 | lr 7.5e-05 | gnorm 69.33 | clip 100 | train_wall 236 | wall 10474 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:27:02] INFO >> epoch 029 | valid on 'valid' subset | loss 69.099 | nll_loss 6.62 | bleu 24.2288 | ppl 98.38 | wps 2173.3 | wpb 10165.6 | bsz 973.9 | num_updates 25172 | best_bleu 24.2288 (progress_bar.py:269, print())
[2021-03-22 13:27:29] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 29 @ 25172 updates, score 24.228801) (writing took 27.032066 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 13:29:05] INFO >> epoch 030: 328 / 868 loss=32.903, nll_loss=3.155, bleu=0, ppl=8.91, wps=1257.9, ups=1.89, wpb=667.1, bsz=64, num_updates=25500, lr=7.5e-05, gnorm=69.695, clip=100, train_wall=135, wall=10692 (progress_bar.py:260, log())
[2021-03-22 13:31:22] INFO >> epoch 030: 828 / 868 loss=32.586, nll_loss=3.136, bleu=0, ppl=8.79, wps=2415.7, ups=3.63, wpb=665, bsz=64, num_updates=26000, lr=7.5e-05, gnorm=69.435, clip=100, train_wall=136, wall=10829 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:31:33] INFO >> epoch 030 | loss 32.495 | nll_loss 3.119 | bleu 0 | ppl 8.69 | wps 1580.4 | ups 2.37 | wpb 666.6 | bsz 64 | num_updates 26040 | lr 7.5e-05 | gnorm 69.681 | clip 100 | train_wall 234 | wall 10840 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:33:09] INFO >> epoch 030 | valid on 'valid' subset | loss 69.495 | nll_loss 6.658 | bleu 24.4606 | ppl 101 | wps 2166.6 | wpb 10165.6 | bsz 973.9 | num_updates 26040 | best_bleu 24.4606 (progress_bar.py:269, print())
[2021-03-22 13:33:36] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 30 @ 26040 updates, score 24.460579) (writing took 27.035607 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 13:35:48] INFO >> epoch 031: 460 / 868 loss=31.167, nll_loss=2.999, bleu=0, ppl=7.99, wps=1249.4, ups=1.88, wpb=665.2, bsz=64, num_updates=26500, lr=7.4e-05, gnorm=69.511, clip=100, train_wall=135, wall=11096 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:37:41] INFO >> epoch 031 | loss 31.469 | nll_loss 3.021 | bleu 0 | ppl 8.12 | wps 1572.5 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 26908 | lr 7.4e-05 | gnorm 69.718 | clip 100 | train_wall 236 | wall 11208 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:39:17] INFO >> epoch 031 | valid on 'valid' subset | loss 68.237 | nll_loss 6.538 | bleu 24.2246 | ppl 92.9 | wps 2155.6 | wpb 10165.6 | bsz 973.9 | num_updates 26908 | best_bleu 24.4606 (progress_bar.py:269, print())
[2021-03-22 13:39:48] INFO >> epoch 032: 92 / 868 loss=31.589, nll_loss=3.027, bleu=0, ppl=8.15, wps=1391.1, ups=2.08, wpb=667.6, bsz=64, num_updates=27000, lr=7.3e-05, gnorm=69.855, clip=100, train_wall=136, wall=11336 (progress_bar.py:260, log())
[2021-03-22 13:42:07] INFO >> epoch 032: 592 / 868 loss=30.188, nll_loss=2.898, bleu=0, ppl=7.45, wps=2405.1, ups=3.61, wpb=666.5, bsz=64, num_updates=27500, lr=7.3e-05, gnorm=69.995, clip=100, train_wall=137, wall=11474 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:43:25] INFO >> epoch 032 | loss 30.461 | nll_loss 2.924 | bleu 0 | ppl 7.59 | wps 1682.7 | ups 2.52 | wpb 666.6 | bsz 64 | num_updates 27776 | lr 7.3e-05 | gnorm 70.014 | clip 100 | train_wall 238 | wall 11552 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:44:57] INFO >> epoch 032 | valid on 'valid' subset | loss 70.504 | nll_loss 6.755 | bleu 24.6509 | ppl 108 | wps 2247.5 | wpb 10165.6 | bsz 973.9 | num_updates 27776 | best_bleu 24.6509 (progress_bar.py:269, print())
[2021-03-22 13:45:25] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 32 @ 27776 updates, score 24.6509) (writing took 27.805146 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 13:46:33] INFO >> epoch 033: 224 / 868 loss=30.35, nll_loss=2.901, bleu=0, ppl=7.47, wps=1257.2, ups=1.88, wpb=669.6, bsz=64, num_updates=28000, lr=7.2e-05, gnorm=70.135, clip=100, train_wall=137, wall=11740 (progress_bar.py:260, log())
[2021-03-22 13:48:54] INFO >> epoch 033: 724 / 868 loss=29.404, nll_loss=2.831, bleu=0, ppl=7.12, wps=2363.7, ups=3.56, wpb=664.4, bsz=64, num_updates=28500, lr=7.2e-05, gnorm=70.083, clip=100, train_wall=139, wall=11881 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:49:34] INFO >> epoch 033 | loss 29.462 | nll_loss 2.828 | bleu 0 | ppl 7.1 | wps 1566.7 | ups 2.35 | wpb 666.6 | bsz 64 | num_updates 28644 | lr 7.2e-05 | gnorm 70.064 | clip 100 | train_wall 239 | wall 11922 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:51:07] INFO >> epoch 033 | valid on 'valid' subset | loss 70.838 | nll_loss 6.787 | bleu 24.8017 | ppl 110.42 | wps 2246.2 | wpb 10165.6 | bsz 973.9 | num_updates 28644 | best_bleu 24.8017 (progress_bar.py:269, print())
[2021-03-22 13:51:34] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 33 @ 28644 updates, score 24.801742) (writing took 27.460146 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 13:53:17] INFO >> epoch 034: 356 / 868 loss=28.555, nll_loss=2.741, bleu=0, ppl=6.68, wps=1266, ups=1.9, wpb=666.8, bsz=64, num_updates=29000, lr=7.2e-05, gnorm=70.078, clip=100, train_wall=135, wall=12144 (progress_bar.py:260, log())
[2021-03-22 13:55:36] INFO >> epoch 034: 856 / 868 loss=28.98, nll_loss=2.782, bleu=0, ppl=6.88, wps=2394.4, ups=3.59, wpb=666.4, bsz=64, num_updates=29500, lr=7.2e-05, gnorm=70.193, clip=100, train_wall=137, wall=12284 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:55:40] INFO >> epoch 034 | loss 28.532 | nll_loss 2.739 | bleu 0 | ppl 6.67 | wps 1581.8 | ups 2.37 | wpb 666.6 | bsz 64 | num_updates 29512 | lr 7.2e-05 | gnorm 70.116 | clip 100 | train_wall 236 | wall 12287 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 13:57:15] INFO >> epoch 034 | valid on 'valid' subset | loss 70.348 | nll_loss 6.74 | bleu 25.2817 | ppl 106.88 | wps 2183.5 | wpb 10165.6 | bsz 973.9 | num_updates 29512 | best_bleu 25.2817 (progress_bar.py:269, print())
[2021-03-22 13:57:43] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 34 @ 29512 updates, score 25.281682) (writing took 27.806838 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 14:00:04] INFO >> epoch 035: 488 / 868 loss=27.367, nll_loss=2.626, bleu=0, ppl=6.17, wps=1246.4, ups=1.87, wpb=666.9, bsz=64, num_updates=30000, lr=7.1e-05, gnorm=70.641, clip=100, train_wall=137, wall=12551 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:01:48] INFO >> epoch 035 | loss 27.6 | nll_loss 2.649 | bleu 0 | ppl 6.27 | wps 1572.8 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 30380 | lr 7.1e-05 | gnorm 70.46 | clip 100 | train_wall 235 | wall 12655 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:03:20] INFO >> epoch 035 | valid on 'valid' subset | loss 72.343 | nll_loss 6.931 | bleu 25.2594 | ppl 122.03 | wps 2240.9 | wpb 10165.6 | bsz 973.9 | num_updates 30380 | best_bleu 25.2817 (progress_bar.py:269, print())
[2021-03-22 14:04:01] INFO >> epoch 036: 120 / 868 loss=27.362, nll_loss=2.628, bleu=0, ppl=6.18, wps=1407.7, ups=2.11, wpb=666.2, bsz=64, num_updates=30500, lr=7e-05, gnorm=70.209, clip=100, train_wall=136, wall=12788 (progress_bar.py:260, log())
[2021-03-22 14:06:21] INFO >> epoch 036: 620 / 868 loss=26.768, nll_loss=2.566, bleu=0, ppl=5.92, wps=2371.2, ups=3.55, wpb=667.3, bsz=64, num_updates=31000, lr=7e-05, gnorm=70.742, clip=100, train_wall=139, wall=12928 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:07:29] INFO >> epoch 036 | loss 26.72 | nll_loss 2.565 | bleu 0 | ppl 5.92 | wps 1699.8 | ups 2.55 | wpb 666.6 | bsz 64 | num_updates 31248 | lr 7e-05 | gnorm 70.583 | clip 100 | train_wall 238 | wall 12996 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:09:01] INFO >> epoch 036 | valid on 'valid' subset | loss 72.269 | nll_loss 6.924 | bleu 25.4656 | ppl 121.43 | wps 2229.6 | wpb 10165.6 | bsz 973.9 | num_updates 31248 | best_bleu 25.4656 (progress_bar.py:269, print())
[2021-03-22 14:09:29] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 36 @ 31248 updates, score 25.465555) (writing took 27.252723 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 14:10:46] INFO >> epoch 037: 252 / 868 loss=26.064, nll_loss=2.507, bleu=0, ppl=5.68, wps=1255.5, ups=1.89, wpb=665.4, bsz=64, num_updates=31500, lr=7e-05, gnorm=70.506, clip=100, train_wall=137, wall=13193 (progress_bar.py:260, log())
[2021-03-22 14:13:05] INFO >> epoch 037: 752 / 868 loss=26.137, nll_loss=2.506, bleu=0, ppl=5.68, wps=2410.1, ups=3.61, wpb=667.2, bsz=64, num_updates=32000, lr=7e-05, gnorm=70.545, clip=100, train_wall=136, wall=13332 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:13:37] INFO >> epoch 037 | loss 25.866 | nll_loss 2.483 | bleu 0 | ppl 5.59 | wps 1570.9 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 32116 | lr 7e-05 | gnorm 70.657 | clip 100 | train_wall 238 | wall 13364 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:15:09] INFO >> epoch 037 | valid on 'valid' subset | loss 71.283 | nll_loss 6.829 | bleu 25.4538 | ppl 113.73 | wps 2241.1 | wpb 10165.6 | bsz 973.9 | num_updates 32116 | best_bleu 25.4656 (progress_bar.py:269, print())
[2021-03-22 14:17:01] INFO >> epoch 038: 384 / 868 loss=25.105, nll_loss=2.407, bleu=0, ppl=5.3, wps=1410.7, ups=2.11, wpb=667.6, bsz=64, num_updates=32500, lr=6.9e-05, gnorm=70.9, clip=100, train_wall=136, wall=13568 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:19:16] INFO >> epoch 038 | loss 25.014 | nll_loss 2.401 | bleu 0 | ppl 5.28 | wps 1704 | ups 2.56 | wpb 666.6 | bsz 64 | num_updates 32984 | lr 6.9e-05 | gnorm 70.808 | clip 100 | train_wall 237 | wall 13704 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:20:48] INFO >> epoch 038 | valid on 'valid' subset | loss 72.577 | nll_loss 6.954 | bleu 25.9993 | ppl 123.94 | wps 2282.9 | wpb 10165.6 | bsz 973.9 | num_updates 32984 | best_bleu 25.9993 (progress_bar.py:269, print())
[2021-03-22 14:21:15] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 38 @ 32984 updates, score 25.999256) (writing took 27.395502 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 14:21:27] INFO >> epoch 039: 16 / 868 loss=25.338, nll_loss=2.432, bleu=0, ppl=5.4, wps=1255.8, ups=1.88, wpb=666.4, bsz=64, num_updates=33000, lr=6.8e-05, gnorm=70.865, clip=100, train_wall=138, wall=13834 (progress_bar.py:260, log())
[2021-03-22 14:23:42] INFO >> epoch 039: 516 / 868 loss=23.758, nll_loss=2.281, bleu=0, ppl=4.86, wps=2454.3, ups=3.68, wpb=666.6, bsz=64, num_updates=33500, lr=6.8e-05, gnorm=70.86, clip=100, train_wall=134, wall=13970 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:25:21] INFO >> epoch 039 | loss 24.232 | nll_loss 2.326 | bleu 0 | ppl 5.01 | wps 1586.5 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 33852 | lr 6.8e-05 | gnorm 70.895 | clip 100 | train_wall 236 | wall 14068 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:26:53] INFO >> epoch 039 | valid on 'valid' subset | loss 74.462 | nll_loss 7.134 | bleu 25.864 | ppl 140.47 | wps 2273.6 | wpb 10165.6 | bsz 973.9 | num_updates 33852 | best_bleu 25.9993 (progress_bar.py:269, print())
[2021-03-22 14:27:40] INFO >> epoch 040: 148 / 868 loss=24.046, nll_loss=2.312, bleu=0, ppl=4.97, wps=1402.7, ups=2.11, wpb=665.4, bsz=64, num_updates=34000, lr=6.8e-05, gnorm=70.703, clip=100, train_wall=137, wall=14207 (progress_bar.py:260, log())
[2021-03-22 14:29:54] INFO >> epoch 040: 648 / 868 loss=23.617, nll_loss=2.263, bleu=0, ppl=4.8, wps=2478.1, ups=3.71, wpb=667.7, bsz=64, num_updates=34500, lr=6.8e-05, gnorm=71.177, clip=100, train_wall=133, wall=14341 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:30:55] INFO >> epoch 040 | loss 23.469 | nll_loss 2.253 | bleu 0 | ppl 4.77 | wps 1734.7 | ups 2.6 | wpb 666.6 | bsz 64 | num_updates 34720 | lr 6.8e-05 | gnorm 70.983 | clip 100 | train_wall 232 | wall 14402 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:32:26] INFO >> epoch 040 | valid on 'valid' subset | loss 74.126 | nll_loss 7.102 | bleu 26.0453 | ppl 137.37 | wps 2274.4 | wpb 10165.6 | bsz 973.9 | num_updates 34720 | best_bleu 26.0453 (progress_bar.py:269, print())
[2021-03-22 14:32:54] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 40 @ 34720 updates, score 26.045284) (writing took 27.349718 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 14:34:16] INFO >> epoch 041: 280 / 868 loss=22.812, nll_loss=2.193, bleu=0, ppl=4.57, wps=1273.7, ups=1.91, wpb=665.6, bsz=64, num_updates=35000, lr=6.7e-05, gnorm=70.848, clip=100, train_wall=134, wall=14603 (progress_bar.py:260, log())
[2021-03-22 14:36:34] INFO >> epoch 041: 780 / 868 loss=23.046, nll_loss=2.213, bleu=0, ppl=4.64, wps=2398.7, ups=3.6, wpb=666.2, bsz=64, num_updates=35500, lr=6.7e-05, gnorm=71.265, clip=100, train_wall=137, wall=14742 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:36:59] INFO >> epoch 041 | loss 22.715 | nll_loss 2.18 | bleu 0 | ppl 4.53 | wps 1589.9 | ups 2.39 | wpb 666.6 | bsz 64 | num_updates 35588 | lr 6.7e-05 | gnorm 71.075 | clip 100 | train_wall 235 | wall 14766 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:38:30] INFO >> epoch 041 | valid on 'valid' subset | loss 74.429 | nll_loss 7.131 | bleu 26.1796 | ppl 140.16 | wps 2277 | wpb 10165.6 | bsz 973.9 | num_updates 35588 | best_bleu 26.1796 (progress_bar.py:269, print())
[2021-03-22 14:38:58] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 41 @ 35588 updates, score 26.179551) (writing took 27.455055 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 14:40:57] INFO >> epoch 042: 412 / 868 loss=21.882, nll_loss=2.096, bleu=0, ppl=4.28, wps=1272.2, ups=1.9, wpb=668.2, bsz=64, num_updates=36000, lr=6.6e-05, gnorm=71.145, clip=100, train_wall=135, wall=15004 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:43:03] INFO >> epoch 042 | loss 21.984 | nll_loss 2.11 | bleu 0 | ppl 4.32 | wps 1586.8 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 36456 | lr 6.6e-05 | gnorm 71.386 | clip 100 | train_wall 236 | wall 15130 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:44:35] INFO >> epoch 042 | valid on 'valid' subset | loss 76.137 | nll_loss 7.295 | bleu 26.2432 | ppl 156.99 | wps 2277.3 | wpb 10165.6 | bsz 973.9 | num_updates 36456 | best_bleu 26.2432 (progress_bar.py:269, print())
[2021-03-22 14:45:08] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 42 @ 36456 updates, score 26.243174) (writing took 32.884995 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 14:45:26] INFO >> epoch 043: 44 / 868 loss=22.166, nll_loss=2.131, bleu=0, ppl=4.38, wps=1235.4, ups=1.86, wpb=665.4, bsz=64, num_updates=36500, lr=6.6e-05, gnorm=71.499, clip=100, train_wall=136, wall=15274 (progress_bar.py:260, log())
[2021-03-22 14:47:43] INFO >> epoch 043: 544 / 868 loss=21.038, nll_loss=2.019, bleu=0, ppl=4.05, wps=2441.8, ups=3.66, wpb=666.5, bsz=64, num_updates=37000, lr=6.6e-05, gnorm=71.438, clip=100, train_wall=135, wall=15410 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:49:12] INFO >> epoch 043 | loss 21.33 | nll_loss 2.047 | bleu 0 | ppl 4.13 | wps 1567.1 | ups 2.35 | wpb 666.6 | bsz 64 | num_updates 37324 | lr 6.6e-05 | gnorm 71.511 | clip 100 | train_wall 235 | wall 15500 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:50:44] INFO >> epoch 043 | valid on 'valid' subset | loss 75.801 | nll_loss 7.262 | bleu 26.5206 | ppl 153.53 | wps 2266.4 | wpb 10165.6 | bsz 973.9 | num_updates 37324 | best_bleu 26.5206 (progress_bar.py:269, print())
[2021-03-22 14:51:27] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 43 @ 37324 updates, score 26.520588) (writing took 42.443440 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 14:52:21] INFO >> epoch 044: 176 / 868 loss=21.197, nll_loss=2.035, bleu=0, ppl=4.1, wps=1197.8, ups=1.8, wpb=666.6, bsz=64, num_updates=37500, lr=6.5e-05, gnorm=71.463, clip=100, train_wall=135, wall=15688 (progress_bar.py:260, log())
[2021-03-22 14:54:40] INFO >> epoch 044: 676 / 868 loss=20.648, nll_loss=1.981, bleu=0, ppl=3.95, wps=2399.3, ups=3.6, wpb=667.2, bsz=64, num_updates=38000, lr=6.5e-05, gnorm=71.345, clip=100, train_wall=137, wall=15827 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:55:33] INFO >> epoch 044 | loss 20.606 | nll_loss 1.978 | bleu 0 | ppl 3.94 | wps 1520.3 | ups 2.28 | wpb 666.6 | bsz 64 | num_updates 38192 | lr 6.5e-05 | gnorm 71.366 | clip 100 | train_wall 236 | wall 15880 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 14:57:05] INFO >> epoch 044 | valid on 'valid' subset | loss 76.573 | nll_loss 7.336 | bleu 26.4734 | ppl 161.6 | wps 2276.4 | wpb 10165.6 | bsz 973.9 | num_updates 38192 | best_bleu 26.5206 (progress_bar.py:269, print())
[2021-03-22 14:58:36] INFO >> epoch 045: 308 / 868 loss=19.977, nll_loss=1.92, bleu=0, ppl=3.78, wps=1411.7, ups=2.12, wpb=665.6, bsz=64, num_updates=38500, lr=6.4e-05, gnorm=71.259, clip=100, train_wall=135, wall=16063 (progress_bar.py:260, log())
[2021-03-22 15:00:54] INFO >> epoch 045: 808 / 868 loss=20.439, nll_loss=1.957, bleu=0, ppl=3.88, wps=2426.9, ups=3.63, wpb=668.3, bsz=64, num_updates=39000, lr=6.4e-05, gnorm=71.852, clip=100, train_wall=136, wall=16201 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:01:10] INFO >> epoch 045 | loss 19.979 | nll_loss 1.918 | bleu 0 | ppl 3.78 | wps 1716 | ups 2.57 | wpb 666.6 | bsz 64 | num_updates 39060 | lr 6.4e-05 | gnorm 71.444 | clip 100 | train_wall 235 | wall 16217 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:02:42] INFO >> epoch 045 | valid on 'valid' subset | loss 77.097 | nll_loss 7.387 | bleu 27.0232 | ppl 167.33 | wps 2277.8 | wpb 10165.6 | bsz 973.9 | num_updates 39060 | best_bleu 27.0232 (progress_bar.py:269, print())
[2021-03-22 15:03:34] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 45 @ 39060 updates, score 27.023169) (writing took 52.577673 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 15:05:42] INFO >> epoch 046: 440 / 868 loss=18.92, nll_loss=1.818, bleu=0, ppl=3.53, wps=1154, ups=1.73, wpb=666, bsz=64, num_updates=39500, lr=6.4e-05, gnorm=70.805, clip=100, train_wall=136, wall=16489 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:07:39] INFO >> epoch 046 | loss 19.363 | nll_loss 1.859 | bleu 0 | ppl 3.63 | wps 1487.8 | ups 2.23 | wpb 666.6 | bsz 64 | num_updates 39928 | lr 6.4e-05 | gnorm 71.346 | clip 100 | train_wall 235 | wall 16606 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:09:11] INFO >> epoch 046 | valid on 'valid' subset | loss 77.934 | nll_loss 7.467 | bleu 26.9381 | ppl 176.9 | wps 2278.7 | wpb 10165.6 | bsz 973.9 | num_updates 39928 | best_bleu 27.0232 (progress_bar.py:269, print())
[2021-03-22 15:09:37] INFO >> epoch 047: 72 / 868 loss=19.782, nll_loss=1.896, bleu=0, ppl=3.72, wps=1423.4, ups=2.13, wpb=667.6, bsz=64, num_updates=40000, lr=6.3e-05, gnorm=71.788, clip=100, train_wall=134, wall=16724 (progress_bar.py:260, log())
[2021-03-22 15:11:54] INFO >> epoch 047: 572 / 868 loss=18.314, nll_loss=1.764, bleu=0, ppl=3.4, wps=2411.2, ups=3.63, wpb=664.3, bsz=64, num_updates=40500, lr=6.3e-05, gnorm=71.224, clip=100, train_wall=136, wall=16862 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:13:16] INFO >> epoch 047 | loss 18.732 | nll_loss 1.798 | bleu 0 | ppl 3.48 | wps 1716.8 | ups 2.58 | wpb 666.6 | bsz 64 | num_updates 40796 | lr 6.3e-05 | gnorm 71.427 | clip 100 | train_wall 236 | wall 16943 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:14:47] INFO >> epoch 047 | valid on 'valid' subset | loss 79.517 | nll_loss 7.618 | bleu 27.051 | ppl 196.5 | wps 2275.3 | wpb 10165.6 | bsz 973.9 | num_updates 40796 | best_bleu 27.051 (progress_bar.py:269, print())
[2021-03-22 15:15:37] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 47 @ 40796 updates, score 27.050982) (writing took 49.568538 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 15:16:38] INFO >> epoch 048: 204 / 868 loss=18.54, nll_loss=1.781, bleu=0, ppl=3.44, wps=1173.7, ups=1.76, wpb=666, bsz=64, num_updates=41000, lr=6.2e-05, gnorm=71.257, clip=100, train_wall=134, wall=17145 (progress_bar.py:260, log())
[2021-03-22 15:18:56] INFO >> epoch 048: 704 / 868 loss=18.363, nll_loss=1.758, bleu=0, ppl=3.38, wps=2428.3, ups=3.63, wpb=668.2, bsz=64, num_updates=41500, lr=6.2e-05, gnorm=71.59, clip=100, train_wall=136, wall=17283 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:19:41] INFO >> epoch 048 | loss 18.183 | nll_loss 1.745 | bleu 0 | ppl 3.35 | wps 1505 | ups 2.26 | wpb 666.6 | bsz 64 | num_updates 41664 | lr 6.2e-05 | gnorm 71.329 | clip 100 | train_wall 233 | wall 17328 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:21:12] INFO >> epoch 048 | valid on 'valid' subset | loss 80.019 | nll_loss 7.666 | bleu 26.9652 | ppl 203.16 | wps 2263 | wpb 10165.6 | bsz 973.9 | num_updates 41664 | best_bleu 27.051 (progress_bar.py:269, print())
[2021-03-22 15:22:50] INFO >> epoch 049: 336 / 868 loss=17.72, nll_loss=1.699, bleu=0, ppl=3.25, wps=1425.2, ups=2.14, wpb=667.4, bsz=64, num_updates=42000, lr=6.2e-05, gnorm=71.171, clip=100, train_wall=134, wall=17517 (progress_bar.py:260, log())
[2021-03-22 15:25:08] INFO >> epoch 049: 836 / 868 loss=17.926, nll_loss=1.721, bleu=0, ppl=3.3, wps=2407.2, ups=3.61, wpb=666.1, bsz=64, num_updates=42500, lr=6.2e-05, gnorm=71.626, clip=100, train_wall=137, wall=17655 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:25:17] INFO >> epoch 049 | loss 17.616 | nll_loss 1.691 | bleu 0 | ppl 3.23 | wps 1719.8 | ups 2.58 | wpb 666.6 | bsz 64 | num_updates 42532 | lr 6.2e-05 | gnorm 71.344 | clip 100 | train_wall 235 | wall 17664 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:26:49] INFO >> epoch 049 | valid on 'valid' subset | loss 79.864 | nll_loss 7.652 | bleu 27.1337 | ppl 201.09 | wps 2256.1 | wpb 10165.6 | bsz 973.9 | num_updates 42532 | best_bleu 27.1337 (progress_bar.py:269, print())
[2021-03-22 15:27:31] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 49 @ 42532 updates, score 27.133714) (writing took 41.683056 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 15:29:45] INFO >> epoch 050: 468 / 868 loss=16.797, nll_loss=1.613, bleu=0, ppl=3.06, wps=1202.3, ups=1.8, wpb=666.3, bsz=64, num_updates=43000, lr=6.1e-05, gnorm=71.047, clip=100, train_wall=134, wall=17932 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:31:36] INFO >> epoch 050 | loss 17.056 | nll_loss 1.637 | bleu 0 | ppl 3.11 | wps 1525.7 | ups 2.29 | wpb 666.6 | bsz 64 | num_updates 43400 | lr 6.1e-05 | gnorm 71.262 | clip 100 | train_wall 235 | wall 18044 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:33:08] INFO >> epoch 050 | valid on 'valid' subset | loss 79.977 | nll_loss 7.662 | bleu 27.0056 | ppl 202.6 | wps 2262.8 | wpb 10165.6 | bsz 973.9 | num_updates 43400 | best_bleu 27.1337 (progress_bar.py:269, print())
[2021-03-22 15:33:43] INFO >> epoch 051: 100 / 868 loss=17.186, nll_loss=1.646, bleu=0, ppl=3.13, wps=1404.6, ups=2.1, wpb=668.3, bsz=64, num_updates=43500, lr=6.1e-05, gnorm=71.447, clip=100, train_wall=137, wall=18170 (progress_bar.py:260, log())
[2021-03-22 15:36:01] INFO >> epoch 051: 600 / 868 loss=16.365, nll_loss=1.576, bleu=0, ppl=2.98, wps=2405.8, ups=3.62, wpb=664.3, bsz=64, num_updates=44000, lr=6.1e-05, gnorm=71.063, clip=100, train_wall=136, wall=18308 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:37:15] INFO >> epoch 051 | loss 16.571 | nll_loss 1.591 | bleu 0 | ppl 3.01 | wps 1708.8 | ups 2.56 | wpb 666.6 | bsz 64 | num_updates 44268 | lr 6.1e-05 | gnorm 71.309 | clip 100 | train_wall 236 | wall 18382 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:38:47] INFO >> epoch 051 | valid on 'valid' subset | loss 80.976 | nll_loss 7.758 | bleu 27.4269 | ppl 216.49 | wps 2273 | wpb 10165.6 | bsz 973.9 | num_updates 44268 | best_bleu 27.4269 (progress_bar.py:269, print())
[2021-03-22 15:39:40] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 51 @ 44268 updates, score 27.426858) (writing took 53.302512 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 15:40:51] INFO >> epoch 052: 232 / 868 loss=16.399, nll_loss=1.571, bleu=0, ppl=2.97, wps=1152.5, ups=1.73, wpb=667.7, bsz=64, num_updates=44500, lr=6e-05, gnorm=71.298, clip=100, train_wall=136, wall=18598 (progress_bar.py:260, log())
[2021-03-22 15:43:08] INFO >> epoch 052: 732 / 868 loss=16.221, nll_loss=1.557, bleu=0, ppl=2.94, wps=2437, ups=3.66, wpb=666.6, bsz=64, num_updates=45000, lr=6e-05, gnorm=71.465, clip=100, train_wall=135, wall=18735 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:43:46] INFO >> epoch 052 | loss 16.068 | nll_loss 1.542 | bleu 0 | ppl 2.91 | wps 1479.4 | ups 2.22 | wpb 666.6 | bsz 64 | num_updates 45136 | lr 6e-05 | gnorm 71.232 | clip 100 | train_wall 236 | wall 18773 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:45:17] INFO >> epoch 052 | valid on 'valid' subset | loss 82.612 | nll_loss 7.915 | bleu 27.4746 | ppl 241.35 | wps 2277.2 | wpb 10165.6 | bsz 973.9 | num_updates 45136 | best_bleu 27.4746 (progress_bar.py:269, print())
[2021-03-22 15:46:05] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 52 @ 45136 updates, score 27.474639) (writing took 47.383790 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 15:47:52] INFO >> epoch 053: 364 / 868 loss=15.724, nll_loss=1.504, bleu=0, ppl=2.84, wps=1175.4, ups=1.76, wpb=668.8, bsz=64, num_updates=45500, lr=5.9e-05, gnorm=71.166, clip=100, train_wall=137, wall=19019 (progress_bar.py:260, log())
[2021-03-22 15:50:08] INFO >> epoch 053: 864 / 868 loss=15.787, nll_loss=1.521, bleu=0, ppl=2.87, wps=2443.3, ups=3.68, wpb=664.3, bsz=64, num_updates=46000, lr=5.9e-05, gnorm=71.175, clip=100, train_wall=134, wall=19155 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:50:10] INFO >> epoch 053 | loss 15.654 | nll_loss 1.503 | bleu 0 | ppl 2.83 | wps 1508.5 | ups 2.26 | wpb 666.6 | bsz 64 | num_updates 46004 | lr 5.9e-05 | gnorm 71.119 | clip 100 | train_wall 234 | wall 19157 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:51:41] INFO >> epoch 053 | valid on 'valid' subset | loss 81.633 | nll_loss 7.821 | bleu 27.5828 | ppl 226.15 | wps 2275.5 | wpb 10165.6 | bsz 973.9 | num_updates 46004 | best_bleu 27.5828 (progress_bar.py:269, print())
[2021-03-22 15:52:21] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 53 @ 46004 updates, score 27.582778) (writing took 40.603017 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 15:54:44] INFO >> epoch 054: 496 / 868 loss=14.856, nll_loss=1.428, bleu=0, ppl=2.69, wps=1205.2, ups=1.81, wpb=666, bsz=64, num_updates=46500, lr=5.9e-05, gnorm=70.364, clip=100, train_wall=135, wall=19432 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:56:27] INFO >> epoch 054 | loss 15.168 | nll_loss 1.456 | bleu 0 | ppl 2.74 | wps 1531.3 | ups 2.3 | wpb 666.6 | bsz 64 | num_updates 46872 | lr 5.9e-05 | gnorm 70.898 | clip 100 | train_wall 235 | wall 19535 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 15:57:59] INFO >> epoch 054 | valid on 'valid' subset | loss 83.003 | nll_loss 7.952 | bleu 27.623 | ppl 247.69 | wps 2266.6 | wpb 10165.6 | bsz 973.9 | num_updates 46872 | best_bleu 27.623 (progress_bar.py:269, print())
[2021-03-22 15:58:46] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 54 @ 46872 updates, score 27.622988) (writing took 47.066437 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 15:59:28] INFO >> epoch 055: 128 / 868 loss=15.182, nll_loss=1.456, bleu=0, ppl=2.74, wps=1174.5, ups=1.76, wpb=667, bsz=64, num_updates=47000, lr=5.8e-05, gnorm=71.062, clip=100, train_wall=136, wall=19716 (progress_bar.py:260, log())
[2021-03-22 16:01:45] INFO >> epoch 055: 628 / 868 loss=14.625, nll_loss=1.402, bleu=0, ppl=2.64, wps=2434.9, ups=3.65, wpb=667.5, bsz=64, num_updates=47500, lr=5.8e-05, gnorm=70.534, clip=100, train_wall=135, wall=19853 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:02:52] INFO >> epoch 055 | loss 14.647 | nll_loss 1.406 | bleu 0 | ppl 2.65 | wps 1505.1 | ups 2.26 | wpb 666.6 | bsz 64 | num_updates 47740 | lr 5.8e-05 | gnorm 70.649 | clip 100 | train_wall 235 | wall 19919 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:04:24] INFO >> epoch 055 | valid on 'valid' subset | loss 82.918 | nll_loss 7.944 | bleu 27.5704 | ppl 246.29 | wps 2259.4 | wpb 10165.6 | bsz 973.9 | num_updates 47740 | best_bleu 27.623 (progress_bar.py:269, print())
[2021-03-22 16:05:44] INFO >> epoch 056: 260 / 868 loss=14.442, nll_loss=1.388, bleu=0, ppl=2.62, wps=1395.7, ups=2.1, wpb=666.1, bsz=64, num_updates=48000, lr=5.8e-05, gnorm=70.778, clip=100, train_wall=138, wall=20091 (progress_bar.py:260, log())
[2021-03-22 16:07:59] INFO >> epoch 056: 760 / 868 loss=14.378, nll_loss=1.379, bleu=0, ppl=2.6, wps=2468.2, ups=3.7, wpb=667.5, bsz=64, num_updates=48500, lr=5.8e-05, gnorm=70.889, clip=100, train_wall=133, wall=20226 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:08:30] INFO >> epoch 056 | loss 14.259 | nll_loss 1.369 | bleu 0 | ppl 2.58 | wps 1713.9 | ups 2.57 | wpb 666.6 | bsz 64 | num_updates 48608 | lr 5.8e-05 | gnorm 70.597 | clip 100 | train_wall 235 | wall 20257 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:10:01] INFO >> epoch 056 | valid on 'valid' subset | loss 82.974 | nll_loss 7.95 | bleu 27.9833 | ppl 247.21 | wps 2277.9 | wpb 10165.6 | bsz 973.9 | num_updates 48608 | best_bleu 27.9833 (progress_bar.py:269, print())
[2021-03-22 16:10:50] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 56 @ 48608 updates, score 27.983333) (writing took 49.459572 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 16:12:45] INFO >> epoch 057: 392 / 868 loss=13.788, nll_loss=1.322, bleu=0, ppl=2.5, wps=1169, ups=1.75, wpb=667, bsz=64, num_updates=49000, lr=5.7e-05, gnorm=70.006, clip=100, train_wall=135, wall=20512 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:14:56] INFO >> epoch 057 | loss 13.872 | nll_loss 1.332 | bleu 0 | ppl 2.52 | wps 1496.8 | ups 2.25 | wpb 666.6 | bsz 64 | num_updates 49476 | lr 5.7e-05 | gnorm 70.32 | clip 100 | train_wall 235 | wall 20643 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:16:27] INFO >> epoch 057 | valid on 'valid' subset | loss 84.628 | nll_loss 8.108 | bleu 28.2319 | ppl 275.91 | wps 2285.2 | wpb 10165.6 | bsz 973.9 | num_updates 49476 | best_bleu 28.2319 (progress_bar.py:269, print())
[2021-03-22 16:17:09] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 57 @ 49476 updates, score 28.231939) (writing took 41.420880 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 16:17:22] INFO >> epoch 058: 24 / 868 loss=14.044, nll_loss=1.352, bleu=0, ppl=2.55, wps=1197.1, ups=1.8, wpb=664.4, bsz=64, num_updates=49500, lr=5.6e-05, gnorm=70.537, clip=100, train_wall=136, wall=20789 (progress_bar.py:260, log())
[2021-03-22 16:19:39] INFO >> epoch 058: 524 / 868 loss=13.24, nll_loss=1.271, bleu=0, ppl=2.41, wps=2440.4, ups=3.66, wpb=666.9, bsz=64, num_updates=50000, lr=5.6e-05, gnorm=69.738, clip=100, train_wall=135, wall=20926 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:21:13] INFO >> epoch 058 | loss 13.471 | nll_loss 1.293 | bleu 0 | ppl 2.45 | wps 1533.2 | ups 2.3 | wpb 666.6 | bsz 64 | num_updates 50344 | lr 5.6e-05 | gnorm 70.078 | clip 100 | train_wall 234 | wall 21021 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:22:44] INFO >> epoch 058 | valid on 'valid' subset | loss 85.069 | nll_loss 8.15 | bleu 28.165 | ppl 284.11 | wps 2288.1 | wpb 10165.6 | bsz 973.9 | num_updates 50344 | best_bleu 28.2319 (progress_bar.py:269, print())
[2021-03-22 16:23:34] INFO >> epoch 059: 156 / 868 loss=13.51, nll_loss=1.294, bleu=0, ppl=2.45, wps=1419.9, ups=2.13, wpb=667.6, bsz=63.9, num_updates=50500, lr=5.6e-05, gnorm=70.277, clip=100, train_wall=135, wall=21161 (progress_bar.py:260, log())
[2021-03-22 16:25:51] INFO >> epoch 059: 656 / 868 loss=13.13, nll_loss=1.262, bleu=0, ppl=2.4, wps=2422.8, ups=3.64, wpb=666.1, bsz=64, num_updates=51000, lr=5.6e-05, gnorm=70.007, clip=100, train_wall=136, wall=21298 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:26:50] INFO >> epoch 059 | loss 13.133 | nll_loss 1.261 | bleu 0 | ppl 2.4 | wps 1719.2 | ups 2.58 | wpb 666.6 | bsz 64 | num_updates 51212 | lr 5.6e-05 | gnorm 70.032 | clip 100 | train_wall 236 | wall 21357 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:28:21] INFO >> epoch 059 | valid on 'valid' subset | loss 85.368 | nll_loss 8.179 | bleu 28.2375 | ppl 289.8 | wps 2274.3 | wpb 10165.6 | bsz 973.9 | num_updates 51212 | best_bleu 28.2375 (progress_bar.py:269, print())
[2021-03-22 16:29:15] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 59 @ 51212 updates, score 28.237523) (writing took 53.321676 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 16:30:40] INFO >> epoch 060: 288 / 868 loss=12.735, nll_loss=1.223, bleu=0, ppl=2.33, wps=1153.9, ups=1.73, wpb=666.4, bsz=64, num_updates=51500, lr=5.5e-05, gnorm=69.631, clip=100, train_wall=135, wall=21587 (progress_bar.py:260, log())
[2021-03-22 16:32:57] INFO >> epoch 060: 788 / 868 loss=12.963, nll_loss=1.245, bleu=0, ppl=2.37, wps=2426.1, ups=3.64, wpb=666.2, bsz=64, num_updates=52000, lr=5.5e-05, gnorm=69.935, clip=100, train_wall=135, wall=21725 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:33:20] INFO >> epoch 060 | loss 12.725 | nll_loss 1.221 | bleu 0 | ppl 2.33 | wps 1484 | ups 2.23 | wpb 666.6 | bsz 64 | num_updates 52080 | lr 5.5e-05 | gnorm 69.623 | clip 100 | train_wall 235 | wall 21747 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:34:51] INFO >> epoch 060 | valid on 'valid' subset | loss 85.933 | nll_loss 8.233 | bleu 28.4536 | ppl 300.88 | wps 2284.5 | wpb 10165.6 | bsz 973.9 | num_updates 52080 | best_bleu 28.4536 (progress_bar.py:269, print())
[2021-03-22 16:35:43] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 60 @ 52080 updates, score 28.453575) (writing took 52.506082 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 16:37:47] INFO >> epoch 061: 420 / 868 loss=12.188, nll_loss=1.172, bleu=0, ppl=2.25, wps=1147, ups=1.72, wpb=665.5, bsz=64, num_updates=52500, lr=5.5e-05, gnorm=69.089, clip=100, train_wall=138, wall=22015 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:39:49] INFO >> epoch 061 | loss 12.407 | nll_loss 1.191 | bleu 0 | ppl 2.28 | wps 1487.9 | ups 2.23 | wpb 666.6 | bsz 64 | num_updates 52948 | lr 5.5e-05 | gnorm 69.633 | clip 100 | train_wall 235 | wall 22136 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:41:20] INFO >> epoch 061 | valid on 'valid' subset | loss 86.731 | nll_loss 8.31 | bleu 28.4272 | ppl 317.27 | wps 2274.1 | wpb 10165.6 | bsz 973.9 | num_updates 52948 | best_bleu 28.4536 (progress_bar.py:269, print())
[2021-03-22 16:41:41] INFO >> epoch 062: 52 / 868 loss=12.687, nll_loss=1.213, bleu=0, ppl=2.32, wps=1432.1, ups=2.14, wpb=668.5, bsz=63.9, num_updates=53000, lr=5.4e-05, gnorm=70.201, clip=100, train_wall=133, wall=22248 (progress_bar.py:260, log())
[2021-03-22 16:43:58] INFO >> epoch 062: 552 / 868 loss=11.927, nll_loss=1.145, bleu=0, ppl=2.21, wps=2429.9, ups=3.65, wpb=666.6, bsz=64, num_updates=53500, lr=5.4e-05, gnorm=68.979, clip=100, train_wall=135, wall=22385 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:45:25] INFO >> epoch 062 | loss 12.062 | nll_loss 1.158 | bleu 0 | ppl 2.23 | wps 1721.9 | ups 2.58 | wpb 666.6 | bsz 64 | num_updates 53816 | lr 5.4e-05 | gnorm 69.162 | clip 100 | train_wall 234 | wall 22472 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:46:57] INFO >> epoch 062 | valid on 'valid' subset | loss 87.47 | nll_loss 8.38 | bleu 28.6161 | ppl 333.23 | wps 2254.9 | wpb 10165.6 | bsz 973.9 | num_updates 53816 | best_bleu 28.6161 (progress_bar.py:269, print())
[2021-03-22 16:47:46] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 62 @ 53816 updates, score 28.616101) (writing took 48.468289 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 16:48:43] INFO >> epoch 063: 184 / 868 loss=11.916, nll_loss=1.144, bleu=0, ppl=2.21, wps=1169.4, ups=1.75, wpb=666.4, bsz=64, num_updates=54000, lr=5.4e-05, gnorm=68.938, clip=100, train_wall=135, wall=22670 (progress_bar.py:260, log())
[2021-03-22 16:51:00] INFO >> epoch 063: 684 / 868 loss=11.604, nll_loss=1.119, bleu=0, ppl=2.17, wps=2426.3, ups=3.66, wpb=663.6, bsz=64, num_updates=54500, lr=5.4e-05, gnorm=68.659, clip=100, train_wall=135, wall=22807 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:51:51] INFO >> epoch 063 | loss 11.727 | nll_loss 1.126 | bleu 0 | ppl 2.18 | wps 1497.7 | ups 2.25 | wpb 666.6 | bsz 64 | num_updates 54684 | lr 5.4e-05 | gnorm 68.851 | clip 100 | train_wall 235 | wall 22858 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:53:23] INFO >> epoch 063 | valid on 'valid' subset | loss 86.958 | nll_loss 8.331 | bleu 28.642 | ppl 322.09 | wps 2269.5 | wpb 10165.6 | bsz 973.9 | num_updates 54684 | best_bleu 28.642 (progress_bar.py:269, print())
[2021-03-22 16:54:03] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 63 @ 54684 updates, score 28.641964) (writing took 40.562146 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 16:55:36] INFO >> epoch 064: 316 / 868 loss=11.631, nll_loss=1.11, bleu=0, ppl=2.16, wps=1211.2, ups=1.81, wpb=670.3, bsz=64, num_updates=55000, lr=5.3e-05, gnorm=68.985, clip=100, train_wall=135, wall=23084 (progress_bar.py:260, log())
[2021-03-22 16:57:53] INFO >> epoch 064: 816 / 868 loss=11.535, nll_loss=1.111, bleu=0, ppl=2.16, wps=2436.9, ups=3.67, wpb=664.3, bsz=64, num_updates=55500, lr=5.3e-05, gnorm=68.992, clip=100, train_wall=135, wall=23220 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:58:08] INFO >> epoch 064 | loss 11.403 | nll_loss 1.095 | bleu 0 | ppl 2.14 | wps 1536.7 | ups 2.31 | wpb 666.6 | bsz 64 | num_updates 55552 | lr 5.3e-05 | gnorm 68.774 | clip 100 | train_wall 234 | wall 23235 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 16:59:40] INFO >> epoch 064 | valid on 'valid' subset | loss 88.856 | nll_loss 8.513 | bleu 28.6707 | ppl 365.36 | wps 2263.9 | wpb 10165.6 | bsz 973.9 | num_updates 55552 | best_bleu 28.6707 (progress_bar.py:269, print())
[2021-03-22 17:00:11] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 64 @ 55552 updates, score 28.670712) (writing took 31.042639 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 17:02:19] INFO >> epoch 065: 448 / 868 loss=10.992, nll_loss=1.053, bleu=0, ppl=2.08, wps=1252.2, ups=1.87, wpb=668, bsz=64, num_updates=56000, lr=5.3e-05, gnorm=68.226, clip=100, train_wall=135, wall=23487 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:04:16] INFO >> epoch 065 | loss 11.109 | nll_loss 1.066 | bleu 0 | ppl 2.09 | wps 1570.3 | ups 2.36 | wpb 666.6 | bsz 64 | num_updates 56420 | lr 5.3e-05 | gnorm 68.426 | clip 100 | train_wall 235 | wall 23603 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:05:47] INFO >> epoch 065 | valid on 'valid' subset | loss 89.639 | nll_loss 8.588 | bleu 28.5985 | ppl 384.84 | wps 2286 | wpb 10165.6 | bsz 973.9 | num_updates 56420 | best_bleu 28.6707 (progress_bar.py:269, print())
[2021-03-22 17:06:16] INFO >> epoch 066: 80 / 868 loss=11.173, nll_loss=1.074, bleu=0, ppl=2.1, wps=1406.2, ups=2.11, wpb=665.7, bsz=64, num_updates=56500, lr=5.2e-05, gnorm=68.574, clip=100, train_wall=136, wall=23723 (progress_bar.py:260, log())
[2021-03-22 17:08:34] INFO >> epoch 066: 580 / 868 loss=10.751, nll_loss=1.03, bleu=0, ppl=2.04, wps=2428.1, ups=3.64, wpb=667.8, bsz=64, num_updates=57000, lr=5.2e-05, gnorm=68.046, clip=100, train_wall=136, wall=23861 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:09:53] INFO >> epoch 066 | loss 10.836 | nll_loss 1.04 | bleu 0 | ppl 2.06 | wps 1718.2 | ups 2.58 | wpb 666.6 | bsz 64 | num_updates 57288 | lr 5.2e-05 | gnorm 68.099 | clip 100 | train_wall 235 | wall 23940 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:11:25] INFO >> epoch 066 | valid on 'valid' subset | loss 89.462 | nll_loss 8.571 | bleu 28.9446 | ppl 380.36 | wps 2273.7 | wpb 10165.6 | bsz 973.9 | num_updates 57288 | best_bleu 28.9446 (progress_bar.py:269, print())
[2021-03-22 17:11:52] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 66 @ 57288 updates, score 28.944583) (writing took 26.907204 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 17:12:57] INFO >> epoch 067: 212 / 868 loss=10.661, nll_loss=1.027, bleu=0, ppl=2.04, wps=1261.5, ups=1.9, wpb=664.6, bsz=64, num_updates=57500, lr=5.2e-05, gnorm=67.698, clip=100, train_wall=136, wall=24124 (progress_bar.py:260, log())
[2021-03-22 17:15:14] INFO >> epoch 067: 712 / 868 loss=10.69, nll_loss=1.022, bleu=0, ppl=2.03, wps=2451.9, ups=3.67, wpb=668.9, bsz=64, num_updates=58000, lr=5.2e-05, gnorm=68.105, clip=100, train_wall=135, wall=24261 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:15:57] INFO >> epoch 067 | loss 10.543 | nll_loss 1.012 | bleu 0 | ppl 2.02 | wps 1589.1 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 58156 | lr 5.2e-05 | gnorm 67.792 | clip 100 | train_wall 235 | wall 24304 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:17:28] INFO >> epoch 067 | valid on 'valid' subset | loss 90.782 | nll_loss 8.698 | bleu 29.0386 | ppl 415.19 | wps 2296.9 | wpb 10165.6 | bsz 973.9 | num_updates 58156 | best_bleu 29.0386 (progress_bar.py:269, print())
[2021-03-22 17:17:54] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 67 @ 58156 updates, score 29.038594) (writing took 26.800070 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 17:19:34] INFO >> epoch 068: 344 / 868 loss=10.313, nll_loss=0.989, bleu=0, ppl=1.99, wps=1281.2, ups=1.92, wpb=666.8, bsz=64, num_updates=58500, lr=5.1e-05, gnorm=67.538, clip=100, train_wall=134, wall=24521 (progress_bar.py:260, log())
[2021-03-22 17:21:52] INFO >> epoch 068: 844 / 868 loss=10.393, nll_loss=0.998, bleu=0, ppl=2, wps=2412.1, ups=3.62, wpb=666.2, bsz=64, num_updates=59000, lr=5.1e-05, gnorm=67.836, clip=100, train_wall=136, wall=24659 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:21:59] INFO >> epoch 068 | loss 10.263 | nll_loss 0.985 | bleu 0 | ppl 1.98 | wps 1599.4 | ups 2.4 | wpb 666.6 | bsz 64 | num_updates 59024 | lr 5.1e-05 | gnorm 67.571 | clip 100 | train_wall 234 | wall 24666 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:23:30] INFO >> epoch 068 | valid on 'valid' subset | loss 89.668 | nll_loss 8.591 | bleu 29.2278 | ppl 385.59 | wps 2286.5 | wpb 10165.6 | bsz 973.9 | num_updates 59024 | best_bleu 29.2278 (progress_bar.py:269, print())
[2021-03-22 17:24:06] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 68 @ 59024 updates, score 29.227844) (writing took 35.584728 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 17:26:23] INFO >> epoch 069: 476 / 868 loss=9.761, nll_loss=0.938, bleu=0, ppl=1.92, wps=1228.2, ups=1.85, wpb=665.6, bsz=64, num_updates=59500, lr=5e-05, gnorm=66.631, clip=100, train_wall=135, wall=24930 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:28:11] INFO >> epoch 069 | loss 10.03 | nll_loss 0.963 | bleu 0 | ppl 1.95 | wps 1553.5 | ups 2.33 | wpb 666.6 | bsz 64 | num_updates 59892 | lr 5e-05 | gnorm 67.333 | clip 100 | train_wall 235 | wall 25038 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:29:43] INFO >> epoch 069 | valid on 'valid' subset | loss 89.523 | nll_loss 8.577 | bleu 28.9564 | ppl 381.9 | wps 2270.4 | wpb 10165.6 | bsz 973.9 | num_updates 59892 | best_bleu 29.2278 (progress_bar.py:269, print())
[2021-03-22 17:30:19] INFO >> epoch 070: 108 / 868 loss=10.141, nll_loss=0.972, bleu=0, ppl=1.96, wps=1414.2, ups=2.12, wpb=667.9, bsz=64, num_updates=60000, lr=5e-05, gnorm=67.665, clip=100, train_wall=136, wall=25166 (progress_bar.py:260, log())
[2021-03-22 17:32:35] INFO >> epoch 070: 608 / 868 loss=9.719, nll_loss=0.934, bleu=0, ppl=1.91, wps=2452.7, ups=3.68, wpb=666, bsz=64, num_updates=60500, lr=5e-05, gnorm=66.811, clip=100, train_wall=134, wall=25302 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:33:47] INFO >> epoch 070 | loss 9.803 | nll_loss 0.941 | bleu 0 | ppl 1.92 | wps 1723.6 | ups 2.59 | wpb 666.6 | bsz 64 | num_updates 60760 | lr 5e-05 | gnorm 66.91 | clip 100 | train_wall 234 | wall 25374 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:35:18] INFO >> epoch 070 | valid on 'valid' subset | loss 91.002 | nll_loss 8.719 | bleu 29.3537 | ppl 421.32 | wps 2284.5 | wpb 10165.6 | bsz 973.9 | num_updates 60760 | best_bleu 29.3537 (progress_bar.py:269, print())
[2021-03-22 17:35:45] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 70 @ 60760 updates, score 29.353735) (writing took 27.075148 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 17:36:58] INFO >> epoch 071: 240 / 868 loss=9.761, nll_loss=0.937, bleu=0, ppl=1.91, wps=1265.1, ups=1.9, wpb=666.3, bsz=64, num_updates=61000, lr=4.9e-05, gnorm=66.723, clip=100, train_wall=136, wall=25565 (progress_bar.py:260, log())
[2021-03-22 17:39:16] INFO >> epoch 071: 740 / 868 loss=9.616, nll_loss=0.922, bleu=0, ppl=1.9, wps=2424, ups=3.64, wpb=666.9, bsz=64, num_updates=61500, lr=4.9e-05, gnorm=66.772, clip=100, train_wall=136, wall=25703 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:39:51] INFO >> epoch 071 | loss 9.576 | nll_loss 0.919 | bleu 0 | ppl 1.89 | wps 1589.3 | ups 2.38 | wpb 666.6 | bsz 64 | num_updates 61628 | lr 4.9e-05 | gnorm 66.625 | clip 100 | train_wall 235 | wall 25738 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:41:22] INFO >> epoch 071 | valid on 'valid' subset | loss 91.182 | nll_loss 8.736 | bleu 29.1959 | ppl 426.38 | wps 2287.5 | wpb 10165.6 | bsz 973.9 | num_updates 61628 | best_bleu 29.3537 (progress_bar.py:269, print())
[2021-03-22 17:43:10] INFO >> epoch 072: 372 / 868 loss=9.248, nll_loss=0.889, bleu=0, ppl=1.85, wps=1422.2, ups=2.14, wpb=665.4, bsz=64, num_updates=62000, lr=4.9e-05, gnorm=66.111, clip=100, train_wall=134, wall=25937 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:45:28] INFO >> epoch 072 | loss 9.338 | nll_loss 0.896 | bleu 0 | ppl 1.86 | wps 1719.6 | ups 2.58 | wpb 666.6 | bsz 64 | num_updates 62496 | lr 4.9e-05 | gnorm 66.257 | clip 100 | train_wall 236 | wall 26075 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:46:59] INFO >> epoch 072 | valid on 'valid' subset | loss 92.486 | nll_loss 8.861 | bleu 29.2969 | ppl 464.94 | wps 2261.8 | wpb 10165.6 | bsz 973.9 | num_updates 62496 | best_bleu 29.3537 (progress_bar.py:269, print())
[2021-03-22 17:47:07] INFO >> epoch 073: 4 / 868 loss=9.578, nll_loss=0.918, bleu=0, ppl=1.89, wps=1403.5, ups=2.1, wpb=667.6, bsz=64, num_updates=62500, lr=4.8e-05, gnorm=66.761, clip=100, train_wall=137, wall=26175 (progress_bar.py:260, log())
[2021-03-22 17:51:27] INFO >> epoch 073: 504 / 868 loss=9.009, nll_loss=0.863, bleu=0, ppl=1.82, wps=1285.8, ups=1.93, wpb=667.9, bsz=64, num_updates=63000, lr=4.8e-05, gnorm=65.775, clip=100, train_wall=258, wall=26434 (progress_bar.py:260, log())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:54:37] INFO >> epoch 073 | loss 9.108 | nll_loss 0.874 | bleu 0 | ppl 1.83 | wps 1053.8 | ups 1.58 | wpb 666.6 | bsz 64 | num_updates 63364 | lr 4.8e-05 | gnorm 66.006 | clip 100 | train_wall 446 | wall 26624 (progress_bar.py:269, print())
Using backend: pytorch
Using backend: pytorch
Using backend: pytorch
[2021-03-22 17:56:09] INFO >> epoch 073 | valid on 'valid' subset | loss 92.174 | nll_loss 8.831 | bleu 29.5286 | ppl 455.41 | wps 2256.7 | wpb 10165.6 | bsz 973.9 | num_updates 63364 | best_bleu 29.5286 (progress_bar.py:269, print())
[2021-03-22 17:56:36] INFO >> saved checkpoint /mnt/wanyao/.ncc/python_wan/summarization/data-mmap/relative_transformer/checkpoints/checkpoint_best.pt (epoch 73 @ 63364 updates, score 29.528629) (writing took 27.240011 seconds) (checkpoint_utils.py:79, save_checkpoint())
[2021-03-22 17:57:21] INFO >> epoch 074: 136 / 868 loss=9.017, nll_loss=0.867, bleu=0, ppl=1.82, wps=940.9, ups=1.41, wpb=665.3, bsz=64, num_updates=63500, lr=4.8e-05, gnorm=65.779, clip=100, train_wall=225, wall=26788 (progress_bar.py:260, log())
[2021-03-22 17:59:37] INFO >> epoch 074: 636 / 868 loss=8.848, nll_loss=0.849, bleu=0, ppl=1.8, wps=2441.7, ups=3.66, wpb=666.4, bsz=64, num_updates=64000, lr=4.8e-05, gnorm=65.563, clip=100, train_wall=135, wall=26924 (progress_bar.py:260, log())