训练数据中wav文件必须保存在本地吗，如果是http链接行不行

如题，我的训练数据音频地址是http://mdn.alipayobjects.com/gov_gjj/afts/file/A*PIEES5D5xLIAAAAAQkAAAAgAdn11AQ这种形式是否可以完成训练，我现在保错如下：
Error executing job with overrides: ['++model=/example/yaze.youxz/Fun-ASR/model', '++trust_remote_code=true', '++train_data_set_list=/ossfs/workspace/Fun-ASR/train_wuyang0113.jsonl', '++valid_data_set_list=/ossfs/workspace/Fun-ASR/val_wuyang0113.jsonl', '++dataset_conf.data_split_num=1', '++dataset_conf.batch_sampler=BatchSampler', '++dataset_conf.batch_size=6000', '++dataset_conf.sort_size=1024', '++dataset_conf.batch_type=token', '++dataset_conf.num_workers=4', '++train_conf.max_epoch=20', '++train_conf.log_interval=1', '++train_conf.resume=true', '++train_conf.validate_interval=2000', '++train_conf.save_checkpoint_interval=5000', '++train_conf.effective_save_name_excludes=None', '++train_conf.keep_nbest_models=20', '++train_conf.avg_nbest_model=10', '++train_conf.use_deepspeed=false', '++train_conf.deepspeed_config=/ossfs/workspace/Fun-ASR/deepspeed_conf/ds_stage1.json', '++optim_conf.lr=0.0002', '++audio_encoder_conf.freeze=true', '++audio_adaptor_conf.freeze=true', '++llm_conf.freeze=false', '++output_dir=/example/yaze.youxz/Fun-ASR/outputs']
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/bin/funasr-train-ds", line 8, in <module>
[rank0]:     sys.exit(main_hydra())
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
[rank0]:     _run_hydra(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
[rank0]:     _run_app(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
[rank0]:     run_and_report(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
[rank0]:     raise ex
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
[rank0]:     return func()
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
[rank0]:     lambda: hydra.run(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
[rank0]:     _ = ret.return_value
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
[rank0]:     raise self._return_value
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
[rank0]:     ret.return_value = task_function(task_cfg)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/funasr/bin/train_ds.py", line 56, in main_hydra
[rank0]:     main(**kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/funasr/bin/train_ds.py", line 177, in main
[rank0]:     trainer.train_epoch(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/funasr/train_utils/trainer_ds.py", line 603, in train_epoch
[rank0]:     self.forward_step(model, batch, loss_dict=loss_dict)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/funasr/train_utils/trainer_ds.py", line 670, in forward_step
[rank0]:     retval = model(**batch)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1639, in forward
[rank0]:     inputs, kwargs = self._pre_forward(*inputs, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1528, in _pre_forward
[rank0]:     if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
[rank0]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by 
[rank0]: making sure all `forward` function outputs participate in calculating loss. 
[rank0]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
[rank0]: Parameter indices which did not receive grad for rank 0: 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395
[rank0]:  In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error

[rank0]:[W129 11:36:49.566637567 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
W0129 11:36:50.754000 14129 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 14202 closing signal SIGTERM
E0129 11:36:51.068000 14129 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 14201) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
    run(args)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
    elastic_launch(
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 

请问这个是什么原因

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练数据中wav文件必须保存在本地吗，如果是http链接行不行 #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

训练数据中wav文件必须保存在本地吗，如果是http链接行不行 #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions