Skip to content

为什么训练不了呢? #14

@Hichengdong

Description

@Hichengdong

环境是:

  • open3d 0.15.2
  • python 3.8.18
  • pytorch 1.10.0(cuda11.3,cudnn8.2)
  • pytorch-lightning 1.5.10
  • pytorch3d 0.6.2
  • shapely 1.7.1
  • torchvision 0.11.0+cu113

仅仅因为显存原因调整了batch_size为8,训练时异常:

model.backward(closure_loss, optimizer, *args, **kwargs)
File "/home/cd/.local/lib/python3.8/site-packages/pytorch_lightning-1.5.10-py3.8.egg/pytorch_lightning/core/lightning.py", line 1434, in backward
loss.backward(*args, **kwargs)
File "/home/cd/.local/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/cd/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128, 8, 128]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Exception ignored in: <function tqdm.del at 0x7f0535976dc0>
Traceback (most recent call last):
File "/home/cd/.local/lib/python3.8/site-packages/tqdm/std.py", line 1145, in del
File "/home/cd/.local/lib/python3.8/site-packages/tqdm/std.py", line 1299, in close
File "/home/cd/.local/lib/python3.8/site-packages/tqdm/std.py", line 1492, in display
File "/home/cd/.local/lib/python3.8/site-packages/tqdm/std.py", line 1148, in str
File "/home/cd/.local/lib/python3.8/site-packages/tqdm/std.py", line 1450, in format_dict
TypeError: cannot unpack non-iterable NoneType object

然后使用Pedestrain预训练进行模型验证,能够顺利通过,但是与论文中的结果有差距:

2024-12-19 19:49:27,347 - INFO - Avg Prec/Succ=92.673/67.344 Frames=6088 Runtime=0.009268

DATALOADER:0 TEST RESULTS
{'n_frames': 2991.41943359375,
'precesion': 92.67328643798828,
'runtime': 0.009268070571124554,
'success': 67.3443615436554}


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions