Skip to content

关于pipeline在训练时任务很重的问题 #4

@wlc185

Description

@wlc185

作者您好,我目前在navsimv1上训练该算法,发现GPU在训练时空置率很高,在观察特征构建环节,发现设置了一个关于相机特征的预处理步骤:SparseDriveFeatureBuilder.pipeline(self, features, targets, token, test_mode, vis=False),这个步骤似乎并不包含在_get_camera_feature里面,意味着在run_dataset_caching时,pipeline不会被执行,而是训练时由CacheOnlyDataset.__getitem__在线执行,这似乎会导致在图像数据传入时产生很重的预处理步骤(预处理时间见如下示例),可以看到这个pipeline耗时约0.2秒,考虑到训练时单个batch这个过程会被执行多次,会导致CPU任务实际很重,之前阅读的同一个框架下(navsim)的算法(diffusiondrivev2)中,对原始图像的预处理步骤则被放在_get_camera_feature中,使得预处理可以被提前缓存从而实现非常快的训练速度,想知道pipeline这样设置是否合理,还是我对代码的理解还没到位,感谢您的回复。

class CacheOnlyDataset(torch.utils.data.Dataset):
    ...
    def __getitem__(self, idx: int) -> Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]]:
        ...
        print(time.perf_counter())  # 70886.107852798
        features, targets, token = self._load_scene_with_token(idx)
        print(time.perf_counter())  # 70886.109476427
        if hasattr(self._feature_builders[0], 'pipeline'):
            features, targets, token = self._feature_builders[0].pipeline(features, targets, token, self.test_mode)
        print(time.perf_counter())  # 70886.299037729

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions