feat(deployment): centerpoint deployment integration#181
feat(deployment): centerpoint deployment integration#181vividf wants to merge 35 commits intotier4:feat/new_deployment_and_evaluation_pipelinefrom
Conversation
bfb778f to
441d06e
Compare
| verification = dict( | ||
| enabled=False, | ||
| tolerance=1e-1, | ||
| tolerance=1, |
There was a problem hiding this comment.
Explain what is tolerance here, and why updating from 0.1 to 1
There was a problem hiding this comment.
The value was originally set for calibration classification and later copied to CenterPoint, but it does not work correctly for CenterPoint.
INFO:deployment.core.evaluation.verification_mixin: tensorrt (cuda:0) latency: 205.08 ms
INFO:deployment.core.evaluation.verification_mixin: output[heatmap]: shape=(1, 5, 510, 510), max_diff=0.070197, mean_diff=0.007674
INFO:deployment.core.evaluation.verification_mixin: output[reg]: shape=(1, 2, 510, 510), max_diff=0.007944, mean_diff=0.001120
INFO:deployment.core.evaluation.verification_mixin: output[height]: shape=(1, 1, 510, 510), max_diff=0.025401, mean_diff=0.002122
INFO:deployment.core.evaluation.verification_mixin: output[dim]: shape=(1, 3, 510, 510), max_diff=0.031920, mean_diff=0.001143
INFO:deployment.core.evaluation.verification_mixin: output[rot]: shape=(1, 2, 510, 510), max_diff=0.075215, mean_diff=0.004582
INFO:deployment.core.evaluation.verification_mixin: output[vel]: shape=(1, 2, 510, 510), max_diff=0.221999, mean_diff=0.004940
INFO:deployment.core.evaluation.verification_mixin:
Overall Max difference: 0.221999
INFO:deployment.core.evaluation.verification_mixin: Overall Mean difference: 0.004347
WARNING:deployment.core.evaluation.verification_mixin: tensorrt (cuda:0) verification FAILED ✗ (max diff: 0.221999 > tolerance: 0.100000)
INFO:deployment.core.evaluation.verification_mixin:
There was a problem hiding this comment.
Do you know any reason why it fail? Since it seems like a verification, it's always better to check the reason rather than update the tolerance
There was a problem hiding this comment.
It doesn't necessarily indicate a failure.
When converting from PyTorch to TensorRT, some numerical differences are expected due to different kernels, precision handling, and TensorRT optimizations.
The verification is mainly used as a safeguard to detect major issues (e.g., incorrect conversion settings) rather than to enforce exact numerical equivalence.
There was a problem hiding this comment.
Since 1e-1 is when we set for resnet18 for calibration classification, it is different in the cases.
There was a problem hiding this comment.
Btw, this is the verification result in tensorrt fp16 right? If that's the case, it makes sense
There was a problem hiding this comment.
Anyway, 5e-1 can be a better value
There was a problem hiding this comment.
Running onnx (cuda:0) reference...
2026-03-10 15:20:07.511273431 [V:onnxruntime:, execution_steps.cc:103 Execute] stream 0 activate notification with index 0
2026-03-10 15:20:07.567219724 [V:onnxruntime:, execution_steps.cc:47 Execute] stream 0 wait on Notification with id: 0
INFO:deployment.core.evaluation.verification_mixin: onnx (cuda:0) latency: 1423.80 ms
INFO:deployment.core.evaluation.verification_mixin:
Running tensorrt (cuda:0) test...
INFO:deployment.core.evaluation.verification_mixin: tensorrt (cuda:0) latency: 1141.26 ms
INFO:deployment.core.evaluation.verification_mixin: output[heatmap]: shape=(1, 5, 510, 510), max_diff=0.464849, mean_diff=0.056135
INFO:deployment.core.evaluation.verification_mixin: output[reg]: shape=(1, 2, 510, 510), max_diff=0.056639, mean_diff=0.006198
INFO:deployment.core.evaluation.verification_mixin: output[height]: shape=(1, 1, 510, 510), max_diff=0.227012, mean_diff=0.065522
INFO:deployment.core.evaluation.verification_mixin: output[dim]: shape=(1, 3, 510, 510), max_diff=0.336713, mean_diff=0.028087
INFO:deployment.core.evaluation.verification_mixin: output[rot]: shape=(1, 2, 510, 510), max_diff=0.515039, mean_diff=0.023962
INFO:deployment.core.evaluation.verification_mixin: output[vel]: shape=(1, 2, 510, 510), max_diff=0.932002, mean_diff=0.034206
INFO:deployment.core.evaluation.verification_mixin:
Overall Max difference: 0.932002
INFO:deployment.core.evaluation.verification_mixin: Overall Mean difference: 0.037279
WARNING:deployment.core.evaluation.verification_mixin: tensorrt (cuda:0) verification FAILED ✗ (max diff: 0.932002 > tolerance: 0.500000)
On a different computer, it can have different values.
I will leave 1 for now
There was a problem hiding this comment.
Did you set any random seed to set this validation since the randomness (for example, shuffling pointclouds) significantly affects the results. Otherwise, i believe the difference between computer is too huge
There was a problem hiding this comment.
Note that the reported difference corresponds to the maximum deviation; the mean difference is actually quite small.
Additionally, the magnitude of the difference depends heavily on the hardware. For example, on Blackwell GPUs (ONNX CUDA vs. TensorRT), the discrepancy is minimal. In contrast, on my laptop, the difference between ONNX CUDA and TensorRT is around 1. Even when forcing ONNX Runtime to use CUDA only, it still initializes a default CPU executor and executes some operations on the CPU, which can introduce discrepancies.
Interestingly, when comparing ONNX CPU with TensorRT on my laptop, the difference becomes very small. However, on Blackwell, the ONNX CPU vs. TensorRT comparison shows a larger gap.
caa92a6 to
93e5558
Compare
de7020e to
6470ac5
Compare
|
Some of the modules, for example, |
| model_cfg = Config.fromfile(args.model_cfg) | ||
| config = BaseDeploymentConfig(deploy_cfg) | ||
|
|
||
| _validate_required_components(config.components_cfg) |
There was a problem hiding this comment.
move _validate_required_components to BaseDeploymentConfig
There was a problem hiding this comment.
This only validates the needed name for Centerpoint
|
|
||
| context = CenterPointExportContext(rot_y_axis_reference=bool(getattr(args, "rot_y_axis_reference", False))) | ||
| runner.run(context=context) | ||
| return 0 |
There was a problem hiding this comment.
Do we need to return status code here?
There was a problem hiding this comment.
run() is annotated as -> int and documented as returning an exit code for the unified CLI (main.py)
| def _release_gpu_resources(self) -> None: | ||
| """Release TensorRT resources (engines and contexts) and CUDA events.""" | ||
| # Destroy CUDA events | ||
| if hasattr(self, "_backbone_start_event"): |
There was a problem hiding this comment.
Use for-loop to achieve this
| } | ||
|
|
||
| for component_name, engine_path in engine_files.items(): | ||
| if not osp.exists(engine_path): |
There was a problem hiding this comment.
This error validation should be done in resolve_artifact_path
There was a problem hiding this comment.
thanks, it is actually duplicated code! fixed in 90d1404
5256306 to
2b28f60
Compare
1ca0e1c to
a6b9840
Compare
715bf79 to
a209d2b
Compare
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
…erpoint Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
a209d2b to
90d1404
Compare
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
|
@KSeangTan
Regarding this, I would like to change those names that can be reused for bevfusion in other PR |
Summary
Integrates CenterPoint into the unified deployment framework, enabling deployment and evaluation of ONNX and TensorRT models.
Note, this PR include changes in #180
Changes
projects/CenterPointtodeployment/projects/centerpointdeploy.pyscript with new unified CLI (deployment.cli.main)Migration Notes
projects/CenterPoint/scripts/deploy.py) is removedpython -m deployment.cli.main centerpoint <deploy_config> <model_config>deployment.projects.centerpoint.onnx_modelsHow to run
Exported ONNX (Same)
Voxel Encoder

Backbone Head
