Skip to content

feat(deployment): centerpoint deployment integration#181

Open
vividf wants to merge 35 commits intotier4:feat/new_deployment_and_evaluation_pipelinefrom
vividf:feat/centerpoint_deployment_integration
Open

feat(deployment): centerpoint deployment integration#181
vividf wants to merge 35 commits intotier4:feat/new_deployment_and_evaluation_pipelinefrom
vividf:feat/centerpoint_deployment_integration

Conversation

@vividf
Copy link
Collaborator

@vividf vividf commented Feb 2, 2026

Summary

Integrates CenterPoint into the unified deployment framework, enabling deployment and evaluation of ONNX and TensorRT models.

Note, this PR include changes in #180

Changes

  • Integrated CenterPoint with deployment framework:
    • Moved deployment code from projects/CenterPoint to deployment/projects/centerpoint
    • Implemented component-based export pipeline for ONNX and TensorRT
    • Added runtime inference support with PyTorch, ONNX Runtime, and TensorRT backends
  • Deployment capabilities:
    • Export CenterPoint models to ONNX format
    • Export CenterPoint models to TensorRT engines
    • Component-based architecture (voxel encoder, backbone+head) for flexible deployment
  • Evaluation capabilities:
    • Evaluate ONNX models using ONNX Runtime
    • Evaluate TensorRT engines
    • Integrated metrics evaluation with deployment pipeline
  • Updated CLI: Replaced old deploy.py script with new unified CLI (deployment.cli.main)
  • Added Docker support: Created Dockerfile for deployment environment with TensorRT dependencies
  • Updated documentation: Added deployment and evaluation instructions in README

Migration Notes

  • Old deployment script (projects/CenterPoint/scripts/deploy.py) is removed
  • Use new CLI: python -m deployment.cli.main centerpoint <deploy_config> <model_config>
  • ONNX model variants are now registered via deployment.projects.centerpoint.onnx_models

How to run

python -m deployment.cli.main centerpoint deployment/projects/centerpoint/config/deploy_config.py   projects/CenterPoint/configs/t4dataset/Centerpoint/second_secfpn_8xb16_121m_j6gen2_base_amp_t4metric_v2.py   --rot-y-axis-reference

Exported ONNX (Same)

Voxel Encoder
image

Backbone Head
image

@vividf vividf changed the title Feat/centerpoint deployment integration feat(deployment): centerpoint deployment integration Feb 2, 2026
@vividf vividf requested review from KSeangTan and yamsam February 2, 2026 16:33
@vividf vividf self-assigned this Feb 2, 2026
@vividf vividf marked this pull request as ready for review February 3, 2026 04:31
@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch 2 times, most recently from bfb778f to 441d06e Compare February 16, 2026 06:08
Copy link
Collaborator

@KSeangTan KSeangTan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done the first round of reviewing, please consider to use dataclass and pydantic for configs, and do type checking there.

Therefore, we can remove all the type checking in the code

verification = dict(
enabled=False,
tolerance=1e-1,
tolerance=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what is tolerance here, and why updating from 0.1 to 1

Copy link
Collaborator Author

@vividf vividf Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value was originally set for calibration classification and later copied to CenterPoint, but it does not work correctly for CenterPoint.

INFO:deployment.core.evaluation.verification_mixin:  tensorrt (cuda:0) latency: 205.08 ms
INFO:deployment.core.evaluation.verification_mixin:  output[heatmap]: shape=(1, 5, 510, 510), max_diff=0.070197, mean_diff=0.007674
INFO:deployment.core.evaluation.verification_mixin:  output[reg]: shape=(1, 2, 510, 510), max_diff=0.007944, mean_diff=0.001120
INFO:deployment.core.evaluation.verification_mixin:  output[height]: shape=(1, 1, 510, 510), max_diff=0.025401, mean_diff=0.002122
INFO:deployment.core.evaluation.verification_mixin:  output[dim]: shape=(1, 3, 510, 510), max_diff=0.031920, mean_diff=0.001143
INFO:deployment.core.evaluation.verification_mixin:  output[rot]: shape=(1, 2, 510, 510), max_diff=0.075215, mean_diff=0.004582
INFO:deployment.core.evaluation.verification_mixin:  output[vel]: shape=(1, 2, 510, 510), max_diff=0.221999, mean_diff=0.004940
INFO:deployment.core.evaluation.verification_mixin:
  Overall Max difference: 0.221999
INFO:deployment.core.evaluation.verification_mixin:  Overall Mean difference: 0.004347
WARNING:deployment.core.evaluation.verification_mixin:  tensorrt (cuda:0) verification FAILED ✗ (max diff: 0.221999 > tolerance: 0.100000)
INFO:deployment.core.evaluation.verification_mixin:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know any reason why it fail? Since it seems like a verification, it's always better to check the reason rather than update the tolerance

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't necessarily indicate a failure.
When converting from PyTorch to TensorRT, some numerical differences are expected due to different kernels, precision handling, and TensorRT optimizations.

The verification is mainly used as a safeguard to detect major issues (e.g., incorrect conversion settings) rather than to enforce exact numerical equivalence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 1e-1 is when we set for resnet18 for calibration classification, it is different in the cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this is the verification result in tensorrt fp16 right? If that's the case, it makes sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, 5e-1 can be a better value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running onnx (cuda:0) reference...
2026-03-10 15:20:07.511273431 [V:onnxruntime:, execution_steps.cc:103 Execute] stream 0 activate notification with index 0
2026-03-10 15:20:07.567219724 [V:onnxruntime:, execution_steps.cc:47 Execute] stream 0 wait on Notification with id: 0
INFO:deployment.core.evaluation.verification_mixin:  onnx (cuda:0) latency: 1423.80 ms
INFO:deployment.core.evaluation.verification_mixin:
Running tensorrt (cuda:0) test...
INFO:deployment.core.evaluation.verification_mixin:  tensorrt (cuda:0) latency: 1141.26 ms
INFO:deployment.core.evaluation.verification_mixin:  output[heatmap]: shape=(1, 5, 510, 510), max_diff=0.464849, mean_diff=0.056135
INFO:deployment.core.evaluation.verification_mixin:  output[reg]: shape=(1, 2, 510, 510), max_diff=0.056639, mean_diff=0.006198
INFO:deployment.core.evaluation.verification_mixin:  output[height]: shape=(1, 1, 510, 510), max_diff=0.227012, mean_diff=0.065522
INFO:deployment.core.evaluation.verification_mixin:  output[dim]: shape=(1, 3, 510, 510), max_diff=0.336713, mean_diff=0.028087
INFO:deployment.core.evaluation.verification_mixin:  output[rot]: shape=(1, 2, 510, 510), max_diff=0.515039, mean_diff=0.023962
INFO:deployment.core.evaluation.verification_mixin:  output[vel]: shape=(1, 2, 510, 510), max_diff=0.932002, mean_diff=0.034206
INFO:deployment.core.evaluation.verification_mixin:
  Overall Max difference: 0.932002
INFO:deployment.core.evaluation.verification_mixin:  Overall Mean difference: 0.037279
WARNING:deployment.core.evaluation.verification_mixin:  tensorrt (cuda:0) verification FAILED ✗ (max diff: 0.932002 > tolerance: 0.500000)

On a different computer, it can have different values.
I will leave 1 for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you set any random seed to set this validation since the randomness (for example, shuffling pointclouds) significantly affects the results. Otherwise, i believe the difference between computer is too huge

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the reported difference corresponds to the maximum deviation; the mean difference is actually quite small.

Additionally, the magnitude of the difference depends heavily on the hardware. For example, on Blackwell GPUs (ONNX CUDA vs. TensorRT), the discrepancy is minimal. In contrast, on my laptop, the difference between ONNX CUDA and TensorRT is around 1. Even when forcing ONNX Runtime to use CUDA only, it still initializes a default CPU executor and executes some operations on the CPU, which can introduce discrepancies.

Interestingly, when comparing ONNX CPU with TensorRT on my laptop, the difference becomes very small. However, on Blackwell, the ONNX CPU vs. TensorRT comparison shows a larger gap.

@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch from caa92a6 to 93e5558 Compare March 5, 2026 17:24
@vividf vividf changed the base branch from feat/new_deployment_and_evaluation_pipeline to main March 5, 2026 17:27
@vividf vividf changed the base branch from main to feat/new_deployment_and_evaluation_pipeline March 5, 2026 17:27
@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch 3 times, most recently from de7020e to 6470ac5 Compare March 10, 2026 14:40
@KSeangTan
Copy link
Collaborator

Some of the modules, for example, dataloader should be able to be reused for the same detection3d tasks right?

model_cfg = Config.fromfile(args.model_cfg)
config = BaseDeploymentConfig(deploy_cfg)

_validate_required_components(config.components_cfg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move _validate_required_components to BaseDeploymentConfig

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only validates the needed name for Centerpoint


context = CenterPointExportContext(rot_y_axis_reference=bool(getattr(args, "rot_y_axis_reference", False)))
runner.run(context=context)
return 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return status code here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run() is annotated as -> int and documented as returning an exit code for the unified CLI (main.py)

def _release_gpu_resources(self) -> None:
"""Release TensorRT resources (engines and contexts) and CUDA events."""
# Destroy CUDA events
if hasattr(self, "_backbone_start_event"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use for-loop to achieve this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f48f5f7

@vividf vividf requested a review from KSeangTan March 11, 2026 04:01
}

for component_name, engine_path in engine_files.items():
if not osp.exists(engine_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error validation should be done in resolve_artifact_path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it is actually duplicated code! fixed in 90d1404

@vividf vividf force-pushed the feat/new_deployment_and_evaluation_pipeline branch from 5256306 to 2b28f60 Compare March 11, 2026 04:27
@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch from 1ca0e1c to a6b9840 Compare March 11, 2026 04:28
@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch 2 times, most recently from 715bf79 to a209d2b Compare March 25, 2026 13:38
vividf added 4 commits March 26, 2026 00:23
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
vividf and others added 26 commits March 26, 2026 00:23
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
…erpoint

Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch from a209d2b to 90d1404 Compare March 25, 2026 15:23
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
@vividf vividf requested a review from KSeangTan March 25, 2026 16:23
@vividf
Copy link
Collaborator Author

vividf commented Mar 25, 2026

@KSeangTan
Thanks for the detailed review!!

Some of the modules, for example, dataloader should be able to be reused for the same detection3d tasks right?

Regarding this, I would like to change those names that can be reused for bevfusion in other PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants