[WIP] Incorrect scale / alignment of depth or world points when using SparkFastVGGT in VGGT-SLAM2.0#39
[WIP] Incorrect scale / alignment of depth or world points when using SparkFastVGGT in VGGT-SLAM2.0#39stepeos wants to merge 2 commits into
Conversation
|
@Dominic101 any advice on where to look at would be greatly appreciated :) |
|
Hi @stepeos, thanks for your interest in VGGT-SLAM. The problem is almost certainly that you need to modify FastVGGT's version of unprotect_depth_map_point_map to return points that are defined wrt each camera instead of wrt to the first camera. You can see how I changed VGGT's function in VGGT_SPARK here https://github.com/MIT-SPARK/VGGT_SPARK/blob/6e6e16107b88e8e76c751826af10d4295d87ecd2/vggt/utils/geometry.py#L15. The reason this change is needed is because the homography matrices we compute assume points are defined wrt each camera. By the way: if you get a chance to include some timing comparison showing the speed-up of VGGT-SLAM with FastVGGT that would be awesome |
|
Wow I completely missed that, thank you so much for taking the time to help me out here, I appreciate it it! I will definitely do that, but I can already say it's significant, since each inference batch has more images (because of less VRAM) while having faster inference. I will include a speed comparison script for the office loop. |
Hi!
I’ve been working on integrating the FastVGGT model as a VGGT backend for VGGT-SLAM2.0.
Setup / Context
To achieve this, I merged the MIT Spark VGGT fork with the FastVGGT model into a combined repository:
https://github.com/stepeos/SparkFastVGGT.git
This combined model supports two modes:
Without
compute_similarity(MIT Spark VGGT path):With
compute_similarityenabled (original VGGT behavior):Both modes work as expected in isolation.
Problem
I encounter issues when using SparkFastVGGT as the backend in VGGT-SLAM2.0.
Specifically, I observe that the depth/scale of the reconstructed point cloud appears incorrect.
I suspect the issue may originate in one of the following stages:
set_point_cloudget_points_in_world_frame(I want to add, that I also tried using the world_points with
enable_pointsenabled and reprojecting the points into camera system)It looks like there may be an additional transformation applied (possibly SL(4) on top of SE(3)) that affects the dense per-frame point cloud in world coordinates.
Observations
What makes this particularly confusing is that everything seems consistent in isolation:
demo_viser.pyfrom SparkFastVGGT:world_pointsmatch the points obtained fromunproject_depth_map_to_point_mapSo the point cloud generation itself appears correct.
However, when integrated into VGGT-SLAM2.0, the scale / alignment becomes incorrect.
The same goes for poses. The scale of the SL4 after optimization causes submaps to drift apart as can be see in the screenshot below.=
Expected Behavior
The reconstructed point cloud should match the correct world-scale geometry as seen in:
demo_viser.pyActual Behavior
When running VGGT-SLAM2.0 with SparkFastVGGT backend, the reconstruction shows incorrect scale / misalignment.
Example (office loop):
SparkFastVGGT in VGGT-SLAM2.0:
Question
Could you lead me in the right direction to look into the incorrect transformations?
Any pointers on where an unintended scale or transform might be introduced would be greatly appreciated.
I think it would be huge win to get the slam only part working with FastVGGT as backend.