Skip to content

[WIP] Incorrect scale / alignment of depth or world points when using SparkFastVGGT in VGGT-SLAM2.0#39

Open
stepeos wants to merge 2 commits into
MIT-SPARK:mainfrom
stepeos:main
Open

[WIP] Incorrect scale / alignment of depth or world points when using SparkFastVGGT in VGGT-SLAM2.0#39
stepeos wants to merge 2 commits into
MIT-SPARK:mainfrom
stepeos:main

Conversation

@stepeos
Copy link
Copy Markdown

@stepeos stepeos commented Apr 12, 2026

Hi!

I’ve been working on integrating the FastVGGT model as a VGGT backend for VGGT-SLAM2.0.

Setup / Context

To achieve this, I merged the MIT Spark VGGT fork with the FastVGGT model into a combined repository:

https://github.com/stepeos/SparkFastVGGT.git

This combined model supports two modes:

  • Without compute_similarity (MIT Spark VGGT path):

    • Uses FastVGGT token merging in attention layers
    • Significantly more memory efficient
  • With compute_similarity enabled (original VGGT behavior):

    • Token merging is disabled, since it likely interferes with similarity computation

Both modes work as expected in isolation.


Problem

I encounter issues when using SparkFastVGGT as the backend in VGGT-SLAM2.0.

Specifically, I observe that the depth/scale of the reconstructed point cloud appears incorrect.

I suspect the issue may originate in one of the following stages:

  • set_point_cloud
  • get_points_in_world_frame
  • or an intermediate transformation step

(I want to add, that I also tried using the world_points with enable_points enabled and reprojecting the points into camera system)

It looks like there may be an additional transformation applied (possibly SL(4) on top of SE(3)) that affects the dense per-frame point cloud in world coordinates.


Observations

What makes this particularly confusing is that everything seems consistent in isolation:

  • In demo_viser.py from SparkFastVGGT:
    • The world_points match the points obtained from unproject_depth_map_to_point_map
    • These results also match the original VGGT implementation

So the point cloud generation itself appears correct.

However, when integrated into VGGT-SLAM2.0, the scale / alignment becomes incorrect.

The same goes for poses. The scale of the SL4 after optimization causes submaps to drift apart as can be see in the screenshot below.=


Expected Behavior

The reconstructed point cloud should match the correct world-scale geometry as seen in:

  • demo_viser.py
  • original VGGT implementation

Actual Behavior

When running VGGT-SLAM2.0 with SparkFastVGGT backend, the reconstruction shows incorrect scale / misalignment.

Example (office loop):

SparkFastVGGT in VGGT-SLAM2.0:

Office loop misalignment example

Question

Could you lead me in the right direction to look into the incorrect transformations?

Any pointers on where an unintended scale or transform might be introduced would be greatly appreciated.
I think it would be huge win to get the slam only part working with FastVGGT as backend.

@stepeos stepeos changed the title Incorrect scale / alignment of depth or world points when using SparkFastVGGT in VGGT-SLAM2.0 [WIP] Incorrect scale / alignment of depth or world points when using SparkFastVGGT in VGGT-SLAM2.0 Apr 12, 2026
@stepeos
Copy link
Copy Markdown
Author

stepeos commented May 6, 2026

@Dominic101 any advice on where to look at would be greatly appreciated :)

@Dominic101
Copy link
Copy Markdown
Collaborator

Dominic101 commented May 7, 2026

Hi @stepeos, thanks for your interest in VGGT-SLAM. The problem is almost certainly that you need to modify FastVGGT's version of unprotect_depth_map_point_map to return points that are defined wrt each camera instead of wrt to the first camera. You can see how I changed VGGT's function in VGGT_SPARK here https://github.com/MIT-SPARK/VGGT_SPARK/blob/6e6e16107b88e8e76c751826af10d4295d87ecd2/vggt/utils/geometry.py#L15. The reason this change is needed is because the homography matrices we compute assume points are defined wrt each camera.

By the way: if you get a chance to include some timing comparison showing the speed-up of VGGT-SLAM with FastVGGT that would be awesome

@stepeos
Copy link
Copy Markdown
Author

stepeos commented May 7, 2026

Wow I completely missed that, thank you so much for taking the time to help me out here, I appreciate it it! I will definitely do that, but I can already say it's significant, since each inference batch has more images (because of less VRAM) while having faster inference. I will include a speed comparison script for the office loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants