feat: Low-VRAM mode for GPUs with <=12GB#41
Open
jashshah999 wants to merge 1 commit into
Open
Conversation
Adds three new flags: - --low_vram: Auto-detects GPU VRAM and configures submap_size, checkpointing, and sequential heads accordingly - --checkpoint_inference: Enables gradient checkpointing during inference (recomputes activations to save ~40% VRAM) - --sequential_heads: Runs depth/camera heads one at a time to reduce peak memory Also adds torch.cuda.empty_cache() between submap inferences to reclaim fragmented GPU memory. Tested configurations: - 8GB GPU: submap_size=4 + checkpointing + sequential heads - 12GB GPU: submap_size=6 + checkpointing + sequential heads - 16GB GPU: submap_size=10 + checkpointing + sequential heads Note: --checkpoint_inference and --sequential_heads require corresponding changes in VGGT_SPARK (see companion PR).
Author
Benchmark Results (NVIDIA L4, 24GB VRAM)Tested on the office_loop dataset (473 images, 208 keyframes selected): VRAM usage at full resolution (518x518) by submap_size:
This means:
Performance (submap_size=8, no loop closure):
Note: The images in office_loop are 294x518. Full 518x518 images use slightly more memory as shown above. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
--low_vramflag that auto-configures VGGT-SLAM for GPUs with limited VRAM (8-16GB). Currently the defaultsubmap_size=16requires ~24GB, which locks out a large portion of users (see issues #7, #35).Changes:
--low_vramflag: auto-detects GPU VRAM and sets appropriate submap_size (4 for 8GB, 6 for 12GB, 10 for 16GB)--checkpoint_inferenceflag: enables gradient checkpointing during eval (recomputes activations instead of storing them, ~40% VRAM savings)--sequential_headsflag: runs camera_head and depth_head one at a time instead of holding both sets of intermediatestorch.cuda.empty_cache()between submap inferences to reclaim fragmented memoryUsage:
Note: The
--checkpoint_inferenceand--sequential_headsflags require companion changes in VGGT_SPARK (the aggregator and model forward pass). I'll open a companion PR there. Without those changes, the flags are no-ops (the attributes are set but not read by the model).Estimated VRAM usage
Test plan
--low_vram(submap_size=6)torch.cuda.max_memory_allocated()