Official implementation of paper "VLM³: Vision Language Models Are Native 3D Learners".
image-matching depth-estimation camera-pose-estimation large-language-models vlms 3d-foundation-model object-level-3d
-
Updated
Jun 1, 2026 - Jupyter Notebook