Hi authors,
Thanks for the great work on the Hypo3D benchmark!
I am currently trying to evaluate different models using video inputs. However, I noticed that the RGB-D video data for 3D-VLMs mentioned in the paper does not seem to be available in the dataset. Additionally, the images currently provided appear to be heavily processed and lose some fine-grained details compared to real-world observations.
Could you please consider releasing the corresponding high-quality image sequences (video data)? Alternatively, if the raw video data cannot be directly shared, would it be possible to provide the scripts used to acquire or render these video sequences?
This would be incredibly helpful for evaluating model performance under video inputs. Thanks in advance!
Hi authors,
Thanks for the great work on the Hypo3D benchmark!
I am currently trying to evaluate different models using video inputs. However, I noticed that the RGB-D video data for 3D-VLMs mentioned in the paper does not seem to be available in the dataset. Additionally, the images currently provided appear to be heavily processed and lose some fine-grained details compared to real-world observations.
Could you please consider releasing the corresponding high-quality image sequences (video data)? Alternatively, if the raw video data cannot be directly shared, would it be possible to provide the scripts used to acquire or render these video sequences?
This would be incredibly helpful for evaluating model performance under video inputs. Thanks in advance!