Python implementation of Structure-based visual localization. Features were detected and matched using SIFT. Triangulation was used to create a sparse 3D reconstruction, as is visualized below. 2D-3D matches are established from new query images. Finally, the camera pose is estimated using PnP inside a RANSAC loop.
A comparison between SIFT and ORB features were initially performed to choose which feature detector and descriptor to go with. Although ORB was significantly faster, too few matches were found in comparison to SIFT, which caused a very sparse 3D reconstruction.
Below is an example showing a sequence of 2D images. The second image shows the 3d reconstruction and the pose of the camera relative to the book in each frame.
Change the images and camera calibration files in the /data folder. Change im1.png, im2.png to consecutive images in your dataset. Change K1.txt, K2.txt to be the camera calibration matrix of the camera that captured im1.png and im2.png.
python3 feat_matcher.py
This script outputs the data/matches_sift.txt file needed for 3D reconstruction and pose estimation.
cd soln_python
python3 estimatepose.py