Visual-SLAM: Loop Closure and Relocalization

Introduction

It’s semester break right now, I finally have more time to do something that I wanted to do for so long.

I would like to build a Visual-SLAM system on my own…

However, to be honest, it’s not gonna be purely on my own since it would be rather relatively time-consuming. Therefore, I decided to build my Visual-SLAM system upon the Visual-Odometry (VO) framework that we have been building throughout the practical course Vision-based Navigation IN2106. (You can check out my previous post of Visual-Inertial Tracking using Preintegrated Factors which was the final project I built with my teammate Nils.) So, let’s go through what we already have in the VO system in the next section.

Visual Odometry

During the first phase of the practical course, we were expected to construct a visual odometry (VO) method which uses image pairs from a stereo camera setup to perform VO. Towards this goal, based on a single initial image pair, the VO extracts features from both images using ORB keypoints [1] and matches them between the two images using BRIEF descriptors [2]. Matches are further distilled using the epipolar constraint and RANSAC [3]. Now, a map is initialized based on an initial camera pair, and the observed features are triangulated between the two cameras using the matched descriptors. They are added to the map as 3D landmarks (map points). After this initialization, we iteratively add new camera pairs and corresponding landmarks using PnP in a RANSAC scheme. As minor errors occur in all of the previous steps we have to optimize the 6DoF pose of all cameras as well as the 3DoF position of all landmarks using Bundle Adjustment. To make this optimization real-time capable, we do not add all frames to the optimization problem, but only keyframes that are created based on a set of criterions. A sliding optimization window is additionally used, to further decrease the number of parameters in the optimization problem and in order to keep the size relatively constant over time. Old keyframes and their respective observed 3D landmarks are discarded and will no longer be optimized in the Bundle Adjustment problem. Even though visual odometry can already accomplish quite nice results in the simple dataset, used during the exercises (EuRoC V1_1_easy), it’s worth noting that the capability of the VO is still quite limited. Therefore, the goal of this project is to extend the current VO baseline to a more sophisticated Visual SLAM system.

Additional Modules

Construct Covisibility Graph

In order to make all the magic happen, the covisibility graph needs to be constructed. The graph $G=(\mathcal{N},\mathcal{E})$ contains keyframes $\in \mathcal{N}$ and weights $\theta \in \mathcal{E}$. The edge is linked between two keyframes if there are more than 10 common map points ( $\theta$ > 10) observed. Apart from the covisibility graph, we also have an essential graph which is the combination of the subset of covisibility graph with edges ( $\theta$ > 30) and the spanning tree (a graph that links between 2 consecutive keyframes).

Covisibility graph

Essential graph

Spanning tree

Bag-of-Word Place Recognition

In order to perform relocalization and loop closure, we have to have criteria to select previously-visited keyframes. That is why bag-of-word was born. Bag-of-word basically transforms the image into a bag-of-word vector which could be quite robust to calculate a score for place recognition. For more detail, please check the repository.

https://github.com/dorian3d/DBoW2