Demonstration of ORBSLAM2 with GPU enhancements running on a TX2 mode 0. On the TX1 without cuda the KFs was around 40.
Project Proposal
- ORB-SLAM2: an Open-Source SLAM Systemfor Monocular, Stereo and RGB-D Cameras. Graph SLAM solver in Python. A complete, extensible Graph SLAM solver written in Python that combines ease of use with g2o-like functionality.
- ORB-SLAM2 threads must be stopped to assure map is not being modify while saving. Actual directory must be set to the desired destination. Often a new directory exclusive for the map is created. Options must be set.
- We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities. The system works in real-time on standard CPUs in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city.
Project Demo
Enable GPU optimizations in ORB_SLAM2 to achieve real time SLAM on the Jetson TX1 embedded computer.
Orb Slam2 Ros
Switch from OpenCV 2.4 to OpenCV 3.1
Orb Slam2 Ros
OpenCV 3.1 introduces several features helpful to this project: custom memory allocator, CUDA stream and rewrite of some essential algorithms, such as FAST and ORB.These features allow us to fully utilize more CUDA APIs, such as Unified Memory.
Feature extraction reimplemented
There are several execution hotspots in the original ORB_SLAM2
, including but not limited toprocedures like FAST corner detection
, Gaussian filter
and ORB feature extraction
.For example, in their Key Point extraction
procedure, an image is divided into many small tilesand FAST
is invoked on each tile one or two times in order to achieve high accuracy.The algorithm was effective yet inefficient.Hence we implemented a slightly modified version of it in CUDA and parallelized the workof each tile. ORB feature extraction
is also a costly but parallelizable procedure, so it's implemented with CUDA, too.
Overlap CPU and GPU execution
However, there are still some irregular code segments that cannot be parallelized. So our next goal is to maximize CPU/GPU overlap. Ideally if a CPU work is completed before a GPU kernel ends, thenthe CPU work would be considered 'free'; unfortunately, many CPU work have data dependencies on other GPU results,thus CPU/GPU work scheduling must be done wisely.With the help of many profiles (thanks to NVVP), we've figured out a pretty good scheduling schemeto pipeline CPU and GPU work, such that GPU is kept as busy as possible while CPU can overlap manyof it's execution with GPU.
The purple bars on the row 'Default domain' indicates CPU work and the 'Compute' row indicates GPU work.
Results
Following are some charts of the speedups we achieved on an ordinary PC and on a Jetson TX1.The PC's CPU/GPU is Xeon E3 1231 / GTX 760.The statistics were mesured using chosen sequences of the KITTI dataset and live captured images from the camera module on top of TX1.
After enabling GPU optimization, the fps of live camera tracking is increased from 5.98 to 14.42 and frame processing time is reduced from 0.166s to 0.068s !
Orb Slam2 Ros
Contacts
Orb Slam2 Paper
- B03902035 黃兆緯 b03902035@ntu.edu.tw
- B03902074 陳耘志 b03902074@ntu.edu.tw
- B03902082 江懿友 b03902082@ntu.edu.tw