GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats

Introduction

GigaSLAM introduces the first RGB NeRF / 3D Gaussian Splatting (3DGS)-based SLAM framework specifically engineered for kilometer-scale outdoor environments. Traditional Neural Radiance Fields (NeRF) and 3DGS SLAM methods are typically confined to smaller, bounded indoor settings. GigaSLAM overcomes these limitations by employing a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This innovative design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes, as demonstrated on challenging datasets like KITTI, KITTI 360, 4 Seasons, and A2D2.

Installation

To get GigaSLAM up and running, follow these steps. Note that the project relies on CUDA/C++ components and benefits from a robust hardware environment, ideally with 32+ GiB CPU RAM for processing ultra-long sequences.

1. Hardware and System Environment

The project was developed and tested on systems with NVIDIA RTX 4090/L20 GPUs, Intel Xeon CPUs, and Ubuntu 22.04.3 LTS with CUDA 11.8.

2. Environment Setup

First, create a conda virtual environment and install PyTorch (version 2.2.0 with CUDA 11.8 is recommended):

conda create -n gigaslam python=3.10
conda activate gigaslam
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Special attention should be paid to torch_scatter and xformers dependencies. If torch_scatter installation fails, manually download and install the wheel from the PyTorch Geometric website. For xformers-0.0.24, ensure compatibility with PyTorch 2.2.0 to avoid forced uninstallation of your PyTorch version.

3. Compiling CUDA/C++ Modules

Install the necessary CUDA/C++ components:

3D GS Rendering Module:

pip install submodules/simple-knn
pip install submodules/diff-gaussian-rasterization

Loop-Closure Detection Module: Install libopencv-dev, compile and install DBoW2, then install DPRetrieval.

sudo apt-get install -y libopencv-dev
cd DBoW2
mkdir -p build && cd build
cmake ..
make
sudo make install
cd ../..
pip install ./DPRetrieval

Loop-Closure Correction Module:
```
python setup.py install
```

4. Bag of Words Model Setup

Download and extract the pre-trained Bag of Words vocabulary for DBoW2:

wget https://github.com/UZ-SLAMLab/ORB_SLAM3/raw/master/Vocabulary/ORBvoc.txt.tar.gz
tar -xzvf ORBvoc.txt.tar.gz

Examples

To run GigaSLAM, you first need to modify the .yaml configuration files located in the ./config directory. Update the Dataset path to your local dataset (in PNG or JPG format) and specify the camera intrinsics (fx, fy, cx, cy).

For instance, an example configuration might look like this:

Dataset:
  color_path: "/media/deng/Data/4SeasonsDataset/BusinessCampus_recording_2020-10-08_09-30-57/undistorted_images/cam0"
  Calibration:
    fx: 501.4757919305817

Once configured, execute the SLAM process using the following command:

python slam.py --config ./configs/kitti_06.yaml

Pretrained weights for DISK, LightGlue, and UniDepth will be automatically downloaded on the first execution. If you encounter issues loading UniDepth models from HuggingFace due to network restrictions, you can try setting export HF_ENDPOINT=https://hf-mirror.com or manually download the model weights and specify their local path in the configuration.

If ['SLAM']['viz'] is set to True in your .yaml file, you will observe real-time visualization outputs in the results/your_exps/ directory during execution, showcasing the tracking and mapping process.

Why Use GigaSLAM?

GigaSLAM offers significant advantages for researchers and developers working in computer vision and robotics:

Unprecedented Scale: It is the first RGB NeRF/3DGS-based SLAM system capable of operating in kilometer-scale outdoor environments, a major breakthrough for this class of methods.
High-Fidelity Mapping: Utilizes hierarchical Gaussian splats and neural networks to achieve highly detailed and visually faithful 3D reconstructions.
Robust Tracking: Employs a sophisticated front-end tracking mechanism combining a metric depth model, epipolar geometry, PnP algorithms, and a Bag-of-Words-based loop closure for accurate and consistent pose estimation over long trajectories.
Efficiency and Scalability: The hierarchical sparse voxel map representation ensures efficient data handling and scalability, making it suitable for large and unbounded scenes.
Monocular Input: Achieves impressive results using only monocular RGB input, reducing sensor complexity and cost.

GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats

Summary

Repository Info

Tags