GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats

GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats

Summary

GigaSLAM is a groundbreaking monocular SLAM framework designed for kilometer-scale outdoor environments. It leverages hierarchical Gaussian splats and neural networks to achieve efficient, scalable mapping and high-fidelity rendering. This system addresses the challenges of large-scale tracking and mapping using only RGB input, extending the applicability of Gaussian Splatting SLAM to unbounded outdoor scenes.

Repository Info

Updated on February 28, 2026
View on GitHub

Tags

Click on any tag to explore related repositories

Introduction

GigaSLAM introduces the first RGB NeRF / 3D Gaussian Splatting (3DGS)-based SLAM framework specifically engineered for kilometer-scale outdoor environments. Traditional Neural Radiance Fields (NeRF) and 3DGS SLAM methods are typically confined to smaller, bounded indoor settings. GigaSLAM overcomes these limitations by employing a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This innovative design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes, as demonstrated on challenging datasets like KITTI, KITTI 360, 4 Seasons, and A2D2.

Installation

To get GigaSLAM up and running, follow these steps. Note that the project relies on CUDA/C++ components and benefits from a robust hardware environment, ideally with 32+ GiB CPU RAM for processing ultra-long sequences.

1. Hardware and System Environment

The project was developed and tested on systems with NVIDIA RTX 4090/L20 GPUs, Intel Xeon CPUs, and Ubuntu 22.04.3 LTS with CUDA 11.8.

2. Environment Setup

First, create a conda virtual environment and install PyTorch (version 2.2.0 with CUDA 11.8 is recommended):

conda create -n gigaslam python=3.10
conda activate gigaslam
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Special attention should be paid to torch_scatter and xformers dependencies. If torch_scatter installation fails, manually download and install the wheel from the PyTorch Geometric website. For xformers-0.0.24, ensure compatibility with PyTorch 2.2.0 to avoid forced uninstallation of your PyTorch version.

3. Compiling CUDA/C++ Modules

Install the necessary CUDA/C++ components:

  • 3D GS Rendering Module:

    pip install submodules/simple-knn
    pip install submodules/diff-gaussian-rasterization
    
  • Loop-Closure Detection Module: Install libopencv-dev, compile and install DBoW2, then install DPRetrieval.

    sudo apt-get install -y libopencv-dev
    cd DBoW2
    mkdir -p build && cd build
    cmake ..
    make
    sudo make install
    cd ../..
    pip install ./DPRetrieval
    
  • Loop-Closure Correction Module:

    python setup.py install
    

4. Bag of Words Model Setup

Download and extract the pre-trained Bag of Words vocabulary for DBoW2:

wget https://github.com/UZ-SLAMLab/ORB_SLAM3/raw/master/Vocabulary/ORBvoc.txt.tar.gz
tar -xzvf ORBvoc.txt.tar.gz

Examples

To run GigaSLAM, you first need to modify the .yaml configuration files located in the ./config directory. Update the Dataset path to your local dataset (in PNG or JPG format) and specify the camera intrinsics (fx, fy, cx, cy).

For instance, an example configuration might look like this:

Dataset:
  color_path: "/media/deng/Data/4SeasonsDataset/BusinessCampus_recording_2020-10-08_09-30-57/undistorted_images/cam0"
  Calibration:
    fx: 501.4757919305817

Once configured, execute the SLAM process using the following command:

python slam.py --config ./configs/kitti_06.yaml

Pretrained weights for DISK, LightGlue, and UniDepth will be automatically downloaded on the first execution. If you encounter issues loading UniDepth models from HuggingFace due to network restrictions, you can try setting export HF_ENDPOINT=https://hf-mirror.com or manually download the model weights and specify their local path in the configuration.

If ['SLAM']['viz'] is set to True in your .yaml file, you will observe real-time visualization outputs in the results/your_exps/ directory during execution, showcasing the tracking and mapping process.

Why Use GigaSLAM?

GigaSLAM offers significant advantages for researchers and developers working in computer vision and robotics:

  • Unprecedented Scale: It is the first RGB NeRF/3DGS-based SLAM system capable of operating in kilometer-scale outdoor environments, a major breakthrough for this class of methods.
  • High-Fidelity Mapping: Utilizes hierarchical Gaussian splats and neural networks to achieve highly detailed and visually faithful 3D reconstructions.
  • Robust Tracking: Employs a sophisticated front-end tracking mechanism combining a metric depth model, epipolar geometry, PnP algorithms, and a Bag-of-Words-based loop closure for accurate and consistent pose estimation over long trajectories.
  • Efficiency and Scalability: The hierarchical sparse voxel map representation ensures efficient data handling and scalability, making it suitable for large and unbounded scenes.
  • Monocular Input: Achieves impressive results using only monocular RGB input, reducing sensor complexity and cost.

Links