PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation

Summary
PPS-Ctrl is an innovative image translation framework designed for controllable sim-to-real translation in colonoscopy depth estimation. It leverages Stable Diffusion and ControlNet, guided by a unique Per-Pixel Shading (PPS) map. This approach provides a physics-informed structural prior, enhancing texture realism and structure preservation in medical imaging applications.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
PPS-Ctrl introduces a novel image translation framework for controllable sim-to-real translation, specifically tailored for colonoscopy depth estimation. This project combines the power of Stable Diffusion and ControlNet, uniquely guided by a Per-Pixel Shading (PPS) map. Unlike traditional sim-to-real methods that rely on depth maps, PPS-Ctrl utilizes a physics-informed representation of surface-light interactions, ensuring a more faithful and geometrically consistent structural prior. This results in superior texture realism and structure preservation, crucial for accurate medical image translation.
Installation
To get started with PPS-Ctrl, follow these steps to set up your environment and prepare your data.
1. Environment Setup
It is recommended to use Python 3.9 with PyTorch ? 2.0 and the HuggingFace diffusers library.
conda create -n ppsctrl python=3.9
conda activate ppsctrl
pip install -r requirements.txt
2. Prepare Data
Download the necessary datasets:
After downloading, precompute PPS maps using the provided utility script:
pip install opencv-python # Required for compute_pps.py
python utils/compute_pps.py --depth_dir path/to/depth --output_dir path/to/pps
Examples
PPS-Ctrl's workflow involves a two-stage training process followed by inference.
1. Train
Stage 1: Fine-tune Stable Diffusion
bash scripts/train_sd.sh
Stage 2: Train ControlNet with PPS conditioning
bash scripts/train_controlnet.sh
2. Inference
Once trained, you can perform inference to generate images:
python scripts/infer.py --depth path/to/test/depth --output path/to/save
Why Use PPS-Ctrl?
PPS-Ctrl offers significant advantages for researchers and developers working on medical image translation, particularly in colonoscopy. By leveraging a Per-Pixel Shading map as a structural prior, it achieves superior texture realism and maintains geometric consistency, which are critical for accurate depth estimation in complex anatomical environments. This physics-informed approach provides a robust foundation for developing more reliable sim-to-real translation models, potentially leading to advancements in surgical simulation, training, and diagnostic tools.