CoTracker: A Powerful Model for Tracking Any Point on a Video
Summary
CoTracker is a state-of-the-art model developed by Facebook AI Research and the University of Oxford, designed for tracking any point (pixel) across video sequences. This transformer-based solution offers fast, accurate, and quasi-dense point tracking capabilities. It is an invaluable tool for researchers and developers in computer vision, enabling precise analysis of motion in videos.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
CoTracker, developed by Meta AI Research and the University of Oxford, is a cutting-edge model designed to track any point or pixel within a video. Leveraging a fast transformer-based architecture, CoTracker brings the benefits of optical flow to point tracking, offering robust and accurate performance.
Key features of CoTracker include:
- Tracking any individual pixel in a video.
- Tracking a quasi-dense set of pixels simultaneously.
- Flexibility to select points manually or sample them on a grid in any video frame.
The project has seen continuous development, with significant updates including CoTracker3, which offers simpler and better point tracking through pseudo-labelling real videos, and the release of the Kubric Dataset for enhanced training. CoTracker has also been instrumental in other projects, such as VGGSfM, a fully differentiable SfM framework.
Installation
CoTracker can be easily integrated into your projects. A GPU is strongly recommended for optimal performance.
Via PyTorch Hub
The easiest way to get started is by loading a pretrained model from torch.hub.
Offline mode:
pip install imageio[ffmpeg]
import torch
import imageio.v3 as iio
url = 'https://github.com/facebookresearch/co-tracker/raw/refs/heads/main/assets/apple.mp4'
frames = iio.imread(url, plugin="FFMPEG")
device = 'cuda'
grid_size = 10
video = torch.tensor(frames).permute(0, 3, 1, 2)[None].float().to(device)
cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker3_offline").to(device)
pred_tracks, pred_visibility = cotracker(video, grid_size=grid_size)
Online mode:
import torch
import imageio.v3 as iio
url = 'https://github.com/facebookresearch/co-tracker/raw/refs/heads/main/assets/apple.mp4'
frames = iio.imread(url, plugin="FFMPEG")
device = 'cuda'
grid_size = 10
video = torch.tensor(frames).permute(0, 3, 1, 2)[None].float().to(device)
cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker3_online").to(device)
cotracker(video_chunk=video, is_first_step=True, grid_size=grid_size)
for ind in range(0, video.shape[1] - cotracker.step, cotracker.step):
pred_tracks, pred_visibility = cotracker(
video_chunk=video[:, ind : ind + cotracker.step * 2]
)
Development Version
To install CoTracker from the GitHub repository, which is ideal for running local demos or for evaluation and training:
git clone https://github.com/facebookresearch/co-tracker
cd co-tracker
pip install -e .
pip install matplotlib flow_vis tqdm tensorboard
Examples
CoTracker provides various ways to interact with its powerful tracking capabilities.
Visualize Predicted Tracks
After installation, you can visualize the tracks with a simple script:
from cotracker.utils.visualizer import Visualizer
vis = Visualizer(save_dir="./saved_videos", pad_value=120, linewidth=3)
vis.visualize(video, pred_tracks, pred_visibility)
Interactive Demos
- Hugging Face Space: Explore the interactive demo on the facebook/cotracker Hugging Face Space.
- Google Colab: Run the notebook directly in Google Colab.
- Local Gradio Demo: Run the Gradio demo locally by installing requirements (
pip install -r gradio_demo/requirements.txt) and executingpython -m gradio_demo.app.
Command Line Demos
- Offline Demo:
python demo.py --grid_size 10(results saved to./saved_videos/demo.mp4). - Online Demo:
python online_demo.py.
Why Use CoTracker?
CoTracker stands out as a robust solution for point tracking due to several compelling reasons:
- State-of-the-Art Performance: CoTracker3 achieves impressive results on benchmarks like TAP-Vid, demonstrating superior accuracy compared to previous versions and other models.
- Versatility: It can track any pixel, a quasi-dense set of pixels, and supports both manual and grid-based point selection.
- Efficiency: The transformer-based architecture ensures fast processing, and the online mode offers memory-efficient tracking for longer videos.
- Continuous Improvement: The project is actively maintained with regular updates, including new models, datasets, and features.
- Developer-Friendly: With PyTorch Hub integration, clear installation instructions, and various demo options, CoTracker is accessible for both quick experimentation and in-depth development.
Links
- GitHub Repository: https://github.com/facebookresearch/co-tracker
- Project Page: https://cotracker3.github.io/
- Paper #1 (ECCV 2024): https://arxiv.org/abs/2307.07635
- Paper #2 (arXiv 2024): https://arxiv.org/abs/2410.11831
- X Thread: https://twitter.com/n_karaev/status/1742638906355470772
- Colab Demo: https://colab.research.google.com/github/facebookresearch/co-tracker/blob/main/notebooks/demo.ipynb
- Hugging Face Space: https://huggingface.co/spaces/facebook/cotracker