Open R1: An Open-Source Reproduction of DeepSeek-R1 for Advanced LLM Training

Introduction

Open R1 is an ambitious project by Hugging Face, aiming to deliver a fully open-source reproduction of DeepSeek-R1, a state-of-the-art reasoning language model. The core goal is to build the missing pieces of the R1 pipeline, making it accessible for everyone to reproduce and innovate upon. The project is designed for simplicity, primarily consisting of src/open_r1 for training and synthetic data generation scripts, and a Makefile for streamlined execution of the R1 pipeline steps.

The development follows a clear plan of attack, guided by the DeepSeek-R1 tech report. This includes replicating R1-Distill models through high-quality corpus distillation, establishing a pure Reinforcement Learning (RL) pipeline for R1-Zero, and demonstrating multi-stage training from base models to RL-tuned variants. Recent milestones include the release of the Mixture-of-Thoughts dataset, CodeForces-CoTs, and OpenR1-Math-220k, marking significant progress in replicating DeepSeek-R1's capabilities.

Installation

To get started with Open R1, ensure you have CUDA 12.4 and Python 3.11. The project leverages uv for environment management and vLLM and FlashAttention for high-performance operations.

Create a virtual environment and install pip:

uv venv openr1 --python 3.11 && source openr1/bin/activate && uv pip install --upgrade pip

Install vLLM and FlashAttention:

uv pip install vllm==0.8.5.post1
uv pip install setuptools && uv pip install flash-attn --no-build-isolation

Note: PyTorch v2.6.0 is required for vLLM binaries.

Install remaining dependencies (e.g., for development):
```
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]"
```
Log in to Hugging Face and Weights and Biases:
```
huggingface-cli login
wandb login
```
Verify Git LFS installation:
```
git-lfs --version
```
If not installed, run: sudo apt-get install git-lfs.

Examples

Open R1 provides robust tools for training, evaluating, and generating data for LLMs.

Training Models

The project supports Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) with DDP or DeepSpeed. Training commands are configured for 8 x H100s, with flexibility for different hardware.

SFT Example (using a YAML config):

accelerate launch --config_file recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py \
    --config recipes/OpenR1-Distill-7B/sft/config_distill.yaml

GRPO Example (single-node with vLLM colocation):

ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file recipes/accelerate_configs/zero3.yaml \
    src/open_r1/grpo.py --config recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/config_demo.yaml \
    --vllm_mode colocate

The project also includes a code reward function for training with a code interpreter, supporting E2B and Morph sandboxes, and specific reward functions for competitive programming problems like IOI and CodeForces.

Evaluating Models

Model evaluation is performed using lighteval with vLLM, supporting both single-GPU and multi-GPU (data or tensor parallel) setups. This allows for comprehensive benchmarking against various tasks.

Example Evaluation (AIME 2024 on a single GPU):

export VLLM_WORKER_MULTIPROC_METHOD=spawn # Required for vLLM
MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
MODEL_ARGS="model_name=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

TASK=aime24
lighteval vllm $MODEL_ARGS "lighteval|$TASK|0|0" \
    --use-chat-template \
    --output-dir $OUTPUT_DIR

Open R1 provides detailed instructions and results for reproducing DeepSeek's reported performance on benchmarks like AIME 2024, MATH-500, GPQA Diamond, and LiveCodeBench.

Data Generation

Synthetic data can be generated using distilabel with vLLM, enabling the creation of high-quality datasets for training. This includes generating data from smaller distilled R1 models or the larger DeepSeek-R1 model using Slurm clusters.

Example (generating data from a smol distilled R1 model):

from datasets import load_dataset
from distilabel.models import vLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import TextGeneration

prompt_template = """\
You will be given a problem. Please reason step by step, and put your final answer within \boxed{}:
{{ instruction }}"""

dataset = load_dataset("AI-MO/NuminaMath-TIR", split="train").select(range(10))

model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"

with Pipeline(
    name="distill-qwen-7b-r1",
    description="A pipeline to generate data from a distilled r1 model",
) as pipeline:

    llm = vLLM(
        model=model_id,
        tokenizer=model_id,
        extra_kwargs={
            "tensor_parallel_size": 1,
            "max_model_len": 8192,
        },
        generation_kwargs={
            "temperature": 0.6,
            "max_new_tokens": 8192,
        },
    )
    prompt_column = "problem"
    text_generation = TextGeneration(
        llm=llm, 
        template=prompt_template,
        num_generations=4,
        input_mappings={"instruction": prompt_column} if prompt_column is not None else {}
    )


if __name__ == "__main__":
    distiset = pipeline.run(dataset=dataset)
    distiset.push_to_hub(repo_id="username/numina-deepseek-r1-qwen-7b")

The project also offers a script for data decontamination, using 8-grams to deduplicate and clean datasets against benchmark contamination.

Why Use Open R1?

Open R1 offers a unique opportunity for researchers and developers to engage with the cutting-edge field of large language model development. By providing an open reproduction of DeepSeek-R1, it democratizes access to advanced reasoning capabilities. The project's comprehensive toolkit, including scalable training with GRPO, robust evaluation with lighteval, and flexible data generation with distilabel, makes it an invaluable resource. Its community-driven nature ensures continuous improvement and collaboration, pushing the boundaries of open AI research.

Open R1: An Open-Source Reproduction of DeepSeek-R1 for Advanced LLM Training

Summary

Repository Info

Tags