Verifiers: Environments for LLM Reinforcement Learning and Evaluation

Summary

Verifiers is a Python library by Prime Intellect AI for building environments to train and evaluate Large Language Models (LLMs). It enables the creation of custom environments with datasets, model harnesses, and reward functions, supporting reinforcement learning, capability evaluation, and synthetic data generation. This library is tightly integrated with the Prime Intellect ecosystem, including their Environments Hub and training framework.

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Verifiers is a powerful Python library developed by Prime Intellect AI, specifically designed for creating robust environments to train and evaluate Large Language Models (LLMs). It provides a comprehensive framework where environments encapsulate everything needed to run and assess a model on a particular task. Each environment typically includes a dataset of task inputs, a harness for the model (managing tools, sandboxes, and context), and a reward function or rubric to score the model's performance. Verifiers is deeply integrated with the Prime Intellect Environments Hub, their prime-rl training framework, and their Hosted Training platform, offering a complete ecosystem for LLM development.

Installation

Getting started with Verifiers is straightforward. First, ensure you have uv and the prime CLI tool installed.

# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# install the prime CLI
uv tool install prime
# log in to the Prime Intellect platform
prime login

To set up a new workspace for environment development, use:

# ~/dev/my-lab
prime lab setup

Alternatively, to add Verifiers to an existing project, run:

uv add verifiers && prime lab setup --skip-install

Examples

Verifiers allows you to easily initialize new environment templates. To create a fresh environment module, use the prime env init command:

prime env init my-env # creates a new template in ./environments/my_env

Environment modules are self-contained Python files that expose a load_environment function. Here's a basic example:

# my_env.py
import verifiers as vf

def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
    dataset = vf.load_example_dataset(dataset_name) # 'question'
    async def correct_answer(completion, answer) -> float:
        completion_ans = completion[-1]['content']
        return 1.0 if completion_ans == answer else 0.0
    rubric = vf.Rubric(funcs=[correct_answer])
    env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
    return env

You can also install environments from the Environments Hub:

prime env install primeintellect/math-python

To run a local evaluation with any OpenAI-compatible model:

prime eval run my-env -m gpt-5-nano # run and save eval results locally

Why Use Verifiers

Verifiers offers a robust solution for anyone working with LLMs, providing a structured and efficient way to:

Create Custom Environments: Define specific tasks with tailored datasets, model interaction harnesses, and precise reward functions.
Facilitate Reinforcement Learning: Design environments optimized for training LLMs using reinforcement learning techniques.
Evaluate LLM Capabilities: Conduct thorough evaluations of model performance across various tasks and metrics.
Generate Synthetic Data: Leverage environments to produce high-quality synthetic data for further model training or analysis.
Seamless Integration: Benefit from tight integration with Prime Intellect's broader ecosystem, including their Environments Hub, prime-rl training framework, and Hosted Training platform.
Streamlined Workflow: The prime CLI tool simplifies environment setup, installation, evaluation, and publishing.