autoresearch: AI Agents for Autonomous LLM Training Research

Summary

autoresearch, by Andrej Karpathy, pioneers autonomous AI research by enabling agents to experiment with LLM training on a single GPU. The system allows an AI agent to modify code, train a model for a fixed 5-minute duration, and iteratively optimize for improved performance. This innovative approach aims to automate the experimental cycle of AI research, fostering continuous discovery and optimization.

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

autoresearch is a visionary project by Andrej Karpathy that introduces a paradigm shift in AI research, enabling autonomous AI agents to conduct experiments on Large Language Model (LLM) training. As envisioned in its README, dated March 2026, the era of "meat computers" doing research is long gone, replaced by "autonomous swarms of AI agents." This repository serves as the foundational story of how this future began.

The core idea is elegantly simple: provide an AI agent with a functional LLM training setup and let it autonomously experiment overnight. The agent modifies the train.py file, runs a training session for a fixed 5-minute budget, evaluates the results, and decides whether to keep or discard the changes, repeating the cycle. The human researcher, in turn, programs the program.md Markdown file, which provides context and instructions to the AI agents, effectively setting up an "autonomous research organization." The training code is a simplified, single-GPU implementation based on Karpathy's nanochat project.

Installation

Getting started with autoresearch requires a specific environment, primarily a single NVIDIA GPU.

Requirements

A single NVIDIA GPU (tested on H100)
Python 3.10+
uv (for dependency management)

Quick Start Steps

Follow these commands to set up and run a manual training experiment:

# 1. Install uv project manager (if you don't already have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Install dependencies
uv sync

# 3. Download data and train tokenizer (one-time, ~2 min)
uv run prepare.py

# 4. Manually run a single training experiment (~5 min)
uv run train.py

If these steps complete successfully, your setup is ready for autonomous research mode.

Examples

Once your environment is set up, you can engage an AI agent to begin autonomous experimentation. The program.md file acts as the baseline instructions, or "skill," for your agent.

To initiate an experiment with an AI agent (e.g., Claude or Codex, ensuring all permissions are disabled for safety):

Hi have a look at program.md and let's kick off a new experiment! let's do the setup first.

The agent will then interpret program.md and begin its iterative process of modifying train.py and running experiments.

Why Use autoresearch?

autoresearch offers a compelling vision and practical framework for the future of AI development:

Autonomous Experimentation: The primary benefit is the automation of the research cycle. Agents can tirelessly explore architectural changes, hyperparameters, and optimizers, potentially discovering novel improvements faster than human-led efforts.
Fixed Time Budget: Each training run is strictly limited to 5 minutes. This design choice ensures that experiments are directly comparable on the same compute platform, regardless of the agent's modifications (e.g., model size, batch size). It also allows for a high throughput of experiments, with approximately 100 experiments possible overnight.
Focused and Manageable Scope: The agent interacts with a single file, train.py, which contains the full GPT model, optimizer (Muon + AdamW), and training loop. This focused approach keeps the scope manageable and agent-generated diffs reviewable.
Self-Contained and Simple: The project is designed to be self-contained with minimal external dependencies beyond PyTorch. It focuses on a single GPU, one file, and one metric (val_bpb), reducing complexity and making it accessible for rapid iteration.
Platform Optimization: The fixed time budget encourages the agent to find the most optimal model for your specific platform within that time constraint, leading to highly tailored performance.