GLM-5: Flagship Models for Long-Horizon Agentic Engineering

Introduction

The GLM-5 series, developed by zai-org, represents a significant advancement in large language models tailored for complex systems engineering and long-horizon agentic tasks. This repository showcases GLM-5, GLM-5.1, and the latest GLM-5.2, each building upon its predecessor with enhanced capabilities.

GLM-5.2

GLM-5.2 is the latest flagship model, making a substantial leap in long-horizon task capability with a solid 1M-token context. Its new features include robust 1M context stability, advanced coding with flexible effort levels, and an improved architecture featuring IndexShare, which reduces per-token FLOPs by 2.9x at 1M context length. GLM-5.2 demonstrates state-of-the-art performance on coding benchmarks, outperforming other open-source models and closing the gap with frontier closed-source models.

GLM-5.1

GLM-5.1 is designed for agentic engineering, offering significantly stronger coding capabilities. It achieves state-of-the-art performance on SWE-Bench Pro and excels in real-world terminal tasks. A key innovation of GLM-5.1 is its ability to remain effective over much longer horizons, handling ambiguous problems with better judgment and sustaining productivity through iterative reasoning, experimentation, and strategy revision over hundreds of rounds.

GLM-5

GLM-5 targets complex systems engineering and long-horizon agentic tasks. It scales significantly from GLM-4.5, increasing parameters and pre-training data. It integrates DeepSeek Sparse Attention (DSA) to reduce deployment costs while maintaining long-context capacity. GLM-5 also leverages slime, a novel asynchronous RL infrastructure, to improve training throughput and efficiency, leading to best-in-class performance among open-source models across reasoning, coding, and agentic tasks.

Installation

The GLM-5 series models are available for download and local deployment. You can access the models through Hugging Face and ModelScope.

To serve GLM-5 series models locally, several frameworks are supported:

SGLang (v0.5.13.post1+), see cookbook
vLLM (v0.23.0+), see recipes
Transformers (v0.5.12+), see transformers docs
KTransformers (v0.5.12+), see tutorial
For deployment on the Ascend NPU platform, inference frameworks such as vLLM-Ascend, xLLM, and SGLang are supported, see here.

Examples

GLM-5 models support controlling the thinking budget through the reasoning_effort parameter. This parameter accepts two levels: max (default) and high. If reasoning_effort is unset or set to any value other than high, the model runs at Max. To use the High level, you must explicitly pass reasoning_effort="high". Thinking can be turned off entirely by setting enable_thinking=false.

Why Use GLM-5?

The GLM-5 series offers compelling advantages for developers and researchers working with advanced AI:

Exceptional Long-Horizon Capability: GLM-5.2 provides a stable 1M-token context, enabling sustained work on complex, long-duration tasks.
State-of-the-Art Agentic Engineering: GLM-5.1 and GLM-5 excel in agentic tasks, demonstrating superior problem-solving, iterative reasoning, and strategic revision over extended sessions.
Advanced Coding Performance: The models achieve leading scores on standard coding benchmarks like Terminal-Bench and SWE-bench Pro.
Efficient Deployment: Features like DeepSeek Sparse Attention in GLM-5 reduce deployment costs while preserving long-context capacity.
Strong Benchmark Results: Consistent top performance across a wide range of academic and real-world benchmarks, including Vending Bench 2, showcasing robust planning and resource management.