GLM-4.5: Agentic, Reasoning, and Coding Foundation Models for Advanced AI

Introduction

The zai-org/GLM-4.5 repository showcases the cutting-edge GLM-4.5 and GLM-4.6 series of foundation models, developed for Agentic, Reasoning, and Coding (ARC) tasks. These models are engineered to meet the complex demands of intelligent agent applications, providing robust capabilities for developers and researchers.

GLM-4.6, an advancement over GLM-4.5, brings several key improvements. It features an expanded context window from 128K to 200K tokens, enabling it to handle more intricate agentic tasks. GLM-4.6 also demonstrates superior coding performance across various benchmarks and real-world applications, alongside advanced reasoning capabilities that support tool use during inference. Furthermore, it exhibits stronger performance in tool-using and search-based agents, and offers refined writing that aligns better with human preferences.

GLM-4.5 models are foundational for intelligent agents, with the main GLM-4.5 model having 355 billion total parameters (32 billion active parameters) and GLM-4.5-Air offering a more compact design with 106 billion total parameters (12 billion active parameters). Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models, providing both a 'thinking mode' for complex reasoning and tool usage, and a 'non-thinking mode' for immediate responses. These models are open-sourced under the MIT license, allowing for commercial use and secondary development, and have achieved exceptional performance in industry-standard benchmarks.

Installation

To get started with the GLM-4.5 and GLM-4.6 models, you first need to install the required Python packages. Navigate to the repository and install dependencies using pip:

pip install -r requirements.txt

Examples

Both GLM-4.5 and GLM-4.6 utilize the same inference methods across various frameworks.

Using Transformers

For inference with the transformers library, refer to the trans_infer_cli.py code located in the inference folder of the repository.

Using vLLM

To serve models like GLM-4.5-Air with vLLM in either BF16 or FP8 precision, use the following command:

vllm serve zai-org/GLM-4.5-Air \
    --tensor-parallel-size 8 \
    --tool-call-parser glm45 \
    --reasoning-parser glm45 \
    --enable-auto-tool-choice \
    --served-model-name glm-4.5-air

Using SGLang (BF16 Example)

For BF16 inference with SGLang, you can launch a server using a command similar to this:

python3 -m sglang.launch_server \
  --model-path zai-org/GLM-4.5-Air \
  --tp-size 8 \
  --tool-call-parser glm45  \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --mem-fraction-static 0.7 \
  --served-model-name glm-4.5-air \
  --host 0.0.0.0 \
  --port 8000

Request Parameter Instructions: When using vLLM and SGLang, the thinking mode is enabled by default. To disable it, you need to add the extra_body={"chat_template_kwargs": {"enable_thinking": False}} parameter to your request. Both frameworks also support OpenAI-style tool calling.

Why Use GLM-4.5?

GLM-4.5 and GLM-4.6 offer compelling advantages for developers working with advanced AI applications:

Comprehensive Capabilities: Excelling in agentic tasks, complex reasoning, and superior code generation, these models provide a unified solution for diverse AI challenges.
State-of-the-Art Performance: With GLM-4.6 showing clear gains over GLM-4.5 and competitive advantages against leading domestic and international models, users can expect top-tier results.
Open-Source and Flexible: Released under the MIT license, the models are freely available for commercial use and secondary development. They come in various sizes (GLM-4.5, GLM-4.5-Air) and precisions (BF16, FP8) to suit different computational needs.
Robust Ecosystem Support: Integration with popular inference frameworks like transformers, vLLM, and SGLang ensures ease of deployment and experimentation.
Continuous Innovation: The rapid evolution from GLM-4.5 to GLM-4.6 demonstrates a commitment to pushing the boundaries of foundation models, offering users access to the latest advancements.

GLM-4.5: Agentic, Reasoning, and Coding Foundation Models for Advanced AI

Summary

Repository Info

Tags