GLM-4.5: Agentic, Reasoning, and Coding Foundation Models for Advanced AI

Summary
The GLM-4.5 GitHub repository introduces the GLM-4.5 and GLM-4.6 series of foundation models, designed for advanced agentic, reasoning, and coding capabilities. These models offer significant improvements, including longer context windows, enhanced coding performance, and superior reasoning, making them highly competitive in the LLM landscape. Developers can leverage these models for complex intelligent agent applications, backed by strong benchmark results.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
The zai-org/GLM-4.5 repository showcases the cutting-edge GLM-4.5 and GLM-4.6 series of foundation models, developed for Agentic, Reasoning, and Coding (ARC) tasks. These models are engineered to meet the complex demands of intelligent agent applications, providing robust capabilities for developers and researchers.
GLM-4.6, an advancement over GLM-4.5, brings several key improvements. It features an expanded context window from 128K to 200K tokens, enabling it to handle more intricate agentic tasks. GLM-4.6 also demonstrates superior coding performance across various benchmarks and real-world applications, alongside advanced reasoning capabilities that support tool use during inference. Furthermore, it exhibits stronger performance in tool-using and search-based agents, and offers refined writing that aligns better with human preferences.
GLM-4.5 models are foundational for intelligent agents, with the main GLM-4.5 model having 355 billion total parameters (32 billion active parameters) and GLM-4.5-Air offering a more compact design with 106 billion total parameters (12 billion active parameters). Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models, providing both a 'thinking mode' for complex reasoning and tool usage, and a 'non-thinking mode' for immediate responses. These models are open-sourced under the MIT license, allowing for commercial use and secondary development, and have achieved exceptional performance in industry-standard benchmarks.
Installation
To get started with the GLM-4.5 and GLM-4.6 models, you first need to install the required Python packages. Navigate to the repository and install dependencies using pip:
pip install -r requirements.txt
Examples
Both GLM-4.5 and GLM-4.6 utilize the same inference methods across various frameworks.
Using Transformers
For inference with the transformers library, refer to the trans_infer_cli.py code located in the inference folder of the repository.
Using vLLM
To serve models like GLM-4.5-Air with vLLM in either BF16 or FP8 precision, use the following command:
vllm serve zai-org/GLM-4.5-Air \
--tensor-parallel-size 8 \
--tool-call-parser glm45 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--served-model-name glm-4.5-air
Using SGLang (BF16 Example)
For BF16 inference with SGLang, you can launch a server using a command similar to this:
python3 -m sglang.launch_server \
--model-path zai-org/GLM-4.5-Air \
--tp-size 8 \
--tool-call-parser glm45 \
--reasoning-parser glm45 \
--speculative-algorithm EAGLE \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mem-fraction-static 0.7 \
--served-model-name glm-4.5-air \
--host 0.0.0.0 \
--port 8000
Request Parameter Instructions: When using vLLM and SGLang, the thinking mode is enabled by default. To disable it, you need to add the extra_body={"chat_template_kwargs": {"enable_thinking": False}} parameter to your request. Both frameworks also support OpenAI-style tool calling.
Why Use GLM-4.5?
GLM-4.5 and GLM-4.6 offer compelling advantages for developers working with advanced AI applications:
- Comprehensive Capabilities: Excelling in agentic tasks, complex reasoning, and superior code generation, these models provide a unified solution for diverse AI challenges.
- State-of-the-Art Performance: With GLM-4.6 showing clear gains over GLM-4.5 and competitive advantages against leading domestic and international models, users can expect top-tier results.
- Open-Source and Flexible: Released under the MIT license, the models are freely available for commercial use and secondary development. They come in various sizes (GLM-4.5, GLM-4.5-Air) and precisions (BF16, FP8) to suit different computational needs.
- Robust Ecosystem Support: Integration with popular inference frameworks like
transformers,vLLM, andSGLangensures ease of deployment and experimentation. - Continuous Innovation: The rapid evolution from GLM-4.5 to GLM-4.6 demonstrates a commitment to pushing the boundaries of foundation models, offering users access to the latest advancements.
Links
- GitHub Repository: zai-org/GLM-4.5
- GLM-4.6 Technical Blog: z.ai/blog/glm-4.6
- GLM-4.5 Technical Report: arxiv.org/abs/2508.06471
- Zhipu AI Technical Documentation: zhipu-ai.feishu.cn/wiki/Gv3swM0Yci7w7Zke9E0crhU7n7D
- GLM-4.6 API Platform: docs.z.ai/guides/llm/glm-4.6
- GLM-4.6 Chat: chat.z.ai
- GLM-4.5 Hugging Face Space: huggingface.co/spaces/zai-org/GLM-4.5-Space
- GLM-4.6 Hugging Face Model: huggingface.co/zai-org/GLM-4.6
- GLM-4.5 ModelScope Demo: modelscope.cn/studios/ZhipuAI/GLM-4.5-Demo
- GLM-4.6 ModelScope Model: modelscope.cn/models/ZhipuAI/GLM-4.6
- Transformers GLM-4 MoE Implementation: github.com/huggingface/transformers/tree/main/src/transformers/models/glm4_moe
- vLLM GLM-4 MoE Implementation: github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4_moe_mtp.py
- SGLang GLM-4 MoE Implementation: github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/glm4_moe.py