Qwen3-Coder: Alibaba Cloud's Agentic Code LLM for Advanced Development

Summary
Qwen3-Coder is a powerful large language model series from Alibaba Cloud's Qwen team, specifically designed for agentic coding. It offers exceptional performance in coding and agentic tasks, boasting long-context capabilities and support for a vast array of programming languages. This model sets new state-of-the-art results among open models, comparable to leading commercial alternatives.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
Qwen3-Coder is the latest agentic code model from the Qwen team at Alibaba Cloud. This series, including the powerful Qwen3-Coder-480B-A35B-Instruct, is engineered to excel in agentic coding, browser-use, and tool-use tasks. It delivers significant performance, matching or exceeding other open models and even competing with commercial solutions like Claude Sonnet. With native support for 256K tokens, extendable up to 1M tokens, Qwen3-Coder is optimized for repository-scale understanding and supports 358 coding languages.
Installation
Getting started with Qwen3-Coder is straightforward using the transformers library. Below are examples for chatting with the model and performing fill-in-the-middle code completion.
Chat with Qwen3-Coder
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-Coder-480B-A35B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "write a quick sort algorithm."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=65536
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Fill in the Middle with Qwen3-Coder
from transformers import AutoTokenizer, AutoModelForCausalLM
# load model
device = "cuda" # the device to load the model onto
TOKENIZER = AutoTokenizer.from_pretrained("Qwen/Qwen3-Coder-480B-A35B-Instruct")
MODEL = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Coder-480B-A35B-Instruct", device_map="auto").eval()
input_text = """<|fim_prefix|>def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
<|fim_suffix|>
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)<|fim_middle|>"""
messages = [
{"role": "system", "content": "You are a code completion assistant."},
{"role": "user", "content": input_text}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = TOKENIZER([text], return_tensors="pt").to(MODEL.device)
# Use `max_new_tokens` to control the maximum output length.
eos_token_ids = [151659, 151661, 151662, 151663, 151664, 151643, 151645]
generated_ids = MODEL.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=False, eos_token_id=eos_token_ids)[0]
# The generated_ids include prompt_ids, we only need to decode the tokens after prompt_ids.
output_text = TOKENIZER.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True)
print(f"Prompt: {input_text}\n\nGenerated text: {output_text}")
Examples
Qwen3-Coder demonstrates its versatility through various impressive use cases, including:
- Physics-Based Chimney Demolition Simulation: Generating complex 3D simulations with realistic physics using
three.jsandcannon-es.js. - Multicolor and Interactive Animation: Creating dynamic and interactive animations with
p5.js. - 3D Google Earth: Developing web pages that simulate 3D terrain maps.
- Typing Game: Designing interactive typing games with modern UI and encouraging feedback.
- Bouncing Ball in Rotation Hypercube: Animating a ball bouncing within a rotating hypercube.
- Solar System Simulation: Building web pages to visualize the solar system.
- DUET Game: Creating a complete, single-file HTML game inspired by "Duet" with smooth animations and neon effects.
Why Use Qwen3-Coder?
- State-of-the-Art Agentic Coding: Achieves results comparable to top commercial models like Claude Sonnet in agentic coding, browser-use, and tool-use.
- Exceptional Long-Context Capabilities: Supports a native context length of 256K tokens, extendable up to 1M tokens, ideal for understanding large codebases.
- Broad Language Support: Capable of handling 358 different coding languages, making it highly versatile for diverse development environments.
- Robust General Capabilities: Retains strong performance in mathematical reasoning and general AI tasks from its base model.
- Multiple Model Sizes: Available in various sizes, allowing developers to choose the best fit for their computational resources and specific needs.
Links
- Qwen Chat: https://chat.qwenlm.ai/
- Hugging Face: https://huggingface.co/collections/Qwen/qwen3-coder-687fc861e53c939e52d52d10
- ModelScope: https://modelscope.cn/organization/qwen
- Documentation: https://qwen.readthedocs.io/
- Arxiv Paper: https://arxiv.org/abs/2505.09388
- Discord: https://discord.gg/CV4E9rpNSD