Qwen3-Coder: Alibaba Cloud's Agentic Code LLM for Advanced Development

Qwen3-Coder: Alibaba Cloud's Agentic Code LLM for Advanced Development

Summary

Qwen3-Coder is a powerful large language model series from Alibaba Cloud's Qwen team, specifically designed for agentic coding. It offers exceptional performance in coding and agentic tasks, boasting long-context capabilities and support for a vast array of programming languages. This model sets new state-of-the-art results among open models, comparable to leading commercial alternatives.

Repository Info

Updated on January 3, 2026
View on GitHub

Introduction

Qwen3-Coder is the latest agentic code model from the Qwen team at Alibaba Cloud. This series, including the powerful Qwen3-Coder-480B-A35B-Instruct, is engineered to excel in agentic coding, browser-use, and tool-use tasks. It delivers significant performance, matching or exceeding other open models and even competing with commercial solutions like Claude Sonnet. With native support for 256K tokens, extendable up to 1M tokens, Qwen3-Coder is optimized for repository-scale understanding and supports 358 coding languages.

Installation

Getting started with Qwen3-Coder is straightforward using the transformers library. Below are examples for chatting with the model and performing fill-in-the-middle code completion.

Chat with Qwen3-Coder

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-Coder-480B-A35B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "write a quick sort algorithm."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Fill in the Middle with Qwen3-Coder

from transformers import AutoTokenizer, AutoModelForCausalLM
# load model
device = "cuda" # the device to load the model onto

TOKENIZER = AutoTokenizer.from_pretrained("Qwen/Qwen3-Coder-480B-A35B-Instruct")
MODEL = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Coder-480B-A35B-Instruct", device_map="auto").eval()


input_text = """<|fim_prefix|>def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    <|fim_suffix|>
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)<|fim_middle|>"""
            
messages = [
    {"role": "system", "content": "You are a code completion assistant."},
    {"role": "user", "content": input_text}
]


text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = TOKENIZER([text], return_tensors="pt").to(MODEL.device)

# Use `max_new_tokens` to control the maximum output length.
eos_token_ids = [151659, 151661, 151662, 151663, 151664, 151643, 151645]
generated_ids = MODEL.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=False, eos_token_id=eos_token_ids)[0]
# The generated_ids include prompt_ids, we only need to decode the tokens after prompt_ids.
output_text = TOKENIZER.decode(generated_ids[len(model_inputs.input_ids[0]):], skip_special_tokens=True)

print(f"Prompt: {input_text}\n\nGenerated text: {output_text}")

Examples

Qwen3-Coder demonstrates its versatility through various impressive use cases, including:

  • Physics-Based Chimney Demolition Simulation: Generating complex 3D simulations with realistic physics using three.js and cannon-es.js.
  • Multicolor and Interactive Animation: Creating dynamic and interactive animations with p5.js.
  • 3D Google Earth: Developing web pages that simulate 3D terrain maps.
  • Typing Game: Designing interactive typing games with modern UI and encouraging feedback.
  • Bouncing Ball in Rotation Hypercube: Animating a ball bouncing within a rotating hypercube.
  • Solar System Simulation: Building web pages to visualize the solar system.
  • DUET Game: Creating a complete, single-file HTML game inspired by "Duet" with smooth animations and neon effects.

Why Use Qwen3-Coder?

  • State-of-the-Art Agentic Coding: Achieves results comparable to top commercial models like Claude Sonnet in agentic coding, browser-use, and tool-use.
  • Exceptional Long-Context Capabilities: Supports a native context length of 256K tokens, extendable up to 1M tokens, ideal for understanding large codebases.
  • Broad Language Support: Capable of handling 358 different coding languages, making it highly versatile for diverse development environments.
  • Robust General Capabilities: Retains strong performance in mathematical reasoning and general AI tasks from its base model.
  • Multiple Model Sizes: Available in various sizes, allowing developers to choose the best fit for their computational resources and specific needs.

Links