# Qwen3: Alibaba Cloud's Advanced Large Language Model Series

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/qwenlm-qwen3
Generated for open source discovery and AI-assisted research.

Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.

GitHub: https://github.com/QwenLM/Qwen3
OSRepos URL: https://osrepos.com/repo/qwenlm-qwen3

## Summary

Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.

## Topics

- Large Language Model
- AI
- Machine Learning
- NLP
- Python
- Generative AI
- Alibaba Cloud
- Deep Learning

## Repository Information

Last analyzed by OSRepos: Sun May 10 2026 00:13:14 GMT+0100 (Western European Summer Time)
Detail views: 2
GitHub clicks: 2

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

Qwen3 represents the latest generation of large language models from the Qwen team at Alibaba Cloud. Building on the success of previous iterations, Qwen3 introduces significant enhancements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. The series features both dense and Mixture-of-Expert (MoE) models, available in various sizes, and supports seamless switching between a dedicated "thinking mode" for complex tasks and a "non-thinking" (instruct) mode for efficient, general-purpose chat. Notably, Qwen3-2507 models boast enhanced 256K long-context understanding, extendable up to 1 million tokens.

## Installation

To get started with Qwen3, the recommended approach is to use the Hugging Face Transformers library. Ensure you have `transformers>=4.51.0` installed.

bash
pip install transformers torch


Alternatively, Qwen3 models are well-supported by various local inference frameworks:

*   **llama.cpp**: Requires `llama.cpp>=b5401`. Follow the instructions in the official [documentation](https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html) for compilation and usage.
*   **Ollama**: Install Ollama (v0.9.0 or higher recommended) and run `ollama serve`, then `ollama run qwen3:8b` (or other sizes).
*   **LM Studio**: Directly use Qwen3 GGUF files within LM Studio.
*   **MLX LM**: For Apple Silicon users, `mlx-lm>=0.24.0` supports Qwen3 models.
*   **OpenVINO**: For Intel CPU/GPU, use the OpenVINO toolkit.

## Examples

Here are basic examples demonstrating how to use Qwen3 models with Hugging Face Transformers.

### Qwen3-Instruct-2507 (Non-Thinking Mode)

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)


### Qwen3-Thinking-2507 (Thinking Mode)

python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B-Thinking-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)  # no opening <think> tag
print("content:", content)


## Why Use It

Qwen3 offers a compelling solution for various AI applications due to its advanced features:

*   **State-of-the-Art Performance**: Achieves significant improvements across general capabilities, including logical reasoning, mathematics, science, coding, and tool usage.
*   **Flexible Architectures**: Available in both dense and Mixture-of-Expert (MoE) models, providing options for different performance and efficiency needs.
*   **Dual Operating Modes**: Seamlessly switch between a highly capable "thinking mode" for complex problem-solving and an efficient "instruct mode" for general conversations.
*   **Extended Context Window**: Supports up to 1 million tokens, enabling deep understanding and generation for ultra-long inputs.
*   **Multilingual Expertise**: Strong capabilities in over 100 languages and dialects, making it suitable for global applications.
*   **Robust Deployment Options**: Supported by popular inference frameworks like SGLang, vLLM, and TensorRT-LLM, facilitating large-scale deployment.
*   **Open-Source and Community-Driven**: Licensed under Apache 2.0, fostering an open environment for development and research.

## Links

*   **GitHub Repository**: [https://github.com/QwenLM/Qwen3](https://github.com/QwenLM/Qwen3){:target="_blank"}
*   **Qwen Chat**: [https://chat.qwen.ai/](https://chat.qwen.ai/){:target="_blank"}
*   **Hugging Face**: [https://huggingface.co/Qwen](https://huggingface.co/Qwen){:target="_blank"}
*   **ModelScope**: [https://modelscope.cn/organization/qwen](https://modelscope.cn/organization/qwen){:target="_blank"}
*   **Paper**: [https://arxiv.org/abs/2505.09388](https://arxiv.org/abs/2505.09388){:target="_blank"}
*   **Documentation**: [https://qwen.readthedocs.io/](https://qwen.readthedocs.io/){:target="_blank"}
*   **Demo**: [https://huggingface.co/spaces/Qwen/Qwen3-Demo](https://huggingface.co/spaces/Qwen/Qwen3-Demo){:target="_blank"}
*   **Discord**: [https://discord.gg/CV4E9rpNSD](https://discord.gg/CV4E9rpNSD){:target="_blank"}