Qwen3: Alibaba Cloud's Advanced Large Language Model Series

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Qwen3: Alibaba Cloud's Advanced Large Language Model Series

Summary

Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.

Repository Information

Analyzed by OSRepos on May 10, 2026

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

Qwen3 represents the latest generation of large language models from the Qwen team at Alibaba Cloud. Building on the success of previous iterations, Qwen3 introduces significant enhancements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. The series features both dense and Mixture-of-Expert (MoE) models, available in various sizes, and supports seamless switching between a dedicated "thinking mode" for complex tasks and a "non-thinking" (instruct) mode for efficient, general-purpose chat. Notably, Qwen3-2507 models boast enhanced 256K long-context understanding, extendable up to 1 million tokens.

Installation

To get started with Qwen3, the recommended approach is to use the Hugging Face Transformers library. Ensure you have transformers>=4.51.0 installed.

pip install transformers torch

Alternatively, Qwen3 models are well-supported by various local inference frameworks:

  • llama.cpp: Requires llama.cpp>=b5401. Follow the instructions in the official documentation for compilation and usage.
  • Ollama: Install Ollama (v0.9.0 or higher recommended) and run ollama serve, then ollama run qwen3:8b (or other sizes).
  • LM Studio: Directly use Qwen3 GGUF files within LM Studio.
  • MLX LM: For Apple Silicon users, mlx-lm>=0.24.0 supports Qwen3 models.
  • OpenVINO: For Intel CPU/GPU, use the OpenVINO toolkit.

Examples

Here are basic examples demonstrating how to use Qwen3 models with Hugging Face Transformers.

Qwen3-Instruct-2507 (Non-Thinking Mode)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)

Qwen3-Thinking-2507 (Thinking Mode)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B-Thinking-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)  # no opening <think> tag
print("content:", content)

Why Use It

Qwen3 offers a compelling solution for various AI applications due to its advanced features:

  • State-of-the-Art Performance: Achieves significant improvements across general capabilities, including logical reasoning, mathematics, science, coding, and tool usage.
  • Flexible Architectures: Available in both dense and Mixture-of-Expert (MoE) models, providing options for different performance and efficiency needs.
  • Dual Operating Modes: Seamlessly switch between a highly capable "thinking mode" for complex problem-solving and an efficient "instruct mode" for general conversations.
  • Extended Context Window: Supports up to 1 million tokens, enabling deep understanding and generation for ultra-long inputs.
  • Multilingual Expertise: Strong capabilities in over 100 languages and dialects, making it suitable for global applications.
  • Robust Deployment Options: Supported by popular inference frameworks like SGLang, vLLM, and TensorRT-LLM, facilitating large-scale deployment.
  • Open-Source and Community-Driven: Licensed under Apache 2.0, fostering an open environment for development and research.

Links

Related repositories

Similar repositories that may be relevant next.

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️