{"name":"Qwen3: Alibaba Cloud's Advanced Large Language Model Series","description":"Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.","github":"https://github.com/QwenLM/Qwen3","url":"https://osrepos.com/repo/qwenlm-qwen3","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/qwenlm-qwen3","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/qwenlm-qwen3.md","json":"https://osrepos.com/repo/qwenlm-qwen3.json","topics":["Large Language Model","AI","Machine Learning","NLP","Python","Generative AI","Alibaba Cloud","Deep Learning"],"keywords":["Large Language Model","AI","Machine Learning","NLP","Python","Generative AI","Alibaba Cloud","Deep Learning"],"stars":null,"summary":"Qwen3 is a powerful series of large language models developed by the Qwen team at Alibaba Cloud. It offers advanced capabilities in reasoning, multilingual support, and long-context understanding, available in various sizes and modes for diverse applications. This repository provides comprehensive resources for running, deploying, and building with Qwen3 models.","content":"## Introduction\n\nQwen3 represents the latest generation of large language models from the Qwen team at Alibaba Cloud. Building on the success of previous iterations, Qwen3 introduces significant enhancements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage. The series features both dense and Mixture-of-Expert (MoE) models, available in various sizes, and supports seamless switching between a dedicated \"thinking mode\" for complex tasks and a \"non-thinking\" (instruct) mode for efficient, general-purpose chat. Notably, Qwen3-2507 models boast enhanced 256K long-context understanding, extendable up to 1 million tokens.\n\n## Installation\n\nTo get started with Qwen3, the recommended approach is to use the Hugging Face Transformers library. Ensure you have `transformers>=4.51.0` installed.\n\nbash\npip install transformers torch\n\n\nAlternatively, Qwen3 models are well-supported by various local inference frameworks:\n\n*   **llama.cpp**: Requires `llama.cpp>=b5401`. Follow the instructions in the official [documentation](https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html) for compilation and usage.\n*   **Ollama**: Install Ollama (v0.9.0 or higher recommended) and run `ollama serve`, then `ollama run qwen3:8b` (or other sizes).\n*   **LM Studio**: Directly use Qwen3 GGUF files within LM Studio.\n*   **MLX LM**: For Apple Silicon users, `mlx-lm>=0.24.0` supports Qwen3 models.\n*   **OpenVINO**: For Intel CPU/GPU, use the OpenVINO toolkit.\n\n## Examples\n\nHere are basic examples demonstrating how to use Qwen3 models with Hugging Face Transformers.\n\n### Qwen3-Instruct-2507 (Non-Thinking Mode)\n\npython\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_name = \"Qwen/Qwen3-30B-A3B-Instruct-2507\"\n\n# load the tokenizer and the model\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name,\n    torch_dtype=\"auto\",\n    device_map=\"auto\"\n)\n\n# prepare the model input\nprompt = \"Give me a short introduction to large language model.\"\nmessages = [\n    {\"role\": \"user\", \"content\": prompt}\n]\ntext = tokenizer.apply_chat_template(\n    messages,\n    tokenize=False,\n    add_generation_prompt=True,\n)\nmodel_inputs = tokenizer([text], return_tensors=\"pt\").to(model.device)\n\n# conduct text completion\ngenerated_ids = model.generate(\n    **model_inputs,\n    max_new_tokens=16384\n)\noutput_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() \n\ncontent = tokenizer.decode(output_ids, skip_special_tokens=True)\n\nprint(\"content:\", content)\n\n\n### Qwen3-Thinking-2507 (Thinking Mode)\n\npython\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_name = \"Qwen/Qwen3-30B-A3B-Thinking-2507\"\n\n# load the tokenizer and the model\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name,\n    torch_dtype=\"auto\",\n    device_map=\"auto\"\n)\n\n# prepare the model input\nprompt = \"Give me a short introduction to large language model.\"\nmessages = [\n    {\"role\": \"user\", \"content\": prompt}\n]\ntext = tokenizer.apply_chat_template(\n    messages,\n    tokenize=False,\n    add_generation_prompt=True,\n)\nmodel_inputs = tokenizer([text], return_tensors=\"pt\").to(model.device)\n\n# conduct text completion\ngenerated_ids = model.generate(\n    **model_inputs,\n    max_new_tokens=32768\n)\noutput_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() \n\n# parsing thinking content\ntry:\n    # rindex finding 151668 (</think>)\n    index = len(output_ids) - output_ids[::-1].index(151668)\nexcept ValueError:\n    index = 0\n\nthinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip(\"\\n\")\ncontent = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip(\"\\n\")\n\nprint(\"thinking content:\", thinking_content)  # no opening <think> tag\nprint(\"content:\", content)\n\n\n## Why Use It\n\nQwen3 offers a compelling solution for various AI applications due to its advanced features:\n\n*   **State-of-the-Art Performance**: Achieves significant improvements across general capabilities, including logical reasoning, mathematics, science, coding, and tool usage.\n*   **Flexible Architectures**: Available in both dense and Mixture-of-Expert (MoE) models, providing options for different performance and efficiency needs.\n*   **Dual Operating Modes**: Seamlessly switch between a highly capable \"thinking mode\" for complex problem-solving and an efficient \"instruct mode\" for general conversations.\n*   **Extended Context Window**: Supports up to 1 million tokens, enabling deep understanding and generation for ultra-long inputs.\n*   **Multilingual Expertise**: Strong capabilities in over 100 languages and dialects, making it suitable for global applications.\n*   **Robust Deployment Options**: Supported by popular inference frameworks like SGLang, vLLM, and TensorRT-LLM, facilitating large-scale deployment.\n*   **Open-Source and Community-Driven**: Licensed under Apache 2.0, fostering an open environment for development and research.\n\n## Links\n\n*   **GitHub Repository**: [https://github.com/QwenLM/Qwen3](https://github.com/QwenLM/Qwen3){:target=\"_blank\"}\n*   **Qwen Chat**: [https://chat.qwen.ai/](https://chat.qwen.ai/){:target=\"_blank\"}\n*   **Hugging Face**: [https://huggingface.co/Qwen](https://huggingface.co/Qwen){:target=\"_blank\"}\n*   **ModelScope**: [https://modelscope.cn/organization/qwen](https://modelscope.cn/organization/qwen){:target=\"_blank\"}\n*   **Paper**: [https://arxiv.org/abs/2505.09388](https://arxiv.org/abs/2505.09388){:target=\"_blank\"}\n*   **Documentation**: [https://qwen.readthedocs.io/](https://qwen.readthedocs.io/){:target=\"_blank\"}\n*   **Demo**: [https://huggingface.co/spaces/Qwen/Qwen3-Demo](https://huggingface.co/spaces/Qwen/Qwen3-Demo){:target=\"_blank\"}\n*   **Discord**: [https://discord.gg/CV4E9rpNSD](https://discord.gg/CV4E9rpNSD){:target=\"_blank\"}","metrics":{"detailViews":2,"githubClicks":2},"dates":{"published":null,"modified":"2026-05-09T23:13:14.000Z"}}