{"name":"torchchat: Run PyTorch LLMs Locally on Servers, Desktop, and Mobile","description":"torchchat is a PyTorch-native codebase designed to showcase the ability to run large language models (LLMs) seamlessly across various platforms. It enables local execution of LLMs using Python, within C/C++ applications on desktop or servers, and directly on iOS and Android devices. Although no longer under active development, it remains a valuable resource for understanding and implementing local LLM deployment strategies.","github":"https://github.com/pytorch/torchchat","url":"https://osrepos.com/repo/pytorch-torchchat","source":"osrepos.com","sourceDescription":"This repository profile is provided by osrepos.com, an open source repository discovery platform.","repositoryProfile":"https://osrepos.com/repo/pytorch-torchchat","generatedFor":"open source discovery and AI-assisted research","markdown":"https://osrepos.com/repo/pytorch-torchchat.md","json":"https://osrepos.com/repo/pytorch-torchchat.json","topics":["llm","local","pytorch","python","machine-learning","ai","mobile-ai","deep-learning"],"keywords":["llm","local","pytorch","python","machine-learning","ai","mobile-ai","deep-learning"],"stars":null,"summary":"torchchat is a PyTorch-native codebase designed to showcase the ability to run large language models (LLMs) seamlessly across various platforms. It enables local execution of LLMs using Python, within C/C++ applications on desktop or servers, and directly on iOS and Android devices. Although no longer under active development, it remains a valuable resource for understanding and implementing local LLM deployment strategies.","content":"## Introduction\n\ntorchchat is a powerful, PyTorch-native codebase that demonstrates how to run large language models (LLMs) efficiently and locally. It supports a wide range of deployment scenarios, from Python environments on servers and desktops to integrated C/C++ applications, and even directly on mobile platforms like iOS and Android. The project emphasizes seamless execution and performance, making it an excellent resource for developers looking to deploy LLMs in diverse settings.\n\nWhile torchchat is no longer under active development, it continues to serve as a comprehensive showcase for running LLMs everywhere. Recent updates included support for DeepSeek R1 Distill: 8B and multimodal capabilities for Llama3.2 11B, highlighting its advanced features and broad model compatibility.\n\n## Installation\n\nTo get started with torchchat, you'll need Python 3.10 installed. It's highly recommended to use a virtual environment to manage dependencies.\n\n1.  **Clone the repository and set up a virtual environment:**\n\n    bash\ngit clone https://github.com/pytorch/torchchat.git\ncd torchchat\npython3 -m venv .venv\nsource .venv/bin/activate\n./install/install_requirements.sh\nmkdir exportedModels\n    \n\n2.  **Log into Hugging Face and download a model:**\n\n    Most models are distributed via Hugging Face. You'll need an account and a user access token with the `write` role.\n\n    bash\nhuggingface-cli login\n    \n\n    Then, list available models and download one, for example, `llama3.1`:\n\n    bash\npython3 torchchat.py list\npython3 torchchat.py download llama3.1\n    \n\n    *Note: Some models may require requesting access via Hugging Face before downloading.* \n\n## Examples\n\ntorchchat provides various commands for interacting with LLMs, from interactive chat to generating text and serving models via a REST API.\n\n### Chat\n\nEngage in an interactive conversation with a downloaded LLM:\n\nbash\npython3 torchchat.py chat llama3.1\n\n\n### Generate\n\nGenerate text based on a specific prompt:\n\nbash\npython3 torchchat.py generate llama3.1 --prompt \"write me a story about a boy and his bear\"\n\n\n### Server\n\nHost a local REST API server for model interaction, following the OpenAI API specification for chat completions. You'll need two terminals: one to start the server and another to query it.\n\n**Terminal 1 (Start Server):**\n\nbash\npython3 torchchat.py server llama3.1\n\n\n**Terminal 2 (Query Server):**\n\nbash\ncurl http://127.0.0.1:5000/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"llama3.1\",\n    \"stream\": \"true\",\n    \"max_tokens\": 200,\n    \"messages\": [\n      {\n        \"role\": \"system\",\n        \"content\": \"You are a helpful assistant.\"\n      },\n      {\n        \"role\": \"user\",\n        \"content\": \"Hello!\"\n      }\n    ]\n  }'\n\n\n### Browser\n\nLaunch a basic browser interface for local chat, which queries a local server. First, start the server as shown above, then in another terminal:\n\nbash\nstreamlit run torchchat/usages/browser.py\n\n\n### Desktop/Server Execution with AOT Inductor\n\nFor faster inference, you can compile models using AOT Inductor (AOTI), which creates a zipped PT2 file. This can be run in both Python and C++ environments.\n\n**Export the model:**\n\nbash\npython3 torchchat.py export llama3.1 --output-aoti-package-path exportedModels/llama3_1_artifacts.pt2\n\n\n**Run in Python:**\n\nbash\npython3 torchchat.py generate llama3.1 --aoti-package-path exportedModels/llama3_1_artifacts.pt2 --prompt \"Hello my name is\"\n\n\n### Mobile Execution with ExecuTorch\n\nExecuTorch optimizes models for execution on mobile or embedded devices. After setting up ExecuTorch (refer to the official repository for detailed steps), you can export and run models.\n\n**Export for mobile:**\n\nbash\npython3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte\n\n\nThis creates a `.pte` artifact that can be deployed on iOS or Android devices.\n\n## Why Use It\n\ntorchchat stands out for its commitment to PyTorch's design philosophy, prioritizing usability and native integration. It offers:\n\n*   **Local LLM Execution**: Run powerful language models directly on your hardware, ensuring data privacy and reducing latency.\n*   **Cross-Platform Compatibility**: Deploy models on Linux, macOS (M1/M2/M3), Android, and iOS, covering a broad spectrum of devices.\n*   **PyTorch-Native Performance**: Leverages PyTorch's capabilities for efficient execution, including eager mode, AOT Inductor, and ExecuTorch for optimized inference.\n*   **Flexibility**: Supports multiple data types (float32, float16, bfloat16) and various quantization schemes to balance performance and model size.\n*   **Simplicity and Extensibility**: Designed with modular building blocks, favoring composition and clarity, making it easy to understand, use, and extend for custom applications.\n*   **Rich Model Support**: Compatible with popular LLMs like Llama 3, Llama 2, Mistral, CodeLlama, and more, including multimodal variants.\n\n## Links\n\n*   **GitHub Repository**: [https://github.com/pytorch/torchchat](https://github.com/pytorch/torchchat){:target=\"_blank\"}\n*   **Hugging Face Token Documentation**: [https://huggingface.co/docs/hub/en/security-tokens](https://huggingface.co/docs/hub/en/security-tokens){:target=\"_blank\"}\n*   **PyTorch AOT Inductor Blog**: [https://pytorch.org/blog/pytorch2-2/](https://pytorch.org/blog/pytorch2-2/){:target=\"_blank\"}\n*   **ExecuTorch GitHub**: [https://github.com/pytorch/executorch](https://github.com/pytorch/executorch){:target=\"_blank\"}\n*   **torchchat Discord**: [https://discord.gg/hm2Keduk3v](https://discord.gg/hm2Keduk3v){:target=\"_blank\"}","metrics":{"detailViews":1,"githubClicks":1},"dates":{"published":null,"modified":"2026-07-03T20:27:21.000Z"}}