# torchchat: Run PyTorch LLMs Locally on Servers, Desktop, and Mobile

This repository profile is provided by osrepos.com, an open source repository discovery platform.

Source: osrepos.com
Repository profile: https://osrepos.com/repo/pytorch-torchchat
Generated for open source discovery and AI-assisted research.

torchchat is a PyTorch-native codebase designed to showcase the ability to run large language models (LLMs) seamlessly across various platforms. It enables local execution of LLMs using Python, within C/C++ applications on desktop or servers, and directly on iOS and Android devices. Although no longer under active development, it remains a valuable resource for understanding and implementing local LLM deployment strategies.

GitHub: https://github.com/pytorch/torchchat
OSRepos URL: https://osrepos.com/repo/pytorch-torchchat

## Summary

torchchat is a PyTorch-native codebase designed to showcase the ability to run large language models (LLMs) seamlessly across various platforms. It enables local execution of LLMs using Python, within C/C++ applications on desktop or servers, and directly on iOS and Android devices. Although no longer under active development, it remains a valuable resource for understanding and implementing local LLM deployment strategies.

## Topics

- llm
- local
- pytorch
- python
- machine-learning
- ai
- mobile-ai
- deep-learning

## Repository Information

Last analyzed by OSRepos: Fri Jul 03 2026 21:27:21 GMT+0100 (Western European Summer Time)
Detail views: 1
GitHub clicks: 1

## Safety Notice

OSRepos shares public repositories for knowledge and discovery only. Review source code, dependencies, licenses, and security implications before running or installing anything.

## Content

## Introduction

torchchat is a powerful, PyTorch-native codebase that demonstrates how to run large language models (LLMs) efficiently and locally. It supports a wide range of deployment scenarios, from Python environments on servers and desktops to integrated C/C++ applications, and even directly on mobile platforms like iOS and Android. The project emphasizes seamless execution and performance, making it an excellent resource for developers looking to deploy LLMs in diverse settings.

While torchchat is no longer under active development, it continues to serve as a comprehensive showcase for running LLMs everywhere. Recent updates included support for DeepSeek R1 Distill: 8B and multimodal capabilities for Llama3.2 11B, highlighting its advanced features and broad model compatibility.

## Installation

To get started with torchchat, you'll need Python 3.10 installed. It's highly recommended to use a virtual environment to manage dependencies.

1.  **Clone the repository and set up a virtual environment:**

    bash
git clone https://github.com/pytorch/torchchat.git
cd torchchat
python3 -m venv .venv
source .venv/bin/activate
./install/install_requirements.sh
mkdir exportedModels
    

2.  **Log into Hugging Face and download a model:**

    Most models are distributed via Hugging Face. You'll need an account and a user access token with the `write` role.

    bash
huggingface-cli login
    

    Then, list available models and download one, for example, `llama3.1`:

    bash
python3 torchchat.py list
python3 torchchat.py download llama3.1
    

    *Note: Some models may require requesting access via Hugging Face before downloading.* 

## Examples

torchchat provides various commands for interacting with LLMs, from interactive chat to generating text and serving models via a REST API.

### Chat

Engage in an interactive conversation with a downloaded LLM:

bash
python3 torchchat.py chat llama3.1


### Generate

Generate text based on a specific prompt:

bash
python3 torchchat.py generate llama3.1 --prompt "write me a story about a boy and his bear"


### Server

Host a local REST API server for model interaction, following the OpenAI API specification for chat completions. You'll need two terminals: one to start the server and another to query it.

**Terminal 1 (Start Server):**

bash
python3 torchchat.py server llama3.1


**Terminal 2 (Query Server):**

bash
curl http://127.0.0.1:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1",
    "stream": "true",
    "max_tokens": 200,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'


### Browser

Launch a basic browser interface for local chat, which queries a local server. First, start the server as shown above, then in another terminal:

bash
streamlit run torchchat/usages/browser.py


### Desktop/Server Execution with AOT Inductor

For faster inference, you can compile models using AOT Inductor (AOTI), which creates a zipped PT2 file. This can be run in both Python and C++ environments.

**Export the model:**

bash
python3 torchchat.py export llama3.1 --output-aoti-package-path exportedModels/llama3_1_artifacts.pt2


**Run in Python:**

bash
python3 torchchat.py generate llama3.1 --aoti-package-path exportedModels/llama3_1_artifacts.pt2 --prompt "Hello my name is"


### Mobile Execution with ExecuTorch

ExecuTorch optimizes models for execution on mobile or embedded devices. After setting up ExecuTorch (refer to the official repository for detailed steps), you can export and run models.

**Export for mobile:**

bash
python3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte


This creates a `.pte` artifact that can be deployed on iOS or Android devices.

## Why Use It

torchchat stands out for its commitment to PyTorch's design philosophy, prioritizing usability and native integration. It offers:

*   **Local LLM Execution**: Run powerful language models directly on your hardware, ensuring data privacy and reducing latency.
*   **Cross-Platform Compatibility**: Deploy models on Linux, macOS (M1/M2/M3), Android, and iOS, covering a broad spectrum of devices.
*   **PyTorch-Native Performance**: Leverages PyTorch's capabilities for efficient execution, including eager mode, AOT Inductor, and ExecuTorch for optimized inference.
*   **Flexibility**: Supports multiple data types (float32, float16, bfloat16) and various quantization schemes to balance performance and model size.
*   **Simplicity and Extensibility**: Designed with modular building blocks, favoring composition and clarity, making it easy to understand, use, and extend for custom applications.
*   **Rich Model Support**: Compatible with popular LLMs like Llama 3, Llama 2, Mistral, CodeLlama, and more, including multimodal variants.

## Links

*   **GitHub Repository**: [https://github.com/pytorch/torchchat](https://github.com/pytorch/torchchat){:target="_blank"}
*   **Hugging Face Token Documentation**: [https://huggingface.co/docs/hub/en/security-tokens](https://huggingface.co/docs/hub/en/security-tokens){:target="_blank"}
*   **PyTorch AOT Inductor Blog**: [https://pytorch.org/blog/pytorch2-2/](https://pytorch.org/blog/pytorch2-2/){:target="_blank"}
*   **ExecuTorch GitHub**: [https://github.com/pytorch/executorch](https://github.com/pytorch/executorch){:target="_blank"}
*   **torchchat Discord**: [https://discord.gg/hm2Keduk3v](https://discord.gg/hm2Keduk3v){:target="_blank"}