torchchat: Run PyTorch LLMs Locally on Servers, Desktop, and Mobile

This repository profile is provided by osrepos.com, an open source repository discovery platform.

torchchat: Run PyTorch LLMs Locally on Servers, Desktop, and Mobile

Summary

torchchat is a PyTorch-native codebase designed to showcase the ability to run large language models (LLMs) seamlessly across various platforms. It enables local execution of LLMs using Python, within C/C++ applications on desktop or servers, and directly on iOS and Android devices. Although no longer under active development, it remains a valuable resource for understanding and implementing local LLM deployment strategies.

Repository Information

Analyzed by OSRepos on July 3, 2026

Topics

Click on any tag to explore related repositories

Use at your own risk

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.

Introduction

torchchat is a powerful, PyTorch-native codebase that demonstrates how to run large language models (LLMs) efficiently and locally. It supports a wide range of deployment scenarios, from Python environments on servers and desktops to integrated C/C++ applications, and even directly on mobile platforms like iOS and Android. The project emphasizes seamless execution and performance, making it an excellent resource for developers looking to deploy LLMs in diverse settings.

While torchchat is no longer under active development, it continues to serve as a comprehensive showcase for running LLMs everywhere. Recent updates included support for DeepSeek R1 Distill: 8B and multimodal capabilities for Llama3.2 11B, highlighting its advanced features and broad model compatibility.

Installation

To get started with torchchat, you'll need Python 3.10 installed. It's highly recommended to use a virtual environment to manage dependencies.

1. Clone the repository and set up a virtual environment:

git clone https://github.com/pytorch/torchchat.git
cd torchchat
python3 -m venv .venv
source .venv/bin/activate
./install/install_requirements.sh
mkdir exportedModels

2. Log into Hugging Face and download a model:

Most models are distributed via Hugging Face. You'll need an account and a user access token with the write role.

huggingface-cli login

Then, list available models and download one, for example, llama3.1:

python3 torchchat.py list
python3 torchchat.py download llama3.1

Note: Some models may require requesting access via Hugging Face before downloading.

Examples

torchchat provides various commands for interacting with LLMs, from interactive chat to generating text and serving models via a REST API.

Chat

Engage in an interactive conversation with a downloaded LLM:

python3 torchchat.py chat llama3.1

Generate

Generate text based on a specific prompt:

python3 torchchat.py generate llama3.1 --prompt "write me a story about a boy and his bear"

Server

Host a local REST API server for model interaction, following the OpenAI API specification for chat completions. You'll need two terminals: one to start the server and another to query it.

Terminal 1 (Start Server):

python3 torchchat.py server llama3.1

Terminal 2 (Query Server):

curl http://127.0.0.1:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1",
    "stream": "true",
    "max_tokens": 200,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Browser

Launch a basic browser interface for local chat, which queries a local server. First, start the server as shown above, then in another terminal:

streamlit run torchchat/usages/browser.py

Desktop/Server Execution with AOT Inductor

For faster inference, you can compile models using AOT Inductor (AOTI), which creates a zipped PT2 file. This can be run in both Python and C++ environments.

Export the model:

python3 torchchat.py export llama3.1 --output-aoti-package-path exportedModels/llama3_1_artifacts.pt2

Run in Python:

python3 torchchat.py generate llama3.1 --aoti-package-path exportedModels/llama3_1_artifacts.pt2 --prompt "Hello my name is"

Mobile Execution with ExecuTorch

ExecuTorch optimizes models for execution on mobile or embedded devices. After setting up ExecuTorch (refer to the official repository for detailed steps), you can export and run models.

Export for mobile:

python3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte

This creates a .pte artifact that can be deployed on iOS or Android devices.

Why Use It

torchchat stands out for its commitment to PyTorch's design philosophy, prioritizing usability and native integration. It offers:

  • Local LLM Execution: Run powerful language models directly on your hardware, ensuring data privacy and reducing latency.
  • Cross-Platform Compatibility: Deploy models on Linux, macOS (M1/M2/M3), Android, and iOS, covering a broad spectrum of devices.
  • PyTorch-Native Performance: Leverages PyTorch's capabilities for efficient execution, including eager mode, AOT Inductor, and ExecuTorch for optimized inference.
  • Flexibility: Supports multiple data types (float32, float16, bfloat16) and various quantization schemes to balance performance and model size.
  • Simplicity and Extensibility: Designed with modular building blocks, favoring composition and clarity, making it easy to understand, use, and extend for custom applications.
  • Rich Model Support: Compatible with popular LLMs like Llama 3, Llama 2, Mistral, CodeLlama, and more, including multimodal variants.

Links

Related repositories

Similar repositories that may be relevant next.

Evidently: Open-Source ML and LLM Observability Framework

Evidently: Open-Source ML and LLM Observability Framework

June 30, 2026

Evidently is an open-source Python library designed for evaluating, testing, and monitoring machine learning and large language model systems. It provides over 100 built-in metrics for various tasks, from data drift detection to LLM judges, supporting both tabular and text data. This framework helps ensure the quality and performance of AI-powered systems throughout their lifecycle.

data-sciencemachine-learningllm
Guardrails: Enhancing LLM Reliability and Structured Data Generation

Guardrails: Enhancing LLM Reliability and Structured Data Generation

June 26, 2026

Guardrails is a Python framework designed to build reliable AI applications by adding guardrails to large language models. It helps detect, quantify, and mitigate risks in LLM inputs/outputs, and facilitates the generation of structured data. This framework ensures more predictable and safer interactions with AI models.

aifoundation-modelllm
Loop Engineering: Orchestrating AI Agents with Practical Patterns and Tools

Loop Engineering: Orchestrating AI Agents with Practical Patterns and Tools

June 25, 2026

Loop Engineering is a GitHub repository offering practical patterns, starters, and CLI tools for building robust AI coding agent systems. It shifts the focus from individual prompt crafting to designing control systems that orchestrate agents over time. This project empowers developers to create autonomous, iterative AI workflows for various development tasks.

agentic-aiai-agentsloop-engineering
MarkLLM: An Open-Source Toolkit for LLM Watermarking

MarkLLM: An Open-Source Toolkit for LLM Watermarking

June 23, 2026

MarkLLM is an open-source toolkit designed to simplify the research and application of watermarking technologies for large language models (LLMs). It offers a unified framework for implementing various watermarking algorithms, alongside robust visualization and comprehensive evaluation tools. This toolkit helps researchers and the broader community understand and assess the authenticity and origin of machine-generated text.

large-language-modelsllmsafety

Source repository

Open the original repository on GitHub.

View on GitHub
OS
OSRepos

Analysis and discovery of open source repositories. Find interesting projects and follow their updates.

Monitor your website with YourWebsiteScore

OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of third-party repository code is at your own risk. Always review source code, dependencies, licenses, and security implications before running anything.

© 2025 OSRepos. Built with Nuxt 3 and lots of ❤️