WeClone: Create Your AI Digital Twin from Chat History with LLMs

Introduction

WeClone is an innovative open-source project designed to create your personal AI digital twin from your chat history. It offers a comprehensive, end-to-end solution for fine-tuning Large Language Models (LLMs) with your unique communication style, allowing you to bring a digital version of yourself to life. The project supports various chat data sources and deployment platforms, emphasizing privacy and localized control over your data.

Installation

To get started with WeClone, follow these steps. A CUDA environment (version 12.6 or above) is required.

Clone the repository:

git clone https://github.com/xming521/WeClone.git && cd WeClone

Set up the environment with uv (recommended):

uv venv .venv --python=3.12
source .venv/bin/activate # For Windows: .venv\Scripts\activate
uv pip install --group main -e .

Copy the configuration file:
```
cp examples/tg.template.jsonc settings.jsonc
```
Modify settings.jsonc for your specific needs.

Download models:
It is recommended to use Hugging Face or the following command:

git lfs install
git clone https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct models/Qwen2.5-VL-7B-Instruct

Examples

WeClone provides a clear workflow from data preparation to deployment.

Data Preparation: Export your chat history (e.g., from Telegram Desktop) as JSON and place it in the ./dataset/telegram directory.
Data Preprocessing: Configure settings.jsonc (e.g., language, platform, telegram_args.my_id) and run:
```
weclone-cli make-dataset
```
The project includes privacy filtering for sensitive information.
Fine-tuning the Model: Adjust training parameters in settings.jsonc and execute:
```
weclone-cli train-sft
```
Multi-GPU training is also supported with DeepSpeed.
Inference and Deployment:
- Webchat Demo: Test your fine-tuned model in a browser:
```
weclone-cli webchat-demo
```
- API Server: Start an API service for integration:
```
weclone-cli server
```
- Deploy to Chatbots: Integrate your AI twin with platforms like AstrBot or LangBot by configuring them to use the WeClone API service.

Why Use WeClone?

WeClone stands out as a powerful tool for creating personalized AI avatars due to several key features:

End-to-End Solution: It covers every step, from chat data export and preprocessing to model training and deployment.
Personalized LLMs: Fine-tune models with your actual chat history, including image modal data, to capture your unique style and "flavor."
Privacy and Control: Supports localized fine-tuning and deployment, along with privacy information filtering, ensuring your data remains secure and under your control.
Multi-Platform Integration: Easily integrate your digital avatar with popular messaging platforms like Telegram, Discord, Slack, and WeChat.
Active Development: The project is in rapid iteration, continuously adding new features and improvements.

WeClone: Create Your AI Digital Twin from Chat History with LLMs

Summary

Repository Info

Tags

Introduction

Installation

Examples

Why Use WeClone?

Links