WeClone: Create Your AI Digital Twin from Chat History with LLMs

Summary
WeClone is an innovative open-source project that provides a comprehensive solution for creating your personal AI digital twin. It allows users to fine-tune Large Language Models (LLMs) using their chat history, capturing unique communication styles. The resulting AI can then be integrated with various chatbots, bringing your digital self to life.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
WeClone is an innovative open-source project designed to create your personal AI digital twin from your chat history. It offers a comprehensive, end-to-end solution for fine-tuning Large Language Models (LLMs) with your unique communication style, allowing you to bring a digital version of yourself to life. The project supports various chat data sources and deployment platforms, emphasizing privacy and localized control over your data.
Installation
To get started with WeClone, follow these steps. A CUDA environment (version 12.6 or above) is required.
Clone the repository:
git clone https://github.com/xming521/WeClone.git && cd WeCloneSet up the environment with
uv(recommended):uv venv .venv --python=3.12 source .venv/bin/activate # For Windows: .venv\Scripts\activate uv pip install --group main -e .Copy the configuration file:
cp examples/tg.template.jsonc settings.jsoncModify
settings.jsoncfor your specific needs.Download models:
It is recommended to use Hugging Face or the following command:git lfs install git clone https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct models/Qwen2.5-VL-7B-Instruct
Examples
WeClone provides a clear workflow from data preparation to deployment.
Data Preparation: Export your chat history (e.g., from Telegram Desktop) as JSON and place it in the
./dataset/telegramdirectory.Data Preprocessing: Configure
settings.jsonc(e.g.,language,platform,telegram_args.my_id) and run:weclone-cli make-datasetThe project includes privacy filtering for sensitive information.
Fine-tuning the Model: Adjust training parameters in
settings.jsoncand execute:weclone-cli train-sftMulti-GPU training is also supported with DeepSpeed.
Inference and Deployment:
Webchat Demo: Test your fine-tuned model in a browser:
weclone-cli webchat-demoAPI Server: Start an API service for integration:
weclone-cli serverDeploy to Chatbots: Integrate your AI twin with platforms like AstrBot or LangBot by configuring them to use the WeClone API service.
Why Use WeClone?
WeClone stands out as a powerful tool for creating personalized AI avatars due to several key features:
- End-to-End Solution: It covers every step, from chat data export and preprocessing to model training and deployment.
- Personalized LLMs: Fine-tune models with your actual chat history, including image modal data, to capture your unique style and "flavor."
- Privacy and Control: Supports localized fine-tuning and deployment, along with privacy information filtering, ensuring your data remains secure and under your control.
- Multi-Platform Integration: Easily integrate your digital avatar with popular messaging platforms like Telegram, Discord, Slack, and WeChat.
- Active Development: The project is in rapid iteration, continuously adding new features and improvements.
Links
- GitHub Repository: https://github.com/xming521/WeClone
- Project Homepage: https://www.weclone.love/
- Documentation: https://docs.weclone.love/docs/introduce/what-is-weclone.html