ChatArena: Multi-Agent Language Game Environments for LLMs
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
ChatArena is a Python library designed to provide multi-agent language game environments for Large Language Models (LLMs), aiming to foster the development of communication and collaboration capabilities in AI. It offers a flexible framework for defining players, environments, and interactions based on Markov Decision Processes. Please note that as of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer receiving updates or support.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
ChatArena (or Chat Arena) is a library that provides multi-agent language game environments for Large Language Models (LLMs) like GPT-3, GPT-4, and ChatGPT. The primary goal of ChatArena is to facilitate research into autonomous LLM agents and their social interactions, helping to develop their communication and collaboration capabilities.
It offers several key features:
- Abstraction: A flexible framework for defining multiple players, environments, and their interactions, based on Markov Decision Process.
- Language Game Environments: A collection of environments to aid in understanding, benchmarking, or training agent LLMs.
- User-friendly Interfaces: Both Web UI and CLI are provided to develop and prompt engineer LLM agents within these environments.
Important Note: As of August 11, 2025, this project has been deprecated due to a lack of widespread community use and is no longer planned to receive any additional updates or support.
Installation
To get started with ChatArena, ensure you have Python >= 3.7. An OpenAI API key is optional if you plan to use GPT-3.5-turbo or GPT-4 as an LLM agent.
Install with pip:
pip install chatarena
Alternatively, install from source:
pip install git+https://github.com/chatarena/chatarena
To use GPT models, set your OpenAI API key:
export OPENAI_API_KEY="your_api_key_here"
Optional dependencies can be installed for full functionality:
pip install chatarena[all_backends] # for all supported backends
pip install chatarena[all_envs] # for all environments
pip install chatarena[all] # for full functionality
You can also launch a demo Web UI locally:
pip install chatarena[gradio]
git clone https://github.com/chatarena/chatarena.git
cd chatarena
gradio app.py
Examples
ChatArena provides a variety of language game environments:
- Conversation: A multi-player environment simulating a conversation, such as the "NLP Classroom" example.
- Moderator Conversation: Based on conversation, but with a moderator controlling game dynamics. Examples include "Rock-paper-scissors" and "Tic-tac-toe".
- Chameleon: A multi-player social deduction game where players describe clues about a secret word, and a "chameleon" tries to blend in without knowing the word.
- PettingZooChess: A two-player chess game environment integrated with PettingZoo.
- PettingZoo TicTacToe: A two-player tic-tac-toe game environment driven by hard-coded rules, distinct from the moderator-driven version.
Why Use It
ChatArena was designed to be a valuable tool for researchers and developers interested in the capabilities of Large Language Models in multi-agent settings. Its structured approach allows for:
- Research and Development: A dedicated platform for exploring and benchmarking LLM agents' communication and collaboration skills.
- Flexible Experimentation: The abstract framework enables easy definition and customization of new language games, players, and interaction rules.
- Diverse Environments: A suite of pre-built environments, from simple conversations to complex social deduction games and classic board games, provides immediate testing grounds.
- User-Friendly Interaction: Both command-line and web-based interfaces simplify the process of developing and prompt engineering LLM agents.
While the project is now deprecated, its design principles and the environments it offered provide insights into the challenges and opportunities in multi-agent LLM research.
Links
- GitHub Repository: https://github.com/Farama-Foundation/ChatArena
- PyPI: https://pypi.org/project/chatarena/
- HuggingFace Demo: https://chatarena-chatarena-demo.hf.space
- Colab Notebook: https://colab.research.google.com/drive/1vKaskNMBtuGOVgn8fQxMgjCevn2wp1Ml?authuser=0#scrollTo=P5DCC0Y0Zbxi
- Vimeo Demo Videos:
- WebUI Demo: https://vimeo.com/816979419
- CLI Demo: https://vimeo.com/816989884
- Twitter: https://twitter.com/_chatarena
- Discord (Farama Server): https://discord.gg/Vrtdmu9Y8Q
Related repositories
Similar repositories that may be relevant next.
Agentarium: A Python Framework for AI Agent Simulations
July 1, 2026
Agentarium is an open-source Python framework designed for creating and managing simulations with AI-powered agents. It offers an intuitive platform for designing complex, interactive environments where agents can act, learn, and evolve. This powerful tool simplifies the orchestration of multiple AI agents and their interactions.
Lighteval: Your All-in-One Toolkit for LLM Evaluation
July 1, 2026
Lighteval is a comprehensive toolkit from Hugging Face for evaluating Large Language Models (LLMs) across various backends. It enables users to dive deep into model performance by saving detailed, sample-by-sample results and supports over 1000 evaluation tasks. The framework offers extensive customization options, allowing users to create custom tasks and metrics tailored to their specific needs.

PromptBench: A Unified Framework for LLM Evaluation and Robustness
July 1, 2026
PromptBench is a comprehensive Python library designed for the evaluation and understanding of Large Language Models (LLMs). It provides a unified framework for assessing model performance, exploring various prompt engineering techniques, and evaluating robustness against adversarial attacks. This tool empowers researchers to conduct in-depth analyses of LLMs across diverse datasets and models.

EvalPlus: Rigorous Evaluation for LLM-Synthesized Code
June 30, 2026
EvalPlus is a robust framework designed for the rigorous evaluation of code generated by Large Language Models (LLMs). It extends standard benchmarks like HumanEval and MBPP with significantly more tests, offering precise assessment of code correctness and efficiency. This tool is crucial for developers and researchers aiming to thoroughly validate LLM-synthesized code.
Source repository
Open the original repository on GitHub.