ElatoAI: Realtime AI Voice Agents for ESP32 with SoTA Models
This repository profile is provided by osrepos.com, an open source repository discovery platform.

Summary
ElatoAI is an innovative project that brings realtime AI voice agents to Arduino ESP32 devices. It integrates state-of-the-art AI models like OpenAI, Gemini, Grok, Eleven Labs, and Hume AI for seamless, uninterrupted conversations. Leveraging secure WebSockets and Deno Edge Functions, ElatoAI enables low-latency, high-quality speech-to-speech interactions globally, perfect for AI toys, companions, and smart devices.
Repository Information
Topics
Click on any tag to explore related repositories
Use at your own risk
OSRepos shares public repositories for knowledge and discovery only. Any installation, execution, configuration, or use of code from these repositories is the user's own responsibility. Always review the repository, source code, dependencies, licenses, and security implications before running or installing anything. OSRepos is not responsible for issues, damages, or losses resulting from third-party repositories.
Introduction
ElatoAI is an innovative project enabling realtime AI voice agents on Arduino ESP32 devices. It integrates state-of-the-art AI voice models like OpenAI Realtime API, Gemini Live API, xAI Grok Voice Agent API, Eleven Labs AI Agents, and Hume AI EVI-4. This system allows for over 15 minutes of uninterrupted, globally accessible conversations, making it ideal for AI toys, companions, and various smart devices. The project leverages secure WebSockets and Deno Edge Functions to deliver low-latency, high-quality speech-to-speech interactions.
Installation
To get started with ElatoAI, follow these key steps:
-
Clone the repository:
git clone git@github.com:akdeb/ElatoAI.git -
Start Supabase: Install the Supabase CLI and Docker Desktop, then run
supabase startfrom the root directory. -
Set up your NextJS Frontend: Navigate to
frontend-nextjs, install dependencies (npm install), configure environment variables in.env.local, and runnpm run dev. -
Choose Edge Server Option: You can use the hosted ElatoAI server (ELATO MODE) or run your own local Deno edge server (DEV MODE). For local setup, navigate to
server-deno, configure.envwith API keys, and rundeno run -A --env-file=.env main.ts. -
Setup ESP32 Device Firmware: In
firmware-arduino/Config.cpp, setws_serverandbackend_serverto your local IP address. Build and upload the firmware. -
Setup ESP32 Device Wifi: The ESP32 will create an
ELATO-DEVICEcaptive portal. Connect to it and configure your Wi-Fi credentials viahttp://192.168.4.1. -
Turn on your device: After configuration, restart the ESP32 to connect to your Wi-Fi and server, enabling conversations with your AI character.
For detailed instructions, refer to the project's GitHub repository.
Examples
ElatoAI supports a variety of cutting-edge AI models for diverse conversational experiences. You can explore demo videos showcasing the capabilities with different providers:
These examples demonstrate the system's ability to handle complex, real-time speech interactions with various AI personalities.
Why Use ElatoAI?
ElatoAI stands out for its unique combination of features and performance:
- Realtime Speech-to-Speech: Experience instant, natural conversations powered by leading AI models.
- Hardware Integration: Seamlessly deploy advanced AI capabilities on affordable and widely available ESP32 microcontrollers.
- Global Performance: Deno Edge Functions ensure low latency and smooth interactions worldwide.
- Customizable Agents: Create and manage custom AI agents with distinct personalities and voices.
- Comprehensive Features: Includes secure WebSockets, server VAD turn detection, Opus audio compression, conversation history, device management, OTA updates, and more.
- DIY Friendly: Provides detailed instructions and hardware designs for building your own AI devices.
Links
- GitHub Repository: https://github.com/akdeb/ElatoAI
- Homepage: https://elatoai.com/
- Kickstarter (Pre-launch): https://www.kickstarter.com/projects/elatoai/elato-make-toys-talk-with-ai-voices
- OpenAI Cookbook Example: https://cookbook.openai.com/examples/voice_solutions/running_realtime_api_speech_on_esp32_arduino_edge_runtime_elatoai
- Hacker News Launch: https://news.ycombinator.com/item?id=43762409
- Adafruit Product Mention: https://blog.adafruit.com/2025/05/06/elatoai-realtime-speech-ai-agents-for-esp32/
Related repositories
Similar repositories that may be relevant next.

agentmemory: Persistent Memory for AI Coding Agents
May 27, 2026
agentmemory provides persistent memory for AI coding agents, ensuring they remember past interactions and project context across sessions. This eliminates the need for re-explaining, significantly boosting agent efficiency and reducing token costs. Built on the `iii engine`, it offers high retrieval accuracy and multi-agent support without external databases.

agent-service-toolkit: A Comprehensive Toolkit for AI Agent Services with LangGraph
March 17, 2026
The agent-service-toolkit is a full-featured repository for building and running AI agent services. It leverages LangGraph for sophisticated agent logic, FastAPI for a robust service API, and Streamlit for an interactive chat interface. This toolkit provides a comprehensive and robust template for developing and deploying custom AI agents with ease.

Deep Agents: The Batteries-Included Agent Harness for Complex AI Tasks
March 4, 2026
Deep Agents is an agent harness built on LangChain and LangGraph, designed to simplify the creation of complex AI agents. It comes equipped with essential tools like planning, filesystem access, and the ability to spawn sub-agents, enabling it to handle sophisticated agentic tasks out of the box. This framework provides a ready-to-run agent that can be easily customized with additional tools, models, and prompts.

Company Research Agent: Deep Diligence with Multi-Agent AI and LangGraph
February 3, 2026
The Company Research Agent is an advanced tool designed for in-depth company diligence, leveraging a multi-agent framework built with LangGraph and Tavily. It efficiently gathers, filters, and synthesizes information from various sources. The system utilizes Google's Gemini 2.5 Flash for high-context synthesis and OpenAI's GPT-5.1 for precise formatting, delivering comprehensive research reports.
Source repository
Open the original repository on GitHub.