ElatoAI: Realtime AI Voice Agents for ESP32 with SoTA Models

Summary
ElatoAI is an innovative project that brings realtime AI voice agents to Arduino ESP32 devices. It integrates state-of-the-art AI models like OpenAI, Gemini, Grok, Eleven Labs, and Hume AI for seamless, uninterrupted conversations. Leveraging secure WebSockets and Deno Edge Functions, ElatoAI enables low-latency, high-quality speech-to-speech interactions globally, perfect for AI toys, companions, and smart devices.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
ElatoAI is an innovative project enabling realtime AI voice agents on Arduino ESP32 devices. It integrates state-of-the-art AI voice models like OpenAI Realtime API, Gemini Live API, xAI Grok Voice Agent API, Eleven Labs AI Agents, and Hume AI EVI-4. This system allows for over 15 minutes of uninterrupted, globally accessible conversations, making it ideal for AI toys, companions, and various smart devices. The project leverages secure WebSockets and Deno Edge Functions to deliver low-latency, high-quality speech-to-speech interactions.
Installation
To get started with ElatoAI, follow these key steps:
-
Clone the repository:
git clone git@github.com:akdeb/ElatoAI.git -
Start Supabase: Install the Supabase CLI and Docker Desktop, then run
supabase startfrom the root directory. -
Set up your NextJS Frontend: Navigate to
frontend-nextjs, install dependencies (npm install), configure environment variables in.env.local, and runnpm run dev. -
Choose Edge Server Option: You can use the hosted ElatoAI server (ELATO MODE) or run your own local Deno edge server (DEV MODE). For local setup, navigate to
server-deno, configure.envwith API keys, and rundeno run -A --env-file=.env main.ts. -
Setup ESP32 Device Firmware: In
firmware-arduino/Config.cpp, setws_serverandbackend_serverto your local IP address. Build and upload the firmware. -
Setup ESP32 Device Wifi: The ESP32 will create an
ELATO-DEVICEcaptive portal. Connect to it and configure your Wi-Fi credentials viahttp://192.168.4.1. -
Turn on your device: After configuration, restart the ESP32 to connect to your Wi-Fi and server, enabling conversations with your AI character.
For detailed instructions, refer to the project's GitHub repository.
Examples
ElatoAI supports a variety of cutting-edge AI models for diverse conversational experiences. You can explore demo videos showcasing the capabilities with different providers:
These examples demonstrate the system's ability to handle complex, real-time speech interactions with various AI personalities.
Why Use ElatoAI?
ElatoAI stands out for its unique combination of features and performance:
- Realtime Speech-to-Speech: Experience instant, natural conversations powered by leading AI models.
- Hardware Integration: Seamlessly deploy advanced AI capabilities on affordable and widely available ESP32 microcontrollers.
- Global Performance: Deno Edge Functions ensure low latency and smooth interactions worldwide.
- Customizable Agents: Create and manage custom AI agents with distinct personalities and voices.
- Comprehensive Features: Includes secure WebSockets, server VAD turn detection, Opus audio compression, conversation history, device management, OTA updates, and more.
- DIY Friendly: Provides detailed instructions and hardware designs for building your own AI devices.
Links
- GitHub Repository: https://github.com/akdeb/ElatoAI
- Homepage: https://elatoai.com/
- Kickstarter (Pre-launch): https://www.kickstarter.com/projects/elatoai/elato-make-toys-talk-with-ai-voices
- OpenAI Cookbook Example: https://cookbook.openai.com/examples/voice_solutions/running_realtime_api_speech_on_esp32_arduino_edge_runtime_elatoai
- Hacker News Launch: https://news.ycombinator.com/item?id=43762409
- Adafruit Product Mention: https://blog.adafruit.com/2025/05/06/elatoai-realtime-speech-ai-agents-for-esp32/