ElatoAI: Realtime AI Voice Agents for ESP32 with SoTA Models

Introduction

ElatoAI is an innovative project enabling realtime AI voice agents on Arduino ESP32 devices. It integrates state-of-the-art AI voice models like OpenAI Realtime API, Gemini Live API, xAI Grok Voice Agent API, Eleven Labs AI Agents, and Hume AI EVI-4. This system allows for over 15 minutes of uninterrupted, globally accessible conversations, making it ideal for AI toys, companions, and various smart devices. The project leverages secure WebSockets and Deno Edge Functions to deliver low-latency, high-quality speech-to-speech interactions.

Installation

To get started with ElatoAI, follow these key steps:

Clone the repository:

git clone git@github.com:akdeb/ElatoAI.git

Start Supabase: Install the Supabase CLI and Docker Desktop, then run supabase start from the root directory.
Set up your NextJS Frontend: Navigate to frontend-nextjs, install dependencies (npm install), configure environment variables in .env.local, and run npm run dev.
Choose Edge Server Option: You can use the hosted ElatoAI server (ELATO MODE) or run your own local Deno edge server (DEV MODE). For local setup, navigate to server-deno, configure .env with API keys, and run deno run -A --env-file=.env main.ts.
Setup ESP32 Device Firmware: In firmware-arduino/Config.cpp, set ws_server and backend_server to your local IP address. Build and upload the firmware.
Setup ESP32 Device Wifi: The ESP32 will create an ELATO-DEVICE captive portal. Connect to it and configure your Wi-Fi credentials via http://192.168.4.1.
Turn on your device: After configuration, restart the ESP32 to connect to your Wi-Fi and server, enabling conversations with your AI character.

For detailed instructions, refer to the project's GitHub repository.

Examples

ElatoAI supports a variety of cutting-edge AI models for diverse conversational experiences. You can explore demo videos showcasing the capabilities with different providers:

These examples demonstrate the system's ability to handle complex, real-time speech interactions with various AI personalities.

Why Use ElatoAI?

ElatoAI stands out for its unique combination of features and performance:

Realtime Speech-to-Speech: Experience instant, natural conversations powered by leading AI models.
Hardware Integration: Seamlessly deploy advanced AI capabilities on affordable and widely available ESP32 microcontrollers.
Global Performance: Deno Edge Functions ensure low latency and smooth interactions worldwide.
Customizable Agents: Create and manage custom AI agents with distinct personalities and voices.
Comprehensive Features: Includes secure WebSockets, server VAD turn detection, Opus audio compression, conversation history, device management, OTA updates, and more.
DIY Friendly: Provides detailed instructions and hardware designs for building your own AI devices.

ElatoAI: Realtime AI Voice Agents for ESP32 with SoTA Models

Summary

Repository Info

Tags

Introduction

Installation

Examples

Why Use ElatoAI?

Links