VideoSDK AI Agents: Build Real-time Multimodal Conversational AI

Summary
VideoSDK AI Agents is an open-source Python framework designed for developing real-time, multimodal conversational AI agents. It enables seamless, natural voice and media interactions between users and intelligent agents within VideoSDK rooms. This powerful framework supports integration with various AI models and tools, facilitating advanced conversational experiences.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
VideoSDK AI Agents is an open-source Python framework designed for developing real-time, multimodal conversational AI agents. It provides a robust infrastructure to connect your agent worker, VideoSDK room, and user devices, enabling natural voice and media interactions between users and intelligent agents. This framework is built on top of the VideoSDK Python SDK, allowing AI-powered agents to seamlessly join VideoSDK rooms as participants.
Key features include:
- Real-time Communication (Audio/Video): Agents can listen, speak, and interact live in meetings.
- SIP & Telephony Integration: Connect agents to phone systems via SIP for call handling and routing.
- Virtual Avatars: Enhance interaction and presence with lifelike avatars using Simli.
- Multi-Model Support: Integrate with leading AI models like OpenAI, Gemini, AWS NovaSonic, and more.
- Cascading and Realtime Pipelines: Flexible pipeline options for STT, LLM, and TTS.
- Function Tools: Extend agent capabilities with custom functions for event scheduling, data retrieval, and more.
- Observability: Built-in OpenTelemetry tracing and metrics collection.
- CLI Tool: Run and test agents locally with the
videosdkCLI.
Installation
To get started with VideoSDK AI Agents, follow these steps:
Prerequisites
Before you begin, ensure you have:
- A VideoSDK authentication token (generate from app.videosdk.live)
- A VideoSDK meeting ID (generate using the Create Room API or dashboard)
- Python 3.12 or higher
- Third-Party API Keys for services like OpenAI, ElevenLabs, Google Gemini, etc.
Steps
- Create and activate a virtual environment with Python 3.12 or higher.
python3 -m venv venv source venv/bin/activate(For Windows, use
python -m venv venvandvenv\Scripts\activate) - Install the core VideoSDK AI Agent package:
pip install videosdk-agents - Install Optional Plugins: Plugins integrate different providers for Realtime, STT, LLM, TTS, VAD, Avatar, and SIP. Install what your use case needs.
# Example: Install the Turn Detector plugin pip install videosdk-plugins-turn-detectorYou can also install with specific plugins:
pip install videosdk-agents[openai,elevenlabs,silero]
Examples
The framework offers various examples to demonstrate its capabilities and common use cases:
- AI Telephony Agent Quickstart: A hospital appointment booking agent via voice. View Example
- AI Whatsapp Agent Quickstart: An agent for asking about available hotel rooms and booking on the go. View Example
- Multi Agent System: A customer care agent that transfers loan-related queries to a Loan Specialist Agent. View Example
- Agent with Knowledge (RAG): An agent that answers questions based on documentation knowledge. View Example
- Virtual Avatar Agent: A virtual avatar agent that presents weather forecasts. View Example
Why Use VideoSDK AI Agents?
VideoSDK AI Agents stands out for its comprehensive approach to building real-time conversational AI. Its key advantages include:
- Real-time, Natural Interactions: Facilitate seamless, low-latency voice and multimodal conversations, making agent interactions feel more human-like.
- Extensive AI Model Integration: Support for a wide array of Real-time, STT, LLM, and TTS providers, offering flexibility and choice for your AI stack.
- Flexible Pipeline Architecture: Choose between Cascading and Realtime pipelines to optimize for latency or complexity based on your application's needs.
- Powerful Function Tools: Easily extend agent intelligence with custom tools, allowing agents to perform actions, retrieve data, and interact with external systems.
- Telephony and Virtual Avatar Support: Integrate agents into traditional phone systems via SIP and enhance user engagement with virtual avatars.
- Open-Source and Pythonic: Leverage the power and flexibility of Python with an open-source framework, fostering community contributions and transparency.
Links
- GitHub Repository: https://github.com/videosdk-live/agents
- Official Documentation: https://docs.videosdk.live/ai_agents/introduction
- API Reference: https://docs.videosdk.live/agent-sdk-reference/agents/
- Discord Community: Join Us on Discord
- Twitter: Follow @video_sdk
- YouTube Channel: VideoSDK on YouTube
- LinkedIn: VideoSDK Company Page