Overview
VoiceBot is the standard agent for building real-time, speech-based conversational experiences in MeshAgent. It builds on the Python SingleRoomAgent base class from meshagent-agents and adds streaming audio input/output, LiveKit session management, speech recognition, and natural voice responses.
A VoiceBot joins a MeshAgent room, listens for voice_call messages, connects to a LiveKit breakout room where it can speak and listen to participants in real-time. It combines speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), LLM reasoning, and tool calling automatically.
The MeshAgent CLI is the recommended way to run and deploy VoiceBots. Configure speech, tools, and rules with CLI flags, then deploy the same command when you want the agent to stay available in a room.
In this guide you will learn
- When to use
VoiceBot - How to run and deploy a
VoiceBotwith the MeshAgent CLI - How
VoiceBotworks, including lifecycle, voice sessions, conversation flow, hooks, and methods
When to use VoiceBot
Use theVoiceBot when you need an agent that:
- Talks and listens in real time using speech
- Manages live voice sessions automatically via LiveKit
- Runs LLM reasoning and tools during spoken interaction
- Supports natural interruptions and turn-taking
- Feels like a phone call or meeting assistant, instead of using a text based chat.
Run and deploy a VoiceBot with the CLI
Step 1: Run a VoiceBot from the CLI
Let’s run a VoiceBot from the CLI with a custom rule and shared rules that can be edited by anyone in the room. The room rules can be modified per conversation turn while the base rule will be applied to the entire conversation.
bash
--room-rules "agents/voiceagent/rules.md" flag and supply a file path for the rules, the file will be created if it does not already exist, this file is relative to the room storage.
Step 2: Interact with the agent in MeshAgent Studio
- Go to MeshAgent Studio and log in
- Enter your room
quickstart - Select the agent
voiceagentand begin speaking!
--room-rules flag to your agent you can modify the agent’s rules.md file to refine the agent’s behavior. Changes to the rules.md will be applied per message.
Tip: Mute your microphone after you finish speaking to prevent background noise from interfering with the agent.
Step 3: Package and deploy the agent
Once your agent works locally to make it always available you’ll need to package and deploy it as a project or room service. You can do this using the CLI, by creating a YAML file, or from MeshAgent Studio. Both options below deploy the same VoiceBot - choose based on your workflow:- Option 1 (
meshagent voicebot deploy): One command that deploys immediately (fastest/easiest approach) - Option 2 (
meshagent voicebot spec+meshagent service create): Generates a yaml file you can review, or further customize before deploying
VoiceBot to your room.
bash
meshagent.yaml file that defines how our service should run, then deploy the agent to our room. The service spec can be dynamically generated from the CLI by running:
bash
meshagent.yaml file
bash
VoiceBot is now deployed to the quickstart room! Now the agent will always be available inside the room for us to chat with. You can interact with the agent directly from the Studio or from Powerboards. With Powerboards you can easily share your agents with others or use built in agents.
How VoiceBot Works
Constructor Parameters
The CLI configures the underlying PythonVoiceBot class. If you extend that class directly, it accepts the current SingleRoomAgent base options and adds voice-specific configuration options.
| Parameter | Type | Description |
|---|---|---|
name | str | None | Optional explicit participant name. Most CLI and deployment flows set the participant identity outside the class. |
title | str | None | Human-friendly name for clients, logs, and operator surfaces. |
description | str | None | Short description of what the voice agent does. |
annotations | list[str] | None | Optional string metadata for clients and services that inspect the agent. |
voice | str | OpenAI Realtime voice name. Defaults to "echo". |
rules | list[str] | None | System or behavior rules sent to the LLM. Defaults to ["You are a helpful assistant communicating through voice."]. |
auto_greet_message | str | None | Optional message spoken to the participant when the session starts. |
auto_greet_prompt | str | None | Optional text prompt that seeds the first LLM response at session start. |
tool_adapter | ToolResponseAdapter | None | Optional adapter that converts tool responses into plain speech. |
toolkits | list[Toolkit] | None | Additional Toolkits to expose beyond requires; pass instantiated toolkits you want available to the LLM. |
requires | list[Requirement] | None | Toolkits or schemas needed before running (e.g., for tool access or shared data). |
client_rules | dict[str, list[str]] | None | Optional map keyed by the participant’s client attribute that appends extra rules when matched. |
Lifecycle Overview
VoiceBot inherits all lifecycle hooks from SingleRoomAgent and adds voice session handling on top.
await start(room: RoomClient): Registers the bot as a voice-capable participant by setting"supports_voice": Trueand listening forvoice_callmessages. When a voice call is received, the bot joins a LiveKit breakout room and starts a real-time session.await stop(): Clears the room reference. Active voice sessions end when the LiveKit room disconnects (for example, when the caller hangs up).- room property: Returns the active
RoomClient(inherited fromSingleRoomAgent).
Conversational Flow
When a user starts a voice call:- The participant sends a message of type
"voice_call"with a breakout room ID (and optionally atranscript_path). - The
VoiceBotreceives it throughon_message()and joins the corresponding LiveKit room. - It creates a
VoiceBotContextscoped to that participant. - A new
AgentSessionis created, containing:- Speech-to-Text (STT)
- Text-to-Speech (TTS)
- Voice Activity Detection (VAD)
- LLM model interface
- A conversational Agent is built with these components and any registered tools.
- The bot begins listening, thinking (with background “typing” sounds), and speaking responses in real time.
transcript_path is provided in the voice_call message, the session uses a Transcriber to log conversation_item_added events to that path while running.
Key Behaviors and Hooks
- Voice connection management: Each session is isolated in its own LiveKit breakout room. The internal
VoiceConnectionhelper handles joining, connecting, and disconnecting from the session safely. - Session creation:
create_session()constructs an AgentSession with STT, TTS, VAD, and LLM components wired to the room’s proxy API. - Agent creation:
create_agent()builds the conversational logic layer that uses the LLM, applies your rules, and exposes tool functions likesay(). - Greeting behavior: When configured,
auto_greet_prompttriggers the LLM to generate an initial spoken message, andauto_greet_messageplays a prewritten greeting. - Tool integration: Tools are automatically converted into callable functions for the LLM via
make_function_tools(). Responses are adapted to speech when atool_adapteris provided. - Transcript logging: When a
transcript_pathis supplied,create_agent()returns aTranscriberthat logs conversation items to that location. - Lifecycle hooks: Override
on_session_created(),on_session_started(), oron_session_ended()to add custom logic around the session lifecycle. - Interruptions and natural flow: Sessions allow interruption mid-speech and handle turn-taking automatically through VAD.
Key Methods
| Method | Description |
|---|---|
async def start(room) | Registers the bot for voice calls and listens for voice_call messages. |
async def run_voice_agent(participant, breakout_room) | Connects to the specified LiveKit room and starts a full voice session. |
async def create_session(context) | Creates a new AgentSession wired with STT, TTS, VAD, and LLM capabilities. |
async def create_agent(context, session) | Builds a conversational agent with your rules and available tools. |
async def make_function_tools(context) | Converts all registered toolkits into LLM-callable functions. |
async def _wait_for_disconnect(room) | Awaits the end of the voice call and cleans up resources. |
async def on_session_created(context, session) | Hook called after an AgentSession is constructed but before it starts. |
async def on_session_started(context, session) | Hook called immediately after the session starts. |
async def on_session_ended(context, session) | Hook called after the session ends. |
Built-in Components and Behavior
VoiceBot comes pre-integrated with:- Speech-to-Text (STT): Converts live audio input into text using OpenAI STT via the room’s proxied client (override
create_sessionto swap providers). - Text-to-Speech (TTS): Streams generated responses as natural audio output using OpenAI TTS.
- Voice Activity Detection (VAD): Detects pauses or user interruptions automatically (Silero VAD).
- Room I/O defaults: Text input is disabled; audio output and transcription are enabled for the LiveKit room session.
- Background Audio Player: Plays gentle “thinking” keyboard sounds during LLM processing to indicate the bot is working.
- Tool Invocation: Voice commands can trigger registered tools and return spoken responses.
Next Steps
VoiceBot builds directly on SingleRoomAgent, inheriting connection management and toolkit installation, and extends it with real-time audio and speech features. Where process-backed agents focus on text and background channels, VoiceBot handles spoken conversations end-to-end.
Discover other agents in MeshAgent:
- Process Agents Overview: Text-based and multi-channel process-backed agents.
- Agents Overview: How agents, channels, threads, and tools fit together.
- Service YAML: write service manifests for voice agents.
- Secrets and Credentials: Learn how to store credentials securely for deployment