Overview
VoiceBot is the standard agent for building real-time, speech-based conversational experiences in MeshAgent. It builds on SingleRoomAgent and adds streaming audio input/output, LiveKit session management, speech recognition, and natural voice responses.
A VoiceBot joins a MeshAgent room, listens for voice_call messages, connects to a LiveKit breakout room where it can speak and listen to participants in real-time. It combines speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), LLM reasoning, and tool calling automatically.
Two ways to build a VoiceBot
- CLI: Run production-ready voice agents with a single command. Configure speech, tools, and rules using CLI flags. Ideal for most use cases.
- SDK: Extend the base
VoiceBotwith custom code when you need deeper integrations or specialized behaviors. Best for full control or more complex logic.
In this guide you will learn
- When to use
VoiceBot - How to run and deploy a
VoiceBotwith the MeshAgent CLI - How to build and deploy a
VoiceBotwith the MeshAgent SDK - How
VoiceBotworks, including lifecycle, voice sessions, conversation flow, hooks, and methods
When to use VoiceBot
Use theVoiceBot when you need an agent that:
- Talks and listens in real time using speech
- Manages live voice sessions automatically via LiveKit
- Runs LLM reasoning and tools during spoken interaction
- Supports natural interruptions and turn-taking
- Feels like a phone call or meeting assistant, instead of using a text based chat.
Run and deploy a VoiceBot with the CLI
Step 1: Run a VoiceBot from the CLI
Let’s run a VoiceBot from the CLI with a custom rule and shared rules that can be edited by anyone in the room. The room rules can be modified per conversation turn while the base rule will be applied to the entire conversation.
bash
--room-rules "agents/voiceagent/rules.txt" flag and supply a file path for the rules, the file will be created if it does not already exist, this file is relative to the room storage.
Step 2: Interact with the agent in MeshAgent Studio
- Go to MeshAgent Studio and log in
- Enter your room
quickstart - Select the agent
voiceagentand begin speaking!
--room-rules flag to your agent you can modify the agent’s rules.txt file to refine the agent’s behavior. Changes to the rules.txt will be applied per message.
Tip: Mute your microphone after you finish speaking to prevent background noise from interfering with the agent.
Step 3: Package and deploy the agent
Once your agent works locally to make it always available you’ll need to package and deploy it as a project or room service. You can do this using the CLI, by creating a YAML file, or from MeshAgent Studio. Both options below deploy the same VoiceBot - choose based on your workflow:- Option 1 (
meshagent voicebot deploy): One command that deploys immediately (fastest/easiest approach) - Option 2 (
meshagent voicebot spec+meshagent service create): Generates a yaml file you can review, or further customize before deploying
VoiceBot to your room.
bash
meshagent.yaml file that defines how our service should run, then deploy the agent to our room. The service spec can be dynamically generated from the CLI by running:
bash
meshagent.yaml file
bash
VoiceBot is now deployed to the quickstart room! Now the agent will always be available inside the room for us to chat with. You can interact with the agent directly from the Studio or from Powerboards. With Powerboards you can easily share your agents with others or use built in agents.
Build and deploy a VoiceBot with the SDK
Step 1: Create a VoiceBot agent
This example shows aVoiceBot with a custom rule to guide the agent’s behavior. For an agent this simple the CLI VoiceBot would be sufficient. The Python SDK code here demonstrates how to get similar functionality as the CLI.
To run the VoiceBot we’ll use the MeshAgent ServiceHost. The ServiceHost is a lightweight HTTP server that allows you to register one or more tools or agents on their own path (e.g., /agent). The host automatically exposes each path as a webhook. When a room makes a call to that path, ServiceHost handles the handshake, connects the agent to the room, and forwards requests and responses between your code and the MeshAgent infrastructure.
Step 2: Call the agent into a room
Run theVoiceBot locally and connect it to a Room:
/agent. If you are running multiple agents or tools, you can use the same ServiceHost and set different paths for each of the agents. The service run command automatically detects the different agent paths and identities (this is the recommended way to test your agents and tools).
Once the agent joins the room, you can converse with it in MeshAgent Studio.
Step 3: Interact with the agent in MeshAgent Studio
- Go to MeshAgent Studio and login
- Enter your room
quickstart - Select the agent
voiceagentand begin chatting!
Ctrl+C from the terminal to stop the agent then re-run the meshagent service run command.
Note: Building an agent will likely take multiple rounds of iterating through writing different versions of the system prompt and crafting the best tools for the agent before it’s ready for deployment.
Step 4: Package and deploy the agent
To deploy your SDK VoiceBot permanently, you’ll package your code with ameshagent.yaml file that defines the service configuration and a container image that MeshAgent can run.
For full details on the service spec and deployment flow, see Packaging Services and Deploying Services.
MeshAgent supports two deployment patterns for containers:
- Runtime image + code mount (recommended): Use a pre-built MeshAgent runtime image (like
python-sdk-slim) that contains Python and all MeshAgent dependencies. Mount your lightweight code-only image on top. This keeps your code image tiny (~KB), eliminates dependency installation time, and allows your service to start quickly. - Single Image: Bundle your code and all dependencies into one image. This is good when you need to install additional libraries, but can result in larger images and slower pulls. If you build your own images we recommend optimizing them with eStargz.
python-docs-examples code image so you can run the documentation sample without building your own image.
If you want to build and push your own code image, follow the steps below and update the storage.images entry in meshagent.yaml.
Prepare your project structure
This example organizes the agent code and configuration in the same folder, making each agent self-contained:
Note: If you’re building a single agent, you only need the simple_voicebot/ folder. The structure shown supports multiple samples sharing one Dockerfile.
Step 4a: Build a Docker container
If you want a code-only image, create a scratch Dockerfile and copy the files you want to run. This creates a minimal image that pairs with the runtime image + code mount pattern.
docker buildx:
bash
Note: Building from the project root copies your entire project structure into the image. For a single agent, this is fine - your image will just contain one folder. For multi-agent projects, all agents will be in one image, but each can deploy independently using its own meshagent.yaml.
Step 4b: Package the agent
Define the service configuration in a meshagent.yaml file.
- Your code image contains
simple_voicebot/simple_voicebot.py - It’s mounted at
/srcin the runtime container - The command runs
python /src/simple_voicebot/simple_voicebot.py
Note: The default YAML in the docs uses us-central1-docker.pkg.dev/meshagent-public/images/python-docs-examples so you can test this example immediately without building your own image first. Replace this with your actual image tag when deploying your own code.
Step 4c: Deploy the agent
Next from the CLI in the directory where your meshagent.yaml file is run:
VoiceBot is now deployed to the quickstart room! Now the agent will always be available inside the room for us to chat with. You can interact with the agent directly from the Studio or from Powerboards.
Note: If you previously deployed the CLI voicebot with name “voiceagent” you will need to give your SDK based voicebot a unique identity (token name).
How VoiceBot Works
Constructor Parameters
VoiceBot accepts everything from SingleRoomAgent (name, title, description, requires, labels) and adds voice-specific configuration options.
| Parameter | Type | Description |
|---|---|---|
name | str | None | Deprecated. Agent identity comes from the participant token; if provided, it is only used to default title. |
title | str | None | Human-friendly name. If omitted and you set name, it defaults to that value. |
description | str | None | Short description of what the voice agent does. |
labels | list[str] | None | Optional tags for discovery and filtering. |
rules | list[str] | None | System or behavior rules sent to the LLM. Defaults to ["You are a helpful assistant communicating through voice."]. |
auto_greet_message | str | None | Optional message spoken to the participant when the session starts. |
auto_greet_prompt | str | None | Optional text prompt that seeds the first LLM response at session start. |
tool_adapter | ToolResponseAdapter | None | Optional adapter that converts tool responses into plain speech. |
toolkits | list[Toolkit] | None | Additional Toolkits to expose beyond requires; pass instantiated toolkits you want available to the LLM. |
requires | list[Requirement] | None | Toolkits or schemas needed before running (e.g., for tool access or shared data). |
client_rules | dict[str, list[str]] | None | Optional map keyed by the participant’s client attribute that appends extra rules when matched. |
Lifecycle Overview
VoiceBot inherits all lifecycle hooks from SingleRoomAgent and adds voice session handling on top.
await start(room: RoomClient): Registers the bot as a voice-capable participant by setting"supports_voice": Trueand listening forvoice_callmessages. When a voice call is received, the bot joins a LiveKit breakout room and starts a real-time session.await stop(): Clears the room reference. Active voice sessions end when the LiveKit room disconnects (for example, when the caller hangs up).- room property: Returns the active
RoomClient(inherited from SingleRoomAgent).
Conversational Flow
When a user starts a voice call:- The participant sends a message of type
"voice_call"with a breakout room ID (and optionally atranscript_path). - The
VoiceBotreceives it throughon_message()and joins the corresponding LiveKit room. - It creates a
ToolContextscoped to that participant. - A new
AgentSessionis created, containing:- Speech-to-Text (STT)
- Text-to-Speech (TTS)
- Voice Activity Detection (VAD)
- LLM model interface
- A conversational Agent is built with these components and any registered tools.
- The bot begins listening, thinking (with background “typing” sounds), and speaking responses in real time.
transcript_path is provided in the voice_call message, the session uses a Transcriber to log conversation_item_added events to that path while running.
Key Behaviors and Hooks
- Voice connection management: Each session is isolated in its own LiveKit breakout room. The internal
VoiceConnectionhelper handles joining, connecting, and disconnecting from the session safely. - Session creation:
create_session()constructs an AgentSession with STT, TTS, VAD, and LLM components wired to the room’s proxy API. - Agent creation:
create_agent()builds the conversational logic layer that uses the LLM, applies your rules, and exposes tool functions likesay(). - Greeting behavior: When configured,
auto_greet_prompttriggers the LLM to generate an initial spoken message, andauto_greet_messageplays a prewritten greeting. - Tool integration: Tools are automatically converted into callable functions for the LLM via
make_function_tools(). Responses are adapted to speech when atool_adapteris provided. - Transcript logging: When a
transcript_pathis supplied,create_agent()returns aTranscriberthat logs conversation items to that location. - Lifecycle hooks: Override
on_session_created(),on_session_started(), oron_session_ended()to add custom logic around the session lifecycle. - Interruptions and natural flow: Sessions allow interruption mid-speech and handle turn-taking automatically through VAD.
Key Methods
| Method | Description |
|---|---|
async def start(room) | Registers the bot for voice calls and listens for voice_call messages. |
async def run_voice_agent(participant, breakout_room) | Connects to the specified LiveKit room and starts a full voice session. |
async def create_session(context) | Creates a new AgentSession wired with STT, TTS, VAD, and LLM capabilities. |
async def create_agent(context, session) | Builds a conversational agent with your rules and available tools. |
async def make_function_tools(context) | Converts all registered toolkits into LLM-callable functions. |
async def _wait_for_disconnect(room) | Awaits the end of the voice call and cleans up resources. |
async def on_session_created(context, session) | Hook called after an AgentSession is constructed but before it starts. |
async def on_session_started(context, session) | Hook called immediately after the session starts. |
async def on_session_ended(context, session) | Hook called after the session ends. |
Built-in Components and Behavior
VoiceBot comes pre-integrated with:- Speech-to-Text (STT): Converts live audio input into text using OpenAI STT via the room’s proxied client (override
create_sessionto swap providers). - Text-to-Speech (TTS): Streams generated responses as natural audio output using OpenAI TTS.
- Voice Activity Detection (VAD): Detects pauses or user interruptions automatically (Silero VAD).
- Room I/O defaults: Text input is disabled; audio output and transcription are enabled for the LiveKit room session.
- Background Audio Player: Plays gentle “thinking” keyboard sounds during LLM processing to indicate the bot is working.
- Tool Invocation: Voice commands can trigger registered tools and return spoken responses.
Next Steps
VoiceBot builds directly on SingleRoomAgent, inheriting connection management and toolkit installation, and extends it with real-time audio and speech features. Where ChatBot focuses on text threads, VoiceBot handles spoken conversations end-to-end.
Discover other agents in MeshAgent:
- ChatBot: Text-based LLM conversations.
- Worker and TaskRunner: Background or automation agents.
- MailBot: Email based agents.
- Services & Containers: Understand different options for running, deploying, and managing agents with MeshAgent
- Secrets & Registries: Learn how to store credentials securely for deployment