Skip to main content

Overview

VoiceBot is the standard agent for building real-time, speech-based conversational experiences in MeshAgent. It builds on SingleRoomAgent and adds streaming audio input/output, LiveKit session management, speech recognition, and natural voice responses. A VoiceBot joins a MeshAgent room, listens for voice_call messages, connects to a LiveKit breakout room where it can speak and listen to participants in real-time. It combines speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), LLM reasoning, and tool calling automatically.

Two ways to build a VoiceBot

  1. CLI: Run production-ready voice agents with a single command. Configure speech, tools, and rules using CLI flags. Ideal for most use cases.
  2. SDK: Extend the base VoiceBot with custom code when you need deeper integrations or specialized behaviors. Best for full control or more complex logic.
Both approaches deploy the same way and can operate together in the same Rooms. We recommend starting with the CLI, it’s fastest and covers most scenarios, then moving to the SDK when you need further customization.

In this guide you will learn

  1. When to use VoiceBot
  2. How to run and deploy a VoiceBot with the MeshAgent CLI
  3. How to build and deploy a VoiceBot with the MeshAgent SDK
  4. How VoiceBot works, including lifecycle, voice sessions, conversation flow, hooks, and methods

When to use VoiceBot

Use the VoiceBot when you need an agent that:
  • Talks and listens in real time using speech
  • Manages live voice sessions automatically via LiveKit
  • Runs LLM reasoning and tools during spoken interaction
  • Supports natural interruptions and turn-taking
  • Feels like a phone call or meeting assistant, instead of using a text based chat.
If your agent only handles text-based chat, use ChatBot instead. For non-interactive background agents, see Worker or TaskRunner. For email based agents use MailBot.

Run and deploy a VoiceBot with the CLI

Step 1: Run a VoiceBot from the CLI

Let’s run a VoiceBot from the CLI with a custom rule and shared rules that can be edited by anyone in the room. The room rules can be modified per conversation turn while the base rule will be applied to the entire conversation.
bash
# Authenticate to MeshAgent if not already signed in
meshagent setup

# Call a voicebot into your room
meshagent voicebot join --room quickstart --agent-name voiceagent --room-rules "agents/voiceagent/rules.txt" --rule "You are a helpful assistant"
When you add the --room-rules "agents/voiceagent/rules.txt" flag and supply a file path for the rules, the file will be created if it does not already exist, this file is relative to the room storage.

Step 2: Interact with the agent in MeshAgent Studio

  1. Go to MeshAgent Studio and log in
  2. Enter your room quickstart
  3. Select the agent voiceagent and begin speaking!
If you’ve added the --room-rules flag to your agent you can modify the agent’s rules.txt file to refine the agent’s behavior. Changes to the rules.txt will be applied per message.
Tip: Mute your microphone after you finish speaking to prevent background noise from interfering with the agent.

Step 3: Package and deploy the agent

Once your agent works locally to make it always available you’ll need to package and deploy it as a project or room service. You can do this using the CLI, by creating a YAML file, or from MeshAgent Studio. Both options below deploy the same VoiceBot - choose based on your workflow:
  • Option 1 (meshagent voicebot deploy): One command that deploys immediately (fastest/easiest approach)
  • Option 2 (meshagent voicebot spec + meshagent service create): Generates a yaml file you can review, or further customize before deploying
Option 1: Deploy directly Use the CLI to automatically deploy the VoiceBot to your room.
bash
meshagent voicebot deploy --service-name voiceagent --room quickstart --agent-name voiceagent --room-rules "agents/voiceagent/rules.txt" --rule "You are a helpful assistant"
Option 2: Generate a YAML spec Create a meshagent.yaml file that defines how our service should run, then deploy the agent to our room. The service spec can be dynamically generated from the CLI by running:
bash
meshagent voicebot spec --service-name voiceagent ---agent-name voiceagent --room-rules "agents/voiceagent/rules.txt" --rule "You are a helpful assistant"
Next, copy the output to a meshagent.yaml file
kind: Service # switch to service Template if installing from link for Powerboards
version: v1
metadata:
  name: voiceagent
  description: "An agent that responds using voice"
  annotations:
    meshagent.service.id: "meshagent.voiceagent"
agents:
  - name: voiceagent
    description: "A voice agent"
    annotations:
      meshagent.agent.type: "VoiceBot"
ports:
- num: "*"                        # automatically assign an available MESHAGENT_PORT for the agent to run on 
  type: http
  liveness: "/"                   # ensure the service is alive before connecting to the room
  endpoints:
  - path: /agent                  # service path to call and run the agent on
    meshagent:
      identity: voiceagent        # name of the agent as it shows up in the Room
container:
  image: "us-central1-docker.pkg.dev/meshagent-public/images/cli:latest"
  command: "/usr/bin/meshagent voicebot service --agent-name=voiceagent --room-rules='agents/voiceagent/rules.txt'"
Then, deploy it to your Room.
bash
# Deploy as a room service (specific room only)
meshagent service create --file meshagent.yaml --room quickstart
The VoiceBot is now deployed to the quickstart room! Now the agent will always be available inside the room for us to chat with. You can interact with the agent directly from the Studio or from Powerboards. With Powerboards you can easily share your agents with others or use built in agents.

Build and deploy a VoiceBot with the SDK

Step 1: Create a VoiceBot agent

This example shows a VoiceBot with a custom rule to guide the agent’s behavior. For an agent this simple the CLI VoiceBot would be sufficient. The Python SDK code here demonstrates how to get similar functionality as the CLI. To run the VoiceBot we’ll use the MeshAgent ServiceHost. The ServiceHost is a lightweight HTTP server that allows you to register one or more tools or agents on their own path (e.g., /agent). The host automatically exposes each path as a webhook. When a room makes a call to that path, ServiceHost handles the handshake, connects the agent to the room, and forwards requests and responses between your code and the MeshAgent infrastructure.
import asyncio
from meshagent.api.services import ServiceHost
from meshagent.livekit.agents.voice import VoiceBot
from meshagent.otel import otel_config

service = ServiceHost()

otel_config(service_name="simple-voicebot")  # enable telemetry for this service

@service.path(path="/agent", identity="simple-voicebot")
class SimpleVoiceBot(VoiceBot):
    def __init__(self):
        super().__init__(
            title="Simple VoiceBot",
            description="A voice assistant",
            rules=[
                "Be concise and friendly.",
                "End each reply with a short fun fact.",
            ],
            auto_greet_message="Hi! I'm your voice assistant—ask me anything.",
        )

asyncio.run(service.run())

Step 2: Call the agent into a room

Run the VoiceBot locally and connect it to a Room:
meshagent setup # authenticate to MeshAgent
meshagent service run "main.py" --room=quickstart
This command will start the VoiceBot on an available port at the path /agent. If you are running multiple agents or tools, you can use the same ServiceHost and set different paths for each of the agents. The service run command automatically detects the different agent paths and identities (this is the recommended way to test your agents and tools). Once the agent joins the room, you can converse with it in MeshAgent Studio.

Step 3: Interact with the agent in MeshAgent Studio

  1. Go to MeshAgent Studio and login
  2. Enter your room quickstart
  3. Select the agent voiceagent and begin chatting!
If you want to modify and restart the agent run Ctrl+C from the terminal to stop the agent then re-run the meshagent service run command.
Note: Building an agent will likely take multiple rounds of iterating through writing different versions of the system prompt and crafting the best tools for the agent before it’s ready for deployment.

Step 4: Package and deploy the agent

To deploy your SDK VoiceBot permanently, you’ll package your code with a meshagent.yaml file that defines the service configuration and a container image that MeshAgent can run. For full details on the service spec and deployment flow, see Packaging Services and Deploying Services. MeshAgent supports two deployment patterns for containers:
  1. Runtime image + code mount (recommended): Use a pre-built MeshAgent runtime image (like python-sdk-slim) that contains Python and all MeshAgent dependencies. Mount your lightweight code-only image on top. This keeps your code image tiny (~KB), eliminates dependency installation time, and allows your service to start quickly.
  2. Single Image: Bundle your code and all dependencies into one image. This is good when you need to install additional libraries, but can result in larger images and slower pulls. If you build your own images we recommend optimizing them with eStargz.
This example uses the runtime image + code mount pattern with the public python-docs-examples code image so you can run the documentation sample without building your own image. If you want to build and push your own code image, follow the steps below and update the storage.images entry in meshagent.yaml. Prepare your project structure This example organizes the agent code and configuration in the same folder, making each agent self-contained:
your-project/
├── Dockerfile                    # Shared by all samples
├── simple_voicebot/
   ├── simple_voicebot.py
   └── meshagent.yaml           # Config specific to this sample
└── another_sample/              # Other samples follow same pattern
    ├── another_sample.py
    └── meshagent.yaml
Note: If you’re building a single agent, you only need the simple_voicebot/ folder. The structure shown supports multiple samples sharing one Dockerfile.
Step 4a: Build a Docker container If you want a code-only image, create a scratch Dockerfile and copy the files you want to run. This creates a minimal image that pairs with the runtime image + code mount pattern.
FROM scratch

COPY . /
Build and push the image with docker buildx:
bash
docker buildx build . \
  -t "<REGISTRY>/<NAMESPACE>/<IMAGE_NAME>:<TAG>" \
  --platform linux/amd64 \
  --push
Note: Building from the project root copies your entire project structure into the image. For a single agent, this is fine - your image will just contain one folder. For multi-agent projects, all agents will be in one image, but each can deploy independently using its own meshagent.yaml.
Step 4b: Package the agent Define the service configuration in a meshagent.yaml file.
kind: Service
version: v1
metadata:
  name: simple-voicebot
  description: "A simple voicebot that can interact with users"
  annotations:
    meshagent.service.id: "simple.voicebot"
agents:
  - name: simple-voicebot
    description: "A conversational agent without tools"
    annotations:
      meshagent.agent.type: "VoiceBot"
ports:
- num: "*"
  endpoints:
  - path: /agent
    meshagent:
      identity: simple-voicebot
container:
  image: "us-central1-docker.pkg.dev/meshagent-public/images/python-sdk:{SERVER_VERSION}-esgz"
  command: python /src/simple_voicebot/simple_voicebot.py
  storage:
    images:
      # Replace this image tag with your own code-only image if you build one.
      - image: "us-central1-docker.pkg.dev/meshagent-public/images/python-docs-examples:{SERVER_VERSION}"
        path: /src
        read_only: true

How the paths work:
  • Your code image contains simple_voicebot/simple_voicebot.py
  • It’s mounted at /src in the runtime container
  • The command runs python /src/simple_voicebot/simple_voicebot.py
Note: The default YAML in the docs uses us-central1-docker.pkg.dev/meshagent-public/images/python-docs-examples so you can test this example immediately without building your own image first. Replace this with your actual image tag when deploying your own code.
Step 4c: Deploy the agent Next from the CLI in the directory where your meshagent.yaml file is run:
meshagent service create --file "meshagent.yaml" --room=quickstart
The VoiceBot is now deployed to the quickstart room! Now the agent will always be available inside the room for us to chat with. You can interact with the agent directly from the Studio or from Powerboards.
Note: If you previously deployed the CLI voicebot with name “voiceagent” you will need to give your SDK based voicebot a unique identity (token name).

How VoiceBot Works

Constructor Parameters

VoiceBot accepts everything from SingleRoomAgent (name, title, description, requires, labels) and adds voice-specific configuration options.
ParameterTypeDescription
namestr | NoneDeprecated. Agent identity comes from the participant token; if provided, it is only used to default title.
titlestr | NoneHuman-friendly name. If omitted and you set name, it defaults to that value.
descriptionstr | NoneShort description of what the voice agent does.
labelslist[str] | NoneOptional tags for discovery and filtering.
ruleslist[str] | NoneSystem or behavior rules sent to the LLM. Defaults to ["You are a helpful assistant communicating through voice."].
auto_greet_messagestr | NoneOptional message spoken to the participant when the session starts.
auto_greet_promptstr | NoneOptional text prompt that seeds the first LLM response at session start.
tool_adapterToolResponseAdapter | NoneOptional adapter that converts tool responses into plain speech.
toolkitslist[Toolkit] | NoneAdditional Toolkits to expose beyond requires; pass instantiated toolkits you want available to the LLM.
requireslist[Requirement] | NoneToolkits or schemas needed before running (e.g., for tool access or shared data).
client_rulesdict[str, list[str]] | NoneOptional map keyed by the participant’s client attribute that appends extra rules when matched.

Lifecycle Overview

VoiceBot inherits all lifecycle hooks from SingleRoomAgent and adds voice session handling on top.
  • await start(room: RoomClient): Registers the bot as a voice-capable participant by setting "supports_voice": True and listening for voice_call messages. When a voice call is received, the bot joins a LiveKit breakout room and starts a real-time session.
  • await stop(): Clears the room reference. Active voice sessions end when the LiveKit room disconnects (for example, when the caller hangs up).
  • room property: Returns the active RoomClient (inherited from SingleRoomAgent).

Conversational Flow

When a user starts a voice call:
  1. The participant sends a message of type "voice_call" with a breakout room ID (and optionally a transcript_path).
  2. The VoiceBot receives it through on_message() and joins the corresponding LiveKit room.
  3. It creates a ToolContext scoped to that participant.
  4. A new AgentSession is created, containing:
    • Speech-to-Text (STT)
    • Text-to-Speech (TTS)
    • Voice Activity Detection (VAD)
    • LLM model interface
  5. A conversational Agent is built with these components and any registered tools.
  6. The bot begins listening, thinking (with background “typing” sounds), and speaking responses in real time.
When the participant hangs up or disconnects, the session ends automatically. If transcript_path is provided in the voice_call message, the session uses a Transcriber to log conversation_item_added events to that path while running.

Key Behaviors and Hooks

  • Voice connection management: Each session is isolated in its own LiveKit breakout room. The internal VoiceConnection helper handles joining, connecting, and disconnecting from the session safely.
  • Session creation: create_session() constructs an AgentSession with STT, TTS, VAD, and LLM components wired to the room’s proxy API.
  • Agent creation: create_agent() builds the conversational logic layer that uses the LLM, applies your rules, and exposes tool functions like say().
  • Greeting behavior: When configured, auto_greet_prompt triggers the LLM to generate an initial spoken message, and auto_greet_message plays a prewritten greeting.
  • Tool integration: Tools are automatically converted into callable functions for the LLM via make_function_tools(). Responses are adapted to speech when a tool_adapter is provided.
  • Transcript logging: When a transcript_path is supplied, create_agent() returns a Transcriber that logs conversation items to that location.
  • Lifecycle hooks: Override on_session_created(), on_session_started(), or on_session_ended() to add custom logic around the session lifecycle.
  • Interruptions and natural flow: Sessions allow interruption mid-speech and handle turn-taking automatically through VAD.

Key Methods

MethodDescription
async def start(room)Registers the bot for voice calls and listens for voice_call messages.
async def run_voice_agent(participant, breakout_room)Connects to the specified LiveKit room and starts a full voice session.
async def create_session(context)Creates a new AgentSession wired with STT, TTS, VAD, and LLM capabilities.
async def create_agent(context, session)Builds a conversational agent with your rules and available tools.
async def make_function_tools(context)Converts all registered toolkits into LLM-callable functions.
async def _wait_for_disconnect(room)Awaits the end of the voice call and cleans up resources.
async def on_session_created(context, session)Hook called after an AgentSession is constructed but before it starts.
async def on_session_started(context, session)Hook called immediately after the session starts.
async def on_session_ended(context, session)Hook called after the session ends.

Built-in Components and Behavior

VoiceBot comes pre-integrated with:
  • Speech-to-Text (STT): Converts live audio input into text using OpenAI STT via the room’s proxied client (override create_session to swap providers).
  • Text-to-Speech (TTS): Streams generated responses as natural audio output using OpenAI TTS.
  • Voice Activity Detection (VAD): Detects pauses or user interruptions automatically (Silero VAD).
  • Room I/O defaults: Text input is disabled; audio output and transcription are enabled for the LiveKit room session.
  • Background Audio Player: Plays gentle “thinking” keyboard sounds during LLM processing to indicate the bot is working.
  • Tool Invocation: Voice commands can trigger registered tools and return spoken responses.
These components can be swapped or extended through adapters if you need different models or behaviors.

Next Steps

VoiceBot builds directly on SingleRoomAgent, inheriting connection management and toolkit installation, and extends it with real-time audio and speech features. Where ChatBot focuses on text threads, VoiceBot handles spoken conversations end-to-end. Discover other agents in MeshAgent: To learn more about deploying agents with MeshAgent