Skip to main content

Overview

VoiceBot is the standard agent for building real-time, speech-based conversational experiences in MeshAgent. It builds on the Python SingleRoomAgent base class from meshagent-agents and adds streaming audio input/output, LiveKit session management, speech recognition, and natural voice responses. A VoiceBot joins a MeshAgent room, listens for voice_call messages, connects to a LiveKit breakout room where it can speak and listen to participants in real-time. It combines speech-to-text (STT), text-to-speech (TTS), voice activity detection (VAD), LLM reasoning, and tool calling automatically. The MeshAgent CLI is the recommended way to run and deploy VoiceBots. Configure speech, tools, and rules with CLI flags, then deploy the same command when you want the agent to stay available in a room.

In this guide you will learn

  1. When to use VoiceBot
  2. How to run and deploy a VoiceBot with the MeshAgent CLI
  3. How VoiceBot works, including lifecycle, voice sessions, conversation flow, hooks, and methods

When to use VoiceBot

Use the VoiceBot when you need an agent that:
  • Talks and listens in real time using speech
  • Manages live voice sessions automatically via LiveKit
  • Runs LLM reasoning and tools during spoken interaction
  • Supports natural interruptions and turn-taking
  • Feels like a phone call or meeting assistant, instead of using a text based chat.
If your agent only handles text-based chat, background work, mail, queues, or toolkit entry points, use the process runtime.

Run and deploy a VoiceBot with the CLI

Step 1: Run a VoiceBot from the CLI

Let’s run a VoiceBot from the CLI with a custom rule and shared rules that can be edited by anyone in the room. The room rules can be modified per conversation turn while the base rule will be applied to the entire conversation.
bash
# Authenticate to MeshAgent if not already signed in
meshagent setup

# Call a voicebot into your room
meshagent voicebot join --room quickstart --agent-name voiceagent --room-rules "agents/voiceagent/rules.md" --rule "You are a helpful assistant"
When you add the --room-rules "agents/voiceagent/rules.md" flag and supply a file path for the rules, the file will be created if it does not already exist, this file is relative to the room storage.

Step 2: Interact with the agent in MeshAgent Studio

  1. Go to MeshAgent Studio and log in
  2. Enter your room quickstart
  3. Select the agent voiceagent and begin speaking!
If you’ve added the --room-rules flag to your agent you can modify the agent’s rules.md file to refine the agent’s behavior. Changes to the rules.md will be applied per message.
Tip: Mute your microphone after you finish speaking to prevent background noise from interfering with the agent.

Step 3: Package and deploy the agent

Once your agent works locally to make it always available you’ll need to package and deploy it as a project or room service. You can do this using the CLI, by creating a YAML file, or from MeshAgent Studio. Both options below deploy the same VoiceBot - choose based on your workflow:
  • Option 1 (meshagent voicebot deploy): One command that deploys immediately (fastest/easiest approach)
  • Option 2 (meshagent voicebot spec + meshagent service create): Generates a yaml file you can review, or further customize before deploying
Option 1: Deploy directly Use the CLI to automatically deploy the VoiceBot to your room.
bash
meshagent voicebot deploy --service-name voiceagent --room quickstart --agent-name voiceagent --room-rules "agents/voiceagent/rules.md" --rule "You are a helpful assistant"
Option 2: Generate a YAML spec Create a meshagent.yaml file that defines how our service should run, then deploy the agent to our room. The service spec can be dynamically generated from the CLI by running:
bash
meshagent voicebot spec --service-name voiceagent --agent-name voiceagent --room-rules "agents/voiceagent/rules.md" --rule "You are a helpful assistant"
Next, copy the output to a meshagent.yaml file
kind: Service # switch to service Template if installing from link for Powerboards
version: v1
metadata:
  name: voiceagent
  description: "An agent that responds using voice"
  annotations:
    meshagent.service.id: "meshagent.voiceagent"
agents:
  - name: voiceagent
    description: "A voice agent"
    annotations:
      meshagent.agent.type: "VoiceBot"
container:
  image: "us-central1-docker.pkg.dev/meshagent-public/images/cli:latest"
  command: "/usr/bin/meshagent voicebot join --agent-name=voiceagent --room-rules='agents/voiceagent/rules.md'"
  environment:
    - name: MESHAGENT_TOKEN
      token:  
        identity: voiceagent
        role: agent

Then, deploy it to your Room.
bash
# Deploy as a room service (specific room only)
meshagent service create --file meshagent.yaml --room quickstart
The VoiceBot is now deployed to the quickstart room! Now the agent will always be available inside the room for us to chat with. You can interact with the agent directly from the Studio or from Powerboards. With Powerboards you can easily share your agents with others or use built in agents.

How VoiceBot Works

Constructor Parameters

The CLI configures the underlying Python VoiceBot class. If you extend that class directly, it accepts the current SingleRoomAgent base options and adds voice-specific configuration options.
ParameterTypeDescription
namestr | NoneOptional explicit participant name. Most CLI and deployment flows set the participant identity outside the class.
titlestr | NoneHuman-friendly name for clients, logs, and operator surfaces.
descriptionstr | NoneShort description of what the voice agent does.
annotationslist[str] | NoneOptional string metadata for clients and services that inspect the agent.
voicestrOpenAI Realtime voice name. Defaults to "echo".
ruleslist[str] | NoneSystem or behavior rules sent to the LLM. Defaults to ["You are a helpful assistant communicating through voice."].
auto_greet_messagestr | NoneOptional message spoken to the participant when the session starts.
auto_greet_promptstr | NoneOptional text prompt that seeds the first LLM response at session start.
tool_adapterToolResponseAdapter | NoneOptional adapter that converts tool responses into plain speech.
toolkitslist[Toolkit] | NoneAdditional Toolkits to expose beyond requires; pass instantiated toolkits you want available to the LLM.
requireslist[Requirement] | NoneToolkits or schemas needed before running (e.g., for tool access or shared data).
client_rulesdict[str, list[str]] | NoneOptional map keyed by the participant’s client attribute that appends extra rules when matched.

Lifecycle Overview

VoiceBot inherits all lifecycle hooks from SingleRoomAgent and adds voice session handling on top.
  • await start(room: RoomClient): Registers the bot as a voice-capable participant by setting "supports_voice": True and listening for voice_call messages. When a voice call is received, the bot joins a LiveKit breakout room and starts a real-time session.
  • await stop(): Clears the room reference. Active voice sessions end when the LiveKit room disconnects (for example, when the caller hangs up).
  • room property: Returns the active RoomClient (inherited from SingleRoomAgent).

Conversational Flow

When a user starts a voice call:
  1. The participant sends a message of type "voice_call" with a breakout room ID (and optionally a transcript_path).
  2. The VoiceBot receives it through on_message() and joins the corresponding LiveKit room.
  3. It creates a VoiceBotContext scoped to that participant.
  4. A new AgentSession is created, containing:
    • Speech-to-Text (STT)
    • Text-to-Speech (TTS)
    • Voice Activity Detection (VAD)
    • LLM model interface
  5. A conversational Agent is built with these components and any registered tools.
  6. The bot begins listening, thinking (with background “typing” sounds), and speaking responses in real time.
When the participant hangs up or disconnects, the session ends automatically. If transcript_path is provided in the voice_call message, the session uses a Transcriber to log conversation_item_added events to that path while running.

Key Behaviors and Hooks

  • Voice connection management: Each session is isolated in its own LiveKit breakout room. The internal VoiceConnection helper handles joining, connecting, and disconnecting from the session safely.
  • Session creation: create_session() constructs an AgentSession with STT, TTS, VAD, and LLM components wired to the room’s proxy API.
  • Agent creation: create_agent() builds the conversational logic layer that uses the LLM, applies your rules, and exposes tool functions like say().
  • Greeting behavior: When configured, auto_greet_prompt triggers the LLM to generate an initial spoken message, and auto_greet_message plays a prewritten greeting.
  • Tool integration: Tools are automatically converted into callable functions for the LLM via make_function_tools(). Responses are adapted to speech when a tool_adapter is provided.
  • Transcript logging: When a transcript_path is supplied, create_agent() returns a Transcriber that logs conversation items to that location.
  • Lifecycle hooks: Override on_session_created(), on_session_started(), or on_session_ended() to add custom logic around the session lifecycle.
  • Interruptions and natural flow: Sessions allow interruption mid-speech and handle turn-taking automatically through VAD.

Key Methods

MethodDescription
async def start(room)Registers the bot for voice calls and listens for voice_call messages.
async def run_voice_agent(participant, breakout_room)Connects to the specified LiveKit room and starts a full voice session.
async def create_session(context)Creates a new AgentSession wired with STT, TTS, VAD, and LLM capabilities.
async def create_agent(context, session)Builds a conversational agent with your rules and available tools.
async def make_function_tools(context)Converts all registered toolkits into LLM-callable functions.
async def _wait_for_disconnect(room)Awaits the end of the voice call and cleans up resources.
async def on_session_created(context, session)Hook called after an AgentSession is constructed but before it starts.
async def on_session_started(context, session)Hook called immediately after the session starts.
async def on_session_ended(context, session)Hook called after the session ends.

Built-in Components and Behavior

VoiceBot comes pre-integrated with:
  • Speech-to-Text (STT): Converts live audio input into text using OpenAI STT via the room’s proxied client (override create_session to swap providers).
  • Text-to-Speech (TTS): Streams generated responses as natural audio output using OpenAI TTS.
  • Voice Activity Detection (VAD): Detects pauses or user interruptions automatically (Silero VAD).
  • Room I/O defaults: Text input is disabled; audio output and transcription are enabled for the LiveKit room session.
  • Background Audio Player: Plays gentle “thinking” keyboard sounds during LLM processing to indicate the bot is working.
  • Tool Invocation: Voice commands can trigger registered tools and return spoken responses.
These components can be swapped or extended through adapters if you need different models or behaviors.

Next Steps

VoiceBot builds directly on SingleRoomAgent, inheriting connection management and toolkit installation, and extends it with real-time audio and speech features. Where process-backed agents focus on text and background channels, VoiceBot handles spoken conversations end-to-end. Discover other agents in MeshAgent: To learn more about deploying agents with MeshAgent