RagToolkit powered agents can search.
When to Use
- Automate RAG over room storage without manually rebuilding embeddings
- Keep crawled websites synchronized with a room before a chat session
- Pair with ChatBot or VoiceBot so they can answer questions using indexed documents
- Pre-process content for long-running workflows (document review, data labeling, etc.)
How Indexers Work
- Detect content: listen for storage events or take a crawl input.
- Transform: convert the source into plain text (customizable via
read_file). - Chunk: split text with a
Chunker(defaults usechonkie.SemanticChunker). - Embed: call the configured
Embedder(defaults use OpenAI embedding models). - Persist: upsert rows into a room database table and build full-text / vector indexes.
RagToolkit(table="…") to any conversational agent and the results become searchable.
Built-in Indexers
StorageIndexer
StorageIndexer extends SingleRoomAgent and watches room.storage events. Whenever a file is uploaded, updated, or deleted it:
- calls
read_file(path=…)to obtain text (you override this to run MarkitDown, OCR, etc.) - chunks and embeds the content
- writes rows into the configured database table (defaults to
storage_index) - maintains vector and full-text indexes so downstream searches stay fast
| Parameter | Type | Description |
|---|---|---|
name | str | None | Deprecated. Agent identity comes from the participant token; if provided, it is only used to default title. |
title / description | str | None | Optional metadata displayed in Studio. If title is omitted and you set name, it defaults to that value. |
requires | list[Requirement] | None | Toolkits or schemas to install before indexing (e.g., MarkitDown). |
labels | list[str] | None | Tag for discovery/filtering. |
chunker | Chunker | None | Splits text into chunks; defaults to ChonkieChunker. |
embedder | Embedder | None | Produces embeddings; defaults to OpenAIEmbedding3Large. |
table | str | Database table used to store rows and embeddings. |
Tip: Implementasync read_file(self, *, path: str) -> str | Noneto pull text from storage. ReturningNoneskips indexing that file.
SiteIndexer
SiteIndexer is a TaskRunner that orchestrates crawls with the FireCrawl toolkit. Invoke it from MeshAgent Studio (“Toolkits…”) or via room.agents.invoke_tool to populate a specified table:
- Sends a crawl job to the FireCrawl queue.
- Streams page results back through the room queue.
- Chunks, embeds, and writes rows into the provided table (creating vector / FTS indexes automatically).
RagToolkit the same way you would with StorageIndexer.
RagToolkit
AddRagToolkit(table="your_table") to a ChatBot, VoiceBot, or TaskRunner to surface the rows produced by either indexer. The toolkit exposes a rag_search tool that optionally accepts a custom embedder, letting you reuse the same model for both indexing and querying.
Example
You typically host aStorageIndexer alongside a chatbot that uses RagToolkit. The example below shows a MarkitDown-powered indexer paired with a RAG-enabled ChatBot.
Prerequisites and Dependencies
- An embedding provider (default uses an OpenAI embedder, but you can set your own).
chonkiefor semantic chunking (installed with the meshagent agents package).- Optional toolkits such as
meshagent.markitdownormeshagent.firecrawldepending on which indexer you deploy.
Step 1: Create a RAG based ChatBot and RAG tool
Step 2: Run the service and test locally
Now that we’ve defined our services, let’s start the room and connect the chatbot and indexer.bash
- Open the studio
- Go to the room
rag - Upload a document to the room for processing. The indexer will chunk and embed the document. You will see logs in your terminal related to the document processing.
- Ask the agent about the document once it’s been processed!
Step 3: Package and deploy the service
To deploy your RAG Indexer and RAG agent permanently, you’ll package your code with ameshagent.yaml file that defines the service configuration and a container image that MeshAgent can run.
For full details on the service spec and deployment flow, see Packaging Services and Deploying Services.
MeshAgent supports two deployment patterns for containers:
- Runtime image + code mount (recommended): Use a pre-built MeshAgent runtime image (like
python-sdk-slim) that contains Python and all MeshAgent dependencies. Mount your lightweight code-only image on top. This keeps your code image tiny (~KB), eliminates dependency installation time, and allows your service to start quickly. - Single Image: Bundle your code and all dependencies into one image. This is good when you need to install additional libraries, but can result in larger images and slower pulls. If you build your own images we recommend optimizing them with eStargz.
python-docs-examples code image so you can run the documentation sample without building your own image. If you build your own code image, follow the steps below and update the storage.images entry in meshagent.yaml.
Prepare your project structure
This example organizes the agent code and configuration in the same folder, making each agent self-contained:
Note: If you’re building a single agent, you only need the storage_rag/ folder. The structure shown supports multiple samples sharing one Dockerfile.
Step 3a: Build a Docker container
If you want a code-only image, create a scratch Dockerfile and copy the files you want to run. This creates a minimal image that pairs with the runtime image + code mount pattern.
docker buildx:
bash
Note: Building from the project root copies your entire project structure into the image. For a single agent, this is fine - your image will just contain one folder. For multi-agent projects, all agents will be in one image, but each can deploy independently using its own meshagent.yaml.
Step 3b: Package the service
Define the service configuration in a meshagent.yaml file. This will be used to deploy the indexer and RAG ChatBot.
- Your code image contains
storage_rag/indexer.py - It’s mounted at
/srcin the runtime container - The command runs
python /src/storage_rag/indexer.py
Note: The default YAML in the docs uses us-central1-docker.pkg.dev/meshagent-public/images/python-docs-examples so you can test this example immediately without building your own image first. Replace this with your actual image tag when deploying your own code.
Step 3c: Deploy the service
Next from the CLI in the directory where your meshagent.yaml file is run:
bash
Next Steps
Understand other MeshAgent agents- Chatbot for conversation based agents
- Voicebot for voice based agents
- MailBot for email based agents
- TaskRunner and Worker for background and queue based agents
- Services & Containers: Understand different options for running, deploying, and managing agents with MeshAgent
- Secrets & Registries: Learn how to store credentials securely for deployment