RagToolkit powered agents can search.
When to Use
- Automate RAG over room storage without manually rebuilding embeddings
- Keep crawled websites synchronized with a room before a chat session
- Pair with ChatBot or VoiceBot so they can answer questions using indexed documents
- Pre-process content for long-running workflows (document review, data labeling, etc.)
How Indexers Work
- Detect content: listen for storage events or take a crawl input.
- Transform: convert the source into plain text (customizable via
read_file). - Chunk: split text with a
Chunker(defaults usechonkie.SemanticChunker). - Embed: call the configured
Embedder(defaults use OpenAI embedding models). - Persist: upsert rows into a room database table and build full-text / vector indexes.
RagToolkit(table="…") to any conversational agent and the results become searchable.
Built-in Indexers
StorageIndexer
StorageIndexer extends SingleRoomAgent and watches room.storage events. Whenever a file is uploaded, updated, or deleted it:
- calls
read_file(path=…)to obtain text (you override this to run MarkitDown, OCR, etc.) - chunks and embeds the content
- writes rows into the configured database table (defaults to
storage_index) - maintains vector and full-text indexes so downstream searches stay fast
| Parameter | Type | Description |
|---|---|---|
name | str | Required agent identity shown in the room. |
title / description | str | None | Optional metadata displayed in Studio. |
requires | list[Requirement] | None | Toolkits or schemas to install before indexing (e.g., MarkitDown). |
labels | list[str] | None | Tag for discovery/filtering. |
chunker | Chunker | None | Splits text into chunks; defaults to ChonkieChunker. |
embedder | Embedder | None | Produces embeddings; defaults to OpenAIEmbedding3Large. |
table | str | Database table used to store rows and embeddings. |
Tip: Implementasync read_file(self, *, path: str) -> str | Noneto pull text from storage. ReturningNoneskips indexing that file.
SiteIndexer
SiteIndexer is a TaskRunner that orchestrates crawls with the FireCrawl toolkit. Invoke it from MeshAgent Studio (“Run Task…”) or via room.agents.ask to populate a specified table:
- Sends a crawl job to the FireCrawl queue.
- Streams page results back through the room queue.
- Chunks, embeds, and writes rows into the provided table (creating vector / FTS indexes automatically).
RagToolkit the same way you would with StorageIndexer.
RagToolkit
AddRagToolkit(table="your_table") to a ChatBot, VoiceBot, or TaskRunner to surface the rows produced by either indexer. The toolkit exposes a rag_search tool that optionally accepts a custom embedder, letting you reuse the same model for both indexing and querying.
Example
You typically host aStorageIndexer alongside a chatbot that uses RagToolkit. The example below shows a MarkitDown-powered indexer paired with a RAG-enabled ChatBot.
Prerequisites and Dependencies
- An embedding provider (default requires the OpenAI Python SDK and
OPENAI_API_KEY, this will be provisioned automatically with the MeshAgent CLI). chonkiefor semantic chunking (installed with the agents package).- Optional toolkits such as
meshagent.markitdownormeshagent.firecrawldepending on which indexer you deploy.
Note: If running this locally you will need to export OPENAI_API_KEY. When deployed and running in the room that will be set automatically.
bash
- Open the studio
- Go to the room
rag - Upload a document to the room for processing. The indexer will chunk and embed the document.
- Ask the agent about the document once it’s been processed!
Next Steps
Understand other MeshAgent agents- Chatbot for conversation based agents
- Voicebot for voice based agents
- MailWorker for email based agents
- TaskRunner and Worker for background and queue based agents
- Services & Containers: Understand different options for running, deploying, and managing agents with MeshAgent
- Secrets & Registries: Learn how to store credentials securely for deployment