Skip to main content
Indexers keep a room’s vector database fresh so other agents can run retrieval-augmented generation (RAG) queries. They watch for new content, chunk and embed it, then store both the raw text and embeddings in a table that RagToolkit powered agents can search.

When to Use

  • Automate RAG over room storage without manually rebuilding embeddings
  • Keep crawled websites synchronized with a room before a chat session
  • Pair with ChatBot or VoiceBot so they can answer questions using indexed documents
  • Pre-process content for long-running workflows (document review, data labeling, etc.)

How Indexers Work

  1. Detect content: listen for storage events or take a crawl input.
  2. Transform: convert the source into plain text (customizable via read_file).
  3. Chunk: split text with a Chunker (defaults use chonkie.SemanticChunker).
  4. Embed: call the configured Embedder (defaults use OpenAI embedding models).
  5. Persist: upsert rows into a room database table and build full-text / vector indexes.
Once data is in the table, add RagToolkit(table="…") to any conversational agent and the results become searchable.

Built-in Indexers

StorageIndexer

StorageIndexer extends SingleRoomAgent and watches room.storage events. Whenever a file is uploaded, updated, or deleted it:
  • calls read_file(path=…) to obtain text (you override this to run MarkitDown, OCR, etc.)
  • chunks and embeds the content
  • writes rows into the configured database table (defaults to storage_index)
  • maintains vector and full-text indexes so downstream searches stay fast
Constructor Parameters
ParameterTypeDescription
namestrRequired agent identity shown in the room.
title / descriptionstr | NoneOptional metadata displayed in Studio.
requireslist[Requirement] | NoneToolkits or schemas to install before indexing (e.g., MarkitDown).
labelslist[str] | NoneTag for discovery/filtering.
chunkerChunker | NoneSplits text into chunks; defaults to ChonkieChunker.
embedderEmbedder | NoneProduces embeddings; defaults to OpenAIEmbedding3Large.
tablestrDatabase table used to store rows and embeddings.
Tip: Implement async read_file(self, *, path: str) -> str | None to pull text from storage. Returning None skips indexing that file.

SiteIndexer

SiteIndexer is a TaskRunner that orchestrates crawls with the FireCrawl toolkit. Invoke it from MeshAgent Studio (“Run Task…”) or via room.agents.ask to populate a specified table:
  1. Sends a crawl job to the FireCrawl queue.
  2. Streams page results back through the room queue.
  3. Chunks, embeds, and writes rows into the provided table (creating vector / FTS indexes automatically).
You can select the queue, destination table, and starting URL when you call the task. Pair the resulting table with RagToolkit the same way you would with StorageIndexer.

RagToolkit

Add RagToolkit(table="your_table") to a ChatBot, VoiceBot, or TaskRunner to surface the rows produced by either indexer. The toolkit exposes a rag_search tool that optionally accepts a custom embedder, letting you reuse the same model for both indexing and querying.

Example

You typically host a StorageIndexer alongside a chatbot that uses RagToolkit. The example below shows a MarkitDown-powered indexer paired with a RAG-enabled ChatBot. Prerequisites and Dependencies
  • An embedding provider (default requires the OpenAI Python SDK and OPENAI_API_KEY, this will be provisioned automatically with the MeshAgent CLI).
  • chonkie for semantic chunking (installed with the agents package).
  • Optional toolkits such as meshagent.markitdown or meshagent.firecrawl depending on which indexer you deploy.
Creating a Rag based ChatBot and Rag tool
from meshagent.api import RequiredToolkit, RequiredSchema
from meshagent.agents.schemas.document import document_schema
from meshagent.tools.document_tools import (
    DocumentAuthoringToolkit,
    DocumentTypeAuthoringToolkit,
)
from meshagent.agents.chat import ChatBot
from meshagent.openai import OpenAIResponsesAdapter
from meshagent.agents.indexer import RagToolkit, StorageIndexer
from meshagent.api.services import ServiceHost
from meshagent.markitdown.tools import MarkItDownToolkit
from meshagent.tools import ToolContext


import asyncio

service = ServiceHost()


@service.path(path="/agent", identity="meshagent.chatbot.storage_rag")
class RagChatBot(ChatBot):
    def __init__(self):
        super().__init__(
            name="meshagent.chatbot.storage_rag",
            title="Storage RAG chatbot",
            description="an simple chatbot that does rag, pair with an indexer",
            llm_adapter=OpenAIResponsesAdapter(),
            rules=[
                "after performing a rag search, do not include citations",
                "output document names MUST have the extension .document, automatically add the extension if it is not provided",
                "after opening a document, display it, before writing to it",
            ],
            requires=[
                RequiredSchema(name="document"),
                RequiredToolkit(
                    name="ui", tools=["ask_user", "display_document", "show_toast"]
                ),
            ],
            toolkits=[
                MarkItDownToolkit(),
                DocumentAuthoringToolkit(),
                DocumentTypeAuthoringToolkit(
                    schema=document_schema, document_type="document"
                ),
                RagToolkit(table="rag-index"),
            ],
            labels=["chatbot", "rag"],
        )


@service.path(path="/indexer", identity="storage_indexer")
class MarkitDownFileIndexer(StorageIndexer):
    def __init__(
        self,
        *,
        name="storage_indexer",
        title="storage indexer",
        description="watch storage and index any uploaded pdfs or office documents",
        labels=["watchers", "rag"],
        chunker=None,
        embedder=None,
        table="rag-index",
    ):
        self._markitdown = MarkItDownToolkit()

        super().__init__(
            name=name,
            title=title,
            description=description,
            labels=labels,
            chunker=chunker,
            embedder=embedder,
            table=table,
        )

    async def read_file(self, *, path: str):
        context = ToolContext(
            room=self.room,
            caller=self.room.local_participant,
        )
        response = await self._markitdown.execute(
            context=context,
            name="markitdown_from_file",
            arguments={"path": path},
        )
        return getattr(response, "text", None)


asyncio.run(service.run())

Now that we’ve defined our services, let’s start the room and connect the chatbot and indexer.
Note: If running this locally you will need to export OPENAI_API_KEY. When deployed and running in the room that will be set automatically.
bash
meshagent setup # authenticate if not already
meshagent service run "main.py" --room=rag
Now we can interact with our agent in MeshAgent Studio
  1. Open the studio
  2. Go to the room rag
  3. Upload a document to the room for processing. The indexer will chunk and embed the document.
  4. Ask the agent about the document once it’s been processed!

Next Steps

Understand other MeshAgent agents Learn more about deploying agents with MeshAgent