Indexer

Indexers keep a room’s vector database fresh so other agents can run retrieval-augmented generation (RAG) queries. They watch for new content, chunk and embed it, then store both the raw text and embeddings in a table that RagToolkit powered agents can search.

When to Use

Automate RAG over room storage without manually rebuilding embeddings
Keep crawled websites synchronized with a room before a chat session
Pair with ChatBot or VoiceBot so they can answer questions using indexed documents
Pre-process content for long-running workflows (document review, data labeling, etc.)

How Indexers Work

Detect content: listen for storage events or take a crawl input.
Transform: convert the source into plain text (customizable via read_file).
Chunk: split text with a Chunker (defaults use chonkie.SemanticChunker).
Embed: call the configured Embedder (defaults use OpenAI embedding models).
Persist: upsert rows into a room database table and build full-text / vector indexes.

Once data is in the table, add RagToolkit(table="…") to any conversational agent and the results become searchable.

Built-in Indexers

StorageIndexer

StorageIndexer extends SingleRoomAgent and watches room.storage events. Whenever a file is uploaded, updated, or deleted it:

calls read_file(path=…) to obtain text (you override this to run MarkitDown, OCR, etc.)
chunks and embeds the content
writes rows into the configured database table (defaults to storage_index)
maintains vector and full-text indexes so downstream searches stay fast

Constructor Parameters

Parameter	Type	Description
`name`	`str`	Required agent identity shown in the room.
`title` / `description`	`str \| None`	Optional metadata displayed in Studio.
`requires`	`list[Requirement] \| None`	Toolkits or schemas to install before indexing (e.g., MarkitDown).
`labels`	`list[str] \| None`	Tag for discovery/filtering.
`chunker`	`Chunker \| None`	Splits text into chunks; defaults to `ChonkieChunker`.
`embedder`	`Embedder \| None`	Produces embeddings; defaults to `OpenAIEmbedding3Large`.
`table`	`str`	Database table used to store rows and embeddings.

Tip: Implement async read_file(self, *, path: str) -> str | None to pull text from storage. Returning None skips indexing that file.

SiteIndexer

SiteIndexer is a TaskRunner that orchestrates crawls with the FireCrawl toolkit. Invoke it from MeshAgent Studio (“Run Task…”) or via room.agents.ask to populate a specified table:

Sends a crawl job to the FireCrawl queue.
Streams page results back through the room queue.
Chunks, embeds, and writes rows into the provided table (creating vector / FTS indexes automatically).

You can select the queue, destination table, and starting URL when you call the task. Pair the resulting table with RagToolkit the same way you would with StorageIndexer.

RagToolkit

Add RagToolkit(table="your_table") to a ChatBot, VoiceBot, or TaskRunner to surface the rows produced by either indexer. The toolkit exposes a rag_search tool that optionally accepts a custom embedder, letting you reuse the same model for both indexing and querying.

Example

You typically host a StorageIndexer alongside a chatbot that uses RagToolkit. The example below shows a MarkitDown-powered indexer paired with a RAG-enabled ChatBot. Prerequisites and Dependencies

An embedding provider (default requires the OpenAI Python SDK and OPENAI_API_KEY, this will be provisioned automatically with the MeshAgent CLI).
chonkie for semantic chunking (installed with the agents package).
Optional toolkits such as meshagent.markitdown or meshagent.firecrawl depending on which indexer you deploy.

Creating a Rag based ChatBot and Rag tool

from meshagent.api import RequiredToolkit, RequiredSchema
from meshagent.agents.schemas.document import document_schema
from meshagent.tools.document_tools import (
    DocumentAuthoringToolkit,
    DocumentTypeAuthoringToolkit,
)
from meshagent.agents.chat import ChatBot
from meshagent.openai import OpenAIResponsesAdapter
from meshagent.agents.indexer import RagToolkit, StorageIndexer
from meshagent.api.services import ServiceHost
from meshagent.markitdown.tools import MarkItDownToolkit
from meshagent.tools import ToolContext


import asyncio

service = ServiceHost()


@service.path(path="/agent", identity="meshagent.chatbot.storage_rag")
class RagChatBot(ChatBot):
    def __init__(self):
        super().__init__(
            name="meshagent.chatbot.storage_rag",
            title="Storage RAG chatbot",
            description="an simple chatbot that does rag, pair with an indexer",
            llm_adapter=OpenAIResponsesAdapter(),
            rules=[
                "after performing a rag search, do not include citations",
                "output document names MUST have the extension .document, automatically add the extension if it is not provided",
                "after opening a document, display it, before writing to it",
            ],
            requires=[
                RequiredSchema(name="document"),
                RequiredToolkit(
                    name="ui", tools=["ask_user", "display_document", "show_toast"]
                ),
            ],
            toolkits=[
                MarkItDownToolkit(),
                DocumentAuthoringToolkit(),
                DocumentTypeAuthoringToolkit(
                    schema=document_schema, document_type="document"
                ),
                RagToolkit(table="rag-index"),
            ],
            labels=["chatbot", "rag"],
        )


@service.path(path="/indexer", identity="storage_indexer")
class MarkitDownFileIndexer(StorageIndexer):
    def __init__(
        self,
        *,
        name="storage_indexer",
        title="storage indexer",
        description="watch storage and index any uploaded pdfs or office documents",
        labels=["watchers", "rag"],
        chunker=None,
        embedder=None,
        table="rag-index",
    ):
        self._markitdown = MarkItDownToolkit()

        super().__init__(
            name=name,
            title=title,
            description=description,
            labels=labels,
            chunker=chunker,
            embedder=embedder,
            table=table,
        )

    async def read_file(self, *, path: str):
        context = ToolContext(
            room=self.room,
            caller=self.room.local_participant,
        )
        response = await self._markitdown.execute(
            context=context,
            name="markitdown_from_file",
            arguments={"path": path},
        )
        return getattr(response, "text", None)


asyncio.run(service.run())

Now that we’ve defined our services, let’s start the room and connect the chatbot and indexer.

Note: If running this locally you will need to export OPENAI_API_KEY. When deployed and running in the room that will be set automatically.

bash

meshagent setup # authenticate if not already
meshagent service run "main.py" --room=rag

Now we can interact with our agent in MeshAgent Studio

Open the studio
Go to the room rag
Upload a document to the room for processing. The indexer will chunk and embed the document.
Ask the agent about the document once it’s been processed!

Next Steps

Understand other MeshAgent agents

Chatbot for conversation based agents
Voicebot for voice based agents
MailWorker for email based agents
TaskRunner and Worker for background and queue based agents

Learn more about deploying agents with MeshAgent

Services & Containers: Understand different options for running, deploying, and managing agents with MeshAgent
Secrets & Registries: Learn how to store credentials securely for deployment

Introduction

Rooms API

Project Administration

Agents

Tools

Observability

Services & Containers

Secrets

Documents

Flutter SDK

References

When to Use

How Indexers Work

Built-in Indexers

StorageIndexer

SiteIndexer

RagToolkit

Example

Next Steps

Introduction

Rooms API

Project Administration

Agents

Tools

Observability

Services & Containers

Secrets

Documents

Flutter SDK

References

​When to Use

​How Indexers Work

​Built-in Indexers

​StorageIndexer

​SiteIndexer

​RagToolkit

​Example

​Next Steps

When to Use

How Indexers Work

Built-in Indexers

StorageIndexer

SiteIndexer

RagToolkit

Example

Next Steps