Skip to main content
MeshAgent emits OpenTelemetry signals out of the box. When you need deeper insight, you can add your own spans, logs, and metrics using standard OTEL SDKs. This guide walks through a Weather Toolkit with custom intstrumentation so you can see the pieces in action.

Enable telemetry in your service

Add two lines at startup before you create your ServiceHost:
Python
from meshagent.otel import otel_config
otel_config(service_name="weather-service")
This initializes tracing, logging, and metrics and tags everything with your project/room/session so it shows up live in MeshAgent Studio.

Example: Weather Toolkit with custom instrumentation

The example adds custom spans around validation, the external HTTP call, and response parsing. It also adds logs and metrics.
import time
import asyncio
import httpx
import logging
from meshagent.api.services import ServiceHost
from meshagent.tools import Tool, ToolContext, RemoteToolkit
from meshagent.otel import otel_config
from opentelemetry import trace, metrics
from opentelemetry.trace import Status, StatusCode

# Configure OpenTelemetry
otel_config(service_name="weather_tools")
service = ServiceHost()
log = logging.getLogger(__name__)
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

# Counters
calls = meter.create_counter(
    "weather.calls", unit="1", description="Total weather tool invocations"
)
errors = meter.create_counter(
    "weather.errors", unit="1", description="Errors during execution"
)


class WeatherTool(Tool):
    def __init__(self):
        super().__init__(
            name="get_weather",
            title="Weather Tool",
            description="Get current weather for a city using wttr.in API",
            input_schema={
                "type": "object",
                "additionalProperties": False,
                "required": ["city", "units"],
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "units": {
                        "type": "string",
                        "enum": ["metric", "imperial"],
                        "description": "Units the temperature will be returned in",
                    },
                },
            },
        )

    async def execute(self, context: ToolContext, *, city: str, units: str):
        """
        This shows custom instrumentation that meshagent doesn't do automatically:
        1. Separate spans for validation, API call, parsing
        2. Custom attributes (API endpoint, response size)
        3. Events for important moments (rate limits etc.)
        4. Error handling with span status
        """
        log.info(f"Weather tool is running for city: {city} with units: {units}")
        calls.add(1, attributes={"city": city.lower(), "units": units})
        # Custom span for input validation
        with tracer.start_as_current_span("validate_input") as span:
            span.set_attribute("city", city)

            if not city or len(city) < 2:
                span.set_status(Status(StatusCode.ERROR, "Invalid city name"))
                span.add_event("validation_failed", {"reason": "city too short"})
                return {"error": "City name must be at least 2 characters"}

            span.add_event("validation_passed")

        # Custom span for external API call
        with tracer.start_as_current_span("fetch_weather_api") as span:
            # Add attributes about the API call
            api_url = f"https://wttr.in/{city}?format=j1"
            span.set_attribute("http.url", api_url)
            span.set_attribute("http.method", "GET")
            span.set_attribute("api.provider", "wttr.in")
            start = time.perf_counter()

            span.add_event("api_request_start")

            try:
                async with httpx.AsyncClient(timeout=10.0) as client:
                    response = await client.get(api_url)

                    # Record response attributes
                    span.set_attribute("http.status_code", response.status_code)
                    span.set_attribute("http.response_size", len(response.content))

                    if response.status_code == 429:
                        span.add_event("rate_limit_exceeded")
                        span.set_status(Status(StatusCode.ERROR, "Rate limited"))
                        return {"error": "Rate limit exceeded, try again later"}

                    response.raise_for_status()
                    data = response.json()

                    span.add_event(
                        "api_request_success",
                        {
                            "data_keys": list(data.keys()),
                        },
                    )

            except httpx.TimeoutException:
                span.set_status(Status(StatusCode.ERROR, "API timeout"))
                span.add_event("api_timeout")
                errors.add(1, attributes={"kind": "timeout", "city": city.lower()})
                return {"error": "Weather service timeout"}
            except Exception as e:
                errors.add(
                    1, attributes={"kind": type(e).__name__, "city": city.lower()}
                )
                span.set_status(Status(StatusCode.ERROR, str(e)))
                span.add_event("api_error", {"error_type": type(e).__name__})
                return {"error": f"Failed to fetch weather: {str(e)}"}

        # Custom span for parsing and formatting
        with tracer.start_as_current_span("parse_response") as span:
            try:
                current = data["current_condition"][0]
                location = data["nearest_area"][0]

                if units == "metric":
                    temperature = current["temp_C"]
                    degrees_in = "°C"
                elif units == "imperial":
                    temperature = current["temp_F"]
                    degrees_in = "°F"
                else:
                    log.warning(
                        f"Units {units} is not a valid unit. Must use metric or imperial"
                    )
                    return {"error": "Invalid units: must be 'metric' or 'imperial'"}

                result = {
                    "city": location["areaName"][0]["value"],
                    "country": location["country"][0]["value"],
                    "temperature": temperature,
                    "units": degrees_in,
                    "description": current["weatherDesc"][0]["value"],
                    "humidity": current["humidity"],
                    "wind_speed": current["windspeedKmph"],
                }

                # Record what we parsed
                span.set_attribute("parsed_fields", len(result))
                span.add_event("parse_success")

                return result

            except (KeyError, IndexError) as e:
                span.set_status(Status(StatusCode.ERROR, "Parse failed"))
                span.add_event("parse_error", {"error": str(e)})
                errors.add(1, attributes={"kind": "parse_error"})
                return {"error": "Failed to parse weather data"}


@service.path(path="/weather", identity="weather-toolkit")
class WeatherToolkit(RemoteToolkit):
    def __init__(self):
        super().__init__(
            name="weather-toolkit",
            title="Weather Toolkit",
            description="Tools for getting weather information",
            tools=[WeatherTool()],
        )


asyncio.run(service.run())

Custom Spans

Use spans to track a specific operation in trace. Spans can help you understand timing for each step and attach attributes/events to make debugging easier. This example records:
  • toolkit.execute (created by MeshAgent) this is the high-level tool call including information about which tool was called and what information was passed to the tool
    • validate_input – checks city/units, adds events for pass/fail
    • fetch_weather_api – the external HTTP call with:
      • http.url, http.method, http.status_code, http.response_size
      • events like api_request_start, api_request_success, api_timeout, rate_limit_exceeded
    • parse_response – extracts the fields you return to the caller
Capturing additional information inside the (span/trace) created by MeshAgent allows us to capture finer-grained information about our process. Pattern:
Python
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("fetch_weather_api") as span:
    span.set_attribute("http.url", api_url)
    span.add_event("api_request_start")
    ...
    if resp.status_code == 429:
        span.set_status(Status(StatusCode.ERROR, "Rate limited"))
These show up in the room’s Traces tab.

Custom Logs

otel_config() sets up an OTEL logging handler so your normal logging.info() lines are captured. Add extra logs to make it easier to debug and understand…
Python
import logging
log = logging.getLogger(__name__)

log.info(f"Weather tool is running for city: {city} with units: {units}")
These show up in the room’s Logs tab.

Custom Metrics

You can also add additional counters and historgrams to help track trends over time.
Python
from opentelemetry import metrics
meter = metrics.get_meter("weather.tools")

calls = meter.create_counter("weather.calls", unit="1",
                             description="Total weather tool invocations")

# When a call starts
calls.add(1, attributes={"city": city.lower(), "units": units})
These show up in the room’s Metrics tab.

Running the sample

You can deploy and run this toolkit as a service and invoke the weather tool, get_weather(), from Studio or the CLI.

Step 1: Package and deploy the room service

To deploy the otel service permanently, you’ll package your code with a meshagent.yaml file that defines the service configuration and a container image that MeshAgent can run. For full details on the service spec and deployment flow, see Packaging Services and Deploying Services. MeshAgent supports two deployment patterns for containers:
  1. Runtime image + code mount (recommended): Use a pre-built MeshAgent runtime image (like python-sdk-slim) that contains Python and all MeshAgent dependencies. Mount your lightweight code-only image on top. This keeps your code image tiny (~KB), eliminates dependency installation time, and allows your service to start quickly.
  2. Single Image: Bundle your code and all dependencies into one image. This is good when you need extra libraries that aren’t already in the runtime image, but can result in larger images and slower pulls.
This example demonstrates approach #1 with a code-only image. The default YAML points at the public python-docs-examples image so you can still run the documentation examples without building your own images. If you want to build and push your own code image, follow the steps below and update the image mount section of the meshagent.yaml file. Prepare your project structure This example organizes the tool code and configuration in the same folder, making each sample self-contained:
your-project/
├── Dockerfile                    # Shared by all samples
├── observability/
   ├── observability.py
   └── meshagent.yaml            # Config specific to this sample
└── another_sample/               # Other samples follow same pattern
    ├── another_sample.py
    └── meshagent.yaml
Note: If you’re building a single tool, you only need the observability/ folder. The structure shown supports multiple samples sharing one Dockerfile.
Step 1a: Build a Docker container Create a scratch Dockerfile and copy the files you want to run. This creates a minimal image containing only your code files.
FROM scratch

COPY . /
Build and push the image with docker buildx:
bash
docker buildx build . \
  -t "<REGISTRY>/<NAMESPACE>/<IMAGE_NAME>:<TAG>" \
  --platform linux/amd64 \
  --push
Note: Building from the project root copies your entire project structure into the image. For a single tool, this is fine - your image will just contain one folder. For multi-tool projects, all samples will be in one image, but each can deploy independently using its own meshagent.yaml.
Step 1b: Package the service Define the service configuration in a meshagent.yaml file. This service will have a container section that references:
  • Runtime image: The MeshAgent Python SDK image with all dependencies
  • Code mount: Your code-only image mounted at /src
  • Command path: Points to your sample’s specific location
kind: Service
version: v1
metadata:
  name: otel-example
  description: "An example weather tool with otel instrumentation"
ports:
- num: "*"
  liveness: "/"
  endpoints:
  - path: /weather
    meshagent:
      identity: weather-toolkit
container:
  image: "us-central1-docker.pkg.dev/meshagent-public/images/python-sdk:{SERVER_VERSION}-esgz"
  command: python /src/observability/observability.py
  storage:
    images:
      # Replace this image tag with your own code-only image if you build one.
      - image: "us-central1-docker.pkg.dev/meshagent-public/images/python-docs-examples:{SERVER_VERSION}"
        path: /src
        read_only: true

How the paths work:
  • Your code image contains /observability/observability.py
  • It’s mounted at /src in the runtime container
  • The command runs python /src/observability/observability.py
Note: The default YAML in the docs uses us-central1-docker.pkg.dev/meshagent-public/images/python-docs-examples so you can test this example immediately without building your own image first. Replace this with your actual image tag when deploying your own code.
Step 1c: Deploy the service Next from the CLI in the directory where your meshagent.yaml file is run:
meshagent service create --file "meshagent.yaml" --room=otel

Step 2: Invoking the tool

Once you’ve created the service you can invoke the tool directly from MeshAgent Studio by going into the room, clicking the menu icon, selecting Toolkits, then select the Weather Tool and click Invoke. The UI will display a dialog for you to input the city and units you want to get the weather from. You can also invoke the tool using the MeshAgent CLI:
bash
meshagent room agents invoke-tool \
  --room=otel \
  --toolkit=weather-toolkit \
  --tool=get_weather \
  --arguments='{"city":"Costa Mesa","units":"imperial"}'

Viewing telemetry

From MeshAgent Studio you will see Logs, Traces, and Metrics both from the Session view of the room and from the Developer Console. The Session viewer provides a focused view of session details which are accessible during and after each session. The Developer Console appears on the lower part of the screen once you’re inside a room and shows live traces, logs, and metrics as you interact with your services inside a room. After invoking the weather tool in this example you should see a trace tree appear in the Traces tab with a structure like this:
toolkit.execute (get_weather)
 ├─ validate_input
 ├─ fetch_weather_api
 └─ parse_response
Applicable logs and metrics will appear under their respective tabs.

Next Steps

  • Agents: Understand how agents work in MeshAgent and start building your first one!
  • Tools: Learn how to build tools human and agent participants can use inside a MeshAgent room.
  • Services & Containers: Learn how to deploy instrumented agents and tools as project wide or room specific services in MeshAgent