Custom Logs, Traces, and Metrics

MeshAgent already gives you logs, traces, metrics, session data, and developer logs in MeshAgent Studio. Use this page when you want to add your own spans, logs, and metrics inside a custom Python service or room-connected toolkit. For the built-in observability model first, see Observability.

Enable telemetry in your process

Call otel_config() once at process startup:

Python

from meshagent.otel import otel_config
otel_config(service_name="weather-service")

When this code runs inside MeshAgent, otel_config() uses the injected OTEL_ENDPOINT and room/session environment to send telemetry to MeshAgent Studio.

Example: Weather Toolkit with custom instrumentation

The example adds custom spans around validation, the external HTTP call, and response parsing. It also adds logs and metrics.

import asyncio
import logging

import httpx
from meshagent.agents import SingleRoomAgent
from meshagent.otel import otel_config
from meshagent.tools import FunctionTool, ToolContext, Toolkit
from opentelemetry import metrics, trace
from opentelemetry.trace import Status, StatusCode

# Configure OpenTelemetry
otel_config(service_name="weather_tools")
log = logging.getLogger(__name__)
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

# Counters
calls = meter.create_counter(
    "weather.calls", unit="1", description="Total weather tool invocations"
)
errors = meter.create_counter(
    "weather.errors", unit="1", description="Errors during execution"
)


class WeatherTool(FunctionTool):
    def __init__(self):
        super().__init__(
            name="get_weather",
            title="Weather Tool",
            description="Get current weather for a city using wttr.in API",
            input_schema={
                "type": "object",
                "additionalProperties": False,
                "required": ["city", "units"],
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "units": {
                        "type": "string",
                        "enum": ["metric", "imperial"],
                        "description": "Units the temperature will be returned in",
                    },
                },
            },
        )

    async def execute(self, context: ToolContext, *, city: str, units: str):
        """
        This shows custom instrumentation that meshagent doesn't do automatically:
        1. Separate spans for validation, API call, parsing
        2. Custom attributes (API endpoint, response size)
        3. Events for important moments (rate limits etc.)
        4. Error handling with span status
        """
        log.info(f"Weather tool is running for city: {city} with units: {units}")
        calls.add(1, attributes={"city": city.lower(), "units": units})
        # Custom span for input validation
        with tracer.start_as_current_span("validate_input") as span:
            span.set_attribute("city", city)

            if not city or len(city) < 2:
                span.set_status(Status(StatusCode.ERROR, "Invalid city name"))
                span.add_event("validation_failed", {"reason": "city too short"})
                return {"error": "City name must be at least 2 characters"}

            span.add_event("validation_passed")

        # Custom span for external API call
        with tracer.start_as_current_span("fetch_weather_api") as span:
            # Add attributes about the API call
            api_url = f"https://wttr.in/{city}?format=j1"
            span.set_attribute("http.url", api_url)
            span.set_attribute("http.method", "GET")
            span.set_attribute("api.provider", "wttr.in")

            span.add_event("api_request_start")

            try:
                async with httpx.AsyncClient(timeout=10.0) as client:
                    response = await client.get(api_url)

                    # Record response attributes
                    span.set_attribute("http.status_code", response.status_code)
                    span.set_attribute("http.response_size", len(response.content))

                    if response.status_code == 429:
                        span.add_event("rate_limit_exceeded")
                        span.set_status(Status(StatusCode.ERROR, "Rate limited"))
                        return {"error": "Rate limit exceeded, try again later"}

                    response.raise_for_status()
                    data = response.json()

                    span.add_event(
                        "api_request_success",
                        {
                            "data_keys": list(data.keys()),
                        },
                    )

            except httpx.TimeoutException:
                span.set_status(Status(StatusCode.ERROR, "API timeout"))
                span.add_event("api_timeout")
                errors.add(1, attributes={"kind": "timeout", "city": city.lower()})
                return {"error": "Weather service timeout"}
            except Exception as e:
                errors.add(
                    1, attributes={"kind": type(e).__name__, "city": city.lower()}
                )
                span.set_status(Status(StatusCode.ERROR, str(e)))
                span.add_event("api_error", {"error_type": type(e).__name__})
                return {"error": f"Failed to fetch weather: {str(e)}"}

        # Custom span for parsing and formatting
        with tracer.start_as_current_span("parse_response") as span:
            try:
                current = data["current_condition"][0]
                location = data["nearest_area"][0]

                if units == "metric":
                    temperature = current["temp_C"]
                    degrees_in = "°C"
                elif units == "imperial":
                    temperature = current["temp_F"]
                    degrees_in = "°F"
                else:
                    log.warning(
                        f"Units {units} is not a valid unit. Must use metric or imperial"
                    )
                    return {"error": "Invalid units: must be 'metric' or 'imperial'"}

                result = {
                    "city": location["areaName"][0]["value"],
                    "country": location["country"][0]["value"],
                    "temperature": temperature,
                    "units": degrees_in,
                    "description": current["weatherDesc"][0]["value"],
                    "humidity": current["humidity"],
                    "wind_speed": current["windspeedKmph"],
                }

                # Record what we parsed
                span.set_attribute("parsed_fields", len(result))
                span.add_event("parse_success")

                return result

            except (KeyError, IndexError) as e:
                span.set_status(Status(StatusCode.ERROR, "Parse failed"))
                span.add_event("parse_error", {"error": str(e)})
                errors.add(1, attributes={"kind": "parse_error"})
                return {"error": "Failed to parse weather data"}


class WeatherToolkit(Toolkit):
    def __init__(self):
        super().__init__(
            name="weather-toolkit",
            title="Weather Toolkit",
            description="Tools for getting weather information",
            tools=[WeatherTool()],
        )


class WeatherAgent(SingleRoomAgent):
    async def get_exposed_toolkits(self) -> list[Toolkit]:
        return [WeatherToolkit()]


async def main() -> None:
    agent = WeatherAgent(title="weather-toolkit-host")
    await agent.run()


if __name__ == "__main__":
    asyncio.run(main())

Custom spans

Use spans to track specific operations inside a trace. This example records:

execute.weather-toolkit.get_weather, which MeshAgent creates for the overall tool call
validate_input for city and units validation
fetch_weather_api for the outbound HTTP request, including attributes such as http.url, http.method, http.status_code, and http.response_size
parse_response for the final response shaping step

This gives you finer-grained visibility inside the spans MeshAgent already creates. Pattern:

Python

from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("fetch_weather_api") as span:
    span.set_attribute("http.url", api_url)
    span.add_event("api_request_start")
    ...
    if resp.status_code == 429:
        span.set_status(Status(StatusCode.ERROR, "Rate limited"))

These spans appear in the room’s Traces tab.

Custom logs

otel_config() sets up an OTEL logging handler so normal logging calls are captured too.

Python

import logging
log = logging.getLogger(__name__)

log.info(f"Weather tool is running for city: {city} with units: {units}")

These logs appear in the room’s Logs tab.

Custom metrics

You can also add counters and histograms to track trends over time.

Python

from opentelemetry import metrics
meter = metrics.get_meter("weather.tools")

calls = meter.create_counter("weather.calls", unit="1",
                             description="Total weather tool invocations")

# When a call starts
calls.add(1, attributes={"city": city.lower(), "units": units})

These metrics appear in the room’s Metrics tab.

Deploy the sample

You can run this toolkit locally with meshagent room connect, deploy it as a service, and invoke get_weather() from Studio or the CLI.

Run it locally

During development, run the toolkit with meshagent room connect so MeshAgent provides the room token, room name, LLM proxy credentials, and telemetry environment:

bash

meshagent rooms create gettingstarted --if-not-exists
meshagent room connect --room=gettingstarted --identity=weather-toolkit -- python3 observability.py

Step 1: Package and deploy the room service

Package the sample with a meshagent.yaml file and a container image that MeshAgent can run. For the general deployment flow, see Service YAML. This example uses a MeshAgent runtime image plus a lightweight code image. The default YAML points at the public python-docs-examples image so you can run the docs example without building your own image first. Project structure:

your-project/
├── Dockerfile                    # Shared by all samples
├── observability/
│   ├── observability.py
│   └── meshagent.yaml            # Config specific to this sample
└── another_sample/               # Other samples follow same pattern
    ├── another_sample.py
    └── meshagent.yaml

If you are building a single tool, you only need the observability/ folder.

Step 1a: Build a Docker image

Create a scratch Dockerfile and copy the files you want to run:

FROM scratch

COPY . /

Build and push the image:

bash

docker buildx build . \
  -t "<REGISTRY>/<NAMESPACE>/<IMAGE_NAME>:<TAG>" \
  --platform linux/amd64 \
  --push

Step 1b: Define the service

Create a meshagent.yaml file that references:

Runtime image: The MeshAgent Python SDK image with all dependencies
Code mount: Your code-only image mounted at /src
Command path: Points to your sample’s specific location
Participant token: Injects MESHAGENT_TOKEN for the room-connected toolkit process

kind: Service
version: v1
metadata:
  name: otel-example
  description: "An example weather tool with otel instrumentation"
container:
  image: "us-central1-docker.pkg.dev/meshagent-public/images/python-sdk:{SERVER_VERSION}-esgz"
  command: python /src/observability/observability.py
  environment:
    - name: MESHAGENT_TOKEN
      token:
        identity: weather-toolkit
        role: agent
  storage:
    images:
      # Replace this image tag with your own code-only image if you build one.
      - image: "us-central1-docker.pkg.dev/meshagent-public/images/python-docs-examples:{SERVER_VERSION}"
        path: /src
        read_only: true

Path mapping:

Your code image contains /observability/observability.py
It’s mounted at /src in the runtime container
The command runs python /src/observability/observability.py

The default YAML in the docs uses us-central1-docker.pkg.dev/meshagent-public/images/python-docs-examples so you can test this example immediately without building your own image first. Replace this with your own image tag when deploying your code.

Step 1c: Deploy the service

From the directory that contains meshagent.yaml:

meshagent service create --file "meshagent.yaml" --room=gettingstarted

Step 2: Invoke the tool

Once the service is deployed, invoke the tool from MeshAgent Studio or the CLI. You can also invoke the tool using the MeshAgent CLI:

bash

meshagent room agents invoke-tool \
  --room=gettingstarted \
  --toolkit=weather-toolkit \
  --tool=get_weather \
  --arguments='{"city":"Costa Mesa","units":"imperial"}'

View telemetry

In MeshAgent Studio, you can inspect the telemetry from this example in both the Session view and the Developer Console. After you invoke the weather tool, the Traces tab should show a tree like:

execute.weather-toolkit.get_weather
 ├─ validate_input
 ├─ fetch_weather_api
 └─ parse_response

Logs and metrics appear under their respective tabs.

Next Steps

Observability: understand the built-in telemetry model and where to inspect it in MeshAgent Studio
Agents: understand how agents work in MeshAgent and start building your first one
Tools and Toolkits: learn how tools are discovered, shared, and called inside a MeshAgent room
Service YAML: write service manifests for instrumented agents and tools

​Enable telemetry in your process

​Example: Weather Toolkit with custom instrumentation

​Custom spans

​Custom logs

​Custom metrics

​Deploy the sample

​Run it locally

​Step 1: Package and deploy the room service

​Step 1a: Build a Docker image

​Step 1b: Define the service

​Step 1c: Deploy the service

​Step 2: Invoke the tool

​View telemetry

​Next Steps

Enable telemetry in your process

Example: Weather Toolkit with custom instrumentation

Custom spans

Custom logs

Custom metrics

Deploy the sample

Run it locally

Step 1: Package and deploy the room service

Step 1a: Build a Docker image

Step 1b: Define the service

Step 1c: Deploy the service

Step 2: Invoke the tool

View telemetry

Next Steps