Skip to main content
The OpenAI Responses adapter is our reference LLMAdapter implementation. It enables MeshAgent agents to use the OpenAI Responses API, handling streaming, tool calls, and model-specific settings.

Key features

  • Model defaults: Reads the model name from the constructor (model=) or the OPENAI_MODEL environment variable. Override per message by passing model in the chat payload.
  • Session context defaults: Creates AgentSessionContext(system_role=None) so system/developer prompts are driven by the caller or wrapper agent.
  • Tool bundling: Converts the supplied toolkits into OpenAI tool definitions (both standard JSON function tools and OpenAI-native tools like computer_use_preview, web_search_preview, image_generation).
  • Streaming support: Consumes the streaming response API, emitting events such as reasoning summaries, partial content, and tool call updates.
  • Parallel tool calls: Optionally enables OpenAI’s parallel_tool_calls setting (disabled automatically for models that do not support it).
  • Structured output: If output_schema is provided to create_response(), requests JSON schema output and validates the result.
  • Automatic compaction: Uses OpenAI Responses auto-compaction (context_management) by default.

Constructor parameters

Python
OpenAIResponsesAdapter(
    model: str = "gpt-5.2",
    parallel_tool_calls: Optional[bool] = None,
    client: Optional[AsyncOpenAI] = None,
    response_options: Optional[dict] = None,
    reasoning_effort: Optional[str] = None,
    provider: str = "openai",
    log_requests: bool = False,
    max_output_tokens: Optional[int] = 32000,
    context_management: Literal["auto", "standalone", "none"] = "auto",
    compaction_threshold: int = 200000,
    base_url: Optional[str] = None,
)
  • model – default model name; can be overridden per message.
  • parallel_tool_calls – request parallel tool execution when supported.
  • client – reuse an existing AsyncOpenAI client; otherwise the adapter creates one via meshagent.openai.proxy.get_client.
  • base_url – override the provider base URL used when the adapter creates its own client. Defaults to OPENAI_BASE_URL when omitted.
  • response_options – extra parameters passed to responses.create.
  • reasoning_effort – populates the Responses API reasoning options.
  • provider – label emitted in telemetry and logs.
  • log_requests – when true, logs HTTP requests for debugging.
  • max_output_tokens – cap output tokens per response; also used when deciding whether to compact the context.
  • context_management – controls compaction behavior:
    • auto attaches context_management to each request and lets Responses handle compaction.
    • standalone uses manual responses.compact preflight in the adapter.
    • none disables both auto and manual compaction.
  • compaction_threshold – threshold used for compaction (compact_threshold in Responses context_management entries and manual preflight trigger in standalone mode).

Tool provider integration

The adapter includes several OpenAI native tool wrappers. Agents can use them directly, or override them with agent-specific wrappers that add persistence and room-specific behavior.
  • Image generationImageGenerationTool
  • Shell executionShellTool
  • MCPMCPServer, MCPTool
  • Web search previewWebSearchTool
  • File Search - FileSearchTool
  • Code Interpreter - CodeInterpreterTool
  • Reasoning - ReasoningTool

Handling a turn

When create_response() is called it:
  1. Bundles tools - Collects the tools from your toolkits and packages them for OpenAI’s API
  2. Calls the model - Sends messages and tools to OpenAI’s API
  3. Handles responses - Processes text, tool calls, or structured output
  4. Executes tools - When the model requests tools, executes them and formats results
  5. Loops - Continues calling the model with tool results until it produces a final answer
  6. Returns Result - Gives you the final output (text or structured data)

Context compaction

OpenAIResponsesAdapter defaults to context_management="auto" and sends: context_management=[{"type":"compaction","compact_threshold":200000}] You can switch to:
  • context_management="standalone" to use manual responses.compact before a turn when usage crosses the threshold.
  • context_management="none" to disable compaction management.