Agent runtime — integration contract

Documents the integration between meshi-platform (the API) and meshi-agent-runtime (the AI agent backend at agent-runtime/). Read this when debugging auth failures, call-flow regressions, or MCP back-channel issues.

For the deploy runbook see coach-runtime-mcp-deploy.md. For the tool catalog see agent-runtime-tool-catalog.md. For the coach API surface see coach-api.md.

Backend mode matrix

MESHI_RUNTIME_BACKEND (primary) or ZEROCLAW_BACKEND (legacy fallback) selects which backend class handles every call to getBackend().chat() in packages/api/src/agent-runtime.ts. The two-call-direction diagram below only applies to local mode. The other two modes do not touch the external runtime at all.

Mode	Env value	Class	LLM provider	Tool execution	Onboarding tools	MCP back-channel
Local (default)	`local`	`LocalBackend`	External `meshi-agent-runtime` service (its own model config + agent loop)	Runtime-native tools + MCP federated via `mcp_call_tool`	✗ — onboarding tools are not registered on the platform MCP server	Runtime → platform via Streamable HTTP `/mcp`
Agent (platform-native)	`agent`	`AgentBackend`	Cerebras / OpenRouter direct via `AGENT_API_KEY` / `AGENT_BASE_URL` / `AGENT_MODEL`	In-process: calls `@meshi/core` service functions directly	✓ — `get_onboarding_status`, `start_onboarding`, `update_linkedin_url`, `complete_onboarding`	None — bypasses MCP entirely
Fly (decommissioned)	`fly`	`FlyBackend`	Per-user ephemeral Fly Machines running the legacy Rust `zeroclaw` image	Machine-side tool loop, no MCP	✗	None

Selecting a backend

Set MESHI_RUNTIME_BACKEND on the platform Fly app (meshi-api-staging / meshi-api-prod):

# Default — routes through the shared TypeScript runtime
MESHI_RUNTIME_BACKEND=local
MESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617
MESHI_RUNTIME_API_KEY=<shared secret, must equal runtime's GATEWAY_API_KEY>

# Platform-native loop — LLM calls go direct, no external runtime hop
# Required when onboarding tools must be available to the coach agent
MESHI_RUNTIME_BACKEND=agent
AGENT_API_KEY=<Cerebras or OpenRouter key>
AGENT_BASE_URL=https://api.cerebras.ai/v1  # or any OpenAI-compatible endpoint
AGENT_MODEL=zai-glm-4.7                    # or any model on that endpoint

The agent backend also falls back through CEREBRAS_API_KEY → OPENROUTER_API_KEY when AGENT_API_KEY is not set, and defaults AGENT_BASE_URL to https://api.cerebras.ai/v1.

Do not flip to fly. The Rust machine pool (meshi-zeroclaw-jobs) is decommissioned. The FlyBackend class is preserved for emergency rollback only.

Why onboarding tools only exist in `AgentBackend`

The four onboarding tools call @meshi/core service functions directly (getOnboardingStatus, startOnboarding, updateLinkedinUrl, completeOnboarding). They are defined as AGENT_TOOLS in packages/api/src/agent-backend.ts. They have not been ported to the platform MCP server (packages/mcp/src/server.ts).

When MESHI_RUNTIME_BACKEND=local, the external runtime cannot reach these tools even if it tries mcp_call_tool server=meshi-platform name=get_onboarding_status — the platform MCP server returns a “tool not found” error. The COACH_SYSTEM_PROMPT in packages/api/src/routes/coach.ts lists get_onboarding_status as an available tool; this annotation is only valid under agent backend.

Two call directions

meshi-platform  ──────────────────────►  agent-runtime
  /api/v0/coach POST              /v1/chat/completions
  (platform calls runtime)

agent-runtime  ────────────────────────►  meshi-platform
  mcp_call_tool server=meshi-platform    /mcp (Streamable HTTP)
  (runtime calls back via MCP)

Direction 1: platform → runtime

Entry point in the platform

packages/api/src/agent-runtime.ts, LocalBackend.chat().

// Resolved by MESHI_RUNTIME_BACKEND=local
MESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617
MESHI_RUNTIME_API_KEY=<shared secret == runtime's GATEWAY_API_KEY>

Request headers the platform sends

Header	Value
`Authorization`	`Bearer <MESHI_RUNTIME_API_KEY>` — service-to-service auth
`x-user-id`	The authenticated user’s `auth_user_id` (acts-as this user for all MCP tool calls)
`x-conversation-id`	The coach conversation UUID (optional; forwarded by `LocalBackend` only — `AgentBackend` receives it as `ChatRequest.conversationId` but does not use it in its agentic loop)
`x-entity-id`	The user’s entity UUID (optional; forwarded to Composio as `entity_id`)
`Content-Type`	`application/json`

Request body

POST /v1/chat/completions — OpenAI-compatible. Key fields:

{
  "messages": [
    { "role": "system", "content": "<system prompt>" },
    { "role": "user",   "content": "<user turn>" }
  ],
  "stream": true
}

The platform sets stream: true and tees the SSE stream to persist tool_events on the assistant message. Non-streaming calls are used only from test scripts.

Response: `tool_events` on assistant messages

The coach message handler accumulates structured tool events from the SSE stream and persists them in response_object.tool_events on the assistant message row. This lets UI reloads reconstruct the full tool-call chrome without replaying the runtime.

type ToolEvent =
  | { type: "call";         index: number; id?: string; name?: string; arguments_json: string }
  | { type: "result";       tool_call_id: string; content: string }
  | { type: "pre_reasoning"; content: string };

Array is chronological. Typical multi-round turn: [pre_reasoning, call, result, pre_reasoning, call, result] then final prose in message.content.

Present only on role='assistant' messages; response_object = null on older messages.

Direction 2: runtime → platform (MCP back-channel)

How the runtime authenticates to `/mcp`

Authorization: Bearer mk_<service API key>

The key is the mk_… token from step 1 of coach-runtime-mcp-deploy.md, stored hashed in api_key with auth_user_id = '__mcp_service__' and scope meshi:service. The platform’s MCP auth middleware checks this key, then reads the per-user identity from x-meshi-runtime-user-id.

Per-user identity header

The runtime signs a short-lived JWT for each MCP call using BETTER_AUTH_SECRET (read as JWT_SECRET first, then BETTER_AUTH_SECRET — set either):

x-meshi-runtime-user-id: <HS256 JWT, sub=<auth_user_id>, iss=meshi-api, exp=+6h>

The platform verifies the JWT with the same BETTER_AUTH_SECRET. The MCP server then executes all tool calls in the context of that user — every tool call is scoped to the user whose coach session is running.

Legacy: x-zeroclaw-user-id is accepted as a fallback header name.

MCP transport

Streamable HTTP at https://api.staging.meshi.io/mcp. The runtime uses transport: "http" in its MCP_CONFIG_JSON. It follows the spec-correct MCP session handshake (the platform returns Mcp-Session-Id on first connection).

Secret lockstep

All three of these must be consistent or the integration breaks:

Secret	Where set	What breaks if wrong
`MESHI_RUNTIME_API_KEY`	`meshi-api-*` (Fly secret)	Platform → runtime calls get 401
`GATEWAY_API_KEY`	`meshi-runtime-ts-*` (Fly secret)	Platform → runtime calls get 401
`BETTER_AUTH_SECRET`	Both apps (Fly secret)	Runtime MCP calls get 401 (JWT verify fails)

MESHI_RUNTIME_API_KEY == GATEWAY_API_KEY (same value on both sides).

Env vars quick reference

On `meshi-api-staging` / `meshi-api-prod`

MESHI_RUNTIME_BACKEND=local
MESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617
MESHI_RUNTIME_API_KEY=<shared bearer>
BETTER_AUTH_SECRET=<shared JWT signing secret>

Legacy fallbacks (still read; migrate to MESHI_RUNTIME_* on next secret rotation):

ZEROCLAW_BACKEND=local     # → MESHI_RUNTIME_BACKEND
ZEROCLAW_URL=...           # → MESHI_RUNTIME_URL
ZEROCLAW_API_KEY=...       # → MESHI_RUNTIME_API_KEY

On `meshi-runtime-ts-staging` / `meshi-runtime-ts-prod`

PORT=42617
HOST=::                          # IPv6 for Fly .internal mesh
GATEWAY_API_KEY=<shared bearer>
BETTER_AUTH_SECRET=<same secret as platform, or set JWT_SECRET>
MCP_CONFIG_JSON={"servers":[{"name":"meshi-platform","transport":"http","url":"https://api.staging.meshi.io/mcp","headers":{"Authorization":"Bearer mk_<service key>"}}]}
OPENAI_API_KEY=<key for agent loop>
DATABASE_URL=<neon staging url>
AGENT_RUNTIME_SCHEMA=agent-runtime

Observability

Platform side

agent_run table (migration 083) records one row per coach turn. Columns written by the POST /api/v0/coach/conversations/:id/messages handler:

Column	Set at	Notes
`conversation_id`	insert	The conversation this turn belongs to
`status`	insert → closeout	`running` on open; `succeeded`, `failed`, or `aborted` on close
`message_id`	closeout	UUID of the persisted assistant message; `null` if stream produced no content
`model`	closeout	Model name as emitted by the runtime; `null` if the backend did not report it
`tool_call_count`	closeout	Distinct tool-call indices observed across all rounds
`error_message`	closeout	Set on `failed` paths only
`finished_at`	closeout	Timestamp when the SSE stream closed

Rows with finished_at IS NULL are abandoned streams (client disconnected before any content arrived).

Runtime side

GET /health — liveness.
GET /metrics — Prometheus-style counters (requires auth — see security-audit.md MED findings).

Diagnosing failures

Symptom	Most likely cause
401 on `POST /v1/chat/completions`	`MESHI_RUNTIME_API_KEY` ≠ `GATEWAY_API_KEY`
401 on MCP tool calls inside the agent loop	`BETTER_AUTH_SECRET` drift between the two apps
`JWT_SECRET not configured` in runtime logs	`BETTER_AUTH_SECRET` or `JWT_SECRET` not set on runtime app
`MCP server did not return Mcp-Session-Id`	MCP transport misconfigured or `/mcp` endpoint unreachable
Platform → runtime connection refused	Runtime `HOST=0.0.0.0` instead of `::` (IPv4 only, Fly mesh is IPv6)

Agent runtime — integration contract

Agent runtime — integration contract

Backend mode matrix

Selecting a backend

Why onboarding tools only exist in AgentBackend

Two call directions

Direction 1: platform → runtime

Entry point in the platform

Request headers the platform sends

Request body

Response: tool_events on assistant messages

Direction 2: runtime → platform (MCP back-channel)

How the runtime authenticates to /mcp

Per-user identity header

MCP transport

Secret lockstep

Env vars quick reference

On meshi-api-staging / meshi-api-prod

On meshi-runtime-ts-staging / meshi-runtime-ts-prod

Observability

Platform side

Runtime side

Diagnosing failures

Why onboarding tools only exist in `AgentBackend`

Response: `tool_events` on assistant messages

How the runtime authenticates to `/mcp`

On `meshi-api-staging` / `meshi-api-prod`

On `meshi-runtime-ts-staging` / `meshi-runtime-ts-prod`