Skip to content

Agent runtime — integration contract

Agent runtime — integration contract

Documents the integration between meshi-platform (the API) and meshi-agent-runtime (the AI agent backend at agent-runtime/). Read this when debugging auth failures, call-flow regressions, or MCP back-channel issues.

For the deploy runbook see coach-runtime-mcp-deploy.md. For the tool catalog see agent-runtime-tool-catalog.md. For the coach API surface see coach-api.md.


Backend mode matrix

MESHI_RUNTIME_BACKEND (primary) or ZEROCLAW_BACKEND (legacy fallback) selects which backend class handles every call to getBackend().chat() in packages/api/src/agent-runtime.ts. The two-call-direction diagram below only applies to local mode. The other two modes do not touch the external runtime at all.

ModeEnv valueClassLLM providerTool executionOnboarding toolsMCP back-channel
Local (default)localLocalBackendExternal meshi-agent-runtime service (its own model config + agent loop)Runtime-native tools + MCP federated via mcp_call_tool✗ — onboarding tools are not registered on the platform MCP serverRuntime → platform via Streamable HTTP /mcp
Agent (platform-native)agentAgentBackendCerebras / OpenRouter direct via AGENT_API_KEY / AGENT_BASE_URL / AGENT_MODELIn-process: calls @meshi/core service functions directly✓ — get_onboarding_status, start_onboarding, update_linkedin_url, complete_onboardingNone — bypasses MCP entirely
Fly (decommissioned)flyFlyBackendPer-user ephemeral Fly Machines running the legacy Rust zeroclaw imageMachine-side tool loop, no MCPNone

Selecting a backend

Set MESHI_RUNTIME_BACKEND on the platform Fly app (meshi-api-staging / meshi-api-prod):

Terminal window
# Default — routes through the shared TypeScript runtime
MESHI_RUNTIME_BACKEND=local
MESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617
MESHI_RUNTIME_API_KEY=<shared secret, must equal runtime's GATEWAY_API_KEY>
# Platform-native loop — LLM calls go direct, no external runtime hop
# Required when onboarding tools must be available to the coach agent
MESHI_RUNTIME_BACKEND=agent
AGENT_API_KEY=<Cerebras or OpenRouter key>
AGENT_BASE_URL=https://api.cerebras.ai/v1 # or any OpenAI-compatible endpoint
AGENT_MODEL=zai-glm-4.7 # or any model on that endpoint

The agent backend also falls back through CEREBRAS_API_KEYOPENROUTER_API_KEY when AGENT_API_KEY is not set, and defaults AGENT_BASE_URL to https://api.cerebras.ai/v1.

Do not flip to fly. The Rust machine pool (meshi-zeroclaw-jobs) is decommissioned. The FlyBackend class is preserved for emergency rollback only.

Why onboarding tools only exist in AgentBackend

The four onboarding tools call @meshi/core service functions directly (getOnboardingStatus, startOnboarding, updateLinkedinUrl, completeOnboarding). They are defined as AGENT_TOOLS in packages/api/src/agent-backend.ts. They have not been ported to the platform MCP server (packages/mcp/src/server.ts).

When MESHI_RUNTIME_BACKEND=local, the external runtime cannot reach these tools even if it tries mcp_call_tool server=meshi-platform name=get_onboarding_status — the platform MCP server returns a “tool not found” error. The COACH_SYSTEM_PROMPT in packages/api/src/routes/coach.ts lists get_onboarding_status as an available tool; this annotation is only valid under agent backend.


Two call directions

meshi-platform ──────────────────────► agent-runtime
/api/v0/coach POST /v1/chat/completions
(platform calls runtime)
agent-runtime ────────────────────────► meshi-platform
mcp_call_tool server=meshi-platform /mcp (Streamable HTTP)
(runtime calls back via MCP)

Direction 1: platform → runtime

Entry point in the platform

packages/api/src/agent-runtime.ts, LocalBackend.chat().

// Resolved by MESHI_RUNTIME_BACKEND=local
MESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617
MESHI_RUNTIME_API_KEY=<shared secret == runtime's GATEWAY_API_KEY>

Request headers the platform sends

HeaderValue
AuthorizationBearer <MESHI_RUNTIME_API_KEY> — service-to-service auth
x-user-idThe authenticated user’s auth_user_id (acts-as this user for all MCP tool calls)
x-conversation-idThe coach conversation UUID (optional; forwarded by LocalBackend only — AgentBackend receives it as ChatRequest.conversationId but does not use it in its agentic loop)
x-entity-idThe user’s entity UUID (optional; forwarded to Composio as entity_id)
Content-Typeapplication/json

Request body

POST /v1/chat/completions — OpenAI-compatible. Key fields:

{
"messages": [
{ "role": "system", "content": "<system prompt>" },
{ "role": "user", "content": "<user turn>" }
],
"stream": true
}

The platform sets stream: true and tees the SSE stream to persist tool_events on the assistant message. Non-streaming calls are used only from test scripts.

Response: tool_events on assistant messages

The coach message handler accumulates structured tool events from the SSE stream and persists them in response_object.tool_events on the assistant message row. This lets UI reloads reconstruct the full tool-call chrome without replaying the runtime.

type ToolEvent =
| { type: "call"; index: number; id?: string; name?: string; arguments_json: string }
| { type: "result"; tool_call_id: string; content: string }
| { type: "pre_reasoning"; content: string };

Array is chronological. Typical multi-round turn: [pre_reasoning, call, result, pre_reasoning, call, result] then final prose in message.content.

Present only on role='assistant' messages; response_object = null on older messages.


Direction 2: runtime → platform (MCP back-channel)

How the runtime authenticates to /mcp

Authorization: Bearer mk_<service API key>

The key is the mk_… token from step 1 of coach-runtime-mcp-deploy.md, stored hashed in api_key with auth_user_id = '__mcp_service__' and scope meshi:service. The platform’s MCP auth middleware checks this key, then reads the per-user identity from x-meshi-runtime-user-id.

Per-user identity header

The runtime signs a short-lived JWT for each MCP call using BETTER_AUTH_SECRET (read as JWT_SECRET first, then BETTER_AUTH_SECRET — set either):

x-meshi-runtime-user-id: <HS256 JWT, sub=<auth_user_id>, iss=meshi-api, exp=+6h>

The platform verifies the JWT with the same BETTER_AUTH_SECRET. The MCP server then executes all tool calls in the context of that user — every tool call is scoped to the user whose coach session is running.

Legacy: x-zeroclaw-user-id is accepted as a fallback header name.

MCP transport

Streamable HTTP at https://api.staging.meshi.io/mcp. The runtime uses transport: "http" in its MCP_CONFIG_JSON. It follows the spec-correct MCP session handshake (the platform returns Mcp-Session-Id on first connection).


Secret lockstep

All three of these must be consistent or the integration breaks:

SecretWhere setWhat breaks if wrong
MESHI_RUNTIME_API_KEYmeshi-api-* (Fly secret)Platform → runtime calls get 401
GATEWAY_API_KEYmeshi-runtime-ts-* (Fly secret)Platform → runtime calls get 401
BETTER_AUTH_SECRETBoth apps (Fly secret)Runtime MCP calls get 401 (JWT verify fails)

MESHI_RUNTIME_API_KEY == GATEWAY_API_KEY (same value on both sides).


Env vars quick reference

On meshi-api-staging / meshi-api-prod

Terminal window
MESHI_RUNTIME_BACKEND=local
MESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617
MESHI_RUNTIME_API_KEY=<shared bearer>
BETTER_AUTH_SECRET=<shared JWT signing secret>

Legacy fallbacks (still read; migrate to MESHI_RUNTIME_* on next secret rotation):

Terminal window
ZEROCLAW_BACKEND=local # → MESHI_RUNTIME_BACKEND
ZEROCLAW_URL=... # → MESHI_RUNTIME_URL
ZEROCLAW_API_KEY=... # → MESHI_RUNTIME_API_KEY

On meshi-runtime-ts-staging / meshi-runtime-ts-prod

Terminal window
PORT=42617
HOST=:: # IPv6 for Fly .internal mesh
GATEWAY_API_KEY=<shared bearer>
BETTER_AUTH_SECRET=<same secret as platform, or set JWT_SECRET>
MCP_CONFIG_JSON={"servers":[{"name":"meshi-platform","transport":"http","url":"https://api.staging.meshi.io/mcp","headers":{"Authorization":"Bearer mk_<service key>"}}]}
OPENAI_API_KEY=<key for agent loop>
DATABASE_URL=<neon staging url>
AGENT_RUNTIME_SCHEMA=agent-runtime

Observability

Platform side

agent_run table (migration 083) records one row per coach turn. Columns written by the POST /api/v0/coach/conversations/:id/messages handler:

ColumnSet atNotes
conversation_idinsertThe conversation this turn belongs to
statusinsert → closeoutrunning on open; succeeded, failed, or aborted on close
message_idcloseoutUUID of the persisted assistant message; null if stream produced no content
modelcloseoutModel name as emitted by the runtime; null if the backend did not report it
tool_call_countcloseoutDistinct tool-call indices observed across all rounds
error_messagecloseoutSet on failed paths only
finished_atcloseoutTimestamp when the SSE stream closed

Rows with finished_at IS NULL are abandoned streams (client disconnected before any content arrived).

Runtime side

GET /health — liveness.
GET /metrics — Prometheus-style counters (requires auth — see security-audit.md MED findings).

Diagnosing failures

SymptomMost likely cause
401 on POST /v1/chat/completionsMESHI_RUNTIME_API_KEYGATEWAY_API_KEY
401 on MCP tool calls inside the agent loopBETTER_AUTH_SECRET drift between the two apps
JWT_SECRET not configured in runtime logsBETTER_AUTH_SECRET or JWT_SECRET not set on runtime app
MCP server did not return Mcp-Session-IdMCP transport misconfigured or /mcp endpoint unreachable
Platform → runtime connection refusedRuntime HOST=0.0.0.0 instead of :: (IPv4 only, Fly mesh is IPv6)