Agent runtime — integration contract
Agent runtime — integration contract
Documents the integration between meshi-platform (the API) and meshi-agent-runtime (the AI
agent backend at agent-runtime/). Read this when debugging auth failures, call-flow regressions,
or MCP back-channel issues.
For the deploy runbook see coach-runtime-mcp-deploy.md. For the
tool catalog see agent-runtime-tool-catalog.md. For the coach
API surface see coach-api.md.
Backend mode matrix
MESHI_RUNTIME_BACKEND (primary) or ZEROCLAW_BACKEND (legacy fallback) selects which backend
class handles every call to getBackend().chat() in
packages/api/src/agent-runtime.ts. The two-call-direction diagram below only applies to local
mode. The other two modes do not touch the external runtime at all.
| Mode | Env value | Class | LLM provider | Tool execution | Onboarding tools | MCP back-channel |
|---|---|---|---|---|---|---|
| Local (default) | local | LocalBackend | External meshi-agent-runtime service (its own model config + agent loop) | Runtime-native tools + MCP federated via mcp_call_tool | ✗ — onboarding tools are not registered on the platform MCP server | Runtime → platform via Streamable HTTP /mcp |
| Agent (platform-native) | agent | AgentBackend | Cerebras / OpenRouter direct via AGENT_API_KEY / AGENT_BASE_URL / AGENT_MODEL | In-process: calls @meshi/core service functions directly | ✓ — get_onboarding_status, start_onboarding, update_linkedin_url, complete_onboarding | None — bypasses MCP entirely |
| Fly (decommissioned) | fly | FlyBackend | Per-user ephemeral Fly Machines running the legacy Rust zeroclaw image | Machine-side tool loop, no MCP | ✗ | None |
Selecting a backend
Set MESHI_RUNTIME_BACKEND on the platform Fly app (meshi-api-staging / meshi-api-prod):
# Default — routes through the shared TypeScript runtimeMESHI_RUNTIME_BACKEND=localMESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617MESHI_RUNTIME_API_KEY=<shared secret, must equal runtime's GATEWAY_API_KEY>
# Platform-native loop — LLM calls go direct, no external runtime hop# Required when onboarding tools must be available to the coach agentMESHI_RUNTIME_BACKEND=agentAGENT_API_KEY=<Cerebras or OpenRouter key>AGENT_BASE_URL=https://api.cerebras.ai/v1 # or any OpenAI-compatible endpointAGENT_MODEL=zai-glm-4.7 # or any model on that endpointThe agent backend also falls back through CEREBRAS_API_KEY → OPENROUTER_API_KEY when
AGENT_API_KEY is not set, and defaults AGENT_BASE_URL to https://api.cerebras.ai/v1.
Do not flip to
fly. The Rust machine pool (meshi-zeroclaw-jobs) is decommissioned. TheFlyBackendclass is preserved for emergency rollback only.
Why onboarding tools only exist in AgentBackend
The four onboarding tools call @meshi/core service functions directly (getOnboardingStatus,
startOnboarding, updateLinkedinUrl, completeOnboarding). They are defined as AGENT_TOOLS in
packages/api/src/agent-backend.ts. They have not been ported to the platform MCP server
(packages/mcp/src/server.ts).
When MESHI_RUNTIME_BACKEND=local, the external runtime cannot reach these tools even if it tries
mcp_call_tool server=meshi-platform name=get_onboarding_status — the platform MCP server returns a
“tool not found” error. The COACH_SYSTEM_PROMPT in packages/api/src/routes/coach.ts lists
get_onboarding_status as an available tool; this annotation is only valid under agent backend.
Two call directions
meshi-platform ──────────────────────► agent-runtime /api/v0/coach POST /v1/chat/completions (platform calls runtime)
agent-runtime ────────────────────────► meshi-platform mcp_call_tool server=meshi-platform /mcp (Streamable HTTP) (runtime calls back via MCP)Direction 1: platform → runtime
Entry point in the platform
packages/api/src/agent-runtime.ts, LocalBackend.chat().
// Resolved by MESHI_RUNTIME_BACKEND=localMESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617MESHI_RUNTIME_API_KEY=<shared secret == runtime's GATEWAY_API_KEY>Request headers the platform sends
| Header | Value |
|---|---|
Authorization | Bearer <MESHI_RUNTIME_API_KEY> — service-to-service auth |
x-user-id | The authenticated user’s auth_user_id (acts-as this user for all MCP tool calls) |
x-conversation-id | The coach conversation UUID (optional; forwarded by LocalBackend only — AgentBackend receives it as ChatRequest.conversationId but does not use it in its agentic loop) |
x-entity-id | The user’s entity UUID (optional; forwarded to Composio as entity_id) |
Content-Type | application/json |
Request body
POST /v1/chat/completions — OpenAI-compatible. Key fields:
{ "messages": [ { "role": "system", "content": "<system prompt>" }, { "role": "user", "content": "<user turn>" } ], "stream": true}The platform sets stream: true and tees the SSE stream to persist tool_events on the assistant
message. Non-streaming calls are used only from test scripts.
Response: tool_events on assistant messages
The coach message handler accumulates structured tool events from the SSE stream and persists them
in response_object.tool_events on the assistant message row. This lets UI reloads reconstruct
the full tool-call chrome without replaying the runtime.
type ToolEvent = | { type: "call"; index: number; id?: string; name?: string; arguments_json: string } | { type: "result"; tool_call_id: string; content: string } | { type: "pre_reasoning"; content: string };Array is chronological. Typical multi-round turn:
[pre_reasoning, call, result, pre_reasoning, call, result] then final prose in message.content.
Present only on role='assistant' messages; response_object = null on older messages.
Direction 2: runtime → platform (MCP back-channel)
How the runtime authenticates to /mcp
Authorization: Bearer mk_<service API key>The key is the mk_… token from step 1 of coach-runtime-mcp-deploy.md,
stored hashed in api_key with auth_user_id = '__mcp_service__' and scope meshi:service. The
platform’s MCP auth middleware checks this key, then reads the per-user identity from
x-meshi-runtime-user-id.
Per-user identity header
The runtime signs a short-lived JWT for each MCP call using BETTER_AUTH_SECRET (read as
JWT_SECRET first, then BETTER_AUTH_SECRET — set either):
x-meshi-runtime-user-id: <HS256 JWT, sub=<auth_user_id>, iss=meshi-api, exp=+6h>The platform verifies the JWT with the same BETTER_AUTH_SECRET. The MCP server then executes all
tool calls in the context of that user — every tool call is scoped to the user whose coach session
is running.
Legacy: x-zeroclaw-user-id is accepted as a fallback header name.
MCP transport
Streamable HTTP at https://api.staging.meshi.io/mcp. The runtime uses
transport: "http" in its MCP_CONFIG_JSON. It follows the spec-correct MCP session handshake
(the platform returns Mcp-Session-Id on first connection).
Secret lockstep
All three of these must be consistent or the integration breaks:
| Secret | Where set | What breaks if wrong |
|---|---|---|
MESHI_RUNTIME_API_KEY | meshi-api-* (Fly secret) | Platform → runtime calls get 401 |
GATEWAY_API_KEY | meshi-runtime-ts-* (Fly secret) | Platform → runtime calls get 401 |
BETTER_AUTH_SECRET | Both apps (Fly secret) | Runtime MCP calls get 401 (JWT verify fails) |
MESHI_RUNTIME_API_KEY == GATEWAY_API_KEY (same value on both sides).
Env vars quick reference
On meshi-api-staging / meshi-api-prod
MESHI_RUNTIME_BACKEND=localMESHI_RUNTIME_URL=http://meshi-runtime-ts-staging.internal:42617MESHI_RUNTIME_API_KEY=<shared bearer>BETTER_AUTH_SECRET=<shared JWT signing secret>Legacy fallbacks (still read; migrate to MESHI_RUNTIME_* on next secret rotation):
ZEROCLAW_BACKEND=local # → MESHI_RUNTIME_BACKENDZEROCLAW_URL=... # → MESHI_RUNTIME_URLZEROCLAW_API_KEY=... # → MESHI_RUNTIME_API_KEYOn meshi-runtime-ts-staging / meshi-runtime-ts-prod
PORT=42617HOST=:: # IPv6 for Fly .internal meshGATEWAY_API_KEY=<shared bearer>BETTER_AUTH_SECRET=<same secret as platform, or set JWT_SECRET>MCP_CONFIG_JSON={"servers":[{"name":"meshi-platform","transport":"http","url":"https://api.staging.meshi.io/mcp","headers":{"Authorization":"Bearer mk_<service key>"}}]}OPENAI_API_KEY=<key for agent loop>DATABASE_URL=<neon staging url>AGENT_RUNTIME_SCHEMA=agent-runtimeObservability
Platform side
agent_run table (migration 083) records one row per coach turn. Columns written by the
POST /api/v0/coach/conversations/:id/messages handler:
| Column | Set at | Notes |
|---|---|---|
conversation_id | insert | The conversation this turn belongs to |
status | insert → closeout | running on open; succeeded, failed, or aborted on close |
message_id | closeout | UUID of the persisted assistant message; null if stream produced no content |
model | closeout | Model name as emitted by the runtime; null if the backend did not report it |
tool_call_count | closeout | Distinct tool-call indices observed across all rounds |
error_message | closeout | Set on failed paths only |
finished_at | closeout | Timestamp when the SSE stream closed |
Rows with finished_at IS NULL are abandoned streams (client disconnected before any content arrived).
Runtime side
GET /health — liveness.
GET /metrics — Prometheus-style counters (requires auth — see security-audit.md MED findings).
Diagnosing failures
| Symptom | Most likely cause |
|---|---|
401 on POST /v1/chat/completions | MESHI_RUNTIME_API_KEY ≠ GATEWAY_API_KEY |
| 401 on MCP tool calls inside the agent loop | BETTER_AUTH_SECRET drift between the two apps |
JWT_SECRET not configured in runtime logs | BETTER_AUTH_SECRET or JWT_SECRET not set on runtime app |
MCP server did not return Mcp-Session-Id | MCP transport misconfigured or /mcp endpoint unreachable |
| Platform → runtime connection refused | Runtime HOST=0.0.0.0 instead of :: (IPv4 only, Fly mesh is IPv6) |