Coach + meshi-agent-runtime + MCP — deploy runbook
Coach + meshi-agent-runtime + MCP — deploy runbook
Runbook for deploying the coach chat with its MCP back-channel to staging or production. The agent
runtime source lives at agent-runtime/ (git submodule → github.com/myascendai/meshi-agent-runtime).
The MCP back-channel lets the runtime call platform tools (list_goals, get_brief, etc.) with
per-user identity. Three secrets must be in lockstep across two Fly apps.
Apps involved
| Fly app | Role | Important env |
|---|---|---|
meshi-api-staging | Platform API at api.staging.meshi.io | BETTER_AUTH_SECRET, MESHI_RUNTIME_* (see docs/runtime-swap-to-ts.md) |
meshi-runtime-ts-staging | Agent runtime at meshi-runtime-ts-staging.internal:42617 | GATEWAY_API_KEY, BETTER_AUTH_SECRET, MCP_CONFIG_PATH or MCP_CONFIG_JSON |
Step 1 — Mint the platform-side service API key
The runtime authenticates to /mcp with a meshi:service-scoped API key. The key’s auth_user_id
is the synthetic __mcp_service__ row (already exists on staging Neon). The actual user identity
per call comes from the x-meshi-runtime-user-id JWT, which the platform signs and the runtime
forwards. (x-zeroclaw-user-id is accepted as a legacy fallback.)
-- Run against the staging Neon DB.-- Generate a key and HASH it with sha256 BEFORE inserting.-- The CLI flow:-- RAW="mk_$(openssl rand -hex 32)"-- HASH=$(printf '%s' "$RAW" | shasum -a 256 | awk '{print $1}')-- echo "$RAW" ← this is what goes into the runtime's MCP_CONFIG (Bearer)INSERT INTO api_key (auth_user_id, name, key_hash, key_prefix, scopes)VALUES ( '__mcp_service__', 'meshi-agent-runtime-staging', '<sha256 hex of the raw key>', 'mk_', ARRAY['meshi:service']);Save the raw mk_… token to a password manager — it’s never recoverable from the database.
Step 2 — Sync BETTER_AUTH_SECRET across both apps
The platform signs x-meshi-runtime-user-id JWTs with BETTER_AUTH_SECRET; the runtime verifies
them with the same value (read as JWT_SECRET first, then BETTER_AUTH_SECRET — set either).
Drift = 100% of MCP calls 401.
# Read platform's existing secret (DO NOT print to terminal in shared sessions)PLAT_SECRET=$(fly ssh console -a meshi-api-staging -C 'printenv BETTER_AUTH_SECRET')
# Set the same value on the runtimefly secrets set -a meshi-runtime-ts-staging BETTER_AUTH_SECRET="$PLAT_SECRET"Step 3 — Build the runtime’s MCP_CONFIG and deploy it
Set the entire JSON as a Fly secret:
fly secrets set -a meshi-runtime-ts-staging \ MCP_CONFIG_JSON='{"servers":[{"name":"meshi-platform","transport":"http","url":"https://api.staging.meshi.io/mcp","headers":{"Authorization":"Bearer mk_<from step 1>"}}]}'Then ensure the Dockerfile entrypoint (or a start script) writes it to disk at boot:
echo "$MCP_CONFIG_JSON" > /tmp/mcp-config.jsonexport MCP_CONFIG_PATH=/tmp/mcp-config.jsonexec deno task startDo NOT include a static x-meshi-runtime-user-id header in the config. The runtime injects a
per-user JWT on every MCP call. A static one is an impersonation footgun.
Step 4 — Deploy the runtime
cd agent-runtimegit pull origin mainfly deploy -c fly.staging.toml -a meshi-runtime-ts-stagingWatch logs for:
meshi-ts-runtime listening on :::42617- No
JWT_SECRET not configuredwarnings - No
MCP server "meshi-platform" did not return Mcp-Session-Iderrors
Step 5 — Smoke test on staging
# Pick any user with goals; get their auth_user_id from Neon.USER_ID=<some staging auth_user_id with goals>
curl -sS -X POST \ -H "Authorization: Bearer $MESHI_RUNTIME_API_KEY" \ -H "x-user-id: $USER_ID" \ -H "Content-Type: application/json" \ -d '{"messages":[{"role":"system","content":"Always call mcp_call_tool: server=meshi-platform method=tools/call params={name:list_goals,arguments:{}}. Then list goal titles."},{"role":"user","content":"List my goals."}],"stream":false}' \ https://meshi-runtime-ts-staging.fly.dev/v1/chat/completions \ | jq -r '.choices[0].message.content'Expected: a list of that user’s actual goals from staging Neon. Fail = check secret sync (step 2) and the MCP_CONFIG (step 3).
Notes:
- The platform bearer env var is
MESHI_RUNTIME_API_KEY(primary);ZEROCLAW_API_KEYis a legacy fallback.
Persisted assistant message shape — response_object.tool_events
The coach /conversations/:id/messages POST handler tees the runtime’s SSE stream and accumulates a
structured tool_events: ToolEvent[] array saved on the assistant message’s response_object JSON
column. This lets reloads rebuild the tool-call chrome (cards + result bodies + pre-tool reasoning)
without replaying the runtime.
Any consumer displaying assistant messages (admin viewers, exports, analytics, replay tools) must handle this shape:
type ToolEvent = | { type: "call"; index: number; // monotonic across rounds, dedupe key id?: string; // model's tool_call_id (correlates with results) name?: string; // tool name (e.g. "mcp_call_tool", "list_goals") arguments_json: string; // JSON-encoded args (model-emitted) } | { type: "result"; tool_call_id: string; // matches a `call`'s id content: string; // tool result body (JSON; for MCP, outer envelope wrapping inner text) } | { type: "pre_reasoning"; content: string; // model's "thinking" before invoking a tool };The array is in chronological order. A typical multi-round turn:
[pre_reasoning, call, result, pre_reasoning, call, result] followed by the final prose in
message.content.
Stored only on role='assistant' messages when the runtime emitted at least one event. Older
messages (pre-coach-migration) have response_object = null — consumers must handle both.
Open infrastructure debt
- MCP session map is per-process, in-memory. Scaling the runtime beyond one Fly instance means
sessions scatter across pods. Today
min_machines_running = 2infly.staging.tomlwith round-robin DNS — MCP is stateful so sticky routing would be needed before scaling further. agent_runtable (migration 083 in the platform DB) records token counts, tool-call counts, model name, and status per turn — populate it for runtime-call observability.- Per-tool 15s MCP timeout in the runtime’s MCP client defends against a stuck platform tool
exhausting
MAX_TOOL_ROUNDS. - Static
x-meshi-runtime-user-idin any committedmcp-config.jsonis a single-user impersonation token. Never commit one for staging or prod.
Rollback
fly releases -a meshi-runtime-ts-stagingfly releases rollback <prior-version> -a meshi-runtime-ts-stagingPlatform rollback is not needed — coach endpoints are additive and the stateful MCP handler is backwards-compatible with stateless clients.