Skip to content

Coach + meshi-agent-runtime + MCP — deploy runbook

Coach + meshi-agent-runtime + MCP — deploy runbook

Runbook for deploying the coach chat with its MCP back-channel to staging or production. The agent runtime source lives at agent-runtime/ (git submodule → github.com/myascendai/meshi-agent-runtime).

The MCP back-channel lets the runtime call platform tools (list_goals, get_brief, etc.) with per-user identity. Three secrets must be in lockstep across two Fly apps.

Apps involved

Fly appRoleImportant env
meshi-api-stagingPlatform API at api.staging.meshi.ioBETTER_AUTH_SECRET, MESHI_RUNTIME_* (see docs/runtime-swap-to-ts.md)
meshi-runtime-ts-stagingAgent runtime at meshi-runtime-ts-staging.internal:42617GATEWAY_API_KEY, BETTER_AUTH_SECRET, MCP_CONFIG_PATH or MCP_CONFIG_JSON

Step 1 — Mint the platform-side service API key

The runtime authenticates to /mcp with a meshi:service-scoped API key. The key’s auth_user_id is the synthetic __mcp_service__ row (already exists on staging Neon). The actual user identity per call comes from the x-meshi-runtime-user-id JWT, which the platform signs and the runtime forwards. (x-zeroclaw-user-id is accepted as a legacy fallback.)

-- Run against the staging Neon DB.
-- Generate a key and HASH it with sha256 BEFORE inserting.
-- The CLI flow:
-- RAW="mk_$(openssl rand -hex 32)"
-- HASH=$(printf '%s' "$RAW" | shasum -a 256 | awk '{print $1}')
-- echo "$RAW" ← this is what goes into the runtime's MCP_CONFIG (Bearer)
INSERT INTO api_key (auth_user_id, name, key_hash, key_prefix, scopes)
VALUES (
'__mcp_service__',
'meshi-agent-runtime-staging',
'<sha256 hex of the raw key>',
'mk_',
ARRAY['meshi:service']
);

Save the raw mk_… token to a password manager — it’s never recoverable from the database.

Step 2 — Sync BETTER_AUTH_SECRET across both apps

The platform signs x-meshi-runtime-user-id JWTs with BETTER_AUTH_SECRET; the runtime verifies them with the same value (read as JWT_SECRET first, then BETTER_AUTH_SECRET — set either). Drift = 100% of MCP calls 401.

Terminal window
# Read platform's existing secret (DO NOT print to terminal in shared sessions)
PLAT_SECRET=$(fly ssh console -a meshi-api-staging -C 'printenv BETTER_AUTH_SECRET')
# Set the same value on the runtime
fly secrets set -a meshi-runtime-ts-staging BETTER_AUTH_SECRET="$PLAT_SECRET"

Step 3 — Build the runtime’s MCP_CONFIG and deploy it

Set the entire JSON as a Fly secret:

Terminal window
fly secrets set -a meshi-runtime-ts-staging \
MCP_CONFIG_JSON='{"servers":[{"name":"meshi-platform","transport":"http","url":"https://api.staging.meshi.io/mcp","headers":{"Authorization":"Bearer mk_<from step 1>"}}]}'

Then ensure the Dockerfile entrypoint (or a start script) writes it to disk at boot:

Terminal window
echo "$MCP_CONFIG_JSON" > /tmp/mcp-config.json
export MCP_CONFIG_PATH=/tmp/mcp-config.json
exec deno task start

Do NOT include a static x-meshi-runtime-user-id header in the config. The runtime injects a per-user JWT on every MCP call. A static one is an impersonation footgun.

Step 4 — Deploy the runtime

Terminal window
cd agent-runtime
git pull origin main
fly deploy -c fly.staging.toml -a meshi-runtime-ts-staging

Watch logs for:

  • meshi-ts-runtime listening on :::42617
  • No JWT_SECRET not configured warnings
  • No MCP server "meshi-platform" did not return Mcp-Session-Id errors

Step 5 — Smoke test on staging

Terminal window
# Pick any user with goals; get their auth_user_id from Neon.
USER_ID=<some staging auth_user_id with goals>
curl -sS -X POST \
-H "Authorization: Bearer $MESHI_RUNTIME_API_KEY" \
-H "x-user-id: $USER_ID" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"system","content":"Always call mcp_call_tool: server=meshi-platform method=tools/call params={name:list_goals,arguments:{}}. Then list goal titles."},{"role":"user","content":"List my goals."}],"stream":false}' \
https://meshi-runtime-ts-staging.fly.dev/v1/chat/completions \
| jq -r '.choices[0].message.content'

Expected: a list of that user’s actual goals from staging Neon. Fail = check secret sync (step 2) and the MCP_CONFIG (step 3).

Notes:

  • The platform bearer env var is MESHI_RUNTIME_API_KEY (primary); ZEROCLAW_API_KEY is a legacy fallback.

Persisted assistant message shape — response_object.tool_events

The coach /conversations/:id/messages POST handler tees the runtime’s SSE stream and accumulates a structured tool_events: ToolEvent[] array saved on the assistant message’s response_object JSON column. This lets reloads rebuild the tool-call chrome (cards + result bodies + pre-tool reasoning) without replaying the runtime.

Any consumer displaying assistant messages (admin viewers, exports, analytics, replay tools) must handle this shape:

type ToolEvent =
| {
type: "call";
index: number; // monotonic across rounds, dedupe key
id?: string; // model's tool_call_id (correlates with results)
name?: string; // tool name (e.g. "mcp_call_tool", "list_goals")
arguments_json: string; // JSON-encoded args (model-emitted)
}
| {
type: "result";
tool_call_id: string; // matches a `call`'s id
content: string; // tool result body (JSON; for MCP, outer envelope wrapping inner text)
}
| {
type: "pre_reasoning";
content: string; // model's "thinking" before invoking a tool
};

The array is in chronological order. A typical multi-round turn: [pre_reasoning, call, result, pre_reasoning, call, result] followed by the final prose in message.content.

Stored only on role='assistant' messages when the runtime emitted at least one event. Older messages (pre-coach-migration) have response_object = null — consumers must handle both.

Open infrastructure debt

  1. MCP session map is per-process, in-memory. Scaling the runtime beyond one Fly instance means sessions scatter across pods. Today min_machines_running = 2 in fly.staging.toml with round-robin DNS — MCP is stateful so sticky routing would be needed before scaling further.
  2. agent_run table (migration 083 in the platform DB) records token counts, tool-call counts, model name, and status per turn — populate it for runtime-call observability.
  3. Per-tool 15s MCP timeout in the runtime’s MCP client defends against a stuck platform tool exhausting MAX_TOOL_ROUNDS.
  4. Static x-meshi-runtime-user-id in any committed mcp-config.json is a single-user impersonation token. Never commit one for staging or prod.

Rollback

Terminal window
fly releases -a meshi-runtime-ts-staging
fly releases rollback <prior-version> -a meshi-runtime-ts-staging

Platform rollback is not needed — coach endpoints are additive and the stateful MCP handler is backwards-compatible with stateless clients.