All courses 290 min6 chaptersBuilderanthropic

Production Agents with Claude Agent SDK + MCP Connector

Python or TypeScript developers who have used the Claude Messages API at least once and understand what an API key is. New to the Agent SDK, Managed Agents, and MCP.

What you'll learn
  • Migrate a project from the Claude Code SDK to the Claude Agent SDK without breaking changes
  • Choose between Managed Agents and Agent SDK for a production workload with confidence
  • Wire three MCP servers (stdio + HTTP + SSE) into a single agent with proper auth and error handling
  • Upload, reference, and manage files with the Files API across multi-turn agent sessions
  • Deploy a production agent with structured logging, cost circuit breakers, and observability hooks
  • Persist working context across Claude Code, Codex CLI, Cursor, Gemini CLI, and Agent SDK handoffs
Chapters in this course
What changed when Claude Code SDK became Claude Agent SDK slides35m
Managed Agents beta — when to use it, when to roll your own audio slides45m
MCP connector: orchestrating multi-server agents audio slides50m
Files API + code execution: the complete agent IO surface audio slides45m
Production: deploy + observability + cost controls audio slides45m
Cross-CLI Context Persistence50m
Chapter 1 · 35 min

What changed when Claude Code SDK became Claude Agent SDK

Download slides (.pptx)

The Claude Agent SDK is Anthropic's official library for embedding an autonomous agent loop — including built-in file operations, shell execution, web access, and subagent spawning — directly into a Python or TypeScript application, renamed from the Claude Code SDK in April 2026 alongside the public beta of Claude Managed Agents. This chapter sets up the migration path for Managed Agents, MCP connectors, and production observability.

On April 8, 2026, Anthropic simultaneously shipped the renamed SDK, the Managed Agents REST API, and an explicit MCP connector guide. The rename wasn't a rebrand of the package alone; it came with a branding prohibition — partners may no longer call their products "Claude Code" or use Claude Code ASCII art — and with an SDK migration guide that names package changes, option-type changes, and configuration-loading changes you need to audit before shipping [6].

> Prerequisites: None — this is Chapter 1. > > Time: 35 minutes > > Learning objectives: By the end of this chapter you can install the renamed SDK, update your imports, run your first query() call, and explain what the rename means for your production roadmap.

Key facts

  1. The npm package changed from @anthropic-ai/claude-code to @anthropic-ai/claude-agent-sdk; the PyPI package changed from claude-code-sdk to claude-agent-sdk [6].
  2. The options type/class changed from ClaudeCodeOptions to ClaudeAgentOptions in both TypeScript and Python examples [6].
  3. The TypeScript SDK bundles a native Claude Code binary for your platform as an optional dependency — you no longer need a separate Claude Code installation [1].
  4. Authentication on Amazon Bedrock, Google Vertex AI, and Microsoft Azure Foundry is controlled entirely by environment variables (CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX, CLAUDE_CODE_USE_FOUNDRY), not constructor arguments [1].
  5. The branding guidelines explicitly prohibit partners from using the names "Claude Code," "Claude Code Agent," or Claude Code-branded ASCII art — a signal that the SDK is now a platform, not a feature of a specific product [1].
  6. Session state is stored as JSONL on your filesystem and can be resumed by passing resume: sessionId in your options [1].

The rename isn't cosmetic

Most developers saw the April 2026 announcement and ran npm install @anthropic-ai/claude-agent-sdk. Done, right? Not quite.

The rename matters strategically because it de-couples the SDK from Claude Code the developer product. Claude Code is a terminal app; the Claude Agent SDK is now a general-purpose platform library. By prohibiting partners from calling their products "Claude Code," Anthropic is drawing a hard line: Claude Code is the consumer app, the Agent SDK is the infrastructure you build on. If you're building a product on top of this SDK, that distinction matters for your own naming and positioning.

There's also a real technical signal in the migration guide: configuration that Claude Code users may have treated as implicit now deserves an explicit audit. Your package manager can make the import rename look trivial, but stale settings, old package names, and different defaults are what usually break production agents.

- The npm package renamed from `@anthropic-ai/claude-code` to `@anthropic-ai/claude-agent-sdk`; the PyPI package renamed from `claude-code-sdk` to `claude-agent-sdk`.
- The rename signals a strategic separation: Claude Code is the end-user terminal app; the Claude Agent SDK is infrastructure for custom autonomous agents.
- Partners may no longer use the names "Claude Code" or "Claude Code Agent" in their product naming — the SDK is now a platform, not a product feature.

Installing the renamed SDK

TypeScript

```bash # Remove the old package npm uninstall @anthropic-ai/claude-code

Python

```bash # Remove the old package pip uninstall claude-code-sdk

After installing, verify the version:

```bash # TypeScript: check package.json cat package.json | grep claude-agent-sdk # → "@anthropic-ai/claude-agent-sdk": "^0.1.0" or later

Updating your imports

Every import in your existing code needs to change. This is a search-and-replace operation, not a logic change.

TypeScript — before

import { query } from "@anthropic-ai/claude-code";
import type { ClaudeCodeOptions } from "@anthropic-ai/claude-code";

TypeScript — after

import { query } from "@anthropic-ai/claude-agent-sdk";
import type { ClaudeAgentOptions } from "@anthropic-ai/claude-agent-sdk";

Note: the options type renamed from ClaudeCodeOptions to ClaudeAgentOptions.

Python — before

from claude_code_sdk import query, ClaudeCodeOptions

Python — after

from claude_agent_sdk import query, ClaudeAgentOptions

The query() API in 2 minutes

The core API hasn't changed between SDK versions. query() is an async generator that yields message objects as the agent works through a task. The simplest possible call:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions

async def main(): async for message in query( prompt="What files are in this directory?", options=ClaudeAgentOptions(allowed_tools=["Bash", "Glob"]), ): if hasattr(message, "result"): print(message.result)

asyncio.run(main()) ```

```typescript import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({ prompt: "What files are in this directory?", options: { allowedTools: ["Bash", "Glob"] } })) { if ("result" in message) console.log(message.result); } ```

The generator yields several message types. The ones you'll care about most:

TypeWhen it firesWhat it contains
SystemMessage (subtype init)First, before any workSession ID, connected MCP servers
AssistantMessageAfter each model turnClaude's text + tool calls
ToolResultMessageAfter each tool executionThe tool's output
ResultMessageLastFinal answer, token usage, session ID
Try this · claude-sonnet-4-6

What is the current working directory? List the files in it.

Show expected output
The agent calls Bash with `pwd` and `ls`, then returns the directory path and a list of files. You see AssistantMessage objects containing tool_use blocks, followed by ToolResultMessage objects with the shell output, ending with a ResultMessage containing the synthesized answer.
- `query()` is an async generator that yields `SystemMessage`, `AssistantMessage`, `ToolResultMessage`, and `ResultMessage` objects as the agent works.
- The `ResultMessage` is the final event and contains the synthesized answer, token usage, and session ID.
- The options type renamed from `ClaudeCodeOptions` to `ClaudeAgentOptions`; update all imports before shipping to avoid runtime errors.

Capturing and resuming sessions

Session continuity is one of the most underused features of the SDK. When the SystemMessage with subtype init arrives, grab the session_id:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions, SystemMessage, ResultMessage

session_id = None

async def first_query(): global session_id async for message in query( prompt="Read auth.py and tell me what it does", options=ClaudeAgentOptions(allowed_tools=["Read", "Glob"]), ): if isinstance(message, SystemMessage) and message.subtype == "init": session_id = message.data["session_id"] if isinstance(message, ResultMessage): print(message.result)

async def follow_up(): async for message in query( prompt="Now find every file that imports from auth.py", options=ClaudeAgentOptions(resume=session_id), ): if isinstance(message, ResultMessage): print(message.result)

async def main(): await first_query() await follow_up() # Claude already knows auth.py's contents

asyncio.run(main()) ```

The resume option re-opens the existing JSONL session file on your filesystem. Claude picks up with full context from the previous turn — no re-reading files, no redundant tool calls.

Built-in tools: the complete list

The Agent SDK ships ten built-in tools. You must declare which ones you allow explicitly — there's no "allow all built-ins" shortcut:

ToolWhat it doesSafe to allow broadly?
ReadRead any file in the working directoryYes
WriteCreate new filesWith caution
EditMake precise edits to existing filesWith caution
BashRun terminal commands, scripts, git operationsNo — scope carefully
MonitorWatch a background script, react to each stdout lineYes
GlobFind files by pattern (**/*.ts, src/**/*.py)Yes
GrepSearch file contents with regexYes
WebSearchSearch the web for current informationYes
WebFetchFetch and parse web page contentYes
AskUserQuestionAsk the user clarifying questions with multiple choiceYes

The Bash tool is the one to be careful with. In a CI context with a fully sandboxed container it's fine. On a developer workstation, Bash can delete files, install packages, and run arbitrary code. If you don't need shell execution, don't include it.

- The Agent SDK ships ten built-in tools; you must declare each one explicitly in `allowed_tools` — there is no "allow all" shortcut.
- `Bash` is the highest-risk tool: on a developer workstation it can delete files, install packages, and run arbitrary code; omit it unless shell execution is explicitly required.
- Session JSONL files are stored under `~/.claude/sessions/` by default; in production, set `CLAUDE_SESSIONS_DIR` to a path with an appropriate retention policy.

Multi-cloud authentication

If you run behind Bedrock, Vertex AI, or Azure, the SDK respects environment variables — you don't change any code:

```bash # Amazon Bedrock export CLAUDE_CODE_USE_BEDROCK=1 # Then configure AWS credentials normally aws configure # or use IAM roles

The ANTHROPIC_API_KEY environment variable is still checked first. If it's set, it wins over cloud provider credentials.

Try this · claude-sonnet-4-6

Find all TypeScript files in this project that import from '@anthropic-ai/claude-code' and list their paths.

Show expected output
The agent uses Grep with pattern '@anthropic-ai/claude-code' and glob '**/*.ts', returns a list of file paths that still use the old import. This is the first step of a real migration audit.

Hands-on exercise

Migrate a code-reviewer agent to the Claude Agent SDK.

Start with this minimal Claude Code SDK agent (or your own existing code):

```python # reviewer_old.py — uses the old SDK from claude_code_sdk import query, ClaudeCodeOptions

async def review_code(file_path: str): async for message in query( prompt=f"Review {file_path} for bugs and code quality issues", options=ClaudeCodeOptions( allowed_tools=["Read", "Glob", "Grep"], ), ): if hasattr(message, "result"): print(message.result) ```

Your tasks: 1. Install claude-agent-sdk (Python) or @anthropic-ai/claude-agent-sdk (TypeScript) 2. Update the import to from claude_agent_sdk import query, ClaudeAgentOptions 3. Rename ClaudeCodeOptions to ClaudeAgentOptions 4. Add session capture: print the session_id from the SystemMessage 5. Run the agent against any .py or .ts file in your project

Verification: The agent runs without import errors, produces a code review, and prints a session ID that looks like sess_01XxXxxXx….

Estimated time: 15 minutes

Knowledge check1 of 1
You're migrating a Python project from the Claude Code SDK to the Claude Agent SDK. Which of the following changes is required?

Subagents: orchestrating specialized agents

One of the most powerful Agent SDK features is the ability to spawn specialized subagents from within a parent agent. Subagents handle focused subtasks and report back results, enabling you to build multi-agent pipelines entirely in Python or TypeScript:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition, ResultMessage

async def review_and_document(codebase_path: str): """Parent agent that delegates to two specialists.""" async for message in query( prompt=f"Use the code-reviewer agent to review {codebase_path}, then use the doc-writer agent to create a README.", options=ClaudeAgentOptions( allowed_tools=["Read", "Glob", "Grep", "Write", "Agent"], agents={ "code-reviewer": AgentDefinition( description="Expert code reviewer for quality and security.", prompt="Analyze code quality, identify bugs, suggest improvements.", tools=["Read", "Glob", "Grep"], ), "doc-writer": AgentDefinition( description="Technical writer who creates clear documentation.", prompt="Write clear, accurate technical documentation.", tools=["Read", "Write"], ), }, ), ): if isinstance(message, ResultMessage): print(message.result)

asyncio.run(review_and_document("./src")) ```

The Agent tool must be in allowedTools for the parent to spawn subagents. Messages from within a subagent's context include a parent_tool_use_id field — use this to correlate subagent output back to the parent's tool call in your audit logs.

Note the pattern: the parent doesn't implement the reviewer or writer logic itself. It delegates, which keeps the parent's context window focused on orchestration rather than implementation. This is the right architecture for agents with more than two or three distinct skill sets.

- Subagents receive focused tasks via `AgentDefinition` with their own description, prompt, and tool list — keeping the parent's context window focused on orchestration.
- The `parent_tool_use_id` field in subagent messages correlates subagent output back to the parent tool call in audit logs.
- The `Agent` tool must appear in the parent's `allowedTools`; omitting it silently prevents subagent spawning.

Configuration file loading order

The SDK loads configuration from multiple sources, applied in a defined order. Understanding this prevents "why isn't my setting taking effect?" debugging sessions:

~/.claude/settings.json          # global user settings (lowest priority)
~/.claude/CLAUDE.md              # global system prompt additions
.claude/settings.json            # project settings
.claude/CLAUDE.md / CLAUDE.md    # project system prompt
inline ClaudeAgentOptions()      # runtime options (highest priority)

Later sources override earlier ones. This means you can set safe defaults globally and override them per-project or per-run without touching the global config.

To restrict which sources load — for example, in a CI environment where you don't want the developer's ~/.claude settings to affect the build — use settingSources:

options = ClaudeAgentOptions(
    allowed_tools=["Read", "Glob", "Grep"],
    setting_sources=["project"],  # only load .claude/ in the current project
)
const options = {
  allowedTools: ["Read", "Glob", "Grep"],
  settingSources: ["project"],  // ignores ~/.claude entirely
};

This is important for reproducibility: a CI agent should behave identically regardless of what's installed in the developer's home directory.

Skills and slash commands

The Agent SDK supports two additional configuration primitives that most tutorials skip: Skills and slash commands. Both are defined in Markdown files and loaded from the project .claude/ directory.

Skills are specialist instructions that extend the agent's capabilities for specific domains. A SKILL.md file at .claude/skills/<name>/SKILL.md is loaded into context when the agent needs that capability. This is how the Koenig AI Academy's own agents are extended — each agent has skills for its specialized workflows without bloating the base system prompt.

Slash commands are shorthand for common task templates. A review.md file at .claude/commands/review.md becomes a /review command that the agent can invoke. In the SDK context, you can trigger slash commands by starting a prompt with /.

These are the same skill and command systems that power Claude Code's daily usage, now fully available to your programmatic agents.

What's next

In Chapter 2 you'll meet Managed Agents — Anthropic's hosted agent harness that launched the same day as this SDK rename. You'll learn the decision rule for when to let Anthropic run your agent infrastructure vs running it yourself, and you'll wire up your first session with full SSE streaming. The pricing model has a non-obvious trap that most tutorials skip: we'll name it explicitly.

References

[1] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [2] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [3] Claude Agent SDK TypeScript releases — https://github.com/anthropics/claude-agent-sdk-typescript/releases · retrieved 2026-05-27 [4] Claude Agent SDK MCP documentation — https://code.claude.com/docs/en/agent-sdk/mcp · retrieved 2026-04-30 [5] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30 [6] Claude Agent SDK migration guide — https://docs.claude.com/en/docs/claude-code/sdk/migration-guide · retrieved 2026-05-27

Chapter 2 · 45 min

Managed Agents beta — when to use it, when to roll your own

Slide deck · PDF · 8 MB
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

Claude Managed Agents is Anthropic's hosted REST API for running Claude as an autonomous agent in a sandboxed cloud environment — launched in public beta on April 8, 2026, requiring the managed-agents-2026-04-01 beta header. Where the Agent SDK runs the agent loop in your own process, Managed Agents runs it in Anthropic's infrastructure: you send user messages, you stream results back. Anthropic handles the container, tool execution, and session persistence [1]. Verify current pricing in the official quickstart before launch [2].

Key facts

  1. All API requests require the managed-agents-2026-04-01 beta header [1].
  2. Pricing: Managed Agents runtime plus standard Claude token costs; verify current rates before launch [2].
  3. Rate limits: 300 RPM for create endpoints (agents, sessions, environments); 600 RPM for read endpoints [1].
  4. agent_toolset_20260401 enables Bash, file ops, web search, and MCP; outcomes and multiagent are research preview requiring separate access [1].

The four core concepts

Agent — saved configuration (model, system prompt, tools). Create once, reuse by agent.id. Think Docker image: build once, run many sessions from it.

Environment — cloud container template: packages, network rules. cloud config with unrestricted or restricted networking.

Session — one running agent+environment instance per task. Not reused; start a new one when the task is done.

Events — SSE stream: you send user.message; agent emits agent.message, agent.tool_use, then session.status_idle when done.

- Managed Agents uses four primitives: Agent (saved config), Environment (sandbox template), Session (running instance per task), and Events (SSE message stream).
- Sessions are not reused — one session equals one task; when the task is done, start a new session for the next task.
- Agent and Environment IDs are stable and should be created once and reused; only the Session is created per-task to avoid hitting the 300 create-requests-per-minute rate limit.

Creating your first agent

Install the Anthropic SDK (Managed Agents uses the standard client, not the Agent SDK):

pip install anthropic  # Python
npm install @anthropic-ai/sdk  # TypeScript

Create an agent once — save the returned agent.id:

```python from anthropic import Anthropic

client = Anthropic() # reads ANTHROPIC_API_KEY from env

agent = client.beta.agents.create( name="Data Analyst", model="claude-opus-4-7", system="You are a data analyst. When given a dataset, summarize it with statistics and key insights.", tools=[ {"type": "agent_toolset_20260401"}, # enables Bash, file ops, web search ], )

print(f"Agent ID: {agent.id}") # save this print(f"Agent version: {agent.version}") ```

```typescript import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const agent = await client.beta.agents.create({ name: "Data Analyst", model: "claude-opus-4-7", system: "You are a data analyst. When given a dataset, summarize it with statistics and key insights.", tools: [{ type: "agent_toolset_20260401" }], });

console.log(Agent ID: ${agent.id}); ```

Creating an environment

```python environment = client.beta.environments.create( name="analyst-env", config={ "type": "cloud", "networking": {"type": "unrestricted"}, # allows outbound web access }, )

print(f"Environment ID: {environment.id}") # save this too ```

The environment is a one-time setup. Use unrestricted for most workloads; restricted blocks outbound access for sensitive data.

Starting a session and streaming events

Open the stream, then immediately send the first user message:

```python import asyncio from anthropic import Anthropic

client = Anthropic()

for event in stream: match event.type: case "agent.message": for block in event.content: print(block.text, end="", flush=True) case "agent.tool_use": print(f"\n[Tool: {event.name}]", flush=True) case "session.status_idle": print("\n\n[Session complete]") break ```

```typescript const session = await client.beta.sessions.create({ agent: agentId, environment_id: environmentId, title: "Analyze Q1 sales data", });

const stream = await client.beta.sessions.events.stream(session.id);

await client.beta.sessions.events.send(session.id, { events: [{ type: "user.message", content: [{ type: "text", text: "Sales data: [120, 340, 290, 410, 380]. Compute mean, median, std dev. Show Python code." }] }] });

for await (const event of stream) { if (event.type === "agent.message") { for (const block of event.content) process.stdout.write(block.text); } else if (event.type === "agent.tool_use") { console.log(\n[Tool: ${event.name}]); } else if (event.type === "session.status_idle") { console.log("\n[Session complete]"); break; } } ```

Try this · claude-opus-4-7

You are running inside a Managed Agents session. The user has sent: 'Here is some sales data as a Python list: [120, 340, 290, 410, 380]. Compute mean, median, and standard deviation. Show your work i…

Show expected output
Claude emits an agent.message with a plan, then an agent.tool_use event for Bash, then another agent.message with results like: mean=308.0, median=340.0, std_dev=109.3. The session then emits session.status_idle.
- Open the SSE stream before sending the first `user.message` event; events arrive in real time, including `agent.tool_use` calls and `agent.message` responses.
- The `session.status_idle` event is the canonical signal that the agent has finished working; break the stream loop when you see it.
- Always close idle sessions explicitly with `client.beta.sessions.update(session.id, status="completed")` to avoid ongoing runtime cost exposure.

Session lifecycle and cost

Managed Agents cost depends on session lifetime, not just active generation time. A session left idle after session.status_idle can accrue runtime exposure — verify current billing rules in the official quickstart [2]. Never use Managed Agents for polling loops; use the Agent SDK with a cron job instead. For cost circuit breakers and audit hooks that protect production sessions, see Chapter 5.

Always close idle sessions explicitly:

client.beta.sessions.update(session.id, status="completed")

Decision rule: Managed Agents vs Agent SDK

ScenarioUse
Long-running task (>5 min), async, need cloud sandboxManaged Agents
Agent needs to operate on files on your own server/filesystemAgent SDK
You need custom in-process tool execution (Python functions)Agent SDK
You're prototyping locally; no cloud infra budget yetAgent SDK
You need to serve many concurrent agent sessions to end usersManaged Agents (they handle the infrastructure)

<Callout type="hot"> Managed Agents is in public beta as of April 2026. The managed-agents-2026-04-01 beta header is required on every request. Behaviors can be refined between releases. Two capabilities — outcomes and multiagent — are in research preview and require a separate access request at claude.com/form/claude-managed-agents. Do not build production features that depend on research-preview capabilities without direct Anthropic support. </Callout>

- Use Managed Agents for long-running (>5 min), async tasks needing a cloud sandbox; use the Agent SDK for short, stateless, webhook-triggered, or locally-executed work.
- Runtime pricing has two components: Managed Agents runtime plus standard Claude token costs — verify current rates in the official quickstart before launch.
- The `managed-agents-2026-04-01` beta header is required on every request; outcomes and multiagent are in research preview and require a separate access request.

Hands-on exercise

Ship a Managed Agents session streaming a data analysis task to your terminal.

  1. Create agent: model: "claude-opus-4-7", tools: [{ type: "agent_toolset_20260401" }]
  2. Create environment: type: "cloud", networking: { type: "unrestricted" }
  3. Create session; send: "Fetch https://jsonplaceholder.typicode.com/todos (10 items), filter completed, print titles. Run it."
  4. Print tool name per agent.tool_use, text per agent.message

Verify: At least one [Tool: bash] line, ending with [Session complete]. Est. time: 20 min

Rate limits

The 300 RPM create limit is shared across agent, environment, and session creates. Pre-create agents and environments once — only sessions are per-task:

```python # Create once, store these IDs AGENT_ID = "agt_01XxXxxXx" # created once, reused forever ENVIRONMENT_ID = "env_01YyYyyYy" # created once, reused forever

What's next

Chapter 3 covers MCP tool servers — three transport modes and the permission grants that make them work.

References

[1] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30 [2] Claude Managed Agents Quickstart — https://platform.claude.com/docs/en/managed-agents/quickstart · retrieved 2026-04-30 [3] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [4] Managed Agents Beta Header Documentation — https://platform.claude.com/docs/en/api/beta-headers · retrieved 2026-05-14 [5] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [6] Model Context Protocol introduction — https://modelcontextprotocol.io/docs/getting-started/intro · retrieved 2026-05-14

Chapter 3 · 50 min

MCP connector: orchestrating multi-server agents

Slide deck · PDF · 8 MB
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

The MCP connector in the Claude Agent SDK attaches external tool servers — databases, APIs, browsers — to an agent at runtime. Three transport modes (stdio, HTTP, SSE) handle connection management, tool discovery, and error signaling automatically [1]. For a breakdown of which community servers teams are actually deploying, see MCP server adoption 2026.

Key facts

  1. MCP tools are named mcp__<server-name>__<tool-name> — e.g., server "github" + tool list_issues = mcp__github__list_issues [1].
  2. MCP tools need explicit allowedTools grants; permissionMode: "acceptEdits" does NOT cover MCP [1].
  3. stdio: local process; HTTP: stateless remote; SSE: streaming remote. Default stdio timeout: 60 seconds [1].
  4. Tool search is enabled by default — withholds tool definitions from context and loads only what's needed per turn [1].

The MCP naming convention

Given mcpServers key "github", every tool is prefixed mcp__github__. Example:

mcp__github__list_issues
mcp__github__search_issues
mcp__github__create_issue
mcp__github__get_pull_request

mcp__github__* allows all tools from the server; mcp__github__list_issues allows only that one.

- MCP tools follow the naming pattern `mcp__<server-name>__<tool-name>` where the server name is the key used in `mcpServers`, not the package name.
- Use `mcp__<server>__*` wildcards during development; narrow to specific tool names in production to minimize blast radius.
- All MCP tools require explicit `allowedTools` grants — `permissionMode: "acceptEdits"` does not auto-approve MCP tool calls.

The three transport types

stdio — local process servers

stdio is the most common transport for development and for community-published servers on npm or PyPI. The SDK spawns a child process and communicates over stdin/stdout.

```python from claude_agent_sdk import query, ClaudeAgentOptions

options = ClaudeAgentOptions( mcp_servers={ "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": {"GITHUB_TOKEN": "ghp_xxxxxxxxxxxx"}, } }, allowed_tools=["mcp__github__list_issues", "mcp__github__search_issues"], )

async for message in query( prompt="List the 5 most recent open issues in anthropics/claude-code", options=options, ): if hasattr(message, "result"): print(message.result) ```

```typescript import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({ prompt: "List the 5 most recent open issues in anthropics/claude-code", options: { mcpServers: { github: { command: "npx", args: ["-y", "@modelcontextprotocol/server-github"], env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN } } }, allowedTools: ["mcp__github__list_issues", "mcp__github__search_issues"] } })) { if (message.type === "result" && message.subtype === "success") { console.log(message.result); } } ```

HTTP — stateless remote servers

Use HTTP for cloud-hosted servers that expose a standard MCP endpoint. No child process, no local installation required:

options = ClaudeAgentOptions(
    mcp_servers={
        "claude-code-docs": {
            "type": "http",
            "url": "https://docs.anthropic.com/en/docs/claude-code/sdk/sdk-mcp",
        }
    },
    allowed_tools=["mcp__claude-code-docs__*"],
)
options = {
  mcpServers: {
    "remote-api": {
      type: "http",
      url: "https://api.yourcompany.com/mcp",
      headers: {
        Authorization: `Bearer ${process.env.API_TOKEN}`
      }
    }
  },
  allowedTools: ["mcp__remote-api__*"]
}

SSE — streaming remote servers

SSE is the right transport when the remote server needs to push events as it processes (e.g., long-running queries, real-time data feeds):

options = ClaudeAgentOptions(
    mcp_servers={
        "analytics-stream": {
            "type": "sse",
            "url": "https://analytics.yourcompany.com/mcp/sse",
            "headers": {"Authorization": f"Bearer {os.environ['ANALYTICS_TOKEN']}"},
        }
    },
    allowed_tools=["mcp__analytics-stream__*"],
)

The SDK transparently handles SSE reconnection — you don't need to manage the event stream yourself.

Orchestrating three servers in one agent

Multiple servers with different transports go in one mcpServers dict:

```python import asyncio import os from claude_agent_sdk import ( query, ClaudeAgentOptions, SystemMessage, ResultMessage, AssistantMessage )

async def investigate_issue(issue_ref: str, db_connection: str): """Pull a GitHub issue, query related DB records, write a summary.""" options = ClaudeAgentOptions( mcp_servers={ # stdio: GitHub MCP server "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": {"GITHUB_TOKEN": os.environ["GITHUB_TOKEN"]}, }, # stdio: Postgres MCP server "postgres": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres", db_connection], }, # HTTP: Cloud docs server "docs": { "type": "http", "url": "https://docs.anthropic.com/en/docs/claude-code/sdk/sdk-mcp", }, }, allowed_tools=[ "mcp__github__get_issue", "mcp__github__list_comments", "mcp__postgres__query", # read-only "mcp__docs__*", # all doc tools ], )

prompt = ( f"1. Fetch the GitHub issue at {issue_ref}. " "2. Query the postgres DB for any records mentioning the issue number. " "3. Look up relevant documentation from the docs server. " "4. Write a one-paragraph summary of what the issue is about and whether the DB has related data." )

async for message in query(prompt=prompt, options=options): # Verify all three servers connected on the first message if isinstance(message, SystemMessage) and message.subtype == "init": servers = message.data.get("mcp_servers", []) for server in servers: status = server.get("status") name = server.get("name") if status != "connected": print(f"WARNING: {name} failed to connect — {server}") # Show which MCP tools are being called if isinstance(message, AssistantMessage): for block in message.content: if hasattr(block, "name") and block.name.startswith("mcp__"): print(f"[MCP call: {block.name}]") if isinstance(message, ResultMessage) and message.subtype == "success": print(message.result)

asyncio.run(investigate_issue( issue_ref="anthropics/claude-code#1234", db_connection=os.environ["DATABASE_URL"], )) ```

- Multiple MCP servers with different transport types (stdio, HTTP, SSE) can be configured in a single `mcpServers` dict; the agent uses whichever tools match the task.
- Check the `mcp_servers` list in the `SystemMessage` init event before the agent starts work to catch connection failures before tokens are wasted.
- Never hard-code secrets in `mcpServers.env` — use `os.environ["KEY"]` or `process.env.KEY` to pull credentials from environment variables.

Why permissionMode: "acceptEdits" is not enough

The Agent SDK has three permission modes:

ModeWhat it auto-approvesAuto-approves MCP?
defaultNothing — every tool call prompts for approvalNo
acceptEditsFile edit and filesystem Bash commandsNo
bypassPermissionsEverything including MCPYes (but dangerous)

acceptEdits does not cover MCP. The agent sees the tools but refuses to call them without explicit grants:

```python # WRONG — permissionMode doesn't cover MCP options = ClaudeAgentOptions( permission_mode="acceptEdits", mcp_servers={"github": github_config}, )

bypassPermissions disables all safety checks — do not use it to work around missing allowedTools. The complete production-safe permission model — combining allowedTools, permissionMode, and cost circuit breakers — is detailed in Chapter 5.

Detecting connection failures

The SystemMessage with subtype init arrives before the agent does any work. Check its mcp_servers list — servers fail silently otherwise:

async for message in query(prompt=..., options=options):
    if isinstance(message, SystemMessage) and message.subtype == "init":
        failed = [
            s for s in message.data.get("mcp_servers", [])
            if s.get("status") != "connected"
        ]
        if failed:
            # Abort or handle gracefully before the agent wastes tokens
            raise RuntimeError(f"MCP servers failed to connect: {failed}")
for await (const message of query({ prompt, options })) {
  if (message.type === "system" && message.subtype === "init") {
    const failed = message.mcp_servers.filter(s => s.status !== "connected");
    if (failed.length > 0) {
      throw new Error(`MCP servers failed: ${JSON.stringify(failed)}`);
    }
  }
}

Common failure causes by transport:

  • stdio: npx not on PATH, package not published, missing env vars
  • HTTP: URL unreachable, invalid SSL certificate, wrong endpoint path
  • SSE: CORS headers missing on the server, auth token expired

Pre-warm slow stdio servers before querying to avoid the 60-second connection timeout.

- Check the `mcp_servers` list in the `SystemMessage` init event before the agent does any work — servers fail silently if you don't inspect this event.
- The three most common stdio failure causes are: `npx` not on PATH, missing environment variables, and servers that take longer than 60 seconds to start.
- Pre-warm slow server processes before starting a query to avoid the default 60-second connection timeout.

Project-level config with .mcp.json

Put shared servers in .mcp.json at the project root — the SDK loads it automatically:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "${DATABASE_URL}"]
    }
  }
}

${VAR} expands environment variables at load time — credentials stay out of code.

Tool search for large tool sets

Tool search is enabled by default: the SDK withholds all tool definitions from context and loads only those relevant to each turn via vector similarity search. With 200 tools across servers, this prevents context exhaustion before any work begins. Disable per-server via mcpServers config if a server's tools always need to be in context.

- Tool search is enabled by default; it withholds all tool definitions from context and loads only tools relevant to each turn using vector similarity search.
- Without tool search, a system with 200 MCP tools sends every definition to Claude on every turn, consuming large amounts of context window before any work begins.
- Project-level `.mcp.json` files keep MCP config declarative and version-controllable; use `${VAR}` syntax for environment variable expansion.

Hands-on exercise

Wire GitHub (stdio) + Postgres (stdio) + Claude Code docs (HTTP) into one agent.

Setup: GITHUB_TOKEN (repo:read), DATABASE_URL (any Postgres instance).

Prompt: "Get the README from anthropics/claude-code. Check for an 'issues' table in postgres. Look up 'hooks' in the docs. Write a three-sentence summary."

Verify: init shows all 3 servers connected; at least 2 different mcp__* tool calls appear. Est. time: 25 min

Knowledge check1 of 1
Your agent is configured with `permissionMode: 'acceptEdits'` and an MCP server named `db`. You've added the server to `mcpServers` but NOT listed any MCP tools in `allowedTools`. What happens when Claude tries to call `mcp__db__query`?

What's next

Chapter 4 covers the Files API and code execution tool — upload documents once, reference by file_id, generate and download chart output.

References

[1] Agent SDK MCP Connector — https://code.claude.com/docs/en/sdk/sdk-mcp · retrieved 2026-06-14 [2] Model Context Protocol specification — https://modelcontextprotocol.io/docs/getting-started/intro · retrieved 2026-04-30 [3] MCP server registry — https://github.com/modelcontextprotocol/servers · retrieved 2026-04-30 [4] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [5] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [6] MCP OAuth 2.1 specification — https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization · retrieved 2026-04-30

Chapter 4 · 45 min

Files API + code execution: the complete agent IO surface

Slide deck · PDF · 11 MB
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

The Anthropic Files API lets you upload a file once (up to 500 MB), receive a persistent file_id, and reference it across multiple Messages calls without retransmitting bytes. The savings are bandwidth and latency — you still pay full input tokens each time a file_id appears in a Messages request [1]. This chapter covers the complete IO surface: Files API for document persistence, code execution for computation, and downloading generated artifacts.

Key facts

  1. Beta header files-api-2025-04-14 required on every request [1].
  2. Max file size: 500 MB; workspace storage: 500 GB per org [1].
  3. Storage operations (upload, download, list, delete) are free; file content is billed as input tokens on each Messages reference [1].
  4. Code execution billed as container runtime (5-min minimum) plus normal token costs; verify current rate [7].
  5. Files API: not eligible for ZDR; not available on Bedrock or Vertex AI; any workspace API key can delete any file [1].
  6. You can only download files created by code execution or skills — not files you uploaded [1].

Content block types by file format

Each file type maps to a specific content block — using the wrong one returns a 400 error:

File typeMIME typeContent blockUse case
PDFapplication/pdfdocumentDocument analysis, citations
Plain texttext/plaindocumentLogs, markdown, config files
JPEG, PNG, GIF, WebPimage/*imageVisual analysis, screenshots
CSV, datasets, binariesvariescontainer_uploadCode execution, data analysis

For .docx, .xlsx, .md: convert to plain text or PDF first.

- PDFs and plain text use the `document` content block type; images use `image`; files passed to code execution use `container_upload` — using the wrong type returns a 400 error.
- The Files API beta header `files-api-2025-04-14` is required on every request.
- Maximum file size is 500 MB per file; total workspace storage is 500 GB per organization.

Uploading files

Install the Anthropic SDK (not the Agent SDK):

pip install anthropic

Upload a PDF and an image:

```python from anthropic import Anthropic

client = Anthropic()

```typescript import Anthropic, { toFile } from "@anthropic-ai/sdk"; import fs from "fs";

const anthropic = new Anthropic();

// Upload a PDF const pdfFile = await anthropic.beta.files.upload({ file: await toFile( fs.createReadStream("quarterly_report.pdf"), undefined, { type: "application/pdf" } ), }); console.log(PDF file_id: ${pdfFile.id}); ```

The returned file_id is permanent until you delete it. Store it in your database alongside the document metadata.

Referencing files in Messages calls

Once uploaded, reference the file_id using the appropriate content block type. You don't need the file's bytes — just the ID:

```python # Three queries against the same PDF — only one upload needed questions = [ "What were the total revenues in Q1?", "List the top 3 risk factors mentioned in this report.", "What is management's outlook for Q2?", ]

for question in questions: response = client.beta.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[{ "role": "user", "content": [ {"type": "text", "text": question}, { "type": "document", "source": { "type": "file", "file_id": pdf_file.id, }, "citations": {"enabled": True}, # request inline citations }, ], }], betas=["files-api-2025-04-14"], ) print(f"\n{question}") print(response.content[0].text) ```

For images, use the image content block type:

response = client.beta.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=512,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe what this chart shows."},
            {
                "type": "image",
                "source": {
                    "type": "file",
                    "file_id": image_file.id,
                },
            },
        ],
    }],
    betas=["files-api-2025-04-14"],
)

The billing reality

Storage operations are free. Every Messages call that references a file_id bills the file content as input tokens — 100 queries against the same PDF cost 100× the document's token price. Code execution adds container runtime cost on top. Use extended prompt caching (1-hour TTL) when querying the same document many times in one session to drop repeated calls to ~10% of full input price. For session-level cost controls and PreToolUse circuit breakers, see Chapter 5.

- File storage operations (upload, download, list, metadata, delete) are free; file content is billed as input tokens every time a `file_id` is referenced in a Messages request.
- The "upload once" pitch saves bandwidth and latency but not token cost — 100 queries against the same file cost 100× the document's token price.
- Enable 1-hour extended prompt caching when running many queries against the same document in one session to reduce per-call costs to approximately 10% of full input price.

Code execution with the Files API

Unlike MCP tool servers in Chapter 3 — which connect to external services — code execution runs within Anthropic's infrastructure. Pass files via container_upload blocks, run code, and download output files:

```python # Upload a dataset for code execution with open("sales_data.csv", "rb") as f: dataset = client.beta.files.upload( file=("sales_data.csv", f, "text/plain"), )

Now download the generated chart:

# Download the generated chart
chart_content = client.beta.files.download(output_file_id)
chart_content.write_to_file("monthly_totals.png")
print("Chart downloaded to monthly_totals.png")
Try this · claude-sonnet-4-5

I have a CSV with columns: month, product, revenue. Using the code execution tool, compute the top 3 products by total revenue and create a horizontal bar chart. Return the file_id of the saved PNG.

Show expected output
Claude writes Python code using pandas and matplotlib. The code reads the CSV from the container, computes `.groupby('product')['revenue'].sum().nlargest(3)`, generates a horizontal bar chart with `plt.barh()`, saves it as `top_products.png`. The tool_result block includes a `file_id` for the output PNG that can be passed to `client.beta.files.download()`.

File lifecycle management

Files persist until you explicitly delete them. For production agents, you need a retention policy:

```python import datetime

def cleanup_old_files(client: Anthropic, max_age_days: int = 30): """Delete files older than max_age_days.""" files = client.beta.files.list() cutoff = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=max_age_days) deleted = 0 for file in files.data: created = datetime.datetime.fromisoformat(file.created_at) if created < cutoff: client.beta.files.delete(file.id) deleted += 1 return deleted ```

- Files persist until explicitly deleted; build a retention policy from day one to avoid hitting the 500 GB per-organization storage limit.
- A `cleanup_old_files()` function that checks `created_at` and deletes stale entries is the minimum viable retention policy for production agents.
- The Files API rate limit during beta is approximately 100 requests per minute; batch bulk uploads during off-peak windows if you need to ingest many documents at once.

Extended prompt caching with Files API

Add cache_control: {type: "ephemeral"} to a document block to cache it. First call pays full token cost; subsequent calls within the TTL window pay ~10% of full input price:

response = client.beta.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What are the payment terms?"},
            {
                "type": "document",
                "source": {"type": "file", "file_id": pdf_file.id},
                "cache_control": {"type": "ephemeral"},  # cache this document
            },
        ],
    }],
    betas=["files-api-2025-04-14", "prompt-caching-2024-07-31"],
)

Cache is keyed on exact content — re-uploading a changed file resets it.

Hands-on exercise

Upload a PDF once and run three analytical queries; bonus: download a generated chart.

  1. Upload any PDF; store file_id
  2. Query 1: "What is the main topic? Summarize in 3 sentences."
  3. Query 2: "List all named organizations mentioned."
  4. Query 3: "What are the 5 most important statistics?" with citations: {enabled: true}
  5. Bonus: Pass Q2 org list as CSV to code execution; download output PNG

Verify: Same file_id in all 3 calls; Q3 includes inline citations. Est. time: 20 min (30 with bonus)

Knowledge check1 of 1
A developer uploads a 2 MB PDF to the Files API and then uses the same file_id in 50 separate Messages API calls over one month. Which costs does she pay?

What's next

Chapter 5 covers production hardening: hooks, cost circuit breakers, and the deployment checklist.

References

[1] Files API — https://platform.claude.com/docs/en/build-with-claude/files · retrieved 2026-04-30 [2] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [3] Claude Managed Agents Tools — https://platform.claude.com/docs/en/managed-agents/tools · retrieved 2026-04-30 [4] Code Execution Tool — https://platform.claude.com/docs/en/agents-and-tools/tool-use/code-execution-tool · retrieved 2026-04-30 [5] Files API Reference — https://platform.claude.com/docs/en/api/files-list · retrieved 2026-05-14 [6] Anthropic API and data retention — https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention · retrieved 2026-05-14 [7] Current code execution tool reference — https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/code-execution-tool · retrieved 2026-05-27

Chapter 5 · 45 min

Production: deploy + observability + cost controls

Slide deck · PDF · 20 MB
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

The Agent SDK hook system attaches Python or TypeScript callbacks to agent events: PreToolUse, PostToolUse, UserPromptSubmit, Stop, and permission events. Python SDK callbacks do not support SessionStart or SessionEnd; TypeScript callbacks add those [3]. The biggest production failure mode is cost runaway — this chapter gives you the four hooks and deployment checklist to prevent it.

Key facts

  1. Python SDK callbacks: tool, prompt, stop, compaction, permission, notification, subagent events — no SessionStart/SessionEnd; TypeScript adds session lifecycle [3].
  2. bypassPermissions disables ALL safety checks including file-edit prompts and destructive Bash confirmations [1].
  3. Session JSONL files: ~/.claude/sessions/ by default; redirect with CLAUDE_SESSIONS_DIR [1].
  4. PreToolUse can deny/allow before execution; PostToolUse runs after — use for logging, not prevention [3].

The hook system

Hooks are synchronous callbacks that run in your process. HookMatcher filters by tool name via regex:

```python from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher

async def my_hook(input_data: dict, tool_use_id: str, context: dict) -> dict: # Return {} to pass through, or raise to block return {}

options = ClaudeAgentOptions( hooks={ "PostToolUse": [ HookMatcher(matcher="Edit|Write", hooks=[my_hook]) ] } ) ```

The matcher is a Python regex. "Edit|Write" matches any tool whose name contains "Edit" or "Write". Use ".*" to match everything.

- Hooks are synchronous callback functions that run in your process before or after every tool call; `HookMatcher` filters by tool name using a Python regex.
- `PreToolUse` runs before execution — use it to block risky calls before side effects occur; `PostToolUse` runs after — use it for logging and audit, not prevention.
- Python SDK callbacks do not support `SessionStart` or `SessionEnd`; TypeScript SDK callbacks add these session lifecycle events.

Hook 1: Audit log (PostToolUse)

```python import asyncio import json import logging from datetime import datetime from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher

async def audit_file_change(input_data: dict, tool_use_id: str, context: dict) -> dict: tool_input = input_data.get("tool_input", {}) file_path = tool_input.get("file_path", tool_input.get("path", "unknown")) tool_name = input_data.get("tool_name", "unknown") log_entry = { "event": "file_modified", "timestamp": datetime.utcnow().isoformat() + "Z", "tool": tool_name, "file_path": file_path, "session_id": context.get("session_id", "unknown"), "tool_use_id": tool_use_id, } logger.info(json.dumps(log_entry)) return {} # pass through — don't block

options = ClaudeAgentOptions( allowed_tools=["Read", "Write", "Edit", "Bash", "Glob", "Grep"], hooks={ "PostToolUse": [ HookMatcher(matcher="Edit|Write", hooks=[audit_file_change]) ] } ) ```

Sample audit output: ``json {"event": "file_modified", "timestamp": "2026-04-30T10:23:44Z", "tool": "Edit", "file_path": "src/auth.py", "session_id": "sess_01XxXxxXx", "tool_use_id": "toolu_01Abc123"} ``

Hook 2: Cost circuit breaker (PreToolUse)

Use PreToolUse to block tool calls before filesystem or MCP side effects occur:

```python class CostCircuitBreaker: """Deny the next tool call once the application-managed token cap is reached.""" def __init__(self, max_input_tokens: int = 500_000): self.max_input_tokens = max_input_tokens self.total_input_tokens = 0

def update_usage(self, usage: dict) -> None: # Call this from your message loop when result/usage metadata is available. self.total_input_tokens = usage.get("input_tokens", self.total_input_tokens) async def check_cost(self, input_data: dict, tool_use_id: str, context: dict) -> dict: if self.total_input_tokens > self.max_input_tokens: return { "hookSpecificOutput": { "hookEventName": input_data["hook_event_name"], "permissionDecision": "deny", "permissionDecisionReason": ( f"Circuit breaker triggered: {self.total_input_tokens:,} input tokens " f"exceeds cap of {self.max_input_tokens:,}. Tool call blocked before execution." ), } } return {}

circuit_breaker = CostCircuitBreaker(max_input_tokens=500_000)

options = ClaudeAgentOptions( allowed_tools=["Read", "Write", "Edit", "Bash", "Glob", "Grep"], hooks={ "PreToolUse": [ HookMatcher(matcher=".*", hooks=[circuit_breaker.check_cost]) ] } ) ```

When circuit_breaker.check_cost returns permissionDecision: "deny", the current tool call is blocked before it executes and Claude receives the denial reason as feedback. The session JSONL is preserved, so you can inspect exactly what happened.

<Callout type="hot"> Do NOT block silently inside hooks. When a hook denies a tool call in production, you need the context to diagnose it. Log the full input_data, tool_use_id, and denial reason before returning permissionDecision: "deny". </Callout>

Hook 3: Session lifecycle telemetry

SessionStart/SessionEnd are TypeScript-only SDK callbacks. In Python, emit the session-start event when the first message arrives:

async def log_session_start_from_first_message(message, logger):
    session_id = getattr(message, "session_id", "unknown")
    start_event = {
        "event": "session_started",
        "timestamp": datetime.utcnow().isoformat() + "Z",
        "session_id": session_id,
        "environment": os.environ.get("DEPLOY_ENV", "development"),
    }
    logger.info(json.dumps(start_event))

TypeScript for true SessionStart support:

```typescript import { query } from "@anthropic-ai/claude-agent-sdk";

const sessionStart = async (input, toolUseId, context) => { console.log(JSON.stringify({ event: "session_started", session_id: input.session_id, cwd: input.cwd, timestamp: new Date().toISOString() })); return {}; };

for await (const message of query({ prompt: "Run the production agent", options: { hooks: { SessionStart: [{ hooks: [sessionStart] }] } } })) { console.log(message); } ```

Hook 4: Prompt sanitization (UserPromptSubmit)

Fires before the user message reaches the model — strip PII here:

```python import re

PHONE_RE = re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b') SSN_RE = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')

async def sanitize_prompt(input_data: dict, tool_use_id: str, context: dict) -> dict: prompt = input_data.get("prompt", "") # Redact phone numbers and SSNs cleaned = PHONE_RE.sub("[PHONE_REDACTED]", prompt) cleaned = SSN_RE.sub("[SSN_REDACTED]", cleaned) if cleaned != prompt: logger.warning(json.dumps({ "event": "pii_redacted", "session_id": context.get("session_id"), "patterns_found": ["phone" if PHONE_RE.search(prompt) else None, "ssn" if SSN_RE.search(prompt) else None] })) # Return modified input_data with cleaned prompt return {**input_data, "prompt": cleaned}

options = ClaudeAgentOptions( hooks={ "UserPromptSubmit": [ HookMatcher(matcher=".*", hooks=[sanitize_prompt]) ], # ... other hooks } ) ```

- `UserPromptSubmit` fires before the user message reaches the model — use it to strip PII, redact phone numbers and SSNs, and prevent sensitive data from entering the model's context.
- Always log when PII redaction occurs, including which pattern was found and the session ID, to maintain compliance audit trails.
- The four production hooks together cover the full agent lifecycle: input sanitization, pre-execution cost control, post-execution audit logging, and session telemetry.

The complete production hook stack

def production_options(
    allowed_tools: list[str],
    mcp_servers: dict = None,
    max_input_tokens: int = 500_000,
    permission_mode: str = "acceptEdits",
) -> ClaudeAgentOptions:
    cb = CostCircuitBreaker(max_input_tokens=max_input_tokens)
    
    return ClaudeAgentOptions(
        allowed_tools=allowed_tools,
        mcp_servers=mcp_servers or {},
        permission_mode=permission_mode,
        hooks={
            "UserPromptSubmit": [
                HookMatcher(matcher=".*", hooks=[sanitize_prompt])
            ],
            "PreToolUse": [
                HookMatcher(matcher=".*", hooks=[cb.check_cost]),
            ],
            "PostToolUse": [
                HookMatcher(matcher="Edit|Write", hooks=[audit_file_change]),
            ],
        }
    )

Usage — apply to the MCP agent from Chapter 3 or any agent with multiple tool calls:

# Apply to the MCP agent from Chapter 3
async for message in query(
    prompt="Investigate issue #1234 and write a summary",
    options=production_options(
        allowed_tools=["mcp__github__*", "mcp__postgres__query", "mcp__docs__*"],
        mcp_servers={
            "github": github_config,
            "postgres": postgres_config,
            "docs": docs_config,
        },
        max_input_tokens=1_000_000,  # verify the matching token budget against current model pricing
    ),
):
    if hasattr(message, "result"):
        print(message.result)
Try this · claude-sonnet-4-6

I'm running an agent with a PreToolUse hook backed by an application-managed token counter. After 12 tool calls, the counter shows 505,000 input tokens against a cap of 500,000. The agent is about to …

Show expected output
Claude explains: the PreToolUse hook runs before the next Edit call executes. Because the tracked total is already over 500,000 input tokens, the hook returns permissionDecision: deny with a reason. The first pending Edit is blocked before it writes to disk, and Claude receives feedback that the budget cap was exceeded. A PostToolUse guard would be too late for the edit that triggered it.

Langfuse integration

For a broader look at Langfuse setup and why it fits agent workloads, see AI agent observability with Langfuse 2026. Create the trace on first session message; add spans from PostToolUse hooks:

```python from langfuse import Langfuse

langfuse = Langfuse( public_key=os.environ["LANGFUSE_PUBLIC_KEY"], secret_key=os.environ["LANGFUSE_SECRET_KEY"], host=os.environ.get("LANGFUSE_HOST", "http://localhost:3100"), )

traces_by_session = {}

def langfuse_session_start(message): session_id = getattr(message, "session_id", "unknown") trace = langfuse.trace( id=session_id, name="agent_session", metadata={"environment": os.environ.get("DEPLOY_ENV", "dev")}, ) traces_by_session[session_id] = trace return trace

async def langfuse_tool_log(input_data: dict, tool_use_id: str, context: dict) -> dict: trace = traces_by_session.get(input_data.get("session_id")) if trace: trace.span( name=input_data.get("tool_name", "unknown_tool"), input=input_data.get("tool_input"), metadata={"tool_use_id": tool_use_id}, ) return {} ```

The five-step deployment checklist

1. Permissions are minimal - allowedTools names specific tools — no .* wildcards - permissionMode: acceptEdits or default — never bypassPermissions

2. Cost controls are wired - PreToolUse circuit breaker with a tested token cap - Session timeout (Managed Agents: explicit status="completed")

3. Audit logging is active - Every Edit/Write logged: file path + session ID + timestamp - Structured JSON, not print statements

4. Secrets are out of config - No API keys in mcpServers.env — use os.environ["KEY"] - .mcp.json uses ${VAR} syntax

5. Session files have a retention policy - CLAUDE_SESSIONS_DIR with log rotation - JSONL files off user-facing storage

- Never use `bypassPermissions` in production; combine `permissionMode: "acceptEdits"` with explicit `allowedTools` grants to cover both file edits and MCP tool calls safely.
- Production agents must pass five checks: minimal permissions, wired cost controls, active audit logging, secrets out of config, and session files with a retention policy.
- Structured JSON logs — not print statements — enable per-session cost breakdown, error rate by tool type, and session duration distribution from day one.

Hands-on exercise

Add the production hook stack to an existing agent and verify the circuit breaker fires.

  1. Apply production_options() to any multi-tool agent
  2. Set max_input_tokens=50_000 (intentionally low)
  3. Run: "Analyze every Python file in this directory and summarize each one's purpose"
  4. Confirm circuit breaker fires: permissionDecision: "deny" appears before all files are processed
  5. Check logs for file_modified and session_started entries

Verify: Session stops mid-run; raising cap to 2M allows full completion. Est. time: 20 min

Try this · claude-sonnet-4-6

I need to run a Claude Agent in a CI/CD pipeline where there's no human to approve tool calls. The agent reads test results, edits configuration files, and runs bash commands to restart services. What…

Show expected output
Claude recommends: use allowedTools with an explicit list (e.g. ['Read', 'Edit', 'Bash']) plus permissionMode: 'acceptEdits' — not bypassPermissions. This pre-approves file edits and Bash without disabling all safety checks. The agent can still be stopped by hooks. Risks to document: (1) Bash is allowed and can run destructive commands — scope the working directory; (2) Edit can overwrite production config — add a PostToolUse hook that logs every edit to a change log; (3) No human review means runaway loops go undetected — add a token circuit breaker.

What's next

The capstone project ties all five chapters together: a production research agent that orchestrates GitHub + Postgres + a cloud docs MCP server, uses the Files API for document context, and runs behind the complete hook stack. Details in the course outline.

References

[1] Claude Agent SDK Overview — https://code.claude.com/docs/en/sdk · retrieved 2026-06-14 [2] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30 [3] Agent SDK Hooks — https://code.claude.com/docs/en/agent-sdk/hooks · retrieved 2026-05-14 [4] Claude Agent SDK Permissions — https://code.claude.com/docs/en/agent-sdk/permissions · retrieved 2026-04-30 [5] Files API — https://platform.claude.com/docs/en/build-with-claude/files · retrieved 2026-04-30 [6] Langfuse Observability — https://langfuse.com · retrieved 2026-04-30

Chapter 6 · 50 min

Cross-CLI Context Persistence

When a developer shifts from Claude Code to a Codex CLI subtask, or hands a long-running analysis off to a specialised Claude Agent SDK script, something predictable happens: context evaporates. Not the files — those are on disk. Not the code — that is in git. What disappears is the cognitive state that accumulated in the previous session: decisions made, tools already run, intermediate results computed, and the implicit understanding of what was tried and why it was rejected.

In a single-agent workflow this is handled automatically. The conversation window carries every tool call and every result forward. But the moment you introduce a second agent — even one running the same model — you are starting from zero unless you designed the handoff deliberately. The second agent has no knowledge of the context window the first agent held. It cannot see the tool calls that shaped the prior session's output. It does not know what was rejected and why.

This is not a theoretical concern. Real multi-CLI workflows fail in exactly this way: the second agent proposes the architecture the first agent already rejected in the third turn, re-runs the data pipeline that took 15 minutes the first time, or diverges onto a different branch because it did not know which file was the active one. The result is duplicated compute, inconsistent outputs, and frustrated developers who assumed the agents were "working together."

This chapter is about designing the handoff that prevents those failures. You will learn to identify the four layers of context that matter across CLI agents, to implement a file-based relay that works with any two CLIs, to use the Files-API for durable multi-session context, and to expose context through MCP so any downstream agent can query it selectively. By the end you will have a concrete pattern that scales from a two-agent workflow to a full multi-CLI pipeline.

The Context Gap No One Tells You About

The illusion of seamless multi-agent context comes from watching demos where every agent is pre-loaded with the same carefully crafted system prompt. Real workflows are not like that. Agents are spun up at different times for different sub-problems, each one carrying only what it was explicitly given at startup. The gap between "what the first agent knew" and "what the second agent starts with" is the cross-CLI context gap, and it is wider than most practitioners expect.

Three things make this gap worse than it looks on the surface. First, most context is implicit. The first agent never bothered to write down that it rejected SQLite in favour of PostgreSQL because of write concurrency, because that decision felt obvious in the moment. It knew — but it never said so aloud in a way that could be captured. Second, tool outputs are transient. The first agent ran a 10-minute analysis pipeline and the results lived in its context window. When that window closes, those results are gone. No other agent has access to them without re-running the work. Third, the agents do not share a session namespace. Claude Code's session IDs are meaningful only within Claude Code's runtime. A Codex CLI process does not know what they mean. A raw Anthropic API call does not know what they mean. Each CLI manages its own session space, and those spaces do not overlap.

The solution is not to build a distributed context store from day one. The solution is discipline: explicitly capture what matters, in a format that any subsequent agent can consume, before the current session ends. This means treating context handoff as a first-class concern in your workflow design — not an afterthought.

Anatomy of CLI Context

Not all context deserves equal treatment. Before you can design a persistence strategy you need to know what you are persisting and how costly it is to reconstruct each type. Context in a CLI agent workflow falls into four layers with different persistence costs, reconstruction costs, and relevance half-lives.

Conversation history is the literal message log — the back-and-forth between user and assistant. It is cheap to persist (plain JSON) and cheap to re-inject (just tokens). It is also the largest layer by volume and the one with the shortest relevance half-life. Most of a conversation is scaffolding, clarification, and dead-end exploration that is irrelevant to the downstream agent. The useful subset is usually small: final decisions, blocking constraints, key findings. Never pass the full conversation log forward. Distil it to the decisions and facts that the downstream agent actually needs to do its job.

Tool execution outputs are the results of tool calls made during the session — file contents read, API responses received, computation outputs generated, search results returned, data analysis completed. These are the most expensive layer to reconstruct. If the prior agent ran a 10-minute data pipeline, called five external APIs, and processed a 50-page PDF, re-running all of that to re-establish context is not acceptable in a production workflow. Tool outputs must be captured explicitly and passed forward as ground truth. They should be treated as immutable facts that the downstream agent can trust without needing to verify.

File and resource references are the handles to persistent state: file paths, file_id values from the Files API, database record IDs, git commit hashes, versioned artefact identifiers. These are lightweight to persist but critical to carry forward accurately. A second agent that does not know which file_id represents the uploaded dataset, or which branch was active, or which specific version of a schema file was in play, will immediately diverge from the established work. Even a small drift in file references compounds into large inconsistencies when multiple tool calls chain off the same reference.

Session metadata covers the active working directory, git branch, environment variable states, model settings, MCP server connection status, total cost spent so far in the pipeline, and any feature flags or configuration active during the prior session. This layer is often overlooked, but it causes some of the most subtle failures. A second agent that starts with the wrong working directory will write to the wrong location. An agent that does not know the first agent already established a particular MCP server connection may attempt to reconnect and receive a conflict error. An agent unaware of the cost already spent may proceed with expensive operations that push the total past the intended budget cap.

The practical rule for most workflows: you need a precise summary of layer 2 (tool outputs), a compact distillation of layer 1 (decisions, not full conversation), and the complete layer 3 and 4. Design your persistence strategy around that shape. Completeness for layers 3 and 4 is cheap and critical. Selective distillation for layers 1 and 2 is cheap and necessary. Passing layers 1 and 2 in full is expensive and counterproductive.

File-Based Context: The Universal Bridge

The simplest cross-CLI persistence mechanism is a shared file. Every CLI agent can read and write files. No API integration required, no shared service, no network dependency. A well-structured context file on disk is the lowest-friction handoff that works with every CLI in the ecosystem.

The format that performs best in practice is JSONL (JSON Lines) — one JSON object per line, each representing a discrete context event. This gives you append-friendly writes without file locking, easy streaming reads without loading the entire file, and a natural audit trail of how context evolved across agents and sessions. When something goes wrong in a multi-agent pipeline, the JSONL file is where you look to understand the state each agent inherited.

The schema matters. A flat list of key-value pairs is not enough — you need typed events so downstream agents can selectively load only the context relevant to their task. A minimal event schema that covers the four layers looks like this:

```python # context_writer.py — append context events from any agent session import json import time from pathlib import Path from typing import Any

CONTEXT_FILE = Path(".agent-context/session.jsonl")

def write_context_event(event_type: str, payload: dict[str, Any], agent: str = "unknown") -> None: """Append a typed context event to the shared JSONL relay file.""" CONTEXT_FILE.parent.mkdir(exist_ok=True) event = { "ts": time.time(), "schema_version": 1, "type": event_type, # "decision" | "tool_output" | "file_ref" | "metadata" "agent": agent, # which CLI or script wrote this event "payload": payload, } with CONTEXT_FILE.open("a") as f: f.write(json.dumps(event) + "\n")

Claude Code's PostToolUse hooks can fire write_context_event automatically after every significant tool call — no changes to the agent's core behaviour needed. Codex CLI's --system-prompt flag injects the summary at startup. Any CLI or Agent SDK script that accepts a system prompt can consume context this way without knowing anything about where it came from.

The receiving agent reads the JSONL file and injects a rendered summary into its system prompt. Crucially, it filters by event type — a code-generation agent needs decisions and file refs, not raw tool outputs from a separate analysis pass:

```python # context_reader.py — load and render context for a new agent session import json from pathlib import Path

CONTEXT_FILE = Path(".agent-context/session.jsonl")

def load_context_summary(event_types: list[str] | None = None) -> str: """Load typed context events and render them as an injectable system prompt block.""" if not CONTEXT_FILE.exists(): return ""

events = [] with CONTEXT_FILE.open() as f: for line in f: line = line.strip() if not line: continue ev = json.loads(line) if event_types is None or ev["type"] in event_types: events.append(ev)

if not events: return ""

lines = ["## Context inherited from prior agent session\n"] for ev in events: if ev["type"] == "decision": lines.append(f"- Decision ({ev['agent']}): {ev['payload']['summary']}") lines.append(f" Rationale: {ev['payload']['reasoning']}") elif ev["type"] == "tool_output": truncated = str(ev["payload"].get("output", ""))[:400] lines.append(f"- {ev['payload'].get('key', 'output')}: {truncated}") elif ev["type"] == "file_ref": lines.append( f"- Active file {ev['payload']['path']} — role: {ev['payload']['role']}" ) elif ev["type"] == "metadata": for k, v in ev["payload"].items(): lines.append(f"- {k}: {v}")

return "\n".join(lines)

This pattern is portable across the entire CLI ecosystem. The ch03-mcp-connector-orchestrating-multi-server-agents chapter covers how to wire Claude Code's MCP connector for richer inter-agent communication; the JSONL relay is the simpler starting point that requires no MCP infrastructure.

CLAUDE.md and the Memory File System

Claude Code has a native mechanism for context persistence that most developers underuse: the CLAUDE.md file system. When Claude Code starts it reads CLAUDE.md files from the project root, all parent directories, and ~/.claude/. This makes CLAUDE.md a persistent, human-readable context layer that survives across sessions without any custom code, and is available to every future Claude Code session in the project automatically.

For cross-CLI workflows, CLAUDE.md serves a specific role: it is the right place for project-level facts that every Claude Code session and any human reading the project should know from the start. Think of it as the standing brief that never expires. A well-maintained CLAUDE.md for a multi-agent project should document the canonical working directory and active branch, which agents have been active and what each one owns, key architectural decisions already made and why the obvious alternatives were rejected, active file handles and their roles, MCP server configurations in use, and the cost and budget status if the project has a spending cap.

The memory file system at .claude/memory/ extends this to agent-specific accumulated notes. When Claude Code writes to a memory file via its internal tools, those notes persist across every future Claude Code session in the project. Memory files are the right home for session-to-session accumulated knowledge that a human editor would put in a notebook: patterns observed, caveats discovered, partial progress on ongoing tasks.

The important limitation is scope. CLAUDE.md and memory files are Claude Code-native. Codex CLI does not read them. An Agent SDK script does not read them. For genuine cross-CLI persistence you need both: keep CLAUDE.md as the human-readable shadow of the machine-readable JSONL context file. Write the same key facts to both. Each consumer reads whichever format it understands. The two representations should stay in sync, but they serve different audiences: CLAUDE.md for Claude Code and humans, JSONL for programmatic consumers.

Session IDs and Native Resumption

The Claude Agent SDK provides session continuity that eliminates manual context serialisation when you are staying within the SDK. Every query() call returns a session identifier, and subsequent calls that pass the same ID via options: { resume: sessionId } are treated as continuations of the same conversation — including the full tool call history and intermediate state accumulated during the prior call.

The critical distinction: the Agent SDK's query() function lives in @anthropic-ai/claude-agent-sdk, not in the general-purpose @anthropic-ai/sdk Anthropic Client package. Using the wrong package means you are calling the Managed Agents REST API with a different call shape — not the Agent SDK's native session resumption. Here is the correct pattern:

```typescript // session_continuity.ts — persist and resume Agent SDK sessions across process restarts import { query } from "@anthropic-ai/claude-agent-sdk"; import { readFile, writeFile } from "fs/promises";

const SESSION_FILE = ".agent-context/.session-id";

async function loadSessionId(): Promise<string | undefined> { try { return (await readFile(SESSION_FILE, "utf8")).trim() || undefined; } catch { return undefined; } }

async function runWithResumption(prompt: string): Promise<string> { const existingSessionId = await loadSessionId(); let result = ""; let newSessionId = "";

// Pass options.resume to resume a prior session; omit for a fresh session. // See: https://code.claude.com/docs/en/agent-sdk/sessions for await (const message of query({ prompt, ...(existingSessionId ? { options: { resume: existingSessionId } } : {}), })) { if (message.type === "result" && message.subtype === "success") { result = message.result ?? ""; newSessionId = message.session_id ?? ""; } }

if (newSessionId) { // Persist the session ID so the next invocation resumes from this exact state await writeFile(SESSION_FILE, newSessionId, "utf8"); } return result; }

// First call creates a new session const analysis = await runWithResumption("Analyse the database schema"); console.log("Session persisted. Run again to continue from this point.");

// Subsequent call resumes from full prior context without re-injecting history const migration = await runWithResumption("Now generate the migration script"); ```

Session IDs are the right tool for intra-SDK resumption: same CLI, different process invocations, with full conversation continuity. They are not the right tool for cross-CLI handoffs. A Codex CLI process cannot resume an Agent SDK session. An Agent SDK script cannot resume a Claude Code session. Each CLI manages its own session namespace, and those namespaces do not intersect.

The practical rule is straightforward. Within a single SDK surface across multiple invocations of the same process, use session IDs. At any CLI boundary, use the JSONL or Files API approach. The two mechanisms complement each other: the session ID provides seamless intra-SDK continuity, while the JSONL file and Files API provide the cross-CLI bridge. Persist the session ID into the JSONL context file alongside your semantic events so any orchestrating script has both the structured context and the resumable session handle available in one place.

The Files API as a Durable Context Layer

The Files API, available with the files-api-2025-04-14 beta header, provides a hosted document store that persists beyond any single session or process lifetime. A context snapshot uploaded as a file can be referenced by any subsequent API call regardless of which CLI or orchestration layer is making the call. The file_id becomes a portable, stable identifier for a specific version of your context state.

This makes the Files API the right persistence layer for context that needs to outlive the agent runtime: processed analysis results, compiled knowledge bases, and intermediate artefacts that required significant compute to produce. Rather than re-running expensive operations or injecting raw tool output text as part of a prompt, you upload the output once and reference it by ID in every downstream call that needs it.

```python # context_files_api.py — upload context snapshots and reference them in downstream calls import anthropic import json from io import BytesIO

client = anthropic.Anthropic()

def upload_context_snapshot(context: dict, label: str) -> str: """Upload a context snapshot dict and return the stable file_id.""" payload = json.dumps(context, indent=2).encode("utf-8") response = client.beta.files.upload( file=(f"context-{label}.json", BytesIO(payload), "application/json"), ) file_id = response.id # e.g. "file_01XyzAbc..." print(f"Context snapshot uploaded: {file_id}") return file_id

def build_context_message(file_id: str, task: str) -> list[dict]: """Build a messages array that references the uploaded context snapshot.""" return [ { "role": "user", "content": [ { "type": "document", "source": {"type": "file", "file_id": file_id}, }, { "type": "text", "text": f"Using the context document above as ground truth, {task}", }, ], } ]

file_id = upload_context_snapshot(context, "agent-a-handoff") # Write file_id to JSONL context file so Agent B can find it # write_context_event("file_ref", {"path": file_id, "role": "context_snapshot"}, agent="agent-a")

Two constraints are important for context persistence with the Files API. First, files are not zero-data-retention eligible — they are stored server-side under Anthropic's standard data handling policies. Sanitise context snapshots before upload if they contain personally identifiable information, credentials, or other sensitive data. Your JSONL context file on disk is a safe staging area where you can redact before uploading. Second, files count against your organisation's storage quota, which the Files API documentation states as 500 GB across all workspaces. Context snapshots are typically small (a few kilobytes), but long-running pipelines with many handoffs accumulate. Add a cleanup step to your pipeline that calls client.beta.files.delete(file_id) once the downstream agent has confirmed it consumed the context successfully.

Use document content blocks for JSON and text context files. Use image blocks for visual artefacts. Use container_upload for code execution outputs that need downstream programmatic processing.

MCP Servers as Context Brokers

File-based handoff works well for two-agent workflows but does not scale cleanly to pipelines with three or more agents operating concurrently. When multiple agents read and write the same JSONL file, concurrent appends are safe but readers get inconsistent views of a rapidly changing file. Each agent reads a different snapshot of the context state depending on when it opens the file.

The cleaner architecture for multi-agent context is an MCP server that exposes context as queryable tools. Each agent connects to the context broker via the MCP Connector, calls structured tools to read or write context, and receives exactly the slice relevant to its current task — without loading the entire session history or dealing with concurrent file access. The Model Context Protocol specification defines the wire format for these tool calls, making the broker interoperable with any MCP-compatible client.

```python # context_mcp_server.py — minimal stdio MCP context broker for multi-agent pipelines import json from pathlib import Path

STORE = Path(".agent-context/mcp-store.json")

def load_store() -> dict: if STORE.exists(): return json.loads(STORE.read_text()) return {"decisions": [], "tool_outputs": [], "file_refs": {}, "metadata": {}}

def save_store(data: dict) -> None: STORE.parent.mkdir(exist_ok=True) STORE.write_text(json.dumps(data, indent=2))

def handle_tool(name: str, args: dict) -> str: store = load_store()

if name == "write_decision": store["decisions"].append({ "summary": args["summary"], "reasoning": args["reasoning"], "agent": args.get("agent", "unknown"), }) save_store(store) return "Decision recorded."

elif name == "write_tool_output": store["tool_outputs"].append({ "key": args["key"], "output": args["output"], "agent": args.get("agent", "unknown"), }) save_store(store) return "Output recorded."

elif name == "get_context_summary": lines = [] for d in store["decisions"]: lines.append(f"- Decision ({d.get('agent','?')}): {d['summary']}") for o in store["tool_outputs"]: lines.append(f"- Output [{o['key']}]: {str(o['output'])[:300]}") return "\n".join(lines) if lines else "No context recorded yet."

elif name == "get_session_metadata": return json.dumps(store.get("metadata", {}), indent=2)

return f"Unknown tool: {name}" ```

Wire this broker into any agent's .mcp.json using stdio transport:

{
  "mcpServers": {
    "context-broker": {
      "type": "stdio",
      "command": "python",
      "args": ["scripts/context_mcp_server.py"]
    }
  }
}

In the Agent SDK's query() call, scope tool access using allowedTools:

  • Read-only agents: ["mcp__context-broker__get_context_summary", "mcp__context-broker__get_session_metadata"]
  • Orchestrator agents: ["mcp__context-broker__*"] to permit writes

The MCP approach has one key advantage over file-based handoff: the agent does not need to know anything about the context file format, storage location, or schema version. It calls get_context_summary and receives exactly what it needs. Schema evolution, storage backend changes, and concurrent access coordination are all handled by the broker, fully transparent to the agents using it. When the context schema needs to change, you update the broker in one place rather than updating every agent that reads the JSONL file.

Plan Mode, Rollback Coupling, and Approval-Mode Transitions

Three runtime behaviours of Claude Code interact directly with context persistence in ways the earlier sections do not cover: plan mode, git-based rollback, and approval-mode switching. Each creates a handoff-time obligation that most pipelines overlook until they hit a failure in production.

Plan mode and plan artefact persistence

Claude Code's plan mode (/plan command, or planning permission mode) lets Claude draft an ordered list of steps, constraints, and expected outcomes before executing any tool calls. The plan is the result of alignment between the agent and the developer — it represents intent, not just code state. The problem in multi-CLI pipelines is that the plan exists only inside Claude Code's conversation context. When the session ends and a downstream Codex CLI or Agent SDK script continues the work, the plan evaporates. The second agent re-plans from scratch, frequently producing a different plan that diverges from the agreed approach.

The fix is to write the plan artefact to context at the moment of plan acceptance. Use a PostToolUse hook that fires when Claude Code finalises a plan and appends a decision event of sub-type plan to the JSONL relay:

write_context_event("decision", {
    "sub_type": "plan",
    "title": "Migration plan — accepted 2026-06-15",
    "steps": [
        "Step 1: audit schema for N+1 patterns",
        "Step 2: generate migration script",
        "Step 3: run test suite with coverage check",
    ],
    "constraints": [
        "No destructive DDL on the users table",
        "Total wall-clock time under 10 minutes",
    ],
}, agent="claude-code")

A downstream agent reads the plan event at startup, injects it into its system prompt, and inherits the agreed approach rather than generating a conflicting one. For longer-lived pipelines, store the active plan in the MCP context broker so any agent — not just the immediate successor — can query the currently agreed plan without loading the full JSONL history.

Rollback coupling

Every context event that modifies the codebase should carry the git SHA active at the time of the modification. This coupling between context state and git state makes rollback recoverable. Without it, a git reset --hard <sha> leaves your context file describing a codebase that no longer exists — the next agent inherits ground truth that contradicts the actual files on disk.

The pattern is one additional field on tool_output and file_ref events:

```python import subprocess

def current_sha() -> str: return subprocess.check_output(["git", "rev-parse", "HEAD"]).decode().strip()

write_context_event("file_ref", { "path": "src/db/schema.sql", "role": "active_schema", "git_sha": current_sha(), # pins context to the exact commit that introduced this ref }, agent="claude-code") ```

When a rollback occurs, filter the JSONL relay file to events whose git_sha is an ancestor of the target SHA. Any event tied to a newer SHA describes a state that no longer exists and must be discarded before the next agent session starts. For observability on cost and timing, the ch05-production-deploy-observability-cost-controls chapter covers how to instrument multi-agent pipelines so rollback events appear in your traces alongside cost attribution.

Approval-mode switching

Claude Code supports three permission modes — default, acceptEdits, and bypassPermissions — and agents may need to transition between them across session handoffs: for example, a planning agent in default mode handing off to an autonomous execution agent in acceptEdits mode. These transitions create two risks that context persistence must address.

First, the new session may not know which mode the prior session operated under. An agent that inherits a context file describing file modifications has no way to tell whether those modifications were human-approved (default) or autonomously applied (acceptEdits) unless the context explicitly records it. Annotate every tool_output and file_ref event with the active permission_mode:

write_context_event("metadata", {
    "permission_mode": "acceptEdits",   # or "default" | "bypassPermissions"
    "mcp_servers_active": ["context-broker", "github"],
    "safe_paths_confirmed": ["src/", "tests/"],
    "cost_spent_usd": 0.42,             # carry forward for budget enforcement
})

Second, when a new session starts in a more permissive mode than the prior session, safe operating state is not automatically inherited. Safe file paths confirmed in the prior session are not confirmed in the new one — they are session-local. MCP server authentication must be re-established from scratch. The cost_spent_usd from the metadata event lets the new session's orchestration layer enforce the remaining budget without re-querying billing APIs. Write these session-start checks into your orchestration layer rather than relying on the agent to infer them from context alone.

Anti-Patterns and Pitfalls

Injecting full conversation history into every new session. Conversation history grows fast — a two-hour Claude Code session can produce 100k or more tokens of log. Most of it is scaffolding, clarification, and dead-end exploration. Injecting all of it into a new session wastes tokens, pushes the relevant decisions further back in the attention window where they receive less attention from the model, and risks hitting the context window limit for longer sessions. Distil decisions and confirmed outputs; never dump raw conversation logs.

Using git as a context transport. Committing the JSONL context file after every session and expecting the next agent to pull it works, but it creates unintended coupling between your agent workflow and your project's git history. Context files accumulate rapidly, pollute the commit log with machine-generated noise, and create merge conflicts in multi-agent pipelines where more than one agent writes context concurrently. Keep context files in .gitignore and manage them entirely outside git.

Relying on environment variables for context handoff. Environment variables look appealing because they are universally accessible from any process. The fatal flaw is that they vanish the moment the shell exits. Any context stored in environment variables is scoped to the current process tree and is not persistent. Reserve environment variables for static configuration — API keys, model names, feature flags — never for runtime context that needs to survive a process boundary.

Mixing context layers in a flat undifferentiated blob. Storing decisions, tool outputs, file references, and session metadata as a single untyped text block makes it impossible for downstream agents to load only the context relevant to their task. A code generation agent does not need the full tool output history from a data analysis phase. Structure context by type from the start, even if you initially write everything to a single file. The overhead is negligible; the benefit at read time is large.

Not versioning the context schema. Context files written today will be read by agents running next month, after the context schema has evolved, tool names have changed, and models have been upgraded. Add a schema_version field to every context event from day one. When the schema changes, write an explicit migration function that transforms old events to the new shape rather than silently breaking backward compatibility. Consumers should check schema_version at read time and refuse to proceed if they encounter a version they do not understand.

Hands-On Exercise: Building a Cross-CLI Context Relay

Goal: Implement a complete context relay that captures output from an initial script, persists it through the Files API, and makes it available to a subsequent Agent SDK session.

Prerequisites: - Python 3.10+ with anthropic>=0.40.0 installed - An Anthropic API key with Files API access - A project directory with at least two Python source files you can use as analysis targets

Steps:

  1. Create .agent-context/session.jsonl in your project root. Manually add two decision events and one tool_output event as separate JSON lines, following the schema from the "File-Based Context" section. Each event must include schema_version: 1, a type, an agent, and a payload.
  1. Write load_and_run.py that calls load_context_summary() filtering for decision and file_ref events, injects the rendered result into an Agent SDK query() system prompt, and asks the agent to summarise the project's current state based on the inherited context. Run it and verify the response explicitly references at least one decision from your JSONL file.
  1. Extend load_and_run.py to upload the JSONL file to the Files API using upload_context_snapshot() with the raw JSONL content as the payload, write the returned file_id to .agent-context/.file-id, and modify the Agent SDK call to reference the uploaded file via a document content block. Run the extended version and confirm the agent response references the uploaded context document.
  1. Add schema version validation to context_reader.py: if any event has schema_version greater than 1, raise a descriptive ValueError instructing the caller to run the migration function. Test this by appending a synthetic event with "schema_version": 2 to your JSONL file and confirming the error surfaces cleanly.
  1. Add .agent-context/ to your .gitignore and confirm git status shows no untracked files from that directory.

Success criteria: - Agent response in step 2 mentions at least one decision from the JSONL file verbatim or by clear paraphrase - file_id in step 3 begins with file_ and can be listed via client.beta.files.list() - Agent response in step 3 explicitly references the uploaded context document - Schema version validation in step 4 raises ValueError on the synthetic v2 event - .agent-context/ is absent from git status after step 5

Stretch goal: Wire the context broker MCP server from the "MCP Servers" section into a second standalone Agent SDK script. Call get_context_summary via MCP tool use and confirm the returned summary matches the decisions you recorded in the JSONL file, proving that the two persistence mechanisms stay consistent with each other.

The capstone project capstone-project-production-research-agent assembles every layer from this course — Managed Agents, MCP, the Files-API, production hooks, and the cross-CLI context relay — into a single deployable research agent that handles real-world workload patterns from start to finish.