All courses 240 min5 chaptersBuilderanthropic

Production Agents with Claude Agent SDK + MCP Connector

Python or TypeScript developers who have used the Claude Messages API at least once and understand what an API key is. New to the Agent SDK, Managed Agents, and MCP.

What you'll learn
  • Migrate a project from the Claude Code SDK to the Claude Agent SDK without breaking changes
  • Choose between Managed Agents and Agent SDK for a production workload with confidence
  • Wire three MCP servers (stdio + HTTP + SSE) into a single agent with proper auth and error handling
  • Upload, reference, and manage files with the Files API across multi-turn agent sessions
  • Deploy a production agent with structured logging, cost circuit breakers, and observability hooks
Chapters in this course
What changed when Claude Code SDK became Claude Agent SDK slides35m
Managed Agents beta — when to use it, when to roll your own audio slides45m
MCP connector: orchestrating multi-server agents audio slides50m
Files API + code execution: the complete agent IO surface audio slides45m
Production: deploy + observability + cost controls audio slides45m
Chapter 1 · 35 min

What changed when Claude Code SDK became Claude Agent SDK

Download slides (.pptx)

The Claude Agent SDK is Anthropic's official library for embedding an autonomous agent loop — including built-in file operations, shell execution, web access, and subagent spawning — directly into a Python or TypeScript application, renamed from the Claude Code SDK in April 2026 alongside the public beta of Claude Managed Agents. This chapter sets up the migration path for Managed Agents, MCP connectors, and production observability.

On April 8, 2026, Anthropic simultaneously shipped the renamed SDK, the Managed Agents REST API, and an explicit MCP connector guide. The rename wasn't a rebrand of the package alone; it came with a branding prohibition — partners may no longer call their products "Claude Code" or use Claude Code ASCII art — and with an SDK migration guide that names package changes, option-type changes, and configuration-loading changes you need to audit before shipping [6].

> Prerequisites: None — this is Chapter 1. > > Time: 35 minutes > > Learning objectives: By the end of this chapter you can install the renamed SDK, update your imports, run your first query() call, and explain what the rename means for your production roadmap.

Key facts

  1. The npm package changed from @anthropic-ai/claude-code to @anthropic-ai/claude-agent-sdk; the PyPI package changed from claude-code-sdk to claude-agent-sdk [6].
  2. The options type/class changed from ClaudeCodeOptions to ClaudeAgentOptions in both TypeScript and Python examples [6].
  3. The TypeScript SDK bundles a native Claude Code binary for your platform as an optional dependency — you no longer need a separate Claude Code installation [1].
  4. Authentication on Amazon Bedrock, Google Vertex AI, and Microsoft Azure Foundry is controlled entirely by environment variables (CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX, CLAUDE_CODE_USE_FOUNDRY), not constructor arguments [1].
  5. The branding guidelines explicitly prohibit partners from using the names "Claude Code," "Claude Code Agent," or Claude Code-branded ASCII art — a signal that the SDK is now a platform, not a feature of a specific product [1].
  6. Session state is stored as JSONL on your filesystem and can be resumed by passing resume: sessionId in your options [1].

The rename isn't cosmetic

Most developers saw the April 2026 announcement and ran npm install @anthropic-ai/claude-agent-sdk. Done, right? Not quite.

The rename matters strategically because it de-couples the SDK from Claude Code the developer product. Claude Code is a terminal app; the Claude Agent SDK is now a general-purpose platform library. By prohibiting partners from calling their products "Claude Code," Anthropic is drawing a hard line: Claude Code is the consumer app, the Agent SDK is the infrastructure you build on. If you're building a product on top of this SDK, that distinction matters for your own naming and positioning.

There's also a real technical signal in the migration guide: configuration that Claude Code users may have treated as implicit now deserves an explicit audit. Your package manager can make the import rename look trivial, but stale settings, old package names, and different defaults are what usually break production agents.

- The npm package renamed from `@anthropic-ai/claude-code` to `@anthropic-ai/claude-agent-sdk`; the PyPI package renamed from `claude-code-sdk` to `claude-agent-sdk`.
- The rename signals a strategic separation: Claude Code is the end-user terminal app; the Claude Agent SDK is infrastructure for custom autonomous agents.
- Partners may no longer use the names "Claude Code" or "Claude Code Agent" in their product naming — the SDK is now a platform, not a product feature.

Installing the renamed SDK

TypeScript

```bash # Remove the old package npm uninstall @anthropic-ai/claude-code

Python

```bash # Remove the old package pip uninstall claude-code-sdk

After installing, verify the version:

```bash # TypeScript: check package.json cat package.json | grep claude-agent-sdk # → "@anthropic-ai/claude-agent-sdk": "^0.1.0" or later

Updating your imports

Every import in your existing code needs to change. This is a search-and-replace operation, not a logic change.

TypeScript — before

import { query } from "@anthropic-ai/claude-code";
import type { ClaudeCodeOptions } from "@anthropic-ai/claude-code";

TypeScript — after

import { query } from "@anthropic-ai/claude-agent-sdk";
import type { ClaudeAgentOptions } from "@anthropic-ai/claude-agent-sdk";

Note: the options type renamed from ClaudeCodeOptions to ClaudeAgentOptions.

Python — before

from claude_code_sdk import query, ClaudeCodeOptions

Python — after

from claude_agent_sdk import query, ClaudeAgentOptions

The query() API in 2 minutes

The core API hasn't changed between SDK versions. query() is an async generator that yields message objects as the agent works through a task. The simplest possible call:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions

async def main(): async for message in query( prompt="What files are in this directory?", options=ClaudeAgentOptions(allowed_tools=["Bash", "Glob"]), ): if hasattr(message, "result"): print(message.result)

asyncio.run(main()) ```

```typescript import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({ prompt: "What files are in this directory?", options: { allowedTools: ["Bash", "Glob"] } })) { if ("result" in message) console.log(message.result); } ```

The generator yields several message types. The ones you'll care about most:

TypeWhen it firesWhat it contains
SystemMessage (subtype init)First, before any workSession ID, connected MCP servers
AssistantMessageAfter each model turnClaude's text + tool calls
ToolResultMessageAfter each tool executionThe tool's output
ResultMessageLastFinal answer, token usage, session ID
Try this · claude-sonnet-4-6

What is the current working directory? List the files in it.

Show expected output
The agent calls Bash with `pwd` and `ls`, then returns the directory path and a list of files. You see AssistantMessage objects containing tool_use blocks, followed by ToolResultMessage objects with the shell output, ending with a ResultMessage containing the synthesized answer.
- `query()` is an async generator that yields `SystemMessage`, `AssistantMessage`, `ToolResultMessage`, and `ResultMessage` objects as the agent works.
- The `ResultMessage` is the final event and contains the synthesized answer, token usage, and session ID.
- The options type renamed from `ClaudeCodeOptions` to `ClaudeAgentOptions`; update all imports before shipping to avoid runtime errors.

Capturing and resuming sessions

Session continuity is one of the most underused features of the SDK. When the SystemMessage with subtype init arrives, grab the session_id:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions, SystemMessage, ResultMessage

session_id = None

async def first_query(): global session_id async for message in query( prompt="Read auth.py and tell me what it does", options=ClaudeAgentOptions(allowed_tools=["Read", "Glob"]), ): if isinstance(message, SystemMessage) and message.subtype == "init": session_id = message.data["session_id"] if isinstance(message, ResultMessage): print(message.result)

async def follow_up(): async for message in query( prompt="Now find every file that imports from auth.py", options=ClaudeAgentOptions(resume=session_id), ): if isinstance(message, ResultMessage): print(message.result)

async def main(): await first_query() await follow_up() # Claude already knows auth.py's contents

asyncio.run(main()) ```

The resume option re-opens the existing JSONL session file on your filesystem. Claude picks up with full context from the previous turn — no re-reading files, no redundant tool calls.

Built-in tools: the complete list

The Agent SDK ships ten built-in tools. You must declare which ones you allow explicitly — there's no "allow all built-ins" shortcut:

ToolWhat it doesSafe to allow broadly?
ReadRead any file in the working directoryYes
WriteCreate new filesWith caution
EditMake precise edits to existing filesWith caution
BashRun terminal commands, scripts, git operationsNo — scope carefully
MonitorWatch a background script, react to each stdout lineYes
GlobFind files by pattern (**/*.ts, src/**/*.py)Yes
GrepSearch file contents with regexYes
WebSearchSearch the web for current informationYes
WebFetchFetch and parse web page contentYes
AskUserQuestionAsk the user clarifying questions with multiple choiceYes

The Bash tool is the one to be careful with. In a CI context with a fully sandboxed container it's fine. On a developer workstation, Bash can delete files, install packages, and run arbitrary code. If you don't need shell execution, don't include it.

- The Agent SDK ships ten built-in tools; you must declare each one explicitly in `allowed_tools` — there is no "allow all" shortcut.
- `Bash` is the highest-risk tool: on a developer workstation it can delete files, install packages, and run arbitrary code; omit it unless shell execution is explicitly required.
- Session JSONL files are stored under `~/.claude/sessions/` by default; in production, set `CLAUDE_SESSIONS_DIR` to a path with an appropriate retention policy.

Multi-cloud authentication

If you run behind Bedrock, Vertex AI, or Azure, the SDK respects environment variables — you don't change any code:

```bash # Amazon Bedrock export CLAUDE_CODE_USE_BEDROCK=1 # Then configure AWS credentials normally aws configure # or use IAM roles

The ANTHROPIC_API_KEY environment variable is still checked first. If it's set, it wins over cloud provider credentials.

Try this · claude-sonnet-4-6

Find all TypeScript files in this project that import from '@anthropic-ai/claude-code' and list their paths.

Show expected output
The agent uses Grep with pattern '@anthropic-ai/claude-code' and glob '**/*.ts', returns a list of file paths that still use the old import. This is the first step of a real migration audit.

Hands-on exercise

Migrate a code-reviewer agent to the Claude Agent SDK.

Start with this minimal Claude Code SDK agent (or your own existing code):

```python # reviewer_old.py — uses the old SDK from claude_code_sdk import query, ClaudeCodeOptions

async def review_code(file_path: str): async for message in query( prompt=f"Review {file_path} for bugs and code quality issues", options=ClaudeCodeOptions( allowed_tools=["Read", "Glob", "Grep"], ), ): if hasattr(message, "result"): print(message.result) ```

Your tasks: 1. Install claude-agent-sdk (Python) or @anthropic-ai/claude-agent-sdk (TypeScript) 2. Update the import to from claude_agent_sdk import query, ClaudeAgentOptions 3. Rename ClaudeCodeOptions to ClaudeAgentOptions 4. Add session capture: print the session_id from the SystemMessage 5. Run the agent against any .py or .ts file in your project

Verification: The agent runs without import errors, produces a code review, and prints a session ID that looks like sess_01XxXxxXx….

Estimated time: 15 minutes

Knowledge check1 of 1
You're migrating a Python project from the Claude Code SDK to the Claude Agent SDK. Which of the following changes is required?

Subagents: orchestrating specialized agents

One of the most powerful Agent SDK features is the ability to spawn specialized subagents from within a parent agent. Subagents handle focused subtasks and report back results, enabling you to build multi-agent pipelines entirely in Python or TypeScript:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition, ResultMessage

async def review_and_document(codebase_path: str): """Parent agent that delegates to two specialists.""" async for message in query( prompt=f"Use the code-reviewer agent to review {codebase_path}, then use the doc-writer agent to create a README.", options=ClaudeAgentOptions( allowed_tools=["Read", "Glob", "Grep", "Write", "Agent"], agents={ "code-reviewer": AgentDefinition( description="Expert code reviewer for quality and security.", prompt="Analyze code quality, identify bugs, suggest improvements.", tools=["Read", "Glob", "Grep"], ), "doc-writer": AgentDefinition( description="Technical writer who creates clear documentation.", prompt="Write clear, accurate technical documentation.", tools=["Read", "Write"], ), }, ), ): if isinstance(message, ResultMessage): print(message.result)

asyncio.run(review_and_document("./src")) ```

The Agent tool must be in allowedTools for the parent to spawn subagents. Messages from within a subagent's context include a parent_tool_use_id field — use this to correlate subagent output back to the parent's tool call in your audit logs.

Note the pattern: the parent doesn't implement the reviewer or writer logic itself. It delegates, which keeps the parent's context window focused on orchestration rather than implementation. This is the right architecture for agents with more than two or three distinct skill sets.

- Subagents receive focused tasks via `AgentDefinition` with their own description, prompt, and tool list — keeping the parent's context window focused on orchestration.
- The `parent_tool_use_id` field in subagent messages correlates subagent output back to the parent tool call in audit logs.
- The `Agent` tool must appear in the parent's `allowedTools`; omitting it silently prevents subagent spawning.

Configuration file loading order

The SDK loads configuration from multiple sources, applied in a defined order. Understanding this prevents "why isn't my setting taking effect?" debugging sessions:

~/.claude/settings.json          # global user settings (lowest priority)
~/.claude/CLAUDE.md              # global system prompt additions
.claude/settings.json            # project settings
.claude/CLAUDE.md / CLAUDE.md    # project system prompt
inline ClaudeAgentOptions()      # runtime options (highest priority)

Later sources override earlier ones. This means you can set safe defaults globally and override them per-project or per-run without touching the global config.

To restrict which sources load — for example, in a CI environment where you don't want the developer's ~/.claude settings to affect the build — use settingSources:

options = ClaudeAgentOptions(
    allowed_tools=["Read", "Glob", "Grep"],
    setting_sources=["project"],  # only load .claude/ in the current project
)
const options = {
  allowedTools: ["Read", "Glob", "Grep"],
  settingSources: ["project"],  // ignores ~/.claude entirely
};

This is important for reproducibility: a CI agent should behave identically regardless of what's installed in the developer's home directory.

Skills and slash commands

The Agent SDK supports two additional configuration primitives that most tutorials skip: Skills and slash commands. Both are defined in Markdown files and loaded from the project .claude/ directory.

Skills are specialist instructions that extend the agent's capabilities for specific domains. A SKILL.md file at .claude/skills/<name>/SKILL.md is loaded into context when the agent needs that capability. This is how the Koenig AI Academy's own agents are extended — each agent has skills for its specialized workflows without bloating the base system prompt.

Slash commands are shorthand for common task templates. A review.md file at .claude/commands/review.md becomes a /review command that the agent can invoke. In the SDK context, you can trigger slash commands by starting a prompt with /.

These are the same skill and command systems that power Claude Code's daily usage, now fully available to your programmatic agents.

What's next

In Chapter 2 you'll meet Managed Agents — Anthropic's hosted agent harness that launched the same day as this SDK rename. You'll learn the decision rule for when to let Anthropic run your agent infrastructure vs running it yourself, and you'll wire up your first session with full SSE streaming. The pricing model has a non-obvious trap that most tutorials skip: we'll name it explicitly.

References

[1] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [2] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [3] Claude Agent SDK TypeScript releases — https://github.com/anthropics/claude-agent-sdk-typescript/releases · retrieved 2026-05-27 [4] Claude Agent SDK MCP documentation — https://code.claude.com/docs/en/agent-sdk/mcp · retrieved 2026-04-30 [5] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30 [6] Claude Agent SDK migration guide — https://docs.claude.com/en/docs/claude-code/sdk/migration-guide · retrieved 2026-05-27

Chapter 2 · 45 min

Managed Agents beta — when to use it, when to roll your own

Slide deck · PDF · 8 MB
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

Claude Managed Agents is Anthropic's hosted REST API for running Claude as an autonomous agent in a sandboxed cloud environment — launched in public beta on April 8, 2026, requiring the managed-agents-2026-04-01 beta header. Where the Agent SDK runs the agent loop in your own process, Managed Agents runs it in Anthropic's infrastructure: you send user messages, you stream results back. Anthropic handles the container, tool execution, and session persistence [1]. Verify current pricing in the official quickstart before launch [2].

Key facts

  1. All API requests require the managed-agents-2026-04-01 beta header [1].
  2. Pricing: Managed Agents runtime plus standard Claude token costs; verify current rates before launch [2].
  3. Rate limits: 300 RPM for create endpoints (agents, sessions, environments); 600 RPM for read endpoints [1].
  4. agent_toolset_20260401 enables Bash, file ops, web search, and MCP; outcomes and multiagent are research preview requiring separate access [1].

The four core concepts

Agent — saved configuration (model, system prompt, tools). Create once, reuse by agent.id. Think Docker image: build once, run many sessions from it.

Environment — cloud container template: packages, network rules. cloud config with unrestricted or restricted networking.

Session — one running agent+environment instance per task. Not reused; start a new one when the task is done.

Events — SSE stream: you send user.message; agent emits agent.message, agent.tool_use, then session.status_idle when done.

- Managed Agents uses four primitives: Agent (saved config), Environment (sandbox template), Session (running instance per task), and Events (SSE message stream).
- Sessions are not reused — one session equals one task; when the task is done, start a new session for the next task.
- Agent and Environment IDs are stable and should be created once and reused; only the Session is created per-task to avoid hitting the 300 create-requests-per-minute rate limit.

Creating your first agent

Install the Anthropic SDK (Managed Agents uses the standard client, not the Agent SDK):

pip install anthropic  # Python
npm install @anthropic-ai/sdk  # TypeScript

Create an agent once — save the returned agent.id:

```python from anthropic import Anthropic

client = Anthropic() # reads ANTHROPIC_API_KEY from env

agent = client.beta.agents.create( name="Data Analyst", model="claude-opus-4-7", system="You are a data analyst. When given a dataset, summarize it with statistics and key insights.", tools=[ {"type": "agent_toolset_20260401"}, # enables Bash, file ops, web search ], )

print(f"Agent ID: {agent.id}") # save this print(f"Agent version: {agent.version}") ```

```typescript import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const agent = await client.beta.agents.create({ name: "Data Analyst", model: "claude-opus-4-7", system: "You are a data analyst. When given a dataset, summarize it with statistics and key insights.", tools: [{ type: "agent_toolset_20260401" }], });

console.log(Agent ID: ${agent.id}); ```

Creating an environment

```python environment = client.beta.environments.create( name="analyst-env", config={ "type": "cloud", "networking": {"type": "unrestricted"}, # allows outbound web access }, )

print(f"Environment ID: {environment.id}") # save this too ```

The environment is a one-time setup. Use unrestricted for most workloads; restricted blocks outbound access for sensitive data.

Starting a session and streaming events

Open the stream, then immediately send the first user message:

```python import asyncio from anthropic import Anthropic

client = Anthropic()

for event in stream: match event.type: case "agent.message": for block in event.content: print(block.text, end="", flush=True) case "agent.tool_use": print(f"\n[Tool: {event.name}]", flush=True) case "session.status_idle": print("\n\n[Session complete]") break ```

```typescript const session = await client.beta.sessions.create({ agent: agentId, environment_id: environmentId, title: "Analyze Q1 sales data", });

const stream = await client.beta.sessions.events.stream(session.id);

await client.beta.sessions.events.send(session.id, { events: [{ type: "user.message", content: [{ type: "text", text: "Sales data: [120, 340, 290, 410, 380]. Compute mean, median, std dev. Show Python code." }] }] });

for await (const event of stream) { if (event.type === "agent.message") { for (const block of event.content) process.stdout.write(block.text); } else if (event.type === "agent.tool_use") { console.log(\n[Tool: ${event.name}]); } else if (event.type === "session.status_idle") { console.log("\n[Session complete]"); break; } } ```

Try this · claude-opus-4-7

You are running inside a Managed Agents session. The user has sent: 'Here is some sales data as a Python list: [120, 340, 290, 410, 380]. Compute mean, median, and standard deviation. Show your work i…

Show expected output
Claude emits an agent.message with a plan, then an agent.tool_use event for Bash, then another agent.message with results like: mean=308.0, median=340.0, std_dev=109.3. The session then emits session.status_idle.
- Open the SSE stream before sending the first `user.message` event; events arrive in real time, including `agent.tool_use` calls and `agent.message` responses.
- The `session.status_idle` event is the canonical signal that the agent has finished working; break the stream loop when you see it.
- Always close idle sessions explicitly with `client.beta.sessions.update(session.id, status="completed")` to avoid ongoing runtime cost exposure.

Session lifecycle and cost

Managed Agents cost depends on session lifetime, not just active generation time. A session left idle after session.status_idle can accrue runtime exposure — verify current billing rules in the official quickstart [2]. Never use Managed Agents for polling loops; use the Agent SDK with a cron job instead. For cost circuit breakers and audit hooks that protect production sessions, see Chapter 5.

Always close idle sessions explicitly:

client.beta.sessions.update(session.id, status="completed")

Decision rule: Managed Agents vs Agent SDK

ScenarioUse
Long-running task (>5 min), async, need cloud sandboxManaged Agents
Agent needs to operate on files on your own server/filesystemAgent SDK
You need custom in-process tool execution (Python functions)Agent SDK
You're prototyping locally; no cloud infra budget yetAgent SDK
You need to serve many concurrent agent sessions to end usersManaged Agents (they handle the infrastructure)

<Callout type="hot"> Managed Agents is in public beta as of April 2026. The managed-agents-2026-04-01 beta header is required on every request. Behaviors can be refined between releases. Two capabilities — outcomes and multiagent — are in research preview and require a separate access request at claude.com/form/claude-managed-agents. Do not build production features that depend on research-preview capabilities without direct Anthropic support. </Callout>

- Use Managed Agents for long-running (>5 min), async tasks needing a cloud sandbox; use the Agent SDK for short, stateless, webhook-triggered, or locally-executed work.
- Runtime pricing has two components: Managed Agents runtime plus standard Claude token costs — verify current rates in the official quickstart before launch.
- The `managed-agents-2026-04-01` beta header is required on every request; outcomes and multiagent are in research preview and require a separate access request.

Hands-on exercise

Ship a Managed Agents session streaming a data analysis task to your terminal.

  1. Create agent: model: "claude-opus-4-7", tools: [{ type: "agent_toolset_20260401" }]
  2. Create environment: type: "cloud", networking: { type: "unrestricted" }
  3. Create session; send: "Fetch https://jsonplaceholder.typicode.com/todos (10 items), filter completed, print titles. Run it."
  4. Print tool name per agent.tool_use, text per agent.message

Verify: At least one [Tool: bash] line, ending with [Session complete]. Est. time: 20 min

Rate limits

The 300 RPM create limit is shared across agent, environment, and session creates. Pre-create agents and environments once — only sessions are per-task:

```python # Create once, store these IDs AGENT_ID = "agt_01XxXxxXx" # created once, reused forever ENVIRONMENT_ID = "env_01YyYyyYy" # created once, reused forever

What's next

Chapter 3 covers MCP tool servers — three transport modes and the permission grants that make them work.

References

[1] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30 [2] Claude Managed Agents Quickstart — https://platform.claude.com/docs/en/managed-agents/quickstart · retrieved 2026-04-30 [3] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [4] Managed Agents Beta Header Documentation — https://platform.claude.com/docs/en/api/beta-headers · retrieved 2026-05-14 [5] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [6] Model Context Protocol introduction — https://modelcontextprotocol.io/docs/getting-started/intro · retrieved 2026-05-14

Chapter 3 · 50 min

MCP connector: orchestrating multi-server agents

Slide deck · PDF · 8 MB
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

The MCP connector in the Claude Agent SDK attaches external tool servers — databases, APIs, browsers — to an agent at runtime. Three transport modes (stdio, HTTP, SSE) handle connection management, tool discovery, and error signaling automatically [1]. For a breakdown of which community servers teams are actually deploying, see MCP server adoption 2026.

Key facts

  1. MCP tools are named mcp__<server-name>__<tool-name> — e.g., server "github" + tool list_issues = mcp__github__list_issues [1].
  2. MCP tools need explicit allowedTools grants; permissionMode: "acceptEdits" does NOT cover MCP [1].
  3. stdio: local process; HTTP: stateless remote; SSE: streaming remote. Default stdio timeout: 60 seconds [1].
  4. Tool search is enabled by default — withholds tool definitions from context and loads only what's needed per turn [1].

The MCP naming convention

Given mcpServers key "github", every tool is prefixed mcp__github__. Example:

mcp__github__list_issues
mcp__github__search_issues
mcp__github__create_issue
mcp__github__get_pull_request

mcp__github__* allows all tools from the server; mcp__github__list_issues allows only that one.

- MCP tools follow the naming pattern `mcp__<server-name>__<tool-name>` where the server name is the key used in `mcpServers`, not the package name.
- Use `mcp__<server>__*` wildcards during development; narrow to specific tool names in production to minimize blast radius.
- All MCP tools require explicit `allowedTools` grants — `permissionMode: "acceptEdits"` does not auto-approve MCP tool calls.

The three transport types

stdio — local process servers

stdio is the most common transport for development and for community-published servers on npm or PyPI. The SDK spawns a child process and communicates over stdin/stdout.

```python from claude_agent_sdk import query, ClaudeAgentOptions

options = ClaudeAgentOptions( mcp_servers={ "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": {"GITHUB_TOKEN": "ghp_xxxxxxxxxxxx"}, } }, allowed_tools=["mcp__github__list_issues", "mcp__github__search_issues"], )

async for message in query( prompt="List the 5 most recent open issues in anthropics/claude-code", options=options, ): if hasattr(message, "result"): print(message.result) ```

```typescript import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({ prompt: "List the 5 most recent open issues in anthropics/claude-code", options: { mcpServers: { github: { command: "npx", args: ["-y", "@modelcontextprotocol/server-github"], env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN } } }, allowedTools: ["mcp__github__list_issues", "mcp__github__search_issues"] } })) { if (message.type === "result" && message.subtype === "success") { console.log(message.result); } } ```

HTTP — stateless remote servers

Use HTTP for cloud-hosted servers that expose a standard MCP endpoint. No child process, no local installation required:

options = ClaudeAgentOptions(
    mcp_servers={
        "claude-code-docs": {
            "type": "http",
            "url": "https://docs.anthropic.com/en/docs/claude-code/sdk/sdk-mcp",
        }
    },
    allowed_tools=["mcp__claude-code-docs__*"],
)
options = {
  mcpServers: {
    "remote-api": {
      type: "http",
      url: "https://api.yourcompany.com/mcp",
      headers: {
        Authorization: `Bearer ${process.env.API_TOKEN}`
      }
    }
  },
  allowedTools: ["mcp__remote-api__*"]
}

SSE — streaming remote servers

SSE is the right transport when the remote server needs to push events as it processes (e.g., long-running queries, real-time data feeds):

options = ClaudeAgentOptions(
    mcp_servers={
        "analytics-stream": {
            "type": "sse",
            "url": "https://analytics.yourcompany.com/mcp/sse",
            "headers": {"Authorization": f"Bearer {os.environ['ANALYTICS_TOKEN']}"},
        }
    },
    allowed_tools=["mcp__analytics-stream__*"],
)

The SDK transparently handles SSE reconnection — you don't need to manage the event stream yourself.

Orchestrating three servers in one agent

Multiple servers with different transports go in one mcpServers dict:

```python import asyncio import os from claude_agent_sdk import ( query, ClaudeAgentOptions, SystemMessage, ResultMessage, AssistantMessage )

async def investigate_issue(issue_ref: str, db_connection: str): """Pull a GitHub issue, query related DB records, write a summary.""" options = ClaudeAgentOptions( mcp_servers={ # stdio: GitHub MCP server "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": {"GITHUB_TOKEN": os.environ["GITHUB_TOKEN"]}, }, # stdio: Postgres MCP server "postgres": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres", db_connection], }, # HTTP: Cloud docs server "docs": { "type": "http", "url": "https://docs.anthropic.com/en/docs/claude-code/sdk/sdk-mcp", }, }, allowed_tools=[ "mcp__github__get_issue", "mcp__github__list_comments", "mcp__postgres__query", # read-only "mcp__docs__*", # all doc tools ], )

prompt = ( f"1. Fetch the GitHub issue at {issue_ref}. " "2. Query the postgres DB for any records mentioning the issue number. " "3. Look up relevant documentation from the docs server. " "4. Write a one-paragraph summary of what the issue is about and whether the DB has related data." )

async for message in query(prompt=prompt, options=options): # Verify all three servers connected on the first message if isinstance(message, SystemMessage) and message.subtype == "init": servers = message.data.get("mcp_servers", []) for server in servers: status = server.get("status") name = server.get("name") if status != "connected": print(f"WARNING: {name} failed to connect — {server}") # Show which MCP tools are being called if isinstance(message, AssistantMessage): for block in message.content: if hasattr(block, "name") and block.name.startswith("mcp__"): print(f"[MCP call: {block.name}]") if isinstance(message, ResultMessage) and message.subtype == "success": print(message.result)

asyncio.run(investigate_issue( issue_ref="anthropics/claude-code#1234", db_connection=os.environ["DATABASE_URL"], )) ```

- Multiple MCP servers with different transport types (stdio, HTTP, SSE) can be configured in a single `mcpServers` dict; the agent uses whichever tools match the task.
- Check the `mcp_servers` list in the `SystemMessage` init event before the agent starts work to catch connection failures before tokens are wasted.
- Never hard-code secrets in `mcpServers.env` — use `os.environ["KEY"]` or `process.env.KEY` to pull credentials from environment variables.

Why permissionMode: "acceptEdits" is not enough

The Agent SDK has three permission modes:

ModeWhat it auto-approvesAuto-approves MCP?
defaultNothing — every tool call prompts for approvalNo
acceptEditsFile edit and filesystem Bash commandsNo
bypassPermissionsEverything including MCPYes (but dangerous)

acceptEdits does not cover MCP. The agent sees the tools but refuses to call them without explicit grants:

```python # WRONG — permissionMode doesn't cover MCP options = ClaudeAgentOptions( permission_mode="acceptEdits", mcp_servers={"github": github_config}, )

bypassPermissions disables all safety checks — do not use it to work around missing allowedTools. The complete production-safe permission model — combining allowedTools, permissionMode, and cost circuit breakers — is detailed in Chapter 5.

Detecting connection failures

The SystemMessage with subtype init arrives before the agent does any work. Check its mcp_servers list — servers fail silently otherwise:

async for message in query(prompt=..., options=options):
    if isinstance(message, SystemMessage) and message.subtype == "init":
        failed = [
            s for s in message.data.get("mcp_servers", [])
            if s.get("status") != "connected"
        ]
        if failed:
            # Abort or handle gracefully before the agent wastes tokens
            raise RuntimeError(f"MCP servers failed to connect: {failed}")
for await (const message of query({ prompt, options })) {
  if (message.type === "system" && message.subtype === "init") {
    const failed = message.mcp_servers.filter(s => s.status !== "connected");
    if (failed.length > 0) {
      throw new Error(`MCP servers failed: ${JSON.stringify(failed)}`);
    }
  }
}

Common failure causes by transport:

  • stdio: npx not on PATH, package not published, missing env vars
  • HTTP: URL unreachable, invalid SSL certificate, wrong endpoint path
  • SSE: CORS headers missing on the server, auth token expired

Pre-warm slow stdio servers before querying to avoid the 60-second connection timeout.

- Check the `mcp_servers` list in the `SystemMessage` init event before the agent does any work — servers fail silently if you don't inspect this event.
- The three most common stdio failure causes are: `npx` not on PATH, missing environment variables, and servers that take longer than 60 seconds to start.
- Pre-warm slow server processes before starting a query to avoid the default 60-second connection timeout.

Project-level config with .mcp.json

Put shared servers in .mcp.json at the project root — the SDK loads it automatically:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "${DATABASE_URL}"]
    }
  }
}

${VAR} expands environment variables at load time — credentials stay out of code.

Tool search for large tool sets

Tool search is enabled by default: the SDK withholds all tool definitions from context and loads only those relevant to each turn via vector similarity search. With 200 tools across servers, this prevents context exhaustion before any work begins. Disable per-server via mcpServers config if a server's tools always need to be in context.

- Tool search is enabled by default; it withholds all tool definitions from context and loads only tools relevant to each turn using vector similarity search.
- Without tool search, a system with 200 MCP tools sends every definition to Claude on every turn, consuming large amounts of context window before any work begins.
- Project-level `.mcp.json` files keep MCP config declarative and version-controllable; use `${VAR}` syntax for environment variable expansion.

Hands-on exercise

Wire GitHub (stdio) + Postgres (stdio) + Claude Code docs (HTTP) into one agent.

Setup: GITHUB_TOKEN (repo:read), DATABASE_URL (any Postgres instance).

Prompt: "Get the README from anthropics/claude-code. Check for an 'issues' table in postgres. Look up 'hooks' in the docs. Write a three-sentence summary."

Verify: init shows all 3 servers connected; at least 2 different mcp__* tool calls appear. Est. time: 25 min

Knowledge check1 of 1
Your agent is configured with `permissionMode: 'acceptEdits'` and an MCP server named `db`. You've added the server to `mcpServers` but NOT listed any MCP tools in `allowedTools`. What happens when Claude tries to call `mcp__db__query`?

What's next

Chapter 4 covers the Files API and code execution tool — upload documents once, reference by file_id, generate and download chart output.

References

[1] Agent SDK MCP Connector — https://code.claude.com/docs/en/sdk/sdk-mcp · retrieved 2026-06-14 [2] Model Context Protocol specification — https://modelcontextprotocol.io/docs/getting-started/intro · retrieved 2026-04-30 [3] MCP server registry — https://github.com/modelcontextprotocol/servers · retrieved 2026-04-30 [4] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [5] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [6] MCP OAuth 2.1 specification — https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization · retrieved 2026-04-30

Chapter 4 · 45 min

Files API + code execution: the complete agent IO surface

Slide deck · PDF · 11 MB
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

The Anthropic Files API lets you upload a file once (up to 500 MB), receive a persistent file_id, and reference it across multiple Messages calls without retransmitting bytes. The savings are bandwidth and latency — you still pay full input tokens each time a file_id appears in a Messages request [1]. This chapter covers the complete IO surface: Files API for document persistence, code execution for computation, and downloading generated artifacts.

Key facts

  1. Beta header files-api-2025-04-14 required on every request [1].
  2. Max file size: 500 MB; workspace storage: 500 GB per org [1].
  3. Storage operations (upload, download, list, delete) are free; file content is billed as input tokens on each Messages reference [1].
  4. Code execution billed as container runtime (5-min minimum) plus normal token costs; verify current rate [7].
  5. Files API: not eligible for ZDR; not available on Bedrock or Vertex AI; any workspace API key can delete any file [1].
  6. You can only download files created by code execution or skills — not files you uploaded [1].

Content block types by file format

Each file type maps to a specific content block — using the wrong one returns a 400 error:

File typeMIME typeContent blockUse case
PDFapplication/pdfdocumentDocument analysis, citations
Plain texttext/plaindocumentLogs, markdown, config files
JPEG, PNG, GIF, WebPimage/*imageVisual analysis, screenshots
CSV, datasets, binariesvariescontainer_uploadCode execution, data analysis

For .docx, .xlsx, .md: convert to plain text or PDF first.

- PDFs and plain text use the `document` content block type; images use `image`; files passed to code execution use `container_upload` — using the wrong type returns a 400 error.
- The Files API beta header `files-api-2025-04-14` is required on every request.
- Maximum file size is 500 MB per file; total workspace storage is 500 GB per organization.

Uploading files

Install the Anthropic SDK (not the Agent SDK):

pip install anthropic

Upload a PDF and an image:

```python from anthropic import Anthropic

client = Anthropic()

```typescript import Anthropic, { toFile } from "@anthropic-ai/sdk"; import fs from "fs";

const anthropic = new Anthropic();

// Upload a PDF const pdfFile = await anthropic.beta.files.upload({ file: await toFile( fs.createReadStream("quarterly_report.pdf"), undefined, { type: "application/pdf" } ), }); console.log(PDF file_id: ${pdfFile.id}); ```

The returned file_id is permanent until you delete it. Store it in your database alongside the document metadata.

Referencing files in Messages calls

Once uploaded, reference the file_id using the appropriate content block type. You don't need the file's bytes — just the ID:

```python # Three queries against the same PDF — only one upload needed questions = [ "What were the total revenues in Q1?", "List the top 3 risk factors mentioned in this report.", "What is management's outlook for Q2?", ]

for question in questions: response = client.beta.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[{ "role": "user", "content": [ {"type": "text", "text": question}, { "type": "document", "source": { "type": "file", "file_id": pdf_file.id, }, "citations": {"enabled": True}, # request inline citations }, ], }], betas=["files-api-2025-04-14"], ) print(f"\n{question}") print(response.content[0].text) ```

For images, use the image content block type:

response = client.beta.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=512,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe what this chart shows."},
            {
                "type": "image",
                "source": {
                    "type": "file",
                    "file_id": image_file.id,
                },
            },
        ],
    }],
    betas=["files-api-2025-04-14"],
)

The billing reality

Storage operations are free. Every Messages call that references a file_id bills the file content as input tokens — 100 queries against the same PDF cost 100× the document's token price. Code execution adds container runtime cost on top. Use extended prompt caching (1-hour TTL) when querying the same document many times in one session to drop repeated calls to ~10% of full input price. For session-level cost controls and PreToolUse circuit breakers, see Chapter 5.

- File storage operations (upload, download, list, metadata, delete) are free; file content is billed as input tokens every time a `file_id` is referenced in a Messages request.
- The "upload once" pitch saves bandwidth and latency but not token cost — 100 queries against the same file cost 100× the document's token price.
- Enable 1-hour extended prompt caching when running many queries against the same document in one session to reduce per-call costs to approximately 10% of full input price.

Code execution with the Files API

Unlike MCP tool servers in Chapter 3 — which connect to external services — code execution runs within Anthropic's infrastructure. Pass files via container_upload blocks, run code, and download output files:

```python # Upload a dataset for code execution with open("sales_data.csv", "rb") as f: dataset = client.beta.files.upload( file=("sales_data.csv", f, "text/plain"), )

Now download the generated chart:

# Download the generated chart
chart_content = client.beta.files.download(output_file_id)
chart_content.write_to_file("monthly_totals.png")
print("Chart downloaded to monthly_totals.png")
Try this · claude-sonnet-4-5

I have a CSV with columns: month, product, revenue. Using the code execution tool, compute the top 3 products by total revenue and create a horizontal bar chart. Return the file_id of the saved PNG.

Show expected output
Claude writes Python code using pandas and matplotlib. The code reads the CSV from the container, computes `.groupby('product')['revenue'].sum().nlargest(3)`, generates a horizontal bar chart with `plt.barh()`, saves it as `top_products.png`. The tool_result block includes a `file_id` for the output PNG that can be passed to `client.beta.files.download()`.

File lifecycle management

Files persist until you explicitly delete them. For production agents, you need a retention policy:

```python import datetime

def cleanup_old_files(client: Anthropic, max_age_days: int = 30): """Delete files older than max_age_days.""" files = client.beta.files.list() cutoff = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=max_age_days) deleted = 0 for file in files.data: created = datetime.datetime.fromisoformat(file.created_at) if created < cutoff: client.beta.files.delete(file.id) deleted += 1 return deleted ```

- Files persist until explicitly deleted; build a retention policy from day one to avoid hitting the 500 GB per-organization storage limit.
- A `cleanup_old_files()` function that checks `created_at` and deletes stale entries is the minimum viable retention policy for production agents.
- The Files API rate limit during beta is approximately 100 requests per minute; batch bulk uploads during off-peak windows if you need to ingest many documents at once.

Extended prompt caching with Files API

Add cache_control: {type: "ephemeral"} to a document block to cache it. First call pays full token cost; subsequent calls within the TTL window pay ~10% of full input price:

response = client.beta.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What are the payment terms?"},
            {
                "type": "document",
                "source": {"type": "file", "file_id": pdf_file.id},
                "cache_control": {"type": "ephemeral"},  # cache this document
            },
        ],
    }],
    betas=["files-api-2025-04-14", "prompt-caching-2024-07-31"],
)

Cache is keyed on exact content — re-uploading a changed file resets it.

Hands-on exercise

Upload a PDF once and run three analytical queries; bonus: download a generated chart.

  1. Upload any PDF; store file_id
  2. Query 1: "What is the main topic? Summarize in 3 sentences."
  3. Query 2: "List all named organizations mentioned."
  4. Query 3: "What are the 5 most important statistics?" with citations: {enabled: true}
  5. Bonus: Pass Q2 org list as CSV to code execution; download output PNG

Verify: Same file_id in all 3 calls; Q3 includes inline citations. Est. time: 20 min (30 with bonus)

Knowledge check1 of 1
A developer uploads a 2 MB PDF to the Files API and then uses the same file_id in 50 separate Messages API calls over one month. Which costs does she pay?

What's next

Chapter 5 covers production hardening: hooks, cost circuit breakers, and the deployment checklist.

References

[1] Files API — https://platform.claude.com/docs/en/build-with-claude/files · retrieved 2026-04-30 [2] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [3] Claude Managed Agents Tools — https://platform.claude.com/docs/en/managed-agents/tools · retrieved 2026-04-30 [4] Code Execution Tool — https://platform.claude.com/docs/en/agents-and-tools/tool-use/code-execution-tool · retrieved 2026-04-30 [5] Files API Reference — https://platform.claude.com/docs/en/api/files-list · retrieved 2026-05-14 [6] Anthropic API and data retention — https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention · retrieved 2026-05-14 [7] Current code execution tool reference — https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/code-execution-tool · retrieved 2026-05-27

Chapter 5 · 45 min

Production: deploy + observability + cost controls

Slide deck · PDF · 20 MB
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

The Agent SDK hook system attaches Python or TypeScript callbacks to agent events: PreToolUse, PostToolUse, UserPromptSubmit, Stop, and permission events. Python SDK callbacks do not support SessionStart or SessionEnd; TypeScript callbacks add those [3]. The biggest production failure mode is cost runaway — this chapter gives you the four hooks and deployment checklist to prevent it.

Key facts

  1. Python SDK callbacks: tool, prompt, stop, compaction, permission, notification, subagent events — no SessionStart/SessionEnd; TypeScript adds session lifecycle [3].
  2. bypassPermissions disables ALL safety checks including file-edit prompts and destructive Bash confirmations [1].
  3. Session JSONL files: ~/.claude/sessions/ by default; redirect with CLAUDE_SESSIONS_DIR [1].
  4. PreToolUse can deny/allow before execution; PostToolUse runs after — use for logging, not prevention [3].

The hook system

Hooks are synchronous callbacks that run in your process. HookMatcher filters by tool name via regex:

```python from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher

async def my_hook(input_data: dict, tool_use_id: str, context: dict) -> dict: # Return {} to pass through, or raise to block return {}

options = ClaudeAgentOptions( hooks={ "PostToolUse": [ HookMatcher(matcher="Edit|Write", hooks=[my_hook]) ] } ) ```

The matcher is a Python regex. "Edit|Write" matches any tool whose name contains "Edit" or "Write". Use ".*" to match everything.

- Hooks are synchronous callback functions that run in your process before or after every tool call; `HookMatcher` filters by tool name using a Python regex.
- `PreToolUse` runs before execution — use it to block risky calls before side effects occur; `PostToolUse` runs after — use it for logging and audit, not prevention.
- Python SDK callbacks do not support `SessionStart` or `SessionEnd`; TypeScript SDK callbacks add these session lifecycle events.

Hook 1: Audit log (PostToolUse)

```python import asyncio import json import logging from datetime import datetime from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher

async def audit_file_change(input_data: dict, tool_use_id: str, context: dict) -> dict: tool_input = input_data.get("tool_input", {}) file_path = tool_input.get("file_path", tool_input.get("path", "unknown")) tool_name = input_data.get("tool_name", "unknown") log_entry = { "event": "file_modified", "timestamp": datetime.utcnow().isoformat() + "Z", "tool": tool_name, "file_path": file_path, "session_id": context.get("session_id", "unknown"), "tool_use_id": tool_use_id, } logger.info(json.dumps(log_entry)) return {} # pass through — don't block

options = ClaudeAgentOptions( allowed_tools=["Read", "Write", "Edit", "Bash", "Glob", "Grep"], hooks={ "PostToolUse": [ HookMatcher(matcher="Edit|Write", hooks=[audit_file_change]) ] } ) ```

Sample audit output: ``json {"event": "file_modified", "timestamp": "2026-04-30T10:23:44Z", "tool": "Edit", "file_path": "src/auth.py", "session_id": "sess_01XxXxxXx", "tool_use_id": "toolu_01Abc123"} ``

Hook 2: Cost circuit breaker (PreToolUse)

Use PreToolUse to block tool calls before filesystem or MCP side effects occur:

```python class CostCircuitBreaker: """Deny the next tool call once the application-managed token cap is reached.""" def __init__(self, max_input_tokens: int = 500_000): self.max_input_tokens = max_input_tokens self.total_input_tokens = 0

def update_usage(self, usage: dict) -> None: # Call this from your message loop when result/usage metadata is available. self.total_input_tokens = usage.get("input_tokens", self.total_input_tokens) async def check_cost(self, input_data: dict, tool_use_id: str, context: dict) -> dict: if self.total_input_tokens > self.max_input_tokens: return { "hookSpecificOutput": { "hookEventName": input_data["hook_event_name"], "permissionDecision": "deny", "permissionDecisionReason": ( f"Circuit breaker triggered: {self.total_input_tokens:,} input tokens " f"exceeds cap of {self.max_input_tokens:,}. Tool call blocked before execution." ), } } return {}

circuit_breaker = CostCircuitBreaker(max_input_tokens=500_000)

options = ClaudeAgentOptions( allowed_tools=["Read", "Write", "Edit", "Bash", "Glob", "Grep"], hooks={ "PreToolUse": [ HookMatcher(matcher=".*", hooks=[circuit_breaker.check_cost]) ] } ) ```

When circuit_breaker.check_cost returns permissionDecision: "deny", the current tool call is blocked before it executes and Claude receives the denial reason as feedback. The session JSONL is preserved, so you can inspect exactly what happened.

<Callout type="hot"> Do NOT block silently inside hooks. When a hook denies a tool call in production, you need the context to diagnose it. Log the full input_data, tool_use_id, and denial reason before returning permissionDecision: "deny". </Callout>

Hook 3: Session lifecycle telemetry

SessionStart/SessionEnd are TypeScript-only SDK callbacks. In Python, emit the session-start event when the first message arrives:

async def log_session_start_from_first_message(message, logger):
    session_id = getattr(message, "session_id", "unknown")
    start_event = {
        "event": "session_started",
        "timestamp": datetime.utcnow().isoformat() + "Z",
        "session_id": session_id,
        "environment": os.environ.get("DEPLOY_ENV", "development"),
    }
    logger.info(json.dumps(start_event))

TypeScript for true SessionStart support:

```typescript import { query } from "@anthropic-ai/claude-agent-sdk";

const sessionStart = async (input, toolUseId, context) => { console.log(JSON.stringify({ event: "session_started", session_id: input.session_id, cwd: input.cwd, timestamp: new Date().toISOString() })); return {}; };

for await (const message of query({ prompt: "Run the production agent", options: { hooks: { SessionStart: [{ hooks: [sessionStart] }] } } })) { console.log(message); } ```

Hook 4: Prompt sanitization (UserPromptSubmit)

Fires before the user message reaches the model — strip PII here:

```python import re

PHONE_RE = re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b') SSN_RE = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')

async def sanitize_prompt(input_data: dict, tool_use_id: str, context: dict) -> dict: prompt = input_data.get("prompt", "") # Redact phone numbers and SSNs cleaned = PHONE_RE.sub("[PHONE_REDACTED]", prompt) cleaned = SSN_RE.sub("[SSN_REDACTED]", cleaned) if cleaned != prompt: logger.warning(json.dumps({ "event": "pii_redacted", "session_id": context.get("session_id"), "patterns_found": ["phone" if PHONE_RE.search(prompt) else None, "ssn" if SSN_RE.search(prompt) else None] })) # Return modified input_data with cleaned prompt return {**input_data, "prompt": cleaned}

options = ClaudeAgentOptions( hooks={ "UserPromptSubmit": [ HookMatcher(matcher=".*", hooks=[sanitize_prompt]) ], # ... other hooks } ) ```

- `UserPromptSubmit` fires before the user message reaches the model — use it to strip PII, redact phone numbers and SSNs, and prevent sensitive data from entering the model's context.
- Always log when PII redaction occurs, including which pattern was found and the session ID, to maintain compliance audit trails.
- The four production hooks together cover the full agent lifecycle: input sanitization, pre-execution cost control, post-execution audit logging, and session telemetry.

The complete production hook stack

def production_options(
    allowed_tools: list[str],
    mcp_servers: dict = None,
    max_input_tokens: int = 500_000,
    permission_mode: str = "acceptEdits",
) -> ClaudeAgentOptions:
    cb = CostCircuitBreaker(max_input_tokens=max_input_tokens)
    
    return ClaudeAgentOptions(
        allowed_tools=allowed_tools,
        mcp_servers=mcp_servers or {},
        permission_mode=permission_mode,
        hooks={
            "UserPromptSubmit": [
                HookMatcher(matcher=".*", hooks=[sanitize_prompt])
            ],
            "PreToolUse": [
                HookMatcher(matcher=".*", hooks=[cb.check_cost]),
            ],
            "PostToolUse": [
                HookMatcher(matcher="Edit|Write", hooks=[audit_file_change]),
            ],
        }
    )

Usage — apply to the MCP agent from Chapter 3 or any agent with multiple tool calls:

# Apply to the MCP agent from Chapter 3
async for message in query(
    prompt="Investigate issue #1234 and write a summary",
    options=production_options(
        allowed_tools=["mcp__github__*", "mcp__postgres__query", "mcp__docs__*"],
        mcp_servers={
            "github": github_config,
            "postgres": postgres_config,
            "docs": docs_config,
        },
        max_input_tokens=1_000_000,  # verify the matching token budget against current model pricing
    ),
):
    if hasattr(message, "result"):
        print(message.result)
Try this · claude-sonnet-4-6

I'm running an agent with a PreToolUse hook backed by an application-managed token counter. After 12 tool calls, the counter shows 505,000 input tokens against a cap of 500,000. The agent is about to …

Show expected output
Claude explains: the PreToolUse hook runs before the next Edit call executes. Because the tracked total is already over 500,000 input tokens, the hook returns permissionDecision: deny with a reason. The first pending Edit is blocked before it writes to disk, and Claude receives feedback that the budget cap was exceeded. A PostToolUse guard would be too late for the edit that triggered it.

Langfuse integration

For a broader look at Langfuse setup and why it fits agent workloads, see AI agent observability with Langfuse 2026. Create the trace on first session message; add spans from PostToolUse hooks:

```python from langfuse import Langfuse

langfuse = Langfuse( public_key=os.environ["LANGFUSE_PUBLIC_KEY"], secret_key=os.environ["LANGFUSE_SECRET_KEY"], host=os.environ.get("LANGFUSE_HOST", "http://localhost:3100"), )

traces_by_session = {}

def langfuse_session_start(message): session_id = getattr(message, "session_id", "unknown") trace = langfuse.trace( id=session_id, name="agent_session", metadata={"environment": os.environ.get("DEPLOY_ENV", "dev")}, ) traces_by_session[session_id] = trace return trace

async def langfuse_tool_log(input_data: dict, tool_use_id: str, context: dict) -> dict: trace = traces_by_session.get(input_data.get("session_id")) if trace: trace.span( name=input_data.get("tool_name", "unknown_tool"), input=input_data.get("tool_input"), metadata={"tool_use_id": tool_use_id}, ) return {} ```

The five-step deployment checklist

1. Permissions are minimal - allowedTools names specific tools — no .* wildcards - permissionMode: acceptEdits or default — never bypassPermissions

2. Cost controls are wired - PreToolUse circuit breaker with a tested token cap - Session timeout (Managed Agents: explicit status="completed")

3. Audit logging is active - Every Edit/Write logged: file path + session ID + timestamp - Structured JSON, not print statements

4. Secrets are out of config - No API keys in mcpServers.env — use os.environ["KEY"] - .mcp.json uses ${VAR} syntax

5. Session files have a retention policy - CLAUDE_SESSIONS_DIR with log rotation - JSONL files off user-facing storage

- Never use `bypassPermissions` in production; combine `permissionMode: "acceptEdits"` with explicit `allowedTools` grants to cover both file edits and MCP tool calls safely.
- Production agents must pass five checks: minimal permissions, wired cost controls, active audit logging, secrets out of config, and session files with a retention policy.
- Structured JSON logs — not print statements — enable per-session cost breakdown, error rate by tool type, and session duration distribution from day one.

Hands-on exercise

Add the production hook stack to an existing agent and verify the circuit breaker fires.

  1. Apply production_options() to any multi-tool agent
  2. Set max_input_tokens=50_000 (intentionally low)
  3. Run: "Analyze every Python file in this directory and summarize each one's purpose"
  4. Confirm circuit breaker fires: permissionDecision: "deny" appears before all files are processed
  5. Check logs for file_modified and session_started entries

Verify: Session stops mid-run; raising cap to 2M allows full completion. Est. time: 20 min

Try this · claude-sonnet-4-6

I need to run a Claude Agent in a CI/CD pipeline where there's no human to approve tool calls. The agent reads test results, edits configuration files, and runs bash commands to restart services. What…

Show expected output
Claude recommends: use allowedTools with an explicit list (e.g. ['Read', 'Edit', 'Bash']) plus permissionMode: 'acceptEdits' — not bypassPermissions. This pre-approves file edits and Bash without disabling all safety checks. The agent can still be stopped by hooks. Risks to document: (1) Bash is allowed and can run destructive commands — scope the working directory; (2) Edit can overwrite production config — add a PostToolUse hook that logs every edit to a change log; (3) No human review means runaway loops go undetected — add a token circuit breaker.

What's next

The capstone project ties all five chapters together: a production research agent that orchestrates GitHub + Postgres + a cloud docs MCP server, uses the Files API for document context, and runs behind the complete hook stack. Details in the course outline.

References

[1] Claude Agent SDK Overview — https://code.claude.com/docs/en/sdk · retrieved 2026-06-14 [2] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30 [3] Agent SDK Hooks — https://code.claude.com/docs/en/agent-sdk/hooks · retrieved 2026-05-14 [4] Claude Agent SDK Permissions — https://code.claude.com/docs/en/agent-sdk/permissions · retrieved 2026-04-30 [5] Files API — https://platform.claude.com/docs/en/build-with-claude/files · retrieved 2026-04-30 [6] Langfuse Observability — https://langfuse.com · retrieved 2026-04-30