← All courses 240 min5 chaptersBuilderanthropic

Production Agents with Claude Agent SDK + MCP Connector

Python or TypeScript developers who have used the Claude Messages API at least once and understand what an API key is. New to the Agent SDK, Managed Agents, and MCP.

What you'll learn
  • Migrate a project from the Claude Code SDK to the Claude Agent SDK without breaking changes
  • Choose between Managed Agents and Agent SDK for a production workload with confidence
  • Wire three MCP servers (stdio + HTTP + SSE) into a single agent with proper auth and error handling
  • Upload, reference, and manage files with the Files API across multi-turn agent sessions
  • Deploy a production agent with structured logging, cost circuit breakers, and observability hooks
Chapters in this course
1What changed when Claude Code SDK became Claude Agent SDK35m2Managed Agents beta — when to use it, when to roll your own45m3MCP connector: orchestrating multi-server agents50m4Files API + code execution: the complete agent IO surface45m5Production: deploy + observability + cost controls45m
Chapter 1 · 35 min

What changed when Claude Code SDK became Claude Agent SDK

The Claude Agent SDK is Anthropic's official library for embedding an autonomous agent loop — including built-in file operations, shell execution, web access, and subagent spawning — directly into a Python or TypeScript application, renamed from the Claude Code SDK in April 2026 alongside the public beta of Claude Managed Agents.

On April 8, 2026, Anthropic simultaneously shipped the renamed SDK, the Managed Agents REST API, and an explicit MCP connector guide. The rename wasn't a rebrand of the package alone; it came with a branding prohibition — partners may no longer call their products "Claude Code" or use Claude Code ASCII art — and with a note that Opus 4.7 requires SDK version v0.2.111 or later [1].

> Prerequisites: None — this is Chapter 1. > > Time: 35 minutes > > Learning objectives: By the end of this chapter you can install the renamed SDK, update your imports, run your first query() call, and explain what the rename means for your production roadmap.

Key facts

  1. The npm package changed from @anthropic-ai/claude-code-sdk to @anthropic-ai/claude-agent-sdk; the PyPI package changed from claude-code to claude-agent-sdk [1].
  2. Opus 4.7 (claude-opus-4-7) requires Agent SDK v0.2.111 or later; older SDK versions throw a thinking.type.enabled API error when targeting Opus 4.7 [1].
  3. The TypeScript SDK bundles a native Claude Code binary for your platform as an optional dependency — you no longer need a separate Claude Code installation [1].
  4. Authentication on Amazon Bedrock, Google Vertex AI, and Microsoft Azure Foundry is controlled entirely by environment variables (CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX, CLAUDE_CODE_USE_FOUNDRY), not constructor arguments [1].
  5. The branding guidelines explicitly prohibit partners from using the names "Claude Code," "Claude Code Agent," or Claude Code-branded ASCII art — a signal that the SDK is now a platform, not a feature of a specific product [1].
  6. Session state is stored as JSONL on your filesystem and can be resumed by passing resume: sessionId in your options [1].

The rename isn't cosmetic

Most developers saw the April 2026 announcement and ran npm install @anthropic-ai/claude-agent-sdk. Done, right? Not quite.

The rename matters strategically because it de-couples the SDK from Claude Code the developer product. Claude Code is a terminal app; the Claude Agent SDK is now a general-purpose platform library. By prohibiting partners from calling their products "Claude Code," Anthropic is drawing a hard line: Claude Code is the consumer app, the Agent SDK is the infrastructure you build on. If you're building a product on top of this SDK, that distinction matters for your own naming and positioning.

There's also a real technical signal in the version requirement. Requiring v0.2.111 for Opus 4.7 means Anthropic is now coupling model releases to SDK versions in a way they weren't before. You need to track SDK versions actively, not just pin to a major.

Installing the renamed SDK

TypeScript

```bash # Remove the old package npm uninstall @anthropic-ai/claude-code-sdk

Python

```bash # Remove the old package pip uninstall claude-code

After installing, verify the version:

```bash # TypeScript: check package.json cat package.json | grep claude-agent-sdk # → "@anthropic-ai/claude-agent-sdk": "^0.2.111"

Updating your imports

Every import in your existing code needs to change. This is a search-and-replace operation, not a logic change.

TypeScript — before

``typescript import { query } from "@anthropic-ai/claude-code-sdk"; import type { ClaudeCodeOptions } from "@anthropic-ai/claude-code-sdk"; ``

TypeScript — after

``typescript import { query } from "@anthropic-ai/claude-agent-sdk"; import type { ClaudeAgentOptions } from "@anthropic-ai/claude-agent-sdk"; ``

Note: the options type renamed from ClaudeCodeOptions to ClaudeAgentOptions.

Python — before

``python from claude_code_sdk import query, ClaudeCodeOptions ``

Python — after

``python from claude_agent_sdk import query, ClaudeAgentOptions ``

The query() API in 2 minutes

The core API hasn't changed between SDK versions. query() is an async generator that yields message objects as the agent works through a task. The simplest possible call:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions

async def main(): async for message in query( prompt="What files are in this directory?", options=ClaudeAgentOptions(allowed_tools=["Bash", "Glob"]), ): if hasattr(message, "result"): print(message.result)

asyncio.run(main()) ```

```typescript import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({ prompt: "What files are in this directory?", options: { allowedTools: ["Bash", "Glob"] } })) { if ("result" in message) console.log(message.result); } ```

The generator yields several message types. The ones you'll care about most:

| Type | When it fires | What it contains | |---|---|---| | SystemMessage (subtype init) | First, before any work | Session ID, connected MCP servers | | AssistantMessage | After each model turn | Claude's text + tool calls | | ToolResultMessage | After each tool execution | The tool's output | | ResultMessage | Last | Final answer, token usage, session ID |

▶ Try this · claude-sonnet-4-6

What is the current working directory? List the files in it.

Show expected output
The agent calls Bash with `pwd` and `ls`, then returns the directory path and a list of files. You see AssistantMessage objects containing tool_use blocks, followed by ToolResultMessage objects with the shell output, ending with a ResultMessage containing the synthesized answer.

Capturing and resuming sessions

Session continuity is one of the most underused features of the SDK. When the SystemMessage with subtype init arrives, grab the session_id:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions, SystemMessage, ResultMessage

session_id = None

async def first_query(): global session_id async for message in query( prompt="Read auth.py and tell me what it does", options=ClaudeAgentOptions(allowed_tools=["Read", "Glob"]), ): if isinstance(message, SystemMessage) and message.subtype == "init": session_id = message.data["session_id"] if isinstance(message, ResultMessage): print(message.result)

async def follow_up(): async for message in query( prompt="Now find every file that imports from auth.py", options=ClaudeAgentOptions(resume=session_id), ): if isinstance(message, ResultMessage): print(message.result)

async def main(): await first_query() await follow_up() # Claude already knows auth.py's contents

asyncio.run(main()) ```

The resume option re-opens the existing JSONL session file on your filesystem. Claude picks up with full context from the previous turn — no re-reading files, no redundant tool calls.

Built-in tools: the complete list

The Agent SDK ships ten built-in tools. You must declare which ones you allow explicitly — there's no "allow all built-ins" shortcut:

| Tool | What it does | Safe to allow broadly? | |---|---|---| | Read | Read any file in the working directory | Yes | | Write | Create new files | With caution | | Edit | Make precise edits to existing files | With caution | | Bash | Run terminal commands, scripts, git operations | No — scope carefully | | Monitor | Watch a background script, react to each stdout line | Yes | | Glob | Find files by pattern (**/*.ts, src/**/*.py) | Yes | | Grep | Search file contents with regex | Yes | | WebSearch | Search the web for current information | Yes | | WebFetch | Fetch and parse web page content | Yes | | AskUserQuestion | Ask the user clarifying questions with multiple choice | Yes |

The Bash tool is the one to be careful with. In a CI context with a fully sandboxed container it's fine. On a developer workstation, Bash can delete files, install packages, and run arbitrary code. If you don't need shell execution, don't include it.

Multi-cloud authentication

If you run behind Bedrock, Vertex AI, or Azure, the SDK respects environment variables — you don't change any code:

```bash # Amazon Bedrock export CLAUDE_CODE_USE_BEDROCK=1 # Then configure AWS credentials normally aws configure # or use IAM roles

The ANTHROPIC_API_KEY environment variable is still checked first. If it's set, it wins over cloud provider credentials.

▶ Try this · claude-sonnet-4-6

Find all TypeScript files in this project that import from '@anthropic-ai/claude-code-sdk' and list their paths.

Show expected output
The agent uses Grep with pattern '@anthropic-ai/claude-code-sdk' and glob '**/*.ts', returns a list of file paths that still use the old import. This is the first step of a real migration audit.

Hands-on exercise

Migrate a code-reviewer agent to the Claude Agent SDK.

Start with this minimal Claude Code SDK agent (or your own existing code):

```python # reviewer_old.py — uses the old SDK from claude_code_sdk import query, ClaudeCodeOptions

async def review_code(file_path: str): async for message in query( prompt=f"Review {file_path} for bugs and code quality issues", options=ClaudeCodeOptions( allowed_tools=["Read", "Glob", "Grep"], ), ): if hasattr(message, "result"): print(message.result) ```

Your tasks: 1. Install claude-agent-sdk (Python) or @anthropic-ai/claude-agent-sdk (TypeScript) 2. Update the import to from claude_agent_sdk import query, ClaudeAgentOptions 3. Rename ClaudeCodeOptions to ClaudeAgentOptions 4. Add session capture: print the session_id from the SystemMessage 5. Run the agent against any .py or .ts file in your project

Verification: The agent runs without import errors, produces a code review, and prints a session ID that looks like sess_01XxXxxXx….

Estimated time: 15 minutes

<KnowledgeCheck question="You're migrating a Python project from the Claude Code SDK to the Claude Agent SDK. Which of the following changes is required?" options={[ "Replace from claude_code_sdk import query with from claude_agent_sdk import query", "Replace from anthropic import Anthropic with from claude_agent_sdk import Anthropic", "Replace ClaudeCodeOptions with AgentOptions (not ClaudeAgentOptions)", "Add an agent_version parameter to every query() call" ]} correctIdx={0} explanation="The package rename is the only required import change. The class is ClaudeAgentOptions (not AgentOptions). The Anthropic client from the anthropic package is unchanged — it's the separate Messages/Managed Agents client, not the Agent SDK. The query() signature has no agent_version parameter." />

<KnowledgeCheck question="Your team has pinned to @anthropic-ai/claude-agent-sdk@^0.2.100 and wants to use claude-opus-4-7. What will happen and what should you do?" options={["self-check"]} correctIdx={0} explanation="Self-check: Opus 4.7 requires v0.2.111 or later. With ^0.2.100, npm will install the latest 0.2.x patch — which may or may not be ≥ 0.2.111 depending on when you run install. The safe fix is to pin to ^0.2.111 or later. If you see a thinking.type.enabled API error, that's the symptom of this version mismatch." />

Subagents: orchestrating specialized agents

One of the most powerful Agent SDK features is the ability to spawn specialized subagents from within a parent agent. Subagents handle focused subtasks and report back results, enabling you to build multi-agent pipelines entirely in Python or TypeScript:

```python import asyncio from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition, ResultMessage

async def review_and_document(codebase_path: str): """Parent agent that delegates to two specialists.""" async for message in query( prompt=f"Use the code-reviewer agent to review {codebase_path}, then use the doc-writer agent to create a README.", options=ClaudeAgentOptions( allowed_tools=["Read", "Glob", "Grep", "Write", "Agent"], agents={ "code-reviewer": AgentDefinition( description="Expert code reviewer for quality and security.", prompt="Analyze code quality, identify bugs, suggest improvements.", tools=["Read", "Glob", "Grep"], ), "doc-writer": AgentDefinition( description="Technical writer who creates clear documentation.", prompt="Write clear, accurate technical documentation.", tools=["Read", "Write"], ), }, ), ): if isinstance(message, ResultMessage): print(message.result)

asyncio.run(review_and_document("./src")) ```

The Agent tool must be in allowedTools for the parent to spawn subagents. Messages from within a subagent's context include a parent_tool_use_id field — use this to correlate subagent output back to the parent's tool call in your audit logs.

Note the pattern: the parent doesn't implement the reviewer or writer logic itself. It delegates, which keeps the parent's context window focused on orchestration rather than implementation. This is the right architecture for agents with more than two or three distinct skill sets.

Configuration file loading order

The SDK loads configuration from multiple sources, applied in a defined order. Understanding this prevents "why isn't my setting taking effect?" debugging sessions:

`` ~/.claude/settings.json # global user settings (lowest priority) ~/.claude/CLAUDE.md # global system prompt additions .claude/settings.json # project settings .claude/CLAUDE.md / CLAUDE.md # project system prompt inline ClaudeAgentOptions() # runtime options (highest priority) ``

Later sources override earlier ones. This means you can set safe defaults globally and override them per-project or per-run without touching the global config.

To restrict which sources load — for example, in a CI environment where you don't want the developer's ~/.claude settings to affect the build — use settingSources:

``python options = ClaudeAgentOptions( allowed_tools=["Read", "Glob", "Grep"], setting_sources=["project"], # only load .claude/ in the current project ) ``

``typescript const options = { allowedTools: ["Read", "Glob", "Grep"], settingSources: ["project"], // ignores ~/.claude entirely }; ``

This is important for reproducibility: a CI agent should behave identically regardless of what's installed in the developer's home directory.

Skills and slash commands

The Agent SDK supports two additional configuration primitives that most tutorials skip: Skills and slash commands. Both are defined in Markdown files and loaded from the project .claude/ directory.

Skills are specialist instructions that extend the agent's capabilities for specific domains. A SKILL.md file at .claude/skills/<name>/SKILL.md is loaded into context when the agent needs that capability. This is how the Koenig AI Academy's own agents are extended — each agent has skills for its specialized workflows without bloating the base system prompt.

Slash commands are shorthand for common task templates. A review.md file at .claude/commands/review.md becomes a /review command that the agent can invoke. In the SDK context, you can trigger slash commands by starting a prompt with /.

These are the same skill and command systems that power Claude Code's daily usage, now fully available to your programmatic agents.

What's next

In Chapter 2 you'll meet Managed Agents — Anthropic's hosted agent harness that launched the same day as this SDK rename. You'll learn the decision rule for when to let Anthropic run your agent infrastructure vs running it yourself, and you'll wire up your first session with full SSE streaming. The pricing model has a non-obvious trap that most tutorials skip: we'll name it explicitly.

References

[1] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [2] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [3] @anthropic-ai/claude-agent-sdk on npm — https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk · retrieved 2026-04-30 [4] Claude Agent SDK MCP documentation — https://code.claude.com/docs/en/agent-sdk/mcp · retrieved 2026-04-30 [5] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30

Chapter 2 · 45 min

Managed Agents beta — when to use it, when to roll your own

Claude Managed Agents is Anthropic's fully managed REST API for running Claude as an autonomous agent inside a cloud-hosted, sandboxed environment — launched in public beta on April 8, 2026, with all endpoints requiring the managed-agents-2026-04-01 beta header.

On the same day the Claude Code SDK was renamed, Anthropic shipped this hosted counterpart. Where the Agent SDK runs the agent loop inside your own process, Managed Agents runs it inside Anthropic's infrastructure. Your application becomes an event producer and consumer: you send user messages, you stream back results. Anthropic handles the container, the tool execution, the session persistence, and the compute [1]. The pricing reflects this: $0.08 per runtime hour plus standard Claude model usage, meaning an agent running 24/7 costs roughly $58 per month in infrastructure before a single token is billed [2].

> Prerequisites: Chapter 1 (Claude Agent SDK installed and one successful query() call completed) > > Time: 45 minutes > > Learning objectives: By the end of this chapter you can create a Managed Agents session, stream its events, detect completion, and choose correctly between Managed Agents and Agent SDK for a given workload.

Key facts

  1. Claude Managed Agents launched in public beta on April 8, 2026; all API requests require the managed-agents-2026-04-01 beta header [1].
  2. Pricing: $0.08 per runtime hour + standard Claude model token costs; the runtime clock runs from session creation to session termination [2].
  3. Rate limits: 300 requests per minute for create endpoints (agents, sessions, environments); 600 requests per minute for read endpoints (retrieve, list, stream) [1].
  4. The agent_toolset_20260401 tool type enables the full built-in toolset: Bash, file operations, web search and fetch, and MCP servers [1].
  5. Two features — outcomes and multiagent — are in research preview and require separate access approval [1].
  6. Session state (event history) is persisted server-side by Anthropic, not on your filesystem [1].

The four core concepts

Managed Agents introduces four concepts that don't exist in the Agent SDK. You need to understand all four before writing a single line of code.

Agent — a saved configuration: model, system prompt, tools, MCP servers, and skills. Create it once, reference it by agent.id across every session you start. Think of it as a Docker image: you build it, then run containers from it.

Environment — a cloud container template: pre-installed packages, network access rules, and mounted files. Today the only supported config type is cloud with unrestricted or restricted networking. The environment determines what's in the sandbox; the agent determines what thinks.

Session — a running instance of an agent inside an environment. One session = one task. Sessions are not reused. When the task is done, the session goes idle and you start a new one for the next task.

Events — the messages flowing between your application and the running session. You send user.message events; the agent emits agent.message, agent.tool_use, and eventually session.status_idle events back over SSE.

Creating your first agent

Install the Anthropic SDK (not the Agent SDK — Managed Agents uses the standard Anthropic client):

``bash pip install anthropic # Python npm install @anthropic-ai/sdk # TypeScript ``

Create an agent. This is a one-time operation — save the returned agent.id:

```python from anthropic import Anthropic

client = Anthropic() # reads ANTHROPIC_API_KEY from env

agent = client.beta.agents.create( name="Data Analyst", model="claude-opus-4-7", system="You are a data analyst. When given a dataset, summarize it with statistics and key insights.", tools=[ {"type": "agent_toolset_20260401"}, # enables Bash, file ops, web search ], )

print(f"Agent ID: {agent.id}") # save this print(f"Agent version: {agent.version}") ```

```typescript import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const agent = await client.beta.agents.create({ name: "Data Analyst", model: "claude-opus-4-7", system: "You are a data analyst. When given a dataset, summarize it with statistics and key insights.", tools: [{ type: "agent_toolset_20260401" }], });

console.log(Agent ID: ${agent.id}); ```

Creating an environment

```python environment = client.beta.environments.create( name="analyst-env", config={ "type": "cloud", "networking": {"type": "unrestricted"}, # allows outbound web access }, )

print(f"Environment ID: {environment.id}") # save this too ```

The environment is also a one-time setup. For most workloads, unrestricted networking is correct — your agent can fetch URLs, call APIs, and pull packages. For sensitive data processing, use restricted to block outbound access.

Starting a session and streaming events

This is where Managed Agents gets interesting. The pattern is: open a stream, then immediately send the first user message. Events arrive in real time via SSE:

```python import asyncio from anthropic import Anthropic

client = Anthropic()

for event in stream: match event.type: case "agent.message": for block in event.content: print(block.text, end="", flush=True) case "agent.tool_use": print(f"\n[Tool: {event.name}]", flush=True) case "session.status_idle": print("\n\n[Session complete]") break ```

```typescript const session = await client.beta.sessions.create({ agent: agentId, environment_id: environmentId, title: "Analyze Q1 sales data", });

const stream = await client.beta.sessions.events.stream(session.id);

await client.beta.sessions.events.send(session.id, { events: [{ type: "user.message", content: [{ type: "text", text: "Sales data: [120, 340, 290, 410, 380]. Compute mean, median, std dev. Show Python code." }] }] });

for await (const event of stream) { if (event.type === "agent.message") { for (const block of event.content) process.stdout.write(block.text); } else if (event.type === "agent.tool_use") { console.log(\n[Tool: ${event.name}]); } else if (event.type === "session.status_idle") { console.log("\n[Session complete]"); break; } } ```

▶ Try this · claude-opus-4-7

You are running inside a Managed Agents session. The user has sent: 'Here is some sales data as a Python list: [120, 340, 290, 410, 380]. Compute mean, median, and standard deviation. Show your work i…

Show expected output
Claude emits an agent.message with a plan, then an agent.tool_use event for Bash, then another agent.message with results like: mean=308.0, median=340.0, std_dev=109.3. The session then emits session.status_idle.

The pricing trap most tutorials skip

Here's the fact the quickstart buries: the $0.08 per runtime hour accrues from session creation to session termination — not from when the agent is actively processing. A session waiting for a user message, sleeping between tool calls, or paused after going idle but not explicitly closed still accrues runtime cost.

The operational implication:

  • Short, stateless tasks (under 5 minutes): Managed Agents is fine. The $0.08/hr works out to ~$0.007 per run.
  • Long interactive sessions (hours with gaps): runtime cost compounds fast. An agent session left open for 8 hours waiting for user input = $0.64 in runtime before tokens.
  • Polling loops ("check every 30 minutes"): never use Managed Agents for this. Use the Agent SDK with a cron job.

Always close idle sessions explicitly:

``python client.beta.sessions.update(session.id, status="completed") ``

Decision rule: Managed Agents vs Agent SDK

Apply this five-scenario decision table:

| Scenario | Use | |---|---| | Long-running task (>5 min), async, need cloud sandbox | Managed Agents | | Agent needs to operate on files on your own server/filesystem | Agent SDK | | You need custom in-process tool execution (Python functions) | Agent SDK | | You're prototyping locally; no cloud infra budget yet | Agent SDK | | You need to serve many concurrent agent sessions to end users | Managed Agents (they handle the infrastructure) |

The canonical migration path Anthropic documents is: prototype locally with the Agent SDK, then move to Managed Agents for production. But that path only makes sense if your production workload is long-running and async. If your agents run in 30-second bursts triggered by webhooks, the Agent SDK on a serverless function is cheaper and simpler.

<Callout type="hot"> Managed Agents is in public beta as of April 2026. The managed-agents-2026-04-01 beta header is required on every request. Behaviors can be refined between releases. Two capabilities — outcomes and multiagent — are in research preview and require a separate access request at claude.com/form/claude-managed-agents. Do not build production features that depend on research-preview capabilities without direct Anthropic support. </Callout>

Steering a session mid-execution

You can send additional user events to a running session to change direction without starting a new session:

``python # Session is running; you want to narrow the scope client.beta.sessions.events.send( session.id, events=[{ "type": "user.message", "content": [{"type": "text", "text": "Focus only on the top 3 products by revenue."}] }] ) ``

This is one of the most powerful Managed Agents features and the clearest difference from the Agent SDK: the agent is running remotely, and you can inject new instructions while it works. With the Agent SDK, you'd need to stop the generator and restart with a new prompt.

▶ Try this · claude-opus-4-7

Walk me through the Managed Agents session lifecycle: from agent creation through session completion. List each API call in order and what state change it produces.

Show expected output
Claude describes: (1) POST /v1/agents → returns agent.id; (2) POST /v1/environments → returns environment.id; (3) POST /v1/sessions with agent + environment_id → returns session.id in status 'created'; (4) GET /v1/sessions/{id}/stream (SSE) → stream opens; (5) POST /v1/sessions/{id}/events with user.message → agent begins work, emits agent.message and agent.tool_use events; (6) session.status_idle event signals completion.

Hands-on exercise

Ship a Managed Agents session that runs a multi-step data analysis task and streams all tool-use events to your terminal.

Steps: 1. Create an agent with model: "claude-opus-4-7" and tools: [{ type: "agent_toolset_20260401" }] 2. Create an environment with type: "cloud" and networking: { type: "unrestricted" } 3. Create a session referencing both 4. Send this user message: "Write a Python script that fetches the JSON from https://jsonplaceholder.typicode.com/todos (limit to 10 items), filters only completed todos, and prints each title. Run it." 5. Stream events and print: the tool name for every agent.tool_use event, and the text for every agent.message event

Verification: You see at least one [Tool: bash] line in your terminal output followed by the actual output of the Python script, ending with [Session complete].

Estimated time: 20 minutes

<KnowledgeCheck question="A team is building an AI coding assistant that responds to GitHub webhook events. Each request takes 15–30 seconds. The team is choosing between Managed Agents and Agent SDK. Which is more appropriate, and why?" options={[ "Agent SDK — short, stateless, webhook-triggered tasks don't benefit from Managed Agents' hosted runtime, and per-invocation costs are lower", "Managed Agents — it scales automatically to handle concurrent GitHub events", "Managed Agents — it includes a built-in GitHub webhook listener", "Agent SDK — the Managed Agents beta header makes it unsuitable for production webhooks" ]} correctIdx={0} explanation="For 15–30 second tasks triggered by webhooks, the Agent SDK on a serverless function (Lambda, Cloud Run) is the right call. Managed Agents charges $0.08/hr from session creation, meaning each short task costs the same as a task left running for an hour. The beta header caveat is real but not the primary reason — the cost and architecture fit is." />

<KnowledgeCheck question="You've created a Managed Agents session and opened the SSE stream. What event type signals that the agent has finished working and your application should stop listening?" options={["self-check"]} correctIdx={0} explanation="Self-check: The event type is session.status_idle. When you see this event in your stream loop, break out of the loop and optionally close the session with client.beta.sessions.update(session.id, status='completed'). Not breaking the loop means your stream stays open and the session continues accruing runtime cost." />

Fetching historical event data

One significant advantage of Managed Agents over the Agent SDK: event history is persisted server-side. If your stream disconnects mid-session, you don't lose the work. You can replay the full event log from the API:

```python # Reconnect and fetch the full history of what happened events = client.beta.sessions.events.list(session_id)

for event in events.data: print(f"{event.type}: {event}") ```

This is fundamentally different from the Agent SDK's JSONL session files, which live on your local filesystem. For Managed Agents, the source of truth is Anthropic's infrastructure, which means: - Network partitions don't corrupt the session - You can inspect completed sessions retroactively (e.g., for debugging or billing audit) - Multiple processes can query the same session's history

The trade-off is that you're locked into Anthropic's event retention policy, not your own. Keep this in mind for compliance-sensitive workloads.

Multi-agent sessions (research preview)

The most ambitious Managed Agents capability is multiagent: running multiple coordinated agents as a single session. As of April 2026, this is in research preview and requires a separate access request at claude.com/form/claude-managed-agents.

The pattern is: one orchestrator agent breaks the task, one or more worker agents execute subtasks, results flow back to the orchestrator. Each agent runs in its own environment container. This is architecturally equivalent to what the Agent SDK's subagent feature provides in-process, but fully hosted.

If you're building workflows that require true parallelism (multiple agents running simultaneously rather than sequentially) and don't want to manage the orchestration infrastructure yourself, the multiagent research preview is worth requesting access to.

Rate limits in practice

The rate limits deserve more attention than the documentation gives them. At 300 create requests per minute for agents, environments, and sessions — shared across those three endpoints — a system that spins up one session per user request could easily hit this ceiling at modest traffic:

  • 300 requests / minute = 5 requests / second
  • A web app with 100 concurrent users each triggering one session: you're at 100 create RPM on session creation alone, leaving headroom for 200 agent/environment creates per minute

In practice, pre-create your agents and environments once and reuse their IDs. The agent and environment IDs are stable — you don't need to recreate them for each session. Only the session is per-task:

```python # Create once, store these IDs AGENT_ID = "agt_01XxXxxXx" # created once, reused forever ENVIRONMENT_ID = "env_01YyYyyYy" # created once, reused forever

With this pattern, you're only consuming session create capacity (300 RPM total, mostly sessions if agents and environments are already created), not burning agent and environment creates on every request.

What's next

In Chapter 3 you'll connect external tools to your agent via the Model Context Protocol. MCP is what turns a general-purpose Claude into a specialized agent that can query your database, interact with GitHub, and call internal APIs — all without you writing custom tool implementations. The connector has three transport modes, and choosing the wrong one for a given server is the most common setup mistake.

References

[1] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30 [2] Claude Managed Agents Quickstart — https://platform.claude.com/docs/en/managed-agents/quickstart · retrieved 2026-04-30 [3] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [4] Claude Managed Agents Community Guide — https://blog.laozhang.ai/en/posts/claude-managed-agents · retrieved 2026-04-30 [5] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30

Chapter 3 · 50 min

MCP connector: orchestrating multi-server agents

The Model Context Protocol (MCP) connector in the Claude Agent SDK is a built-in mechanism for attaching external tool servers — databases, APIs, browsers, and code execution environments — to an agent at runtime, using a standard open protocol that Anthropic co-developed with the broader AI ecosystem in 2024.

When Anthropic shipped the Agent SDK rename in April 2026, the MCP connector shipped with it as a first-class feature rather than a configuration hack. The connector supports three transport modes — stdio for local process servers, HTTP for stateless remote APIs, and SSE for streaming remote servers — and handles connection management, tool discovery, and error signaling automatically [1]. As of April 2026, the public MCP server registry lists hundreds of community servers for databases, SaaS tools, and developer infrastructure, though quality varies considerably.

> Prerequisites: Chapter 1 (Agent SDK installed, one successful query() call) > > Time: 50 minutes > > Learning objectives: By the end of this chapter you can wire three MCP servers of different transport types into a single agent, scope permissions correctly, and handle connection failures before the agent starts working.

Key facts

  1. MCP tools follow the naming pattern mcp__<server-name>__<tool-name> — e.g., the GitHub server named "github" with a list_issues tool becomes mcp__github__list_issues [1].
  2. MCP tools require explicit permission via allowedTools; permissionMode: "acceptEdits" does NOT auto-approve MCP tools [1].
  3. Three transport types: stdio (local processes), HTTP (stateless remote), SSE (streaming remote). A fourth option — SDK MCP servers — runs tools in-process as code [1].
  4. The default connection timeout for stdio servers is 60 seconds; servers that take longer to start fail silently unless you check the init system message [1].
  5. Tool search is enabled by default when many MCP tools are configured, withholding tool definitions from the context window and loading only what Claude needs per turn [1].
  6. OAuth2 credentials are handled manually: complete the OAuth flow in your application, then pass the access token via headers in the MCP server config [1].

The MCP naming convention

Understanding the naming pattern is the foundation for everything that follows. Given an mcpServers config entry with key "github", every tool that server exposes gets prefixed with mcp__github__. If the GitHub server exposes list_issues, search_issues, create_issue, and get_pull_request, their agent-visible names are:

`` mcp__github__list_issues mcp__github__search_issues mcp__github__create_issue mcp__github__get_pull_request ``

This prefix structure matters because it's what you put in allowedTools. The wildcard pattern mcp__github__* allows all tools from the github server. The explicit pattern mcp__github__list_issues allows only that one tool.

The three transport types

stdio — local process servers

stdio is the most common transport for development and for community-published servers on npm or PyPI. The SDK spawns a child process and communicates over stdin/stdout.

```python from claude_agent_sdk import query, ClaudeAgentOptions

options = ClaudeAgentOptions( mcp_servers={ "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": {"GITHUB_TOKEN": "ghp_xxxxxxxxxxxx"}, } }, allowed_tools=["mcp__github__list_issues", "mcp__github__search_issues"], )

async for message in query( prompt="List the 5 most recent open issues in anthropics/claude-code", options=options, ): if hasattr(message, "result"): print(message.result) ```

```typescript import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({ prompt: "List the 5 most recent open issues in anthropics/claude-code", options: { mcpServers: { github: { command: "npx", args: ["-y", "@modelcontextprotocol/server-github"], env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN } } }, allowedTools: ["mcp__github__list_issues", "mcp__github__search_issues"] } })) { if (message.type === "result" && message.subtype === "success") { console.log(message.result); } } ```

HTTP — stateless remote servers

Use HTTP for cloud-hosted servers that expose a standard MCP endpoint. No child process, no local installation required:

``python options = ClaudeAgentOptions( mcp_servers={ "claude-code-docs": { "type": "http", "url": "https://code.claude.com/docs/mcp", } }, allowed_tools=["mcp__claude-code-docs__*"], ) ``

``typescript options = { mcpServers: { "remote-api": { type: "http", url: "https://api.yourcompany.com/mcp", headers: { Authorization: Bearer ${process.env.API_TOKEN} } } }, allowedTools: ["mcp__remote-api__*"] } ``

SSE — streaming remote servers

SSE is the right transport when the remote server needs to push events as it processes (e.g., long-running queries, real-time data feeds):

``python options = ClaudeAgentOptions( mcp_servers={ "analytics-stream": { "type": "sse", "url": "https://analytics.yourcompany.com/mcp/sse", "headers": {"Authorization": f"Bearer {os.environ['ANALYTICS_TOKEN']}"}, } }, allowed_tools=["mcp__analytics-stream__*"], ) ``

The SDK transparently handles SSE reconnection — you don't need to manage the event stream yourself.

Orchestrating three servers in one agent

This is where the real power emerges. You can configure multiple servers with different transport types in a single mcpServers dict. The agent uses whichever tools it needs based on the task:

```python import asyncio import os from claude_agent_sdk import ( query, ClaudeAgentOptions, SystemMessage, ResultMessage, AssistantMessage )

async def investigate_issue(issue_ref: str, db_connection: str): """Pull a GitHub issue, query related DB records, write a summary.""" options = ClaudeAgentOptions( mcp_servers={ # stdio: GitHub MCP server "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": {"GITHUB_TOKEN": os.environ["GITHUB_TOKEN"]}, }, # stdio: Postgres MCP server "postgres": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres", db_connection], }, # HTTP: Cloud docs server "docs": { "type": "http", "url": "https://code.claude.com/docs/mcp", }, }, allowed_tools=[ "mcp__github__get_issue", "mcp__github__list_comments", "mcp__postgres__query", # read-only "mcp__docs__*", # all doc tools ], )

prompt = ( f"1. Fetch the GitHub issue at {issue_ref}. " "2. Query the postgres DB for any records mentioning the issue number. " "3. Look up relevant documentation from the docs server. " "4. Write a one-paragraph summary of what the issue is about and whether the DB has related data." )

async for message in query(prompt=prompt, options=options): # Verify all three servers connected on the first message if isinstance(message, SystemMessage) and message.subtype == "init": servers = message.data.get("mcp_servers", []) for server in servers: status = server.get("status") name = server.get("name") if status != "connected": print(f"WARNING: {name} failed to connect — {server}") # Show which MCP tools are being called if isinstance(message, AssistantMessage): for block in message.content: if hasattr(block, "name") and block.name.startswith("mcp__"): print(f"[MCP call: {block.name}]") if isinstance(message, ResultMessage) and message.subtype == "success": print(message.result)

asyncio.run(investigate_issue( issue_ref="anthropics/claude-code#1234", db_connection=os.environ["DATABASE_URL"], )) ```

Why permissionMode: "acceptEdits" is not enough

This is the most common production mistake with MCP. The Agent SDK has three permission modes:

| Mode | What it auto-approves | Auto-approves MCP? | |---|---|---| | default | Nothing — every tool call prompts for approval | No | | acceptEdits | File edit and filesystem Bash commands | No | | bypassPermissions | Everything including MCP | Yes (but dangerous) |

acceptEdits is useful for coding agents that need to read and write files without prompting. But it explicitly does not cover MCP tools. If you set acceptEdits and rely on it to green-light your GitHub server, the agent will see the tools but refuse to call them.

The correct pattern is allowedTools with explicit grants:

```python # WRONG — permissionMode doesn't cover MCP options = ClaudeAgentOptions( permission_mode="acceptEdits", mcp_servers={"github": github_config}, )

Using bypassPermissions to work around this is not the answer — it disables every safety check in the SDK, including approval prompts for destructive Bash operations.

Detecting connection failures

MCP servers fail silently if you don't check for them. The SystemMessage with subtype init arrives before the agent does any work. It includes a mcp_servers list where each entry has a status field:

``python async for message in query(prompt=..., options=options): if isinstance(message, SystemMessage) and message.subtype == "init": failed = [ s for s in message.data.get("mcp_servers", []) if s.get("status") != "connected" ] if failed: # Abort or handle gracefully before the agent wastes tokens raise RuntimeError(f"MCP servers failed to connect: {failed}") ``

``typescript for await (const message of query({ prompt, options })) { if (message.type === "system" && message.subtype === "init") { const failed = message.mcp_servers.filter(s => s.status !== "connected"); if (failed.length > 0) { throw new Error(MCP servers failed: ${JSON.stringify(failed)}); } } } ``

Common failure causes by transport:

  • stdio: npx not on PATH, package not published, missing env vars
  • HTTP: URL unreachable, invalid SSL certificate, wrong endpoint path
  • SSE: CORS headers missing on the server, auth token expired

The default connection timeout for stdio servers is 60 seconds. If your server process takes longer than that to respond to its first handshake, it fails. Pre-warm slow servers before starting a query.

Project-level config with .mcp.json

For projects where the same servers are always needed, put them in .mcp.json at the project root. The SDK loads this file automatically when project is in settingSources (the default):

``json { "mcpServers": { "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": { "GITHUB_TOKEN": "${GITHUB_TOKEN}" } }, "postgres": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres", "${DATABASE_URL}"] } } } ``

The ${VAR} syntax expands environment variables at load time. This keeps credentials out of your code while making the MCP config declarative and version-controllable.

▶ Try this · claude-sonnet-4-6

I've configured an MCP server named 'github' with the @modelcontextprotocol/server-github package. What is the full tool name I should put in allowedTools to allow only the list_issues tool from this …

Show expected output
The correct value is `mcp__github__list_issues`. Claude explains the naming pattern: prefix `mcp__`, then the server name as it appears in the mcpServers key, then `__`, then the tool name. A wildcard to allow all GitHub tools would be `mcp__github__*`.

Tool search for large tool sets

When you configure many MCP servers simultaneously, their tool definitions can fill a significant portion of the context window. The SDK's tool search feature addresses this: it withholds tool definitions from context and loads only the ones Claude needs for each turn, based on a vector similarity search over the tool names and descriptions.

Tool search is enabled by default. You can verify it's active by checking whether long tool definition lists appear in your debug output. If you need to disable it for a specific server (e.g., a server with tools that always need to be in context), configure it in the mcpServers entry per the tool search docs.

OAuth2 authentication

For servers that require OAuth 2.1, the SDK doesn't handle the OAuth flow — that's your application's job. After you complete the flow and receive an access token, pass it as a header:

```python access_token = await your_oauth_flow() # your app handles PKCE/redirect

options = ClaudeAgentOptions( mcp_servers={ "oauth-service": { "type": "http", "url": "https://your-service.com/mcp", "headers": {"Authorization": f"Bearer {access_token}"}, } }, allowed_tools=["mcp__oauth-service__*"], ) ```

Refresh token handling is also your responsibility. Wire token refresh into your session initialization code, not into the agent loop.

▶ Try this · claude-sonnet-4-6

I see this in my init message: `{'name': 'postgres', 'status': 'failed', 'error': 'connection timeout'}`. What are the three most likely causes and how do I debug each one?

Show expected output
Claude explains: (1) npx not installed or @modelcontextprotocol/server-postgres package missing — fix: run `npx @modelcontextprotocol/server-postgres --version` manually; (2) DATABASE_URL env var not set or malformed — fix: echo the variable and test with psql; (3) server process takes >60s to start (large package install, slow network) — fix: pre-install the package globally with `npm install -g @modelcontextprotocol/server-postgres` to eliminate startup time.

Hands-on exercise

Wire a GitHub MCP server + a Postgres MCP server + the Claude Code docs HTTP server into one agent.

Setup: 1. Install @modelcontextprotocol/server-github and @modelcontextprotocol/server-postgres via npx (they auto-install on first use) 2. Set GITHUB_TOKEN to a GitHub personal access token with repo:read scope 3. Set DATABASE_URL to a local Postgres instance (e.g., postgresql://localhost/testdb) — even a fresh empty DB works

Task prompt: `` 1. Get the README from the anthropics/claude-code repository on GitHub. 2. Search the postgres database for any table named 'issues' — if it doesn't exist, say so. 3. Look up what 'hooks' are in the Agent SDK using the docs MCP server. 4. Write a three-sentence summary combining what you found. ``

Verification: - The init message shows all three servers with status: "connected" - You see at least two different mcp__* tool calls in the output (one GitHub, one docs at minimum) - The summary references Claude Code and hooks with specific details from the docs

Estimated time: 25 minutes

<KnowledgeCheck question="Your agent is configured with permissionMode: 'acceptEdits' and an MCP server named db. You've added the server to mcpServers but NOT listed any MCP tools in allowedTools. What happens when Claude tries to call mcp__db__query?" options={[ "The tool call is blocked — MCP tools require explicit allowedTools grants regardless of permissionMode", "The tool call succeeds — acceptEdits covers all tool types including MCP", "The tool call prompts the user for approval", "The tool call succeeds but only for read operations" ]} correctIdx={0} explanation="MCP tools require explicit allowedTools grants. permissionMode: 'acceptEdits' covers only file edits and filesystem Bash commands — it does not extend to MCP servers. To allow all tools from the db server, add mcp__db__* to allowedTools. The only permission mode that auto-approves MCP is bypassPermissions, which also disables all other safety checks." />

<KnowledgeCheck question="You're building an agent that uses four MCP servers with a combined total of 200 tools. You notice that context window usage is high even before the agent has called any tools. What feature should you check and what does it do?" options={["self-check"]} correctIdx={0} explanation="Self-check: Tool search. When enabled (the default), the SDK withholds all MCP tool definitions from the context window and loads only the tools relevant to each turn using vector similarity search over tool names and descriptions. If tool search is disabled or misconfigured, all 200 tool definitions appear in context on every turn. Verify it's enabled by checking your agent SDK configuration per the tool search docs at code.claude.com/docs/en/agent-sdk/tool-search." />

What's next

In Chapter 4 you'll complete the agent's IO surface with the Files API and code execution tool. The Files API lets you upload a document once and reference it across multiple Messages calls — but the billing model is counterintuitive. The code execution tool gives your agent a Python sandbox for computation and chart generation, and the output files feed directly back into the Files API for download. Together they form the document and data layer that most production agents need.

References

[1] Agent SDK MCP Connector — https://code.claude.com/docs/en/agent-sdk/mcp · retrieved 2026-04-30 [2] Model Context Protocol specification — https://modelcontextprotocol.io/docs/getting-started/intro · retrieved 2026-04-30 [3] MCP server registry — https://github.com/modelcontextprotocol/servers · retrieved 2026-04-30 [4] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [5] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [6] MCP OAuth 2.1 specification — https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization · retrieved 2026-04-30

Chapter 4 · 45 min

Files API + code execution: the complete agent IO surface

The Anthropic Files API is a beta document-storage layer that allows developers to upload a file once — up to 500 MB — receive a persistent file_id, and reference that ID across multiple Messages requests without re-transmitting the file content each time, launched alongside the MCP connector and code execution tool in May 2025.

The Files API solves a real problem. Without it, a 20-page PDF costs you full bandwidth and ingestion time on every API call that needs it. With it, you upload once and pay that cost once. But the "upload once, use many times" pitch hides a billing nuance that matters at scale: you still pay full input tokens every time you include a file in a Messages request. The savings are in bandwidth and latency, not token cost [1]. This chapter covers the complete IO surface — Files API for document persistence, code execution for computation, and the intersection of both.

> Prerequisites: Chapter 1 (Anthropic API key configured) > > Time: 45 minutes > > Learning objectives: By the end of this chapter you can upload files, reference them by file_id, select the correct content block type, use the code execution tool to generate artifacts, and download output files.

Key facts

  1. The Files API beta header is files-api-2025-04-14 — required on every request [1].
  2. Maximum file size: 500 MB per file; total workspace storage: 500 GB per organization [1].
  3. File storage operations (upload, download, list, retrieve, delete) are free; file content is billed as input tokens when referenced in a Messages request [1].
  4. Code execution pricing: 50 free hours per day, then $0.05 per hour; announced at the May 2025 agent capabilities launch [2].
  5. Files uploaded via the Files API are not eligible for Zero Data Retention (ZDR) — they are retained until explicitly deleted [1].
  6. The Files API is not available on Amazon Bedrock or Google Vertex AI — Anthropic-direct API only [1].
  7. You can only download files created by skills or the code execution tool — not files you uploaded yourself [1].

Content block types by file format

The Files API supports different file types that map to different content block types in the Messages API. Getting this wrong is the most common integration mistake:

| File type | MIME type | Content block | Use case | |---|---|---|---| | PDF | application/pdf | document | Document analysis, citations | | Plain text | text/plain | document | Logs, markdown, config files | | JPEG, PNG, GIF, WebP | image/* | image | Visual analysis, screenshots | | CSV, datasets, binaries | varies | container_upload | Code execution, data analysis |

For file types not in this table (.docx, .xlsx, .md), the recommended approach is conversion: convert to plain text or PDF first, then upload.

Uploading files

Install the Anthropic SDK (not the Agent SDK):

``python pip install anthropic ``

Upload a PDF and an image:

```python from anthropic import Anthropic

client = Anthropic()

```typescript import Anthropic, { toFile } from "@anthropic-ai/sdk"; import fs from "fs";

const anthropic = new Anthropic();

// Upload a PDF const pdfFile = await anthropic.beta.files.upload({ file: await toFile( fs.createReadStream("quarterly_report.pdf"), undefined, { type: "application/pdf" } ), }); console.log(PDF file_id: ${pdfFile.id}); ```

The returned file_id is permanent until you delete it. Store it in your database alongside the document metadata.

Referencing files in Messages calls

Once uploaded, reference the file_id using the appropriate content block type. You don't need the file's bytes — just the ID:

```python # Three queries against the same PDF — only one upload needed questions = [ "What were the total revenues in Q1?", "List the top 3 risk factors mentioned in this report.", "What is management's outlook for Q2?", ]

for question in questions: response = client.beta.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[{ "role": "user", "content": [ {"type": "text", "text": question}, { "type": "document", "source": { "type": "file", "file_id": pdf_file.id, }, "citations": {"enabled": True}, # request inline citations }, ], }], betas=["files-api-2025-04-14"], ) print(f"\n{question}") print(response.content[0].text) ```

For images, use the image content block type:

``python response = client.beta.messages.create( model="claude-opus-4-7", max_tokens=512, messages=[{ "role": "user", "content": [ {"type": "text", "text": "Describe what this chart shows."}, { "type": "image", "source": { "type": "file", "file_id": image_file.id, }, }, ], }], betas=["files-api-2025-04-14"], ) ``

The billing reality

The "upload once" pitch is accurate for bandwidth. Here's the complete billing picture:

Free operations: - POST /v1/files (upload) - GET /v1/files (list) - GET /v1/files/{id} (metadata) - DELETE /v1/files/{id} (delete) - GET /v1/files/{id}/content (download)

Billed as input tokens: - Every time a file_id is included in a Messages request, the file content is counted as input tokens

Billed as compute time: - Code execution: 50 free hours/day, then $0.05/hr

The implications for a document-heavy agent: - Uploading a 5 MB PDF once: free - Referencing that PDF in 100 Messages calls: 100× the input token cost of that document - The upload saves you 100 round trips of bandwidth, but you still pay tokens each time

For agents that run many queries against the same document in a single session, consider using extended prompt caching (1-hour TTL) to reduce the per-call token cost after the first invocation.

Code execution with the Files API

The code execution tool gives Claude a sandboxed Python environment. You can pass files to it via container_upload blocks, run code, and download output files via the Files API:

```python # Upload a dataset for code execution with open("sales_data.csv", "rb") as f: dataset = client.beta.files.upload( file=("sales_data.csv", f, "text/plain"), )

Now download the generated chart:

``python # Download the generated chart chart_content = client.beta.files.download(output_file_id) chart_content.write_to_file("monthly_totals.png") print("Chart downloaded to monthly_totals.png") ``

▶ Try this · claude-opus-4-7

I have a CSV with columns: month, product, revenue. Using the code execution tool, compute the top 3 products by total revenue and create a horizontal bar chart. Return the file_id of the saved PNG.

Show expected output
Claude writes Python code using pandas and matplotlib. The code reads the CSV from the container, computes `.groupby('product')['revenue'].sum().nlargest(3)`, generates a horizontal bar chart with `plt.barh()`, saves it as `top_products.png`. The tool_result block includes a `file_id` for the output PNG that can be passed to `client.beta.files.download()`.

File lifecycle management

Files persist until you explicitly delete them. For production agents, you need a retention policy:

```python import datetime

def cleanup_old_files(client: Anthropic, max_age_days: int = 30): """Delete files older than max_age_days.""" files = client.beta.files.list() cutoff = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=max_age_days) deleted = 0 for file in files.data: created = datetime.datetime.fromisoformat(file.created_at) if created < cutoff: client.beta.files.delete(file.id) deleted += 1 return deleted ```

Extended prompt caching with Files API

When you use the same file across many Messages calls in a short window, extended prompt caching can significantly reduce costs. The standard cache TTL is 5 minutes; an optional 1-hour TTL is available:

``python response = client.beta.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[{ "role": "user", "content": [ {"type": "text", "text": "What are the payment terms?"}, { "type": "document", "source": {"type": "file", "file_id": pdf_file.id}, "cache_control": {"type": "ephemeral"}, # cache this document }, ], }], betas=["files-api-2025-04-14", "prompt-caching-2024-07-31"], ) ``

With caching enabled, the first call to include a given file_id pays full input token cost. Subsequent calls within the TTL window pay cache read tokens — approximately 10% of the full input price for standard caching. For a 100-page PDF queried 50 times in one session, this can reduce document token costs by 85–90%.

The cache is keyed on the exact content. If the file's content changes (you re-upload), the cache key changes and you pay full tokens again. This is expected behavior: the cache reflects the actual bytes.

Managing files at scale

For production agents that ingest documents regularly, you need patterns beyond "upload once." A document ingestion pipeline typically has three stages:

Stage 1 — upload and register: ``python def ingest_document(file_path: str, metadata: dict) -> str: """Upload file, return file_id. Store mapping in your DB.""" with open(file_path, "rb") as f: mime = "application/pdf" if file_path.endswith(".pdf") else "text/plain" uploaded = client.beta.files.upload( file=(os.path.basename(file_path), f, mime), ) # Store in your DB: document_id → file_id mapping db.insert("documents", { "document_id": metadata["id"], "file_id": uploaded.id, "uploaded_at": datetime.utcnow().isoformat(), "filename": os.path.basename(file_path), }) return uploaded.id ``

Stage 2 — use by ID: ``python def query_document(document_id: str, question: str) -> str: """Look up file_id from DB, query without re-uploading.""" row = db.find("documents", {"document_id": document_id}) file_id = row["file_id"] response = client.beta.messages.create( model="claude-opus-4-7", max_tokens=1024, messages=[{ "role": "user", "content": [ {"type": "text", "text": question}, {"type": "document", "source": {"type": "file", "file_id": file_id}}, ], }], betas=["files-api-2025-04-14"], ) return response.content[0].text ``

Stage 3 — clean up stale files: ``python def sync_file_storage(max_age_days: int = 90): """Delete Files API objects for documents removed from DB.""" all_files = {f.id for f in client.beta.files.list().data} active_ids = set(db.select_column("documents", "file_id")) stale = all_files - active_ids for file_id in stale: client.beta.files.delete(file_id) return len(stale) ``

The 500 GB per-organization limit seems generous until you have thousands of PDFs. Build the cleanup stage from day one.

What the Files API does NOT support

Knowing the limits prevents surprises:

  • Not available on Bedrock or Vertex AI: If your organization uses Claude through AWS or GCP, the Files API is not available. You'll need to pass file content inline in every request.
  • Downloaded files only from code execution / skills: You cannot download a file you uploaded. The download endpoint works only for files that were created as outputs by the code execution tool or skills.
  • No ZDR: If your organization has Zero Data Retention enabled, the Files API is ineligible. Files are stored until explicitly deleted regardless of ZDR settings.
  • Not an immutable store: Files can be deleted by any API key in your workspace. There's no access control within a workspace.
▶ Try this · claude-opus-4-7

I uploaded a PDF contract using the Files API. I now want to ask three questions about it: (1) what are the payment terms, (2) what are the termination conditions, and (3) who are the parties. Walk me…

Show expected output
Claude explains: upload the PDF once (free), then make three separate Messages API calls each referencing the same file_id. To minimize token cost, enable the 1-hour extended prompt caching TTL so the PDF tokens are cached after the first call — the second and third calls pay only cache read tokens (much cheaper) rather than full input tokens. Include citations: {enabled: true} to get inline references to specific clauses.

Hands-on exercise

Build a document-analysis agent that uploads a PDF once and runs three analytical queries with a downloaded chart.

Setup: Use any PDF you have — a research paper, a product manual, or a public SEC filing work well.

Steps: 1. Upload the PDF to the Files API, store the returned file_id 2. Run Query 1: "What is the main topic of this document? Summarize in 3 sentences." 3. Run Query 2: "List all named organizations, companies, or institutions mentioned." 4. Run Query 3: "What are the 5 most important numbers or statistics cited?" — with citations: {enabled: true} 5. After Query 3, verify all three calls used the same file_id (no re-upload) 6. Bonus: Add a code execution call that takes the organizations from Query 2 as a CSV and creates a simple word-frequency bar chart, then download the output PNG

Verification: - The file_id in all three Messages requests is identical - Query 3 response includes inline citations with page or section references - (Bonus) You have a PNG file on your filesystem with a chart

Estimated time: 20 minutes (30 minutes with bonus)

<KnowledgeCheck question="A developer uploads a 2 MB PDF to the Files API and then uses the same file_id in 50 separate Messages API calls over one month. Which costs does she pay?" options={[ "Zero for the upload; input tokens for each of the 50 Messages calls", "Zero for everything — Files API uploads and reads are free", "A one-time upload fee + zero for the 50 calls", "Input tokens once for the upload; zero for subsequent calls (cached)" ]} correctIdx={0} explanation="File operations (upload, download, list, delete) are free. However, each of the 50 Messages API calls that reference the file_id charges the PDF's content as input tokens — same as if she'd sent the bytes inline. The savings are bandwidth (no 2 MB per request) and latency. To reduce the per-call token cost, enable extended prompt caching so calls 2–50 pay cache read rates instead of full input rates." />

<KnowledgeCheck question="You want to download a PNG chart that Claude generated during a code execution call. Describe the correct sequence of API calls, including the beta headers needed." options={["self-check"]} correctIdx={0} explanation="Self-check: (1) Make a Messages API call with the code_execution tool enabled and the beta header files-api-2025-04-14. (2) In the response, find the tool_result block that contains a file_id for the generated output. (3) Call GET /v1/files/{file_id}/content with the header anthropic-beta: files-api-2025-04-14 to download the PNG bytes. Note: you can only download files that were CREATED by code execution or skills — not files you uploaded yourself." />

What's next

In Chapter 5 you'll harden everything built so far into production-ready agents. The focus shifts from capabilities to operations: structured logging with hooks, cost circuit breakers that stop runaway sessions, and the deployment checklist that prevents the most common production failures. The biggest surprise for most teams: model hallucination isn't the primary failure mode — it's uncontrolled token spend.

References

[1] Files API — https://platform.claude.com/docs/en/build-with-claude/files · retrieved 2026-04-30 [2] Agent Capabilities API announcement — https://claude.com/blog/agent-capabilities-api · retrieved 2026-04-30 [3] Claude Managed Agents Tools — https://platform.claude.com/docs/en/managed-agents/tools · retrieved 2026-04-30 [4] Code Execution Tool — https://platform.claude.com/docs/en/agents-and-tools/tool-use/code-execution-tool · retrieved 2026-04-30 [5] Anthropic Data Retention — https://platform.claude.com/docs/en/build-with-claude/api-and-data-retention · retrieved 2026-04-30

Chapter 5 · 45 min

Production: deploy + observability + cost controls

The Claude Agent SDK's hook system is a lifecycle callback framework — inspired by HTTP middleware — that lets you attach arbitrary Python or TypeScript functions to key agent events (PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd, UserPromptSubmit) to implement audit logging, cost enforcement, and prompt sanitization without modifying the agent's core logic.

Most teams discover the need for this the hard way: an agent that works perfectly in development starts generating surprise API bills in production, or silently modifies files it shouldn't touch, or loops on a subtask for 40 minutes. The Agent SDK includes a hook system specifically for these scenarios [1]. The biggest production failure mode is not model hallucination — it's cost runaway. This chapter gives you the four hooks you need before any agent goes live, and the deployment checklist that ties everything together.

> Prerequisites: Chapters 1–4 > > Time: 45 minutes > > Learning objectives: By the end of this chapter you have a production hook stack, structured logging, a cost circuit breaker, and a deployment checklist you can apply to any new agent.

Key facts

  1. The Agent SDK supports seven hook event types: PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd, UserPromptSubmit, and additional events loaded from .claude/settings.json [1].
  2. permissionMode: "bypassPermissions" disables ALL safety checks, including file-edit confirmations and destructive Bash command prompts — not just MCP [1].
  3. Session JSONL files are written to ~/.claude/sessions/ by default and can be redirected with the CLAUDE_SESSIONS_DIR environment variable [1].
  4. Setting sources load in order: global (~/.claude/), then project (.claude/), then inline options. Inline options override everything [1].
  5. The correct alternative to bypassPermissions for MCP is allowedTools wildcards; for file edits is permissionMode: "acceptEdits" — combine them [1].
  6. Hooks receive the full tool input and can return a modified input, an error to block the call, or an empty dict to pass through unchanged [1].

The hook system

Hooks are callback functions attached to the agent lifecycle. They run synchronously in your process before or after every tool call. The SDK provides HookMatcher for filtering by tool name using regex:

```python from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher

async def my_hook(input_data: dict, tool_use_id: str, context: dict) -> dict: # Return {} to pass through, or raise to block return {}

options = ClaudeAgentOptions( hooks={ "PostToolUse": [ HookMatcher(matcher="Edit|Write", hooks=[my_hook]) ] } ) ```

The matcher is a Python regex. "Edit|Write" matches any tool whose name contains "Edit" or "Write". Use ".*" to match everything.

Hook 1: Audit log (PostToolUse)

Every file modification should be logged with a timestamp, file path, and session ID. This hook runs after every successful Edit or Write call:

```python import asyncio import json import logging from datetime import datetime from claude_agent_sdk import query, ClaudeAgentOptions, HookMatcher

async def audit_file_change(input_data: dict, tool_use_id: str, context: dict) -> dict: tool_input = input_data.get("tool_input", {}) file_path = tool_input.get("file_path", tool_input.get("path", "unknown")) tool_name = input_data.get("tool_name", "unknown") log_entry = { "event": "file_modified", "timestamp": datetime.utcnow().isoformat() + "Z", "tool": tool_name, "file_path": file_path, "session_id": context.get("session_id", "unknown"), "tool_use_id": tool_use_id, } logger.info(json.dumps(log_entry)) return {} # pass through — don't block

options = ClaudeAgentOptions( allowed_tools=["Read", "Write", "Edit", "Bash", "Glob", "Grep"], hooks={ "PostToolUse": [ HookMatcher(matcher="Edit|Write", hooks=[audit_file_change]) ] } ) ```

Sample audit output: ``json {"event": "file_modified", "timestamp": "2026-04-30T10:23:44Z", "tool": "Edit", "file_path": "src/auth.py", "session_id": "sess_01XxXxxXx", "tool_use_id": "toolu_01Abc123"} ``

Hook 2: Cost circuit breaker (Stop)

The Stop hook fires when the agent's stop_reason indicates it has finished — or when you want to force an early stop. Use PreToolUse with a token counter for a real circuit breaker:

```python class CostCircuitBreaker: """Track estimated token cost and abort if threshold is exceeded.""" def __init__(self, max_input_tokens: int = 500_000): self.max_input_tokens = max_input_tokens self.total_input_tokens = 0 async def check_cost(self, input_data: dict, tool_use_id: str, context: dict) -> dict: # Accumulate token usage from context (populated by the SDK) usage = context.get("cumulative_usage", {}) self.total_input_tokens = usage.get("input_tokens", self.total_input_tokens) if self.total_input_tokens > self.max_input_tokens: raise RuntimeError( f"Circuit breaker triggered: {self.total_input_tokens:,} input tokens " f"exceeds cap of {self.max_input_tokens:,}. Session terminated." ) return {}

circuit_breaker = CostCircuitBreaker(max_input_tokens=500_000)

options = ClaudeAgentOptions( allowed_tools=["Read", "Write", "Edit", "Bash", "Glob", "Grep"], hooks={ "PostToolUse": [ HookMatcher(matcher=".*", hooks=[circuit_breaker.check_cost]) ] } ) ```

When circuit_breaker.check_cost raises, the query() generator raises the exception to your code and the agent stops. The session JSONL is preserved, so you can inspect exactly what happened.

<Callout type="hot"> Do NOT raise exceptions silently inside hooks — always log before raising. When a hook exception terminates a session in production, you need the context to diagnose it. Log the full input_data, tool_use_id, and the reason for termination before re-raising. </Callout>

Hook 3: Session initialization (SessionStart)

Use SessionStart to inject session metadata into your observability system as soon as the session opens:

```python async def session_start(input_data: dict, tool_use_id: str, context: dict) -> dict: session_id = context.get("session_id", "unknown") # Emit a structured start event for Langfuse or your logging backend start_event = { "event": "session_started", "timestamp": datetime.utcnow().isoformat() + "Z", "session_id": session_id, "agent_version": context.get("agent_version", "unknown"), "environment": os.environ.get("DEPLOY_ENV", "development"), } logger.info(json.dumps(start_event)) return {}

options = ClaudeAgentOptions( hooks={ "SessionStart": [ HookMatcher(matcher=".", hooks=[session_start]) ], "PostToolUse": [ HookMatcher(matcher="Edit|Write", hooks=[audit_file_change]), HookMatcher(matcher=".", hooks=[circuit_breaker.check_cost]), ] } ) ```

Hook 4: Prompt sanitization (UserPromptSubmit)

UserPromptSubmit fires when a user message is submitted to the agent. Use it to strip PII or dangerous patterns before they reach the model:

```python import re

PHONE_RE = re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b') SSN_RE = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')

async def sanitize_prompt(input_data: dict, tool_use_id: str, context: dict) -> dict: prompt = input_data.get("prompt", "") # Redact phone numbers and SSNs cleaned = PHONE_RE.sub("[PHONE_REDACTED]", prompt) cleaned = SSN_RE.sub("[SSN_REDACTED]", cleaned) if cleaned != prompt: logger.warning(json.dumps({ "event": "pii_redacted", "session_id": context.get("session_id"), "patterns_found": ["phone" if PHONE_RE.search(prompt) else None, "ssn" if SSN_RE.search(prompt) else None] })) # Return modified input_data with cleaned prompt return {**input_data, "prompt": cleaned}

options = ClaudeAgentOptions( hooks={ "UserPromptSubmit": [ HookMatcher(matcher=".*", hooks=[sanitize_prompt]) ], # ... other hooks } ) ```

The complete production hook stack

Put it all together into a factory function you can reuse across agents:

``python def production_options( allowed_tools: list[str], mcp_servers: dict = None, max_input_tokens: int = 500_000, permission_mode: str = "acceptEdits", ) -> ClaudeAgentOptions: cb = CostCircuitBreaker(max_input_tokens=max_input_tokens) return ClaudeAgentOptions( allowed_tools=allowed_tools, mcp_servers=mcp_servers or {}, permission_mode=permission_mode, hooks={ "SessionStart": [ HookMatcher(matcher=".*", hooks=[session_start]) ], "UserPromptSubmit": [ HookMatcher(matcher=".*", hooks=[sanitize_prompt]) ], "PostToolUse": [ HookMatcher(matcher="Edit|Write", hooks=[audit_file_change]), HookMatcher(matcher=".*", hooks=[cb.check_cost]), ], } ) ``

Usage:

``python # Apply to the MCP agent from Chapter 3 async for message in query( prompt="Investigate issue #1234 and write a summary", options=production_options( allowed_tools=["mcp__github__*", "mcp__postgres__query", "mcp__docs__*"], mcp_servers={ "github": github_config, "postgres": postgres_config, "docs": docs_config, }, max_input_tokens=1_000_000, # ~$3 on Opus 4.7 ), ): if hasattr(message, "result"): print(message.result) ``

▶ Try this · claude-sonnet-4-6

I'm running an agent with a PostToolUse hook that tracks cumulative input tokens. After 12 tool calls, cumulative_usage shows 480,000 input tokens against a cap of 500,000. The agent is about to call …

Show expected output
Claude explains: the circuit breaker runs after each tool call. After the first Edit (call 13), it checks cumulative input tokens — if the total has crossed 500,000 it raises RuntimeError, terminating the session immediately. If the first edit doesn't push over 500k, the second edit might. The key point: the breaker fires AFTER the tool call completes (PostToolUse), so the file edit that triggers the cap will have already been written to disk. To prevent the file write entirely, use a PreToolUse hook instead.

Langfuse integration for observability

Langfuse is the recommended observability backend for Koenig AI Academy's agent stack. Wire it into your SessionStart and PostToolUse hooks:

```python from langfuse import Langfuse

langfuse = Langfuse( public_key=os.environ["LANGFUSE_PUBLIC_KEY"], secret_key=os.environ["LANGFUSE_SECRET_KEY"], host=os.environ.get("LANGFUSE_HOST", "http://localhost:3100"), )

async def langfuse_session_start(input_data: dict, tool_use_id: str, context: dict) -> dict: session_id = context.get("session_id", "unknown") trace = langfuse.trace( id=session_id, name="agent_session", metadata={"environment": os.environ.get("DEPLOY_ENV", "dev")}, ) context["langfuse_trace"] = trace return {}

async def langfuse_tool_log(input_data: dict, tool_use_id: str, context: dict) -> dict: trace = context.get("langfuse_trace") if trace: trace.span( name=input_data.get("tool_name", "unknown_tool"), input=input_data.get("tool_input"), metadata={"tool_use_id": tool_use_id}, ) return {} ```

The five-step deployment checklist

Before any agent goes to production, verify all five:

1. Permissions are minimal

  • allowedTools lists only the specific tools the agent needs — no .* wildcards in production
  • permissionMode is acceptEdits or default — never bypassPermissions
  • MCP tools are scoped to specific tool names where possible (not mcp__github__* for agents that only need list_issues)

2. Cost controls are wired

  • A PostToolUse circuit breaker with a tested token cap
  • A session timeout mechanism (for Managed Agents: explicit session.update to "completed")
  • Langfuse (or equivalent) traces enabled with cost annotations

3. Audit logging is active

  • Every Edit and Write logged with file path + session ID + timestamp
  • Bash tool calls logged with the command (be careful with secrets in commands)
  • Logs are structured JSON, not raw print statements

4. Secrets are out of config

  • No API keys in mcpServers.env values — use environment variable references
  • No hardcoded tokens in headers — use os.environ["KEY"] or process.env.KEY
  • .mcp.json uses ${VAR} syntax, committed to version control

5. Session files have a retention policy

  • CLAUDE_SESSIONS_DIR points to a location with log rotation
  • JSONL files are not written to a disk that's part of user-facing data storage
  • For Managed Agents: sessions are marked completed when done, not left idle

Hands-on exercise

Harden an existing agent with the production hook stack and verify the circuit breaker.

Setup: Use the MCP-wired agent from Chapter 3, or any agent that makes multiple tool calls.

Steps: 1. Apply the production_options() factory function from this chapter to your agent 2. Set max_input_tokens=50_000 (intentionally low to trigger the circuit breaker) 3. Run the agent with a prompt that requires multiple tool calls: "Analyze every Python file in this directory and write a summary of each one's purpose" 4. Observe the circuit breaker trigger — the session should terminate before completing all files 5. Check your structured logs for the session_started, file_modified, and any pii_redacted entries

Verification: - The session terminates before processing all files - The terminal output shows the RuntimeError message from the circuit breaker - At least one {"event": "file_modified"} log entry exists (for any Write/Edit the agent made before tripping the breaker) - Raising max_input_tokens to 2_000_000 allows the full run to complete

Estimated time: 20 minutes

▶ Try this · claude-sonnet-4-6

I need to run a Claude Agent in a CI/CD pipeline where there's no human to approve tool calls. The agent reads test results, edits configuration files, and runs bash commands to restart services. What…

Show expected output
Claude recommends: use allowedTools with an explicit list (e.g. ['Read', 'Edit', 'Bash']) plus permissionMode: 'acceptEdits' — not bypassPermissions. This pre-approves file edits and Bash without disabling all safety checks. The agent can still be stopped by hooks. Risks to document: (1) Bash is allowed and can run destructive commands — scope the working directory; (2) Edit can overwrite production config — add a PostToolUse hook that logs every edit to a change log; (3) No human review means runaway loops go undetected — add a token circuit breaker.

<KnowledgeCheck question="A PostToolUse hook raises an exception when cumulative tokens exceed the cap. However, your team reports that the file edit that triggered the cap was already written to disk. What hook type should you use instead to prevent the write, and why?" options={[ "PreToolUse — it runs before the tool executes, allowing you to block the call before any filesystem change occurs", "PostToolUse with a file rollback — reverse the write after detecting the breach", "SessionEnd — it fires before any tool results are persisted", "Stop — it intercepts the agent's stop signal before cleanup" ]} correctIdx={0} explanation="PostToolUse runs after the tool has already executed — the file is already written. PreToolUse runs before execution, giving you the chance to raise an exception that blocks the tool call entirely. For a cost circuit breaker that needs to prevent writes (not just log them), move the cap check to PreToolUse. For pure logging and alerting, PostToolUse is fine." />

<KnowledgeCheck question="You're deploying an agent that uses the GitHub MCP server and needs to read and write files. List the minimum allowedTools and permissionMode configuration to avoid using bypassPermissions." options={["self-check"]} correctIdx={0} explanation="Self-check: Set permissionMode to 'acceptEdits' (covers file read/write without prompting). Add to allowedTools: ['Read', 'Write', 'Edit', 'Glob', 'Grep'] for filesystem operations, plus 'mcp__github__list_issues' (or whichever specific GitHub tools you need — not mcp__github__* unless you genuinely need all of them). This gives the agent exactly what it needs with no bypassPermissions blast radius." />

Monitoring with structured logging in production

Structured JSON logs let you query agent behavior with standard log tooling. Here's the complete logging setup used by the Koenig AI Academy agent pipeline:

```python import logging import json import os import sys

def setup_agent_logging(agent_name: str) -> logging.Logger: """Configure structured JSON logging for a production agent.""" handler = logging.StreamHandler(sys.stdout) handler.setFormatter(logging.Formatter('%(message)s')) logger = logging.getLogger(f"agent.{agent_name}") logger.setLevel(logging.INFO) logger.addHandler(handler) logger.propagate = False return logger

def log_tool_event(logger: logging.Logger, event: str, tool_name: str, session_id: str, extra: dict = None): """Emit a structured tool event.""" entry = { "event": event, "tool": tool_name, "session_id": session_id, "timestamp": datetime.utcnow().isoformat() + "Z", "service": os.environ.get("SERVICE_NAME", "agent"), "env": os.environ.get("DEPLOY_ENV", "development"), } if extra: entry.update(extra) logger.info(json.dumps(entry)) ```

In Langfuse, these log entries correlate to spans on a trace timeline. In Datadog or CloudWatch Logs Insights, they're filterable with JMESPath or structured queries. In any system, they give you:

  • Per-session cost breakdown (how many tool calls, which tools, which files modified)
  • Error rate by tool type (which MCP servers fail most often)
  • Session duration distribution (identify runaway sessions before the circuit breaker)
  • Token efficiency (input tokens per useful tool call result)

Deploying to production environments

The Agent SDK runs in your process — you can deploy it anywhere Python or Node.js runs. The key differences between deployment targets:

Lambda / Cloud Functions (short-lived): best for agents that complete in under 15 minutes. Package the SDK and your agent code together. Set CLAUDE_SESSIONS_DIR to /tmp (ephemeral, disappears after the function cold-starts). Session resume doesn't work across invocations unless you serialize the session ID to a database.

Long-running container (EC2, Cloud Run, K8s): best for agents with sessions that span multiple turns or that need to resume. Sessions persist on the container's disk. The risk: unbounded session file growth. Add a cron job inside the container that trims JSONL files older than 7 days.

Managed Agents (Anthropic-hosted): as covered in Chapter 2, the right choice for long-running async tasks where you don't want to manage the container. Event history persists server-side.

For all environments: ``bash # Required environment variables in production ANTHROPIC_API_KEY=sk-ant-... # or use Bedrock/Vertex credentials CLAUDE_SESSIONS_DIR=/var/agent/sessions # writable, with retention policy DEPLOY_ENV=production # for log filtering SERVICE_NAME=research-agent # for log correlation ``

The contrarian production advice: log before you optimize

Most teams' first instinct after deploying an agent is to optimize for cost — reduce token usage, tune the model size, add caching. The better first move is to log everything and let data drive optimization decisions.

Until you have structured logs for at least 100 real sessions, you don't know: - Which tool is called most often (and thus where caching would help most) - Which prompts consume the most tokens (and thus where prompt engineering ROI is highest) - What your actual p99 session cost is (different from the estimate you calculated before launch)

The production hook stack from this chapter gives you that data for free as a side effect of safe operations. Run for two weeks, then optimize from evidence.

What's next

You've now completed the full five-chapter arc. The capstone project ties it together: you'll build a production research agent that orchestrates GitHub + Postgres + a cloud docs MCP server, uses the Files API for document context, and runs behind the complete hook stack from this chapter. The capstone repo is described in the course outline.

The field is moving fast. Watch the Claude Agent SDK changelog and the Managed Agents release notes for breaking changes — both are updated on a rolling basis.

References

[1] Claude Agent SDK Overview — https://code.claude.com/docs/en/agent-sdk/overview · retrieved 2026-04-30 [2] Claude Managed Agents Overview — https://platform.claude.com/docs/en/managed-agents/overview · retrieved 2026-04-30 [3] Agent SDK Hooks — https://code.claude.com/docs/en/agent-sdk/hooks · retrieved 2026-04-30 [4] Claude Agent SDK Permissions — https://code.claude.com/docs/en/agent-sdk/permissions · retrieved 2026-04-30 [5] Files API — https://platform.claude.com/docs/en/build-with-claude/files · retrieved 2026-04-30 [6] Langfuse Observability — https://langfuse.com · retrieved 2026-04-30