All courses 540 min10 chaptersBuilderanthropic

Claude Tool Use from Zero: From Basics to Production Connectors

Developers who want to master Claude's tool use capabilities, from simple function calling to building robust specialized MCP servers.

What you'll learn
  • Understand and implement Claude's native tool use
  • Build, test, and deploy compliant MCP servers
  • Design secure, observable tool connectors for real-world domains
  • Debug complex tool interaction and authorization issues
  • Implement structured logging and audit trails for tool operations
Chapters in this course
Introduction to Claude's Tool Use audio slides40m
Beyond Function Calling: Understanding MCP audio slides50m
Building Your First MCP Server audio slides60m
Handling Advanced Data and Resources audio slides60m
Observability and Logging in MCP audio slides60m
Security and Authentication audio slides60m
Creative Connectors audio slides50m
Legal and Regulatory Connectors in MCP audio slides60m
SMB and Growth Connectors audio slides60m
Claude Code Dynamic Workflows: Fan-Out, Checkpoint, and Verify (2026) audio slides65m
Chapter 1 · 40 min

Introduction to Claude's Tool Use

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

This chapter gets you from "Claude can answer questions" to "Claude can decide when to call a structured function, receive the result, and continue the job." That is the first practical step toward real connectors.

Claude tool use is not magic plugin installation. In the Anthropic Messages API, you describe tools with names, descriptions, and JSON input schemas. When Claude decides a tool is needed, the response contains a tool_use content block and the API response stop reason is tool_use.[1] Your application runs the actual code, then sends a follow-up message containing a tool_result. Claude never reaches into your runtime by itself; your host application remains the executor and policy boundary.

That boundary matters. If your tool fetches a stock price, deletes a file, sends an invoice, or queries a customer database, Claude only proposes the call. Your software decides whether the call is valid, authorized, observable, and safe.

Prerequisites check

Before you start, verify that you can do three things:

  1. Read and write a small Python or TypeScript script.
  2. Store an API key in an environment variable rather than hard-coding it.
  3. Understand JSON objects, required fields, and string/number types.

If those are shaky, finish a basic API-client tutorial first. Tool use adds a multi-step protocol on top of ordinary API calls; it is not a good place to learn HTTP from scratch.

The mental model: model chooses, host executes

A tool-use exchange has four parts:

  1. You send Claude a user request plus a list of available tools.
  2. Claude returns either normal text or a tool_use block.
  3. Your host validates the tool name and input, executes local code, and returns a tool_result.
  4. Claude uses that result to produce the final answer or request another tool.
- Claude only proposes a tool call; your host application is always the executor and the security boundary.
- The stop reason `tool_use` signals that Claude is requesting a tool rather than producing a final answer.
- The four-step loop (request → tool_use block → tool_result → final answer) is the stable control flow regardless of SDK version.

The tool description is part of the model's context. It should say what the tool does, when to use it, and what each field means. The input schema is the contract your code can validate before execution. Anthropic recommends precise tool definitions because ambiguous descriptions make tool selection less reliable.[1]

Here is the smallest useful stock-price tool. It uses a fake in-memory price table so the first exercise is reproducible without a paid market-data provider.

Try this · claude-sonnet-4-6

You have a tool named get_stock_price that accepts {ticker: string}. Use it to answer: What is the current price of KOENIG?

[tool_use] name: get_stock_price input: {"ticker":"KOENIG"}

After your host returns the tool result: {"ticker":"KOENIG","price":42.15,"currency":"USD","as_of":"2026-05-14T12:00:00Z"}

Claude can answer: KOENIG is trading at 42.15 USD as of 2026-05-14T12:00:00Z.`} />

The important phrase is "your host returns." Claude does not know the price. The model selected the tool and filled the arguments; your program supplied the facts.

A first implementation shape

The exact SDK syntax can change, but the control flow is stable:

- SDK syntax evolves between releases, but the underlying request-tool_use-tool_result-answer loop does not change.
- The tool definition object must include a name, description, and a JSON input schema with required fields declared explicitly.
- In a full application, tool dispatch is a switch or map by tool name, not a single hardcoded function.

```python TOOLS = [ { "name": "get_stock_price", "description": "Return the latest known price for a ticker symbol from the demo portfolio feed.", "input_schema": { "type": "object", "properties": { "ticker": { "type": "string", "description": "Uppercase ticker symbol, for example KOENIG" } }, "required": ["ticker"] } } ]

def get_stock_price(ticker: str) -> dict: prices = { "KOENIG": {"price": 42.15, "currency": "USD"}, "PAPER": {"price": 18.40, "currency": "USD"}, } symbol = ticker.upper() if symbol not in prices: raise ValueError(f"Unknown demo ticker: {symbol}") return {"ticker": symbol, **prices[symbol], "as_of": "2026-05-14T12:00:00Z"} ```

In a full app, the code around this function calls the Messages API, checks for a tool_use block, dispatches by name, catches errors, and sends the result back as a tool result. Anthropic's Messages API is the primary API surface for this interaction.[2]

Why input schemas are not optional

Without a schema, every tool call becomes a guess. The model may send stock, symbol, ticker_symbol, or company_name. Your application then either breaks or accepts loose input that later creates security problems.

- Without a required input schema, the model has no contract and will use inconsistent field names across calls.
- A well-defined schema lets your host reject malformed input before it touches any external system.
- Specific descriptions ("return the latest quote for one uppercase ticker") are more reliable than vague descriptions ("fetch data").

Schemas protect both sides:

  • Claude gets a compact contract for which fields to fill.
  • Your host can reject malformed input before touching external systems.
  • Logs become comparable because the same tool always receives the same shape.

Use specific descriptions. "Fetch data" is weak. "Return the latest quote for one uppercase ticker in the demo portfolio feed" is useful.

<Callout type="warning"> Do not connect a write-capable production tool on your first pass. Start with a read-only tool whose output you can verify manually. Once parsing, validation, and logging work, then add mutation tools with explicit approval gates. </Callout>

Common first failures

The first failure is over-broad tools. A tool named run_python or query_database is easy to demo and dangerous to operate. It gives the model a low-level execution primitive instead of a business action. Prefer get_invoice_status, lookup_stock_price, or list_open_support_cases.

- Over-broad tools like `run_python` or `query_database` hand the model a general primitive instead of a bounded business action.
- Hidden side effects (a tool that reads, updates, and emails) must be named explicitly in the description and confirmed in host code.
- Model-produced JSON must still be validated in your runtime; schemas guide the model but do not replace server-side input checks.

The second failure is hidden side effects. A tool named sync_customer might read from Salesforce, update Stripe, and email an account manager. The model cannot reason about that safely from the name. If a tool changes state, say so in the description and require confirmation in your host.

The third failure is treating model output as trusted JSON. Even when using tool schemas, validate inputs in your own runtime. Consistency guidance from Anthropic emphasizes strengthening outputs through constraints and checks, not wishful parsing.[3]

Try this · claude-sonnet-4-6

I am designing a first Claude tool for a finance assistant. Compare these tool names and tell me which is safer: run_sql(query), get_customer_balance(customer_id), or update_account(anything). Explain…

Why: - It is narrow and business-specific. - Its input is constrained to a customer identifier. - It sounds read-only, which makes review and logging easier.

run_sql(query) is too powerful because it exposes a general database primitive. update_account(anything) is vague and write-capable, so it needs a much stronger schema, authorization check, and human approval step.`} />

Hands-on exercise

Build a local script that defines get_stock_price(ticker) and exposes it to Claude as a tool. Use only a hard-coded demo price table.

Success criteria:

  • The tool schema has exactly one required field: ticker.
  • A request for KOENIG produces a tool-use turn, then a final answer with price, currency, and timestamp.
  • A request for an unknown ticker returns a controlled error, not a stack trace.
  • You log the tool name, validated input, and success/failure.

What's next

Chapter 2 moves from one client-side function to MCP, the protocol that lets hosts discover and call tools, resources, and prompts from external servers.

[1]: Anthropic, "Tool use with Claude", https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview [2]: Anthropic, "Messages API", https://docs.anthropic.com/en/api/messages [3]: Anthropic, "Increase output consistency", https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency

References

  1. https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview
  2. https://docs.anthropic.com/en/api/messages
  3. https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency
Chapter 2 · 50 min

Beyond Function Calling: Understanding MCP

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

Chapter 1 gave Claude one client-side function: a stock-price lookup that your host application described, executed, and returned to the model. That pattern is the foundation of tool use. It also exposes the first scaling problem. If every AI application has to hand-code its own GitHub connector, database connector, design-system connector, and finance connector, the industry ends up with dozens of one-off integrations that all solve discovery, credentials, logs, and safety in slightly different ways.

The Model Context Protocol, or MCP, is the standard layer that moves reusable capabilities behind a protocol boundary. The official architecture documentation describes MCP as a system where host applications connect to MCP servers through MCP clients, and those servers expose capabilities such as tools, resources, and prompts.[^architecture] The practical result is simple: instead of saying, "My app gave Claude a Python function," you can say, "Claude Code connected to a server that advertises a controlled set of capabilities."

That distinction matters for the rest of this course. Native function calling teaches the turn-by-turn mechanics: describe a tool, receive a tool request, run code, return the result. MCP teaches connector architecture: discover capabilities, isolate domain logic, reuse the same server across hosts, and keep policy enforcement out of the model prompt.

By the end of this chapter, you should be able to look at an MCP server and answer five production questions:

  1. Which application is the host?
  2. Which process or remote endpoint is the server?
  3. Which advertised capabilities are tools, resources, and prompts?
  4. Which transport connects them?
  5. Where do secrets and security decisions live?

Those five questions are enough to keep you out of most early MCP design mistakes.

Prerequisites check

Before continuing, make sure you can explain the Chapter 1 tool-use loop without looking at notes:

  1. Claude receives a user request plus a list of available tools.
  2. Claude returns a tool_use block when it wants one of those tools.
  3. Your host validates the tool name and input.
  4. Your host executes the real code.
  5. Your host sends a tool_result back to Claude.

If that loop is still fuzzy, repeat the Chapter 1 stock-price exercise first. MCP does not remove the loop. It puts a standard client-server protocol around the capabilities that the host can offer to Claude.

You also need a terminal where you can run basic commands. The hands-on exercise uses Claude Code's MCP configuration flow, but the conceptual work applies to Claude Desktop, API-hosted MCP connectors, and custom MCP clients as well.

From one-off functions to connector servers

Native tool use is local to an application. You describe a tool in the API request, and your application handles the result. This is ideal when the tool is small, private to your product, or not worth sharing across environments.

- Native tool use is per-application; MCP moves capability behind a protocol boundary that any MCP-compatible host can reuse.
- An MCP server encapsulates domain logic, credentials, and policy so every host application does not have to re-implement them.
- The official TypeScript SDK structure is: create an McpServer, register capabilities, create a transport, connect — the same shape applies in Python.

MCP becomes useful when a capability should be reusable. A finance connector might need to query invoices, summarize customer balances, expose an accounts-receivable policy, and offer a reusable collection-email prompt. Those pieces should not be copied into every AI application. They belong in a connector server maintained by the team that understands the finance system.

The official MCP TypeScript SDK frames server development around three steps: create an McpServer, register tools/resources/prompts, create a transport, and connect the server to that transport.[^typescript-sdk] That shape is what you will build in Chapter 3. For now, focus on the architecture:

  • The host is the user-facing AI application, such as Claude Code.
  • The client is the MCP protocol component inside that host.
  • The server is the external process or service that exposes capabilities.
  • The transport is the connection mechanism, such as local stdio or remote HTTP.

When someone says "Claude called my MCP server," translate that into the precise version: the host's MCP client discovered capabilities from the MCP server, the model chose or benefited from one of those capabilities, and the host sent the server a protocol request.

<Callout type="warning"> MCP is not a security product by itself. It standardizes how capabilities are exposed and called. Your server still owns authentication, authorization, input validation, rate limits, error handling, and audit logs. </Callout>

The three primitives: tools, resources, and prompts

MCP has several protocol concepts, but this course starts with the three primitives you will use constantly: tools, resources, and prompts.

- Tools are callable actions with side effects or query logic; resources are stable, readable context identified by URIs; prompts are reusable instruction templates.
- Modeling a stable policy document as a tool (`get_refund_policy()`) obscures its read-only nature and pollutes the action surface.
- Classifying capabilities correctly affects safety, UX, and observability — not just naming convention.

Tools are callable actions. The MCP tools specification describes tools as functions exposed by a server that can be invoked by clients, with names, descriptions, input schemas, and returned content.[^tools] Use a tool when something needs to happen: search invoices, create a ticket, list files, run a diagnostic query, or draft a document from live data.

Resources are readable context. The MCP resources specification describes resources as data exposed by servers that clients can read, often identified by URIs.[^resources] Use a resource when the model needs context rather than an action: a refund policy, a project README, a database schema summary, or a localized configuration file.

Prompts are reusable prompt templates. The MCP prompts specification describes prompts as server-provided templates that can accept arguments and return messages for a workflow.[^prompts] Use a prompt when the server knows a repeatable instruction pattern: draft a polite collection email, summarize a pull request against team standards, or prepare a compliance review checklist.

Here is the rule of thumb:

  • If the model should ask the server to do something, use a tool.
  • If the host should attach known context for the model to read, use a resource.
  • If the connector should provide a reusable instruction pattern, use a prompt.

That classification is not academic. It changes safety, UX, and observability. A refund policy modeled as get_refund_policy() makes a read-only document look like an action. A write-capable operation modeled as a resource hides its side effect. A long workflow prompt buried inside a tool description becomes hard to version and test.

Try this · claude-sonnet-4-6

Classify each MCP capability as a tool, resource, or prompt. Use one sentence of reasoning for each: (1) list_unpaid_invoices(customer_id), (2) company://policies/refund-policy, (3) draft_collection_e…

Notice that "read-only" does not automatically mean "resource." list_unpaid_invoices(customer_id) is read-only from the business user's perspective, but it is still an action because the server must execute a query with arguments. A resource is better when the item is stable enough to identify and retrieve directly.

A connector example: accounts receivable

Generic examples like foo() and bar() do not teach connector design. Use a domain.

Imagine a small business wants Claude Code to help with accounts receivable. The underlying system has customers, invoices, payments, and reminder-email templates. A weak MCP server mirrors the database:

query_database(sql)
http_request(method, url, body)
send_email(to, subject, body)

Those tools are flexible, but they force Claude to reason at the wrong layer. They also create serious safety problems. query_database(sql) can over-fetch sensitive records. http_request() can reach unapproved endpoints. send_email() can contact customers without a business approval step.

A stronger server exposes business capabilities:

```text Tools: - list_overdue_invoices(customer_id, max_age_days) - draft_invoice_reminder(customer_id, invoice_ids) - submit_reminder_for_approval(draft_id)

Resources: - finance://policies/collections - finance://customers/{customer_id}/account-summary

Prompts: - write_polite_payment_reminder(customer_name, invoice_summary, policy_uri) - summarize_receivables_risk(customer_summary_uri) ```

This design gives the model useful verbs without handing it raw infrastructure. It also gives the server natural enforcement points. list_overdue_invoices can cap result size. draft_invoice_reminder can redact sensitive notes. submit_reminder_for_approval can require a human approval state. The resources provide stable context, and the prompts encode the organization's preferred language.

The server boundary is where you turn a messy internal system into a model-friendly, policy-aware interface.

Host, client, server, and transport

The MCP architecture is easiest to understand as a responsibility split:

- The host is the user-facing application (e.g., Claude Code), the client is its protocol component, and the server is the external capability provider.
- Use local stdio for scripts that need direct machine access; use remote HTTP for cloud services; SSE is deprecated for new work.
- Configuration scope (local, project, user) controls who can see the server, and secrets should use environment variable expansion rather than committed values.
PartResponsibilityExample
HostUser-facing AI application and UXClaude Code
ClientProtocol connection managed by the hostClaude Code's MCP client for one configured server
ServerExternal capability providerA local Node.js file-browser server
TransportHow messages movestdio for local, HTTP for remote

Claude Code's MCP documentation shows three broad connection options: remote HTTP servers, remote SSE servers, and local stdio servers.[^claude-code] The same documentation marks HTTP as the recommended option for remote cloud services and explains that local stdio servers run as local processes on your machine.[^claude-code] It also notes that SSE is deprecated in favor of HTTP where available.[^claude-code]

For a beginner, this gives you a clear decision tree:

  • Use local stdio when the server is a local script or needs direct local machine access.
  • Use remote HTTP when the server is a cloud service or team-managed endpoint.
  • Avoid starting new SSE work unless you are integrating with an existing server that only supports it.

Configuration scope is the next decision. Claude Code supports local, project, and user scopes for MCP servers.[^claude-code] Local scope is private to your current project entry in your user configuration. Project scope writes a .mcp.json file that can be shared with the repository. User scope makes a server available across projects. For course work, start local unless the exercise explicitly asks for project sharing. For team connectors, project scope can be useful, but only when secrets are handled through environment variable expansion or secure authentication rather than committed values.

What discovery changes

In Chapter 1, your application passed tool definitions directly to Claude in the request. With MCP, the host can discover what the server offers. That discovery shift affects maintenance.

- MCP discovery means a server can advertise new capabilities without requiring every host application to be updated with hardcoded tool definitions.
- Discovery does not grant automatic trust; a responsible host can still filter, require approval, or disable advertised capabilities.
- Tool names and descriptions are part of the interface the model and host use to determine relevance, so a vague name is a bad MCP tool even if the code works.

Suppose the finance team adds a new explain_late_fee(customer_id, invoice_id) tool. In a one-off native integration, every host application might need code or configuration changes. In an MCP setup, the finance server can advertise the new capability through the protocol. The host still needs UX and approval policies, but the connector's capability surface is no longer embedded inside every application.

Discovery does not mean the model should automatically use everything. A responsible host can still filter tools, ask for user approval, display server trust state, or disable a capability. Discovery only means the server has a standard way to say what it can provide.

This is why good descriptions matter. The tool name and description are not documentation for humans only; they are part of the interface that helps the model and host understand when a capability is relevant. A tool named action() with a vague description is a bad MCP tool even if the code works.

Common anti-patterns

The first anti-pattern is the raw executor tool: run_shell, query_sql, http_request, or eval_code. These are attractive because they make demos feel powerful. In production, they shift too much decision-making to the model. Replace them with bounded domain tools such as list_failed_deployments, get_customer_balance, or search_contract_clauses.

The second anti-pattern is hiding writes behind harmless names. A tool named sync_customer might update a CRM, email an owner, and trigger billing workflows. If a tool changes state, name the side effect and require appropriate approval in the host or server.

The third anti-pattern is modeling everything as a tool. Policies, schemas, runbooks, and reference docs should usually be resources. If the model needs the same policy across many tasks, a stable resource URI is easier to inspect, cache, and cite.

The fourth anti-pattern is modeling a workflow prompt as application code only. If the connector's domain expertise includes "how our legal team wants contract-risk summaries formatted," expose that as an MCP prompt. Then the prompt can evolve with the connector instead of being copied into every host application.

The fifth anti-pattern is treating MCP connection success as production readiness. A server can connect and still be unsafe, unobservable, over-broad, or impossible to debug. Later chapters cover logging, security, and approval gates because a connected connector is only the starting line.

Reading a Claude Code MCP configuration

Now connect the architecture to the command line.

Claude Code can add a local stdio server with a command shaped like this:

claude mcp add --transport stdio --env FINANCE_API_KEY=demo finance-demo -- node ./server.js

Read it from left to right:

  • claude mcp add modifies Claude Code's MCP configuration.
  • --transport stdio says Claude Code will start a local process and speak MCP over standard input/output.
  • --env FINANCE_API_KEY=demo passes an environment variable to the server process.
  • finance-demo is the server name inside Claude Code.
  • -- node ./server.js is the command Claude Code runs as the server.

For a remote HTTP server, the shape changes:

claude mcp add --transport http sentry https://mcp.sentry.dev/mcp

In that case, Claude Code does not start a local Node.js process. It connects to a remote endpoint over HTTP. If the server needs authentication, Claude Code supports headers and OAuth flows for remote servers.[^claude-code]

Try this · claude-sonnet-4-6

Read this Claude Code MCP command and explain the host, server, transport, secret handling, and likely discovered capabilities: claude mcp add --transport stdio --env FINANCE_API_KEY=demo finance-demo…

Try this · claude-sonnet-4-6

A team wants to share an MCP server config in a repository. The config points to https://api.example.com/mcp and needs an API key. Recommend a Claude Code scope and explain how to avoid committing the…

Capability design checklist

Before you build your first server in Chapter 3, practice reviewing a server's advertised surface. For every capability, ask:

  1. Is the name a business action or a raw technical primitive?
  2. Does the description say when to use it and what it will not do?
  3. Are inputs narrow enough to validate?
  4. Is the operation read-only or state-changing?
  5. If it reads context, should it be a resource instead?
  6. If it encodes a repeatable workflow, should it be a prompt instead?
  7. Where will detailed errors be logged?
  8. What should Claude see when the server refuses a request?

Here is a practical rewrite exercise:

```text Bad: tool: http_request input: { method, url, body }

Better: tool: search_customer_invoices input: { customer_id, status, max_results }

resource: finance://policies/invoice-collection

prompt: draft_invoice_followup arguments: { customer_name, invoice_summary, policy_uri } ```

The better version gives the model enough flexibility to help while keeping control in the server. It also creates separate places for policy, workflow language, and business actions.

Hands-on exercise: connect and inspect a sandbox MCP server

Your goal is to connect Claude Code to one existing MCP server in a sandbox environment and inspect what it advertises. Do not use a production system. Do not use a server with write access to real customer data.

Use any safe server you already trust, or create a temporary demo server from official SDK examples. The TypeScript SDK repository includes server examples and a minimal server pattern that registers a tool and connects over stdio.[^typescript-sdk] The exact server is less important than the inspection habit.

Step 1: choose the server

Pick one of these:

  • A local demo MCP server from the official TypeScript SDK examples.
  • A read-only local server you wrote earlier.
  • A remote server owned by a trusted vendor, connected with a non-production account.

Avoid:

  • Servers that can delete files, send messages, change billing, or update customer records.
  • Servers that ask you to paste secrets into prompts.
  • Random packages you have not inspected.

Step 2: add the server to Claude Code

For a local stdio server, the command shape is:

claude mcp add --transport stdio sandbox-demo -- node ./server.js

For a remote HTTP server, the command shape is:

claude mcp add --transport http sandbox-demo https://example.com/mcp

If credentials are required, prefer environment variables, OAuth, or a dedicated sandbox token. Claude Code's documentation includes commands for server listing, detail inspection, removal, and /mcp status checking.[^claude-code]

Step 3: inspect capabilities

Run:

claude mcp list
claude mcp get sandbox-demo

Then open Claude Code's /mcp view and inspect the server status.

Write down:

  • Server name.
  • Transport.
  • Scope.
  • Command or URL.
  • Any environment variables or headers involved.
  • Advertised tools.
  • Advertised resources.
  • Advertised prompts.

Step 4: classify and critique

For each advertised capability, classify it as tool, resource, or prompt. Then answer:

  • Is the name domain-specific?
  • Could the input schema allow over-broad access?
  • Does any capability write state?
  • Would you allow this server in a shared team project?

Success criteria:

  • You can identify the host, client, server, and transport in your setup.
  • You can list at least one advertised capability.
  • You can classify each visible capability as a tool, resource, or prompt.
  • You can explain where secrets are stored and which process or endpoint receives them.
  • You can name one risk you would fix before using the server in production.

Slide outline for Slide+Audio Producer

  • Slide 1: Chapter goal: move from one-off function calls to reusable connector servers.
  • Slide 2: MCP architecture: host, client, server, transport.
  • Slide 3: Tools vs resources vs prompts, using accounts-receivable examples.
  • Slide 4: Anti-patterns: raw executors, hidden writes, everything-as-tool.
  • Slide 5: Claude Code configuration: stdio vs HTTP, scope, secrets.
  • Slide 6: Hands-on workflow: connect, inspect, classify, critique.
  • Slide 7: Bridge to Chapter 3: building the first safe file-browser MCP server.

What's next

Chapter 3 turns this architecture into code. You will build a local MCP server with one narrow file-browsing tool, connect it through stdio, and practice returning controlled errors instead of leaking raw filesystem or stack-trace details.

[^architecture]: Model Context Protocol, "Architecture overview," https://modelcontextprotocol.io/docs/learn/architecture [^tools]: Model Context Protocol specification, "Tools," https://modelcontextprotocol.io/specification/draft/server/tools [^resources]: Model Context Protocol specification, "Resources," https://modelcontextprotocol.io/specification/draft/server/resources [^prompts]: Model Context Protocol specification, "Prompts," https://modelcontextprotocol.io/specification/draft/server/prompts [^typescript-sdk]: Model Context Protocol TypeScript SDK, https://github.com/modelcontextprotocol/typescript-sdk [^claude-code]: Anthropic Claude Code docs, "Connect Claude Code to tools via MCP," https://code.claude.com/docs/en/mcp

Chapter 3 · 60 min

Building Your First MCP Server

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

Now you will build the first reusable connector in this course: a local MCP server that exposes a safe file-browsing tool. The point is not to create a full file manager. The point is to practice the server shape: define a capability, constrain inputs, execute domain code, and return useful results.

The official TypeScript SDK includes server libraries for tools, resources, prompts, transports, and examples.[1] This chapter uses TypeScript-style examples because the SDK's minimal McpServer shape is compact and maps cleanly to production servers. The same design ideas apply in Python.

Prerequisites check

You should have completed the MCP classification exercise from Chapter 2. You should also be comfortable running a Node.js script locally. If your environment cannot run TypeScript directly, use plain JavaScript or follow the SDK quickstart from the official repository.[1]

What the server will do

The server exposes one tool:

list_project_files(root_label, relative_path)

It lists files under a pre-approved project root. The user can ask Claude to inspect project structure, but the server will not allow arbitrary filesystem traversal.

- The server enforces the security boundary; the model never decides which paths are off-limits.
- Starting with a list-only tool (no file-content access) is the correct first step for filesystem connectors.
- The `root_label` enum approach prevents arbitrary path construction at the input-schema level before any validation code runs.

This is the key production lesson: the model should not decide the security boundary. The server decides the boundary, then offers Claude a useful operation inside it.

Minimal server shape

An MCP server has identity, registered capabilities, and a transport. The SDK README shows a minimal server that registers a greet tool and connects over stdio.[1] Your file browser follows the same shape, but with stricter validation.

- Every MCP server needs three things: an identity object, registered capabilities, and a connected transport.
- `path.resolve` plus a `startsWith` check is the minimal server-side path-escape guard; omitting it means any caller can traverse to arbitrary directories.
- The `StdioServerTransport` connects the server to the host via standard input/output, which is appropriate for local processes started by a host like Claude Code.

```ts import { McpServer } from "@modelcontextprotocol/server"; import { StdioServerTransport } from "@modelcontextprotocol/server/stdio"; import * as z from "zod/v4"; import { readdir } from "node:fs/promises"; import path from "node:path";

const ROOTS = { demo: path.resolve(process.cwd(), "demo-project") };

const server = new McpServer({ name: "course-file-browser", version: "1.0.0" });

server.registerTool( "list_project_files", { description: "List files inside an approved demo project root. Does not read file contents.", inputSchema: z.object({ root_label: z.enum(["demo"]), relative_path: z.string().default(".") }) }, async ({ root_label, relative_path }) => { const root = ROOTS[root_label]; const target = path.resolve(root, relative_path);

if (!target.startsWith(root)) { throw new Error("Path escapes approved root"); }

const entries = await readdir(target, { withFileTypes: true }); return { content: [ { type: "text", text: JSON.stringify( entries.map((entry) => ({ name: entry.name, type: entry.isDirectory() ? "directory" : "file" })), null, 2 ) } ] }; } );

const transport = new StdioServerTransport(); await server.connect(transport); ```

Why this is safer than read_file(path)

A raw read_file(path) tool looks convenient. It is also an invitation to leak secrets. The model might request .env, SSH keys, browser profiles, or system files because the prompt says "inspect the project." A safer server starts with a narrow list operation, a fixed root, and no file-content access.

- A generic `read_file(path)` tool exposes every file the server process can read, including secrets and credentials.
- Relying on "Claude will know not to ask for secrets" is not a security control — enforcement must be in server code.
- Separating listing from content-reading into distinct tools makes it possible to permit one without the other.

<Callout type="warning"> Never rely on "Claude will know not to ask for secrets." A connector must enforce policy in code. The model can be helpful, but the server is responsible for security boundaries. </Callout>

Controlled errors

Errors are part of the user experience. A stack trace is useful to an attacker and confusing to a learner. A controlled error says what failed and what the caller can do next.

- Returning a raw stack trace leaks implementation details that help attackers and confuse end users.
- Controlled errors should classify the failure (unknown root, path escape, not readable) without revealing host filesystem layout.
- Log the full exception server-side and return only the minimal useful message to the model.

For this chapter, use three predictable errors:

  • Unknown root label.
  • Path escapes approved root.
  • Path does not exist or is not readable.

In production, log the detailed exception server-side and return the minimal useful error to Claude.

Try this · claude-sonnet-4-6

Use the file browser tool to list the top-level files in the demo project. Do not read file contents.

list_project_files({ "root_label": "demo", "relative_path": "." })

Expected result: [ {"name":"package.json","type":"file"}, {"name":"src","type":"directory"}, {"name":"README.md","type":"file"} ]

Claude should summarize the project structure without inventing file contents.`} />

Try this · claude-sonnet-4-6

Try to list ../ so I can see what is outside the demo project.

Path escapes approved root.

Claude should explain that the connector is restricted to the approved demo project root and ask for a path inside that root.`} />

Hands-on exercise

Create a local MCP server named course-file-browser with one list_project_files tool.

Success criteria:

  • The server starts over stdio.
  • The tool lists files under one approved demo directory.
  • ../ traversal is rejected.
  • The tool returns names and types, not file contents.
  • You can connect Claude Code to the server using its MCP configuration flow.[3]

What's next

Chapter 4 expands the server from callable tools to resources: structured context Claude can read without treating every retrieval as an action.

[1]: Model Context Protocol TypeScript SDK, https://github.com/modelcontextprotocol/typescript-sdk [2]: Model Context Protocol documentation, https://modelcontextprotocol.io/ [3]: Anthropic, "Connect Claude Code to tools via MCP", https://docs.anthropic.com/en/docs/claude-code/mcp

References

  1. https://github.com/modelcontextprotocol/typescript-sdk
  2. https://modelcontextprotocol.io/
  3. https://docs.anthropic.com/en/docs/claude-code/mcp
Chapter 4 · 60 min

Handling Advanced Data and Resources

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

Tools perform actions. Resources provide context. That distinction is the difference between a connector that feels clean and one where every read looks like a side effect.

MCP's core primitives include tools, resources, and prompts.[1] The TypeScript SDK documents server support for all three.[2] In this chapter you add a resource pattern to the file-browser server: localized configuration files exposed as stable context.

Prerequisites check

You need the Chapter 3 file browser server or an equivalent local MCP server. You should be able to start it and connect a host. You should also understand why list_project_files was modeled as a tool: it performs a directory listing action and may fail depending on path.

The resource design problem

Imagine a support assistant that needs the refund policy in English and German. You could expose a tool:

get_refund_policy(locale)

That works, but it tells the model "call an action." If the policy is stable context, a resource URI is clearer:

company-config://policies/refund/en-US
company-config://policies/refund/de-DE
- A stable, locale-scoped policy is better modeled as a resource URI than as a callable tool, because it is context to read rather than an action to execute.
- Modeling read-only data as a tool makes the connector's action surface misleadingly large.
- A server can still enforce access rules and log reads on resource fetches, so resources are not uncontrolled.

The URI names the thing. Claude can read it as context. Your server can still enforce access rules and log reads.

Resource URI rules

Good resource URIs are:

  • Stable: the same policy has the same URI tomorrow.
  • Meaningful: a human can infer what the resource represents.
  • Scoped: the URI includes tenant, project, locale, or environment when needed.
  • Non-secret: the URI should not contain credentials or private tokens.
- A resource URI is an interface, not a storage location — the server maps it to a file, database row, or generated summary behind the boundary.
- Raw file paths, signed URLs with embedded secrets, and internal database primary keys are anti-patterns for resource URIs.
- Including tenant, locale, or environment in the URI gives the server a scoping signal without leaking implementation details.

Bad resource URIs include raw file paths from a developer laptop, signed URLs with secrets, or database primary keys that reveal internal implementation details.

Localized configuration example

Create a config/ directory inside the demo project:

demo-project/
  config/
    refund.en-US.json
    refund.de-DE.json

Each file should contain structured policy data:

{
  "policy": "refund",
  "locale": "en-US",
  "window_days": 30,
  "requires_receipt": true,
  "exceptions": ["downloaded digital goods", "custom services"]
}

Your server can expose this as a resource instead of a broad file-read tool. The host sees policy context, not arbitrary disk access.

Try this · claude-sonnet-4-6

Read the company refund policy resource for en-US and summarize the refund window, receipt requirement, and exceptions.

company-config://policies/refund/en-US

Expected resource content: { "policy": "refund", "locale": "en-US", "window_days": 30, "requires_receipt": true, "exceptions": ["downloaded digital goods", "custom services"] }

Claude should summarize: The en-US refund policy allows refunds within 30 days, requires a receipt, and excludes downloaded digital goods and custom services.`} />

Handling binary and large data

Resources can represent more than text, but large or binary data needs care. If a PDF contract is 80 pages, blindly injecting it into the model context is slow, expensive, and often useless. Better patterns include:

- Injecting a large document wholesale into context is slow, expensive, and often counterproductive because most content is irrelevant to the query.
- Expose metadata first, then section-level resources, then targeted extraction tools — this keeps context lean and purposeful.
- The design goal is "Claude sees the right context with enough structure to act," not "Claude sees everything."
  • Expose metadata first: title, type, page count, owner, updated time.
  • Expose section resources: contract://123/section/payment-terms.
  • Expose summaries with provenance: include page numbers or section IDs.
  • Offer a tool for targeted extraction when the model needs specific clauses.

The goal is not "Claude sees everything." The goal is "Claude sees the right context with enough structure to act."

Try this · claude-sonnet-4-6

I have a 90-page vendor contract. Design MCP resources and tools so Claude can answer payment-term questions without loading the entire PDF into context.

Resources: - contract://vendor-123/metadata - contract://vendor-123/sections/payment-terms - contract://vendor-123/sections/termination

Tools: - search_contract(contract_id, query) - extract_clause(contract_id, clause_type)

This keeps context targeted. Claude can read metadata, then the payment terms section, and only call extraction when needed.`} />

Hands-on exercise

Extend your Chapter 3 server with a localized refund-policy resource.

Success criteria:

  • The server exposes company-config://policies/refund/en-US.
  • The resource returns structured JSON with policy, locale, refund window, receipt rule, and exceptions.
  • Claude can summarize the policy without calling a write-capable tool.
  • You can explain why the URI is stable and non-secret.

What's next

Chapter 5 adds observability. You will make every tool and resource access visible through structured logs and audit events.

[1]: Model Context Protocol documentation, https://modelcontextprotocol.io/ [2]: Model Context Protocol TypeScript SDK, https://github.com/modelcontextprotocol/typescript-sdk [3]: Anthropic, "Connect Claude Code to tools via MCP", https://docs.anthropic.com/en/docs/claude-code/mcp

References

  1. https://modelcontextprotocol.io/
  2. https://github.com/modelcontextprotocol/typescript-sdk
  3. https://docs.anthropic.com/en/docs/claude-code/mcp
Chapter 5 · 60 min

Observability and Logging in MCP

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

A connector you cannot observe is not production software. When Claude calls a tool, you need to know what was requested, what policy decision was made, how long execution took, and whether the result was successful. Without that trail, debugging becomes guesswork and compliance review becomes impossible.

MCP standardizes how hosts connect to servers and discover capabilities.[1] It does not remove the need for application observability. Your server must emit logs and audit events around every meaningful operation.

Prerequisites check

You should have a working local MCP server with at least one tool from Chapter 3 and one resource from Chapter 4. If you only have a tool, you can still complete this chapter, but the final exercise is stronger when you log both tool calls and resource reads.

Operational logs vs audit events

Use two categories:

  • Operational logs help engineers debug reliability: latency, exception class, retry count, dependency status.
  • Audit events help reviewers reconstruct sensitive actions: actor, tool name, target object, authorization result, timestamp.
- Operational logs serve engineers debugging reliability; audit events serve reviewers reconstructing sensitive actions — both can share a pipeline but must be distinguishable.
- A timeout reading a config file is operational; a user approving an external send is audit-worthy.
- Mixing the two categories makes logs harder to search and compliance reviews harder to scope.

They can go to the same logging pipeline, but they should be distinguishable. A timeout reading config/refund.en-US.json is operational. A user approving send_invoice_reminder is audit-worthy.

The minimum useful log event

Every tool call should emit:

  • event: stable event name, such as mcp.tool.completed.
  • tool_name: the MCP tool called.
  • request_id: correlation ID from the host or generated server-side.
  • actor: user or agent identity when available.
  • input_summary: sanitized summary, not raw secrets.
  • success: boolean.
  • duration_ms: elapsed time.
  • error_code: controlled code when failed.
- A stable `event` name and a `request_id` make logs searchable and correlatable across distributed systems.
- `input_summary` must be a sanitized version of the input — never log raw credentials or full customer records.
- Consistent event shape across every tool is what makes logs usable; copying logging code into each handler destroys consistency.

<Callout type="warning"> Do not log raw credentials, full customer records, or prompt transcripts by default. Observability that leaks sensitive data creates a second incident. </Callout>

Add a wrapper

Wrap tool handlers instead of copying logging code into every tool.

async function withToolLogging<T>(
  toolName: string,
  input: unknown,
  handler: () => Promise<T>
): Promise<T> {
  const started = Date.now();
  const requestId = crypto.randomUUID();
  try {
    const result = await handler();
    console.log(JSON.stringify({
      event: "mcp.tool.completed",
      request_id: requestId,
      tool_name: toolName,
      input_summary: summarizeInput(input),
      success: true,
      duration_ms: Date.now() - started
    }));
    return result;
  } catch (error) {
    console.log(JSON.stringify({
      event: "mcp.tool.failed",
      request_id: requestId,
      tool_name: toolName,
      input_summary: summarizeInput(input),
      success: false,
      duration_ms: Date.now() - started,
      error_code: classifyError(error)
    }));
    throw error;
  }
}

This wrapper gives every tool a consistent event shape. Consistency is what makes logs searchable.

Try this · claude-sonnet-4-6

Use list_project_files to list the demo project root. Then explain which log fields should appear for this call.

The server should emit a structured log similar to: { "event": "mcp.tool.completed", "request_id": "generated-id", "tool_name": "list_project_files", "input_summary": {"root_label":"demo","relative_path":"."}, "success": true, "duration_ms": 12 }

Claude should explain that request_id, tool_name, sanitized input, success, and duration make the call debuggable.`} />

Audit examples

Read-only file listing may not need a durable audit record in a toy project. A legal document redaction tool does. A payroll approval tool definitely does.

- The sensitivity of the operation determines whether a durable audit record is required, not just whether the call succeeded.
- Audit records should capture actor, connector, tool, target, authorization result, and timestamp — without dumping private data into logs.
- Separating the audit event from the operational log means compliance reviewers can read one stream without sorting through latency metrics.

Audit records should focus on business meaning:

{
  "event": "audit.connector.action_requested",
  "actor": "user_123",
  "connector": "payroll-assistant",
  "tool_name": "draft_payment_reminders",
  "target": "payroll_run_2026_05_14",
  "authorization": "allowed",
  "approval_required": true,
  "timestamp": "2026-05-14T12:00:00Z"
}

That event tells a reviewer what happened without dumping private payment details into logs.

Try this · claude-sonnet-4-6

Design an audit event for a tool named redact_contract(document_id, redaction_policy). Include fields useful to a compliance reviewer but avoid logging document text.

It avoids logging document text while preserving actor, target, policy, authorization, and outcome.`} />

Hands-on exercise

Add structured logging to your file-browser server.

Success criteria:

  • Every list_project_files call emits one completion or failure log.
  • Logs include request ID, tool name, sanitized input, success, duration, and controlled error code when failed.
  • Path traversal attempts are logged as failures without exposing host filesystem details.
  • You can paste one successful log and one failed log into your notes.

What's next

Chapter 6 adds authentication and authorization. Logging tells you what happened; authorization decides whether it should happen.

[1]: Model Context Protocol documentation, https://modelcontextprotocol.io/ [2]: Model Context Protocol TypeScript SDK, https://github.com/modelcontextprotocol/typescript-sdk [3]: Anthropic, "Tool use with Claude", https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview

References

  1. https://modelcontextprotocol.io/
  2. https://github.com/modelcontextprotocol/typescript-sdk
  3. https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview
Chapter 6 · 60 min

Security and Authentication

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

By now your connector can expose tools, resources, and logs. That is enough for a demo and not enough for production. Production connectors need authentication, authorization, least privilege, and approval gates.

Anthropic's Claude Code MCP documentation warns that third-party MCP servers should be used carefully, especially when they communicate with the internet, because they can introduce prompt-injection risks.[1] The lesson is broader than Claude Code: every connector is a new trust boundary.

Prerequisites check

You need the structured logs from Chapter 5. If a tool call is not logged, you cannot audit security behavior. You should also have at least one tool that can be denied without breaking the whole server.

Authentication vs authorization

Authentication answers "who is calling?" Authorization answers "may this caller do this action to this target?"

Examples:

  • Authentication: request includes a bearer token that maps to user_123.
  • Authorization: user_123 may read demo project files but may not list production secrets.
  • Authentication: the MCP server receives FINANCE_API_KEY as an environment variable.
  • Authorization: that key can read settlement records but cannot issue refunds.
- Authentication and authorization are separate concerns: a valid credential does not automatically permit every tool or every target.
- Per-tool authorization (not just per-server) is necessary because different tools carry different risk levels.
- The pattern "authenticate → derive actor → authorize action → execute → log result" is the correct ordering for every tool handler.

Do not collapse these concepts. A valid credential does not imply permission for every tool.

Least privilege for connectors

Least privilege means the connector receives only the access needed for its declared operations. A file browser for a demo project should not mount the whole home directory. A payroll assistant should not receive a write token if it only drafts reminder emails. A legal redaction tool should not retain original documents after output is produced unless policy requires retention.

- Least privilege applies at the credential level (scope of the API key or token) and at the tool level (which operations the connector exposes).
- An "admin token plus prompt instructions" security model is not authorization — prompt rules are suggestions, authorization checks are enforced in code.
- MCP's ease of exposing capabilities is a design pressure toward over-permission; narrow tools and scoped credentials counteract it.

MCP makes it easy to expose capabilities. That ease is exactly why you must design them narrowly.

<Callout type="warning"> Avoid "admin token plus prompt rules" as a security model. Prompt rules are instructions; authorization checks are code. </Callout>

Authorization wrapper

Put authorization before tool execution and before detailed logging of target data.

```ts type Actor = { id: string; roles: string[]; };

function requirePermission(actor: Actor, permission: string) { if (!actor.roles.includes(permission)) { const error = new Error("Forbidden"); error.name = "FORBIDDEN"; throw error; } }

async function authorizedTool<T>( actor: Actor, permission: string, run: () => Promise<T> ): Promise<T> { requirePermission(actor, permission); return run(); } ```

Then call it inside a tool handler:

server.registerTool("list_project_files", schema, async (input, context) => {
  const actor = actorFromContext(context);
  return authorizedTool(actor, "project:read", async () => {
    return listProjectFiles(input);
  });
});

The exact context object depends on your transport and host. The pattern is stable: authenticate, derive actor, authorize action, execute tool, log result.

Try this · claude-sonnet-4-6

A connector has tools read_invoice, draft_invoice_reminder, and send_invoice_reminder. Assign read/write/approval requirements for each using least privilege.

  • read_invoice: requires invoice:read. No human approval if read-only and scoped.
  • draft_invoice_reminder: requires invoice:read and email:draft. No external send; approval optional.
  • send_invoice_reminder: requires invoice:read and email:send, plus human approval before sending.

The write-capable external action has the strongest gate.`} />

Human-in-the-loop approval

Some tools should not execute immediately even when authorized. Approval gates are appropriate when a tool sends money, changes legal text, deletes data, emails customers, or modifies production systems.

- Authorization is necessary but not sufficient for high-risk actions; approval gates add a human decision point after authorization passes.
- A tool should return an `awaiting_approval` object with enough preview data for a human to make an informed decision without exposing full private content.
- The application must enforce the approval state in code; Claude explaining the draft is not a substitute for a required user click.

A safe tool can return a pending action:

{
  "status": "awaiting_approval",
  "action": "send_invoice_reminder",
  "preview": {
    "to": "ap@example.test",
    "subject": "Reminder: invoice INV-123",
    "body_excerpt": "This is a reminder that..."
  },
  "approval_id": "appr_123"
}

Claude can explain the draft, but your application should require a user click or separate approval event before sending.

Try this · claude-sonnet-4-6

Design the result object for a tool that prepares, but does not send, payroll reminder emails. Include enough data for human review.

The tool prepares the action and creates an approval record. It does not send messages automatically.`} />

Hands-on exercise

Add an authorization wrapper to your Chapter 3 file-browser tool.

Success criteria:

  • The server derives an actor from context or a local demo token.
  • list_project_files requires project:read.
  • An actor without project:read receives a controlled forbidden error.
  • The denial is logged without leaking the requested filesystem path outside the approved root.

What's next

Chapter 7 applies the connector pattern to creative tools. You will see how state, non-textual outputs, and professional applications change the design pressure.

[1]: Anthropic, "Connect Claude Code to tools via MCP", https://docs.anthropic.com/en/docs/claude-code/mcp [2]: Model Context Protocol documentation, https://modelcontextprotocol.io/ [3]: Model Context Protocol TypeScript SDK, https://github.com/modelcontextprotocol/typescript-sdk

References

  1. https://docs.anthropic.com/en/docs/claude-code/mcp
  2. https://modelcontextprotocol.io/
  3. https://github.com/modelcontextprotocol/typescript-sdk
Chapter 7 · 50 min

Creative Connectors

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

The creative connectors are a suite of nine integrations launched by Anthropic on April 28, 2026, that enable the Model Context Protocol (MCP) to control professional creative software directly via Claude. This launch marks a strategic shift for MCP: instead of targeting enterprise SaaS (like Salesforce or Jira) first, Claude has moved into the "creative beachhead" — tools like Blender, Adobe for creativity, and Ableton.

Key facts

  1. Nine initial connectors: The launch includes Blender, Adobe for creativity, Ableton Live, Affinity by Canva, Autodesk Fusion, SketchUp, Resolume Arena, Resolume Wire, and Splice.
  2. Live scene execution: Unlike static file parsing, connectors like Blender allow Claude to execute code (via bpy) against the live scene state.
  3. Local-first architecture: The connectors run as local MCP servers, meaning your creative assets stay on your machine while Claude only sends and receives structured commands.
  4. Free and Paid Tiers: While many features (like Adobe Express) are available on free plans, advanced API access for tools like Photoshop typically requires a paid subscription.

Why Anthropic chose creative apps first

Most AI providers chase enterprise CRM and ERP integrations. Anthropic’s pivot to creative tools is non-obvious but brilliant for three reasons:

  1. High Syntax Barrier: Writing bpy (Blender Python) or ExtendScript (Adobe) is notoriously difficult for humans but easy for LLMs. This creates an immediate "aha!" moment that checking a Jira ticket doesn’t.
  2. Low Risk, High Visibility: A bug in a generative 3D script is a creative glitch; a bug in a Salesforce integration is a business catastrophe. Creative tools provide a safe sandbox to stress-test MCP in production.
  3. Synthesizing the Pipeline: Creative work is rarely done in one app. By winning the "bridge" between Blender and Photoshop, Anthropic positions Claude as the OS for the creative studio, not just another chat box.
- Creative tools have a high syntax barrier (bpy, ExtendScript) that makes LLM assistance immediately valuable while keeping failure risk low.
- A bug in a generative 3D script is a creative glitch; a bug in a CRM integration is a business incident — creative apps are safer early production targets for MCP.
- Positioning Claude as a cross-app bridge is a distribution strategy: the model becomes the interface to the entire creative studio stack, not a single chat feature.

This signals that Anthropic view MCP as a distribution play. By becoming the default way humans interact with complex, fragmented software suites, they bypass the need for every software vendor to build their own AI UI. For a primer on the protocol itself, see course/mcp-from-first-principles-to-production/01-why-mcp-exists.


Walkthrough: Controlling Blender via MCP

The Blender connector is the most technically transparent of Anthropic's nine creative integrations. Unlike connectors that wrap proprietary APIs, the Blender MCP server exposes Blender's native Python API (bpy) directly to Claude. That means every technique Claude uses here is a real bpy pattern you can learn, copy, and extend.

- The Blender MCP server exposes the native `bpy` Python API, so Claude's tool calls are real Blender patterns that execute against the live scene, not a simplified wrapper.
- Claude never touches your filesystem directly — it sends a tool call with Python code, and the local MCP server executes it inside Blender's interpreter.
- Because the connector runs locally, creative assets stay on your machine while only structured commands and results cross the MCP boundary.

How the connection works

Before touching any code, it helps to have a mental model of the data flow.

You (natural language)
        │
        ▼
   Claude (reasoning + code generation)
        │   tool_call: execute_python({ code: "..." })
        ▼
   Blender MCP Server (local process, port 9001 by default)
        │   subprocess call via bpy
        ▼
   Blender Python Interpreter
        │   modifies scene graph
        ▼
   Result dict  ──►  back up the chain to Claude  ──►  back to you

The MCP server runs locally alongside Blender. Claude never touches your filesystem directly — it sends a tool_call with Python code, the server executes it inside Blender's interpreter, and returns the result. This means Claude is working against the live scene state, not a static file.

Step 1 — Install the Blender MCP connector

  1. Open Claude.ai → Settings → Connectors → Browse
  2. Search for Blender and click Enable.
  3. The connector guide will prompt you to install the companion server package:
  4. ```bash
  5. pip install blender-mcp-server
  6. ```
  7. Start Blender (4.x), then start the MCP server from your terminal:
  8. ```bash
  9. blender-mcp-server --port 9001
  10. ```
  11. You should see: Blender MCP server running on ws://localhost:9001.
  12. Back in Claude.ai, click Test connection. A green checkmark confirms Claude can reach the server.

<Callout type="warning"> Blender version: The connector requires Blender 4.0 or later. Blender 3.x uses a different Python version (3.10) that is incompatible with blender-mcp-server. Run blender --version to check. </Callout>

Step 2 — Your first tool-use call: query the scene

The safest first call is read-only: ask Claude to describe what's currently in the scene.

Try this · claude-sonnet-4-6

Using the Blender connector, tell me how many objects are in the current scene and list their names and types.

[tool_call: blender.execute_python] \\\python import bpy objects = [(obj.name, obj.type) for obj in bpy.context.scene.objects] print(f"Object count: {len(objects)}") for name, typ in objects: print(f" {name}: {typ}") \\\

Result: Object count: 3 Camera: CAMERA Light: LIGHT Cube: MESH

The scene contains 3 objects: a default Camera, a Point Light, and the default startup Cube (a MESH type).`} />

Notice the pattern: Claude wrote bpy Python, the MCP server ran it inside Blender, and the stdout came back as the tool result.

Step 3 — Generate a procedural wave scene

Now for the main event. Give Claude a creative brief and let it author the full script.

Try this · claude-sonnet-4-6

I want to create a generative wave pattern in Blender: a 10×10 grid of cubes where each cube's Z-scale is determined by a sine wave based on its distance from the centre. Clear the scene first (keep t…

[tool_call: blender.execute_python] \\\`python import bpy import math

GRID = 10 SPACING = 1.5 MAX_HEIGHT = 2.0

for row in range(GRID): for col in range(GRID): x = (col - GRID / 2) SPACING y = (row - GRID / 2) SPACING dist = math.sqrt(x2 + y2) z_scale = 0.2 + (MAX_HEIGHT (0.5 + 0.5 math.sin(dist - 0)))

bpy.ops.mesh.primitive_cube_add(size=1.0, location=(x, y, 0)) obj = bpy.context.active_object obj.name = f"WaveCube_{row}_{col}" obj.scale.z = z_scale wave_coll.objects.link(obj) bpy.context.scene.collection.objects.unlink(obj)

print(f"WaveGrid built: {GRID*GRID} cubes in collection 'WaveGrid'") \\\`

Result: WaveGrid built: 100 cubes in collection 'WaveGrid'

Your wave scene is ready. You should see a 10×10 grid with cubes rising and falling in concentric rings from the centre.`} />


Walkthrough: Multi-tool pipelines with Adobe for creativity

Adobe for creativity is the broadest integration in Anthropic's creative connector lineup: a broad collection of tools across photo editing, video production, vector illustration, web design, and stock media [source]. Unlike the Blender connector, the Adobe for creativity connector wraps a collection of heterogeneous application APIs under one MCP interface.

What requires a paid plan

FlowMinimum plan
Adobe for creativity Libraries asset management (read/write)Adobe for creativity Individual (paid)
Photoshop, Lightroom, Illustrator Smart APIsAdobe for creativity Individual (paid)
Adobe Express design generation + exportFree (Adobe Express free tier)
Adobe Firefly image generation via APIFree trial credits

Multi-tool flow: locate → edit → export

Each step here is a separate tool call; Claude sequences them automatically based on context.

Try this · claude-sonnet-4-6

Find 'hero-banner.png' in my Adobe for creativity Libraries, apply a 2px Gaussian Blur in Photoshop (use a Smart Object), and export it as a web-optimised JPEG.

[tool_call: adobe_creative_cloud.open_in_photoshop] { "asset_id": "asset_123" }

[tool_call: adobe_creative_cloud.photoshop_action] { "action": "convert_to_smart_object" }

[tool_call: adobe_creative_cloud.photoshop_action] { "action": "apply_smart_filter", "filter": "Gaussian Blur", "params": { "radius": 2 } }

[tool_call: adobe_creative_cloud.export_asset] { "format": "jpeg", "quality": 80 }

Done. Your blurred banner has been exported.`} />


Exploring the other creative connectors

While Blender and Adobe for creativity are the flagship integrations, Anthropic launched a total of nine connectors targeting different creative domains.

Ableton Live: Automating the manual

The Ableton connector focuses on documentation and session management. It allows Claude to read session metadata (track names, clips, device chains) and automate manual tasks like renaming 100+ stems or generating "track notes" from the actual MIDI/Audio data.

  • Pattern: Read session → Analyze MIDI → Generate documentation/labels.
  • Key benefit: Eliminates the "janitorial" work of professional music production.

Affinity by Canva: Automating production

The Affinity connector brings Model Context Protocol to professional design apps like Affinity Designer, Photo, and Publisher. Unlike the general Adobe for creativity connector, the Affinity integration focuses on deep layer manipulation and batch processing.

  • Pattern: Select layers → Apply batch adjustment → Export to Canva.
  • Key benefit: Automates the "final 10%" of production work, such as renaming hundreds of layers or adjusting export settings across a multi-page document.

Autodesk Fusion & SketchUp: CAD for agents

For industrial designers and architects, the Fusion and SketchUp connectors provide a bridge to 3D modelling. Like Blender, these connectors often surface a Python-like command interface, allowing Claude to build complex geometric structures from mathematical descriptions.

Resolume & Splice: Performance and Samples

  • Resolume: Enables live visual performance automation. Claude can trigger clips or adjust effects based on a real-time event log (e.g., "Change the visual intensity when the BPM exceeds 140").
  • Splice: Allows Claude to search your local and cloud sample libraries. "Find me all 120bpm techno kicks with a high transient" becomes a tool call instead of a 10-minute manual scroll.

Which connector should you use?

- Integration depth ranges from high (Blender's full Python API, Autodesk's command API) to medium (Adobe's multi-app pipeline, Ableton's metadata layer) — match the connector to the task complexity.
- Ableton and Splice focus on session management and asset discovery rather than direct audio generation, making them useful for documentation and search workflows.
- Resolume connectors target live performance automation, not post-production — clip and effect triggering is the primary use case.
DomainToolIntegration DepthBest for...
3D / VFXBlenderHigh (Python API)Generative scenes, proceduralism
DesignAdobe for creativityMedium (Multi-app)Pipelines, batch editing, library management
DesignAffinity by CanvaMedium (Layer API)Vector/raster design, precision illustration
CADAutodesk FusionHigh (Command API)Precision modelling, engineering
ArchitectureSketchUpMedium (Command API)Architectural modelling, space planning
AudioAbleton LiveMedium (Metadata)Stem management, documentation
SamplesSpliceMedium (Search)Asset discovery
Live VisualResolume ArenaMedium (Clip API)Live VJ performance, clip triggering
Live VisualResolume WireMedium (Patch API)Generative real-time visuals, effect patching

Sidebar: Design for resilience from day one

Building for resilience requires understanding the course/picking-a-frontier-model-2026-q2/01-dimensions-that-matter that ensure your pipeline stays live during provider volatility.

<Callout type="warning"> Two incidents in the same week. On April 30 (UTC), 2026, Claude.ai experienced a full availability outage [source]. The same week, a billing routing bug in Claude Code (the "HERMES.md incident") highlighted the risks of single-provider dependency [source].

Lesson: Claude is an excellent first-choice model, but no single provider has 100% uptime. If your tool-use pipeline depends entirely on one provider, one outage takes your whole workflow offline. </Callout>

The failover pattern

Route your tool-use calls through a provider-agnostic fallback chain. The Vercel AI SDK has no built-in fallbackModels option — fallback must be implemented explicitly, either at the gateway layer (OpenRouter's route configuration) or in your own code:

```typescript import { createAnthropic } from "@ai-sdk/anthropic"; import { createOpenAI } from "@ai-sdk/openai"; import { generateText } from "ai"; import type { Tool } from "ai";

const providers = [ createAnthropic()("claude-sonnet-4-6"), createOpenAI()("gpt-4o"), ];

async function resilientToolCall( tools: Record<string, Tool>, prompt: string, ) { for (let i = 0; i < providers.length; i++) { try { return await generateText({ model: providers[i], tools, prompt }); } catch (err) { if (i === providers.length - 1) throw err; console.warn(Provider ${i} failed, trying fallback:, err); } } } ```

When Claude is unavailable, the loop retries on the next provider. Your tool definitions work unchanged across providers because they are MCP-standard. For server hardening tips, see course/production-agents-claude-agent-sdk-mcp-connector/05-production-deploy-observability.


Hands-on exercise

Goal: Build a product-launch kit using the Adobe for creativity connector.

  1. Find the three most recently added assets in your Adobe for creativity Libraries.
  2. For each image, export a half-size JPEG at 75% quality.
  3. Save all exports to a new library named "Launch Kit — [today's date]".
  4. Return a summary table of the file size reductions.

Ship your creative pipeline: Identify, connect, and automate

The creative connectors represent more than just new features; they are a blueprint for agentic tool-use.

Instead of just observing your scene state, use Claude to actively generate, animate, and export assets. By using consistent naming conventions (like 'WaveCube_') across your tool calls, you build cumulative context that allows Claude to target specific objects and layers precisely. Finally, remember to prioritize resilience by routing your MCP pipelines through gateways like OpenRouter to maintain uptime during provider outages.

By mastering these nine connectors, you move from simple prompt engineering to architecting a fully automated, cross-platform digital production pipeline.


References

[1] Anthropic — Claude for Creative Work — https://www.anthropic.com/news/claude-for-creative-work · retrieved 2026-04-30 [2] Model Context Protocol Spec — https://modelcontextprotocol.io/ · retrieved 2026-04-30 [3] Blender Python API Overview — https://raw.githubusercontent.com/blender/blender/main/doc/python_api/rst/info_overview.rst · retrieved 2026-04-30 [4] Hacker News — Claude.ai and API unavailable [fixed] (outage discussion) — https://news.ycombinator.com/item?id=47956895 · retrieved 2026-04-30 [5] Claude Status — April 2026 outage — https://status.claude.com/incidents/2gf1jpyty350 · retrieved 2026-04-30 [6] Hacker News — Claude Code HERMES.md billing bug — https://news.ycombinator.com/item?id=47952722 · retrieved 2026-04-30

Chapter 9 · 60 min

SMB and Growth Connectors

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

Small businesses do not need a beautiful generic "tool calling" demo. They need reliable help across messy workflows: payments, invoices, CRM updates, documents, customer emails, and approvals. This final chapter turns the patterns from the course into an SMB connector design you can actually ship.

The outline for this course names a Payroll Assistant: reconcile PayPal settlements against a QuickBooks-style ledger and draft reminder emails for missing payments. We will use that as the capstone bridge. The exact vendor APIs may differ in your environment, but the connector architecture should not.

Prerequisites check

You should have completed Chapters 1-8. In particular, you need:

  • Native Claude tool-use flow from Chapter 1.
  • MCP tools/resources/prompts from Chapters 2-4.
  • Structured logs from Chapter 5.
  • Authorization and approval gates from Chapter 6.
  • Domain-specific connector thinking from Chapters 7 and 8.

If any of those are missing, do not build the payroll workflow yet. Finance connectors amplify every weak spot.

The SMB workflow stack

An SMB connector rarely talks to one system. A real workflow may touch:

  • Finance ledger: invoices, payments, payouts, fees.
  • Payment processor: settlements, disputes, transaction IDs.
  • CRM: customer owner, deal stage, account notes.
  • Email or workspace: reminders, internal review threads, attachments.
  • Document system: statements, contracts, receipts.
- An SMB connector typically spans multiple systems; the design job is to present a coherent set of business actions rather than one raw endpoint per vendor.
- Replacing vendor-specific primitives (quickbooks_query, paypal_get) with workflow-level tools (find_unmatched_settlements, draft_missing_payment_reminders) makes approval and logging tractable.
- Finance connectors specifically benefit from boring, narrow interfaces — the more sensitive the action, the less ambiguity the tool should allow.

The connector should present these as business actions, not raw vendor endpoints.

Bad:

quickbooks_query(sql)
paypal_get(path)
hubspot_patch(object)
gmail_send(raw)

Better:

find_unmatched_settlements(date_range)
match_payment_to_invoice(settlement_id, invoice_id)
draft_missing_payment_reminders(run_id)
request_approval_for_reminders(run_id)

The better tools describe the accounting workflow. They also make approval and logging tractable.

<Callout type="hot"> Finance connectors need boring interfaces. The more sensitive the action, the less clever the tool should be. Make each operation narrow, typed, logged, and reviewable. </Callout>

Split reconciliation from action

The Payroll Assistant should not combine "find mismatches" and "email customers" into one call. Use stages:

  1. Collect ledger entries for the period.
  2. Collect payment settlements for the same period.
  3. Match by amount, date window, payer, and reference.
  4. Produce exceptions.
  5. Draft reminders for exceptions.
  6. Request human approval.
  7. Send only after approval.
- Combining reconciliation and customer-facing send into one tool call makes approval impossible and audit trail ambiguous.
- Each stage carries different risk: matching is read-heavy, drafting is reversible, sending is external — these belong in separate tools.
- Splitting stages also means a failure in matching does not roll back a send that already happened.

Each stage has different risk. Matching is read-heavy. Drafting is reversible. Sending is external and customer-facing.

Try this · claude-sonnet-4-6

Design MCP tools for a Payroll Assistant that reconciles PayPal settlements against an invoice ledger. Keep reconciliation separate from customer-facing email.

  • list_invoice_ledger(date_range): read ledger entries.
  • list_payment_settlements(date_range): read settlement records.
  • propose_settlement_matches(date_range): returns matched and unmatched records.
  • draft_missing_payment_reminders(reconciliation_run_id): drafts reminder emails.
  • request_reminder_send_approval(reconciliation_run_id): creates an approval package.
  • send_approved_reminders(approval_id): sends only after approval.

This separates read, draft, approval, and send phases.`} />

Approval state

Use an explicit awaiting_approval state for sensitive actions. Do not hide it in prose.

{
  "status": "awaiting_approval",
  "workflow": "payroll_reminders",
  "run_id": "payroll_2026_05_14",
  "summary": {
    "unmatched_settlements": 4,
    "drafted_reminders": 3,
    "total_amount": "USD 1840.00"
  },
  "approval_required_for": "send_customer_emails"
}
- An explicit `awaiting_approval` status in the tool result gives the host application a machine-readable state it can enforce, not just prose Claude can explain away.
- The approval object should include run_id, a quantitative summary, and what specifically requires approval — enough for a human to decide without seeing full private data.
- The host must gate the next write action on a confirmed approval event; Claude presenting the draft is not an approval.

This object gives Claude something clear to explain and gives the host application a state machine it can enforce.

CRM coordination

CRM updates are also write actions. A safe connector can draft proposed updates:

  • "Move deal to Payment follow-up."
  • "Add note: settlement missing for invoice INV-123."
  • "Assign owner Alex because account owner is Alex."
- CRM updates that trigger automated sequences are effectively external actions and require the same approval gate as customer-facing emails.
- The safe pattern is propose_crm_updates → human approval → apply_approved_crm_updates, not a direct write after finding a mismatch.
- Whether CRM notes are low-risk depends on whether they trigger downstream automations — risk must be assessed at the workflow level, not just the field level.

But the connector should not silently update every customer record. If your organization treats CRM notes as low-risk, you may allow writes with role-based authorization. If CRM updates trigger automations, treat them like external actions and require approval.

Try this · claude-sonnet-4-6

A CRM update will trigger an automated customer success sequence. Should the Payroll Assistant write the CRM stage directly after finding a missing payment? Explain the safer connector design.

Safer design: - propose_crm_updates(reconciliation_run_id) - show affected accounts and stage changes - require human approval - apply_approved_crm_updates(approval_id)

This keeps automation behind an explicit approval gate.`} />

Capstone architecture

Your final connector should expose:

Resources:

  • payroll://runs/{run_id}/summary
  • payroll://runs/{run_id}/exceptions
  • payroll://policies/reminder-template

Tools:

  • start_reconciliation(date_range)
  • propose_settlement_matches(run_id)
  • draft_missing_payment_reminders(run_id)
  • request_send_approval(run_id)
  • send_approved_reminders(approval_id)

Prompts:

  • explain_reconciliation_summary
  • review_reminder_drafts

Logs:

  • operational events for every tool call.
  • audit events for approval creation and sending.

Hands-on exercise

Build the Payroll Assistant design document and one runnable stub tool.

Success criteria:

  • You define resources, tools, prompts, logs, and audit events.
  • You implement a stub propose_settlement_matches(run_id) tool against demo JSON data.
  • The tool returns matched records and exceptions.
  • Customer-facing email remains a draft until an approval object is created.
  • You can explain how this connector satisfies the course outcomes.

What's next

This is the final chapter. Your capstone is to turn the Payroll Assistant or another domain connector into a production-ready MCP server with narrow tools, resources, structured logs, authorization, and approval gates.

[1]: Anthropic news index, https://www.anthropic.com/news [2]: Anthropic, "Tool use with Claude", https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview [3]: Model Context Protocol documentation, https://modelcontextprotocol.io/

References

  1. https://www.anthropic.com/news
  2. https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview
  3. https://modelcontextprotocol.io/
Chapter 10 · 65 min

Claude Code Dynamic Workflows: Fan-Out, Checkpoint, and Verify (2026)

Slide deck · PDF
Open in new tab
Listen · deep-dive podcast
Download slides (.pptx) Open deck preview

Claude Code dynamic workflows turn a single orchestrator instance into a parallel execution engine: Claude writes and runs an orchestration script that fans work out to tens or hundreds of sub-agents, checkpoints progress between stages, and verifies sub-agent results before handing anything back. This is not a cosmetic upgrade to the prompt interface — it is a different execution model that changes how you think about cost, correctness, and control.

This chapter teaches you to design fan-out patterns, implement durable checkpoints, and build result-verification logic that earns trust. It also teaches you when the overhead is not worth it.

Prerequisites check

Before continuing, confirm that you:

  1. Can run a basic Claude tool-use script from Chapter 1 without errors.
  2. Understand the MCP protocol from Chapters 2 and 3 — sub-agents often expose tools via MCP.
  3. Have working structured logging from Chapter 5, because orchestrator-level debugging without logs is impractical.
  4. Understand authorization gates from Chapter 6 — parallel agents amplify every authorization weakness.

If the Chapter 5 logging setup is not in place, you will not be able to tell which sub-agent produced a bad result. Fix that first.

Static chains vs. dynamic workflows

The tool-use pattern you learned in Chapters 1 through 9 is a static chain: one Claude instance, one conversation thread, tools called one at a time, results incorporated sequentially.

User prompt → Claude decides → Tool A → Claude decides → Tool B → Final answer

A static chain is correct for most tasks. Claude reads the last tool result before calling the next tool, so it can adapt. The price is latency: each tool call is a synchronous round trip through the model, and you can only run one at a time.

- A static chain is sequential and adaptive but limited to one tool call at a time, making it slow for large-batch workloads.
- Dynamic workflows replace the sequential conversation loop with an orchestration script that spawns multiple sub-agents in parallel.
- Wall-clock time compresses with dynamic workflows, but token spend multiplies — each sub-agent is a full Claude invocation.

Dynamic workflows break that constraint. The orchestrator Claude instance does not call tools sequentially from inside a conversation loop. Instead, it writes an orchestration script — executable code that the host environment runs — and that script spawns multiple sub-agent instances in parallel.[1]

Orchestrator → generates script
Script → spawns Agent A, Agent B, Agent C in parallel
Agents A, B, C run concurrently → write results to checkpoint store
Script → reads checkpoint store → verifies results → assembles final output

The practical effect: a codebase analysis that would take 20 sequential tool calls can instead fan out to 20 sub-agents reading 20 files simultaneously. Wall-clock time compresses, but token spend multiplies.

The orchestrator model

The orchestrator is a Claude Code instance configured to produce and run multi-agent scripts rather than to answer a single user query. Anthropic's Claude Code SDK supports this pattern via the --output-format and subprocess orchestration APIs.[3]

The orchestrator is responsible for four things:

  1. Decomposing the task into parallel units of work.
  2. Spawning sub-agents with scoped context (not the full conversation history).
  3. Checkpointing each sub-agent's output to a persistent store.
  4. Verifying that the checkpoint store is complete and correct before proceeding.
- Sub-agents receive only the context the orchestrator explicitly provides — they do not share memory with each other or with the orchestrator's conversation thread.
- Sub-agent isolation prevents one agent's error state from contaminating another's result, but also means dense sequential dependencies cannot be parallelized.
- The orchestrator's four responsibilities (decompose, spawn, checkpoint, verify) must all be present; missing one makes the others harder to operate.

Sub-agents are isolated. They receive only the context the orchestrator hands them — a slice of the task, any tools they need, and output format instructions. They do not share memory with each other or with the orchestrator's conversation thread.

This isolation is a feature, not a limitation. It prevents one sub-agent's error state from contaminating another's result. The downside: if your task has dense sequential dependencies (the output of step 3 is required input to step 4), dynamic workflows do not help and may hurt.

Fan-out patterns

There are two structurally different fan-out shapes.

- Homogeneous fan-out applies the same task structure to different inputs; results are structurally identical and easy to aggregate.
- Heterogeneous fan-out assigns genuinely independent analysis dimensions to different sub-agents; results must be merged by the orchestrator.
- Use sequential chaining inside the orchestrator for steps with dense dependencies — forcing them into a heterogeneous fan-out adds complexity without parallelism benefit.

Homogeneous fan-out

All sub-agents receive the same task structure applied to different inputs. Example: analyze 50 code files for security issues. Each sub-agent gets one file. Results are structurally identical.

```python import subprocess, json, pathlib, concurrent.futures

def analyze_file(file_path: str) -> dict: result = subprocess.run( ["claude", "-p", f"Analyze this Python file for security issues. Return JSON: {{issues: [...], severity: 'low'|'medium'|'high'}}", "--input-file", file_path, "--output-format", "json"], capture_output=True, text=True, timeout=120 ) return {"file": file_path, "result": json.loads(result.stdout)}

files = list(pathlib.Path("src/").glob("*/.py"))

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as pool: futures = {pool.submit(analyze_file, str(f)): f for f in files} results = [f.result() for f in concurrent.futures.as_completed(futures)] ```

The orchestrator script launches 20 parallel claude -p processes (the practical thread limit for a Max account without hitting rate ceilings). Each sub-agent returns structured JSON. The orchestrator collects and checkpoints all results before doing anything else.

Heterogeneous fan-out

Sub-agents receive different tasks that produce inputs for a downstream aggregation step. Example: one sub-agent audits for security, another for performance, a third for test coverage. Results are structurally different and must be merged by the orchestrator.

```python AGENTS = { "security": "Audit for OWASP Top 10 issues. Return {findings: [...]}", "perf": "Identify hot paths and O(n²) loops. Return {findings: [...]}", "coverage": "List untested public functions. Return {findings: [...]}", }

def run_agent(role: str, prompt: str, target: str) -> dict: result = subprocess.run( ["claude", "-p", prompt, "--input-file", target, "--output-format", "json"], capture_output=True, text=True, timeout=180 ) return {"role": role, "result": json.loads(result.stdout)}

with concurrent.futures.ThreadPoolExecutor() as pool: futures = [ pool.submit(run_agent, role, prompt, "src/payments.py") for role, prompt in AGENTS.items() ] specialist_results = [f.result() for f in concurrent.futures.as_completed(futures)] ```

Heterogeneous fan-out is useful when the analysis dimensions are genuinely independent — security review has nothing to say about test coverage. When they are not independent (e.g., you need the security result to scope the performance review), use sequential chaining inside the orchestrator script, not a heterogeneous fan-out.

Try this · claude-sonnet-4-6

I want to audit a Python monorepo with 80 files. Some files need security analysis, some need performance review, and some need both. Design a fan-out orchestration strategy. Consider: how do I route …

  1. Classification pass (single agent, sequential)
  2. Run one fast classification agent first: read file names + top-level imports, return a JSON map of {file_path: ["security", "perf"]} tags. Cost: ~1 agent × 80 short reads. Time: ~30s.
  1. Parallel audit agents (fan-out)
  2. From the classification map, build two lists: security_files, perf_files. Files needing both appear in both lists. Spawn two independent thread pools: one for security agents, one for perf agents. Max 20 workers each. Each agent receives one file and its assigned role.
  1. Checkpoint store shape
  2. {
  3. "run_id": "audit-2026-05-31-a1b2",
  4. "files": {
  5. "src/payments.py": {
  6. "security": { "status": "done", "findings": [...] },
  7. "perf": { "status": "done", "findings": [...] }
  8. },
  9. "src/models.py": {
  10. "security": { "status": "done", "findings": [...] },
  11. "perf": { "status": "pending" }
  12. }
  13. }
  14. }
  1. Deduplication
  2. Track each (file, role) pair. If the run crashes and you resume, skip pairs where status == "done". This is idempotent fan-out.`}
  3. />

Token budget math

Dynamic workflows multiply token spend. Every sub-agent is a full Claude invocation. A homogeneous fan-out of 50 sub-agents costs 50× the tokens of running one.

- Token spend scales linearly with sub-agent count: a 50-agent fan-out costs 50× a single invocation, and Claude Code plan weekly limits apply to the total.
- Log `input_tokens` and `output_tokens` from every sub-agent and aggregate them in the checkpoint store to create a cost record per orchestration run.
- Measure token spend on a small pilot run before scheduling a recurring large fan-out; cost accumulation is invisible without explicit tracking.

Rule of thumb for estimating cost before you build:

Total tokens ≈ (n_subagents × avg_context_per_agent) + orchestrator_tokens

For a code analysis run with 50 agents, each receiving a 2,000-token file plus a 500-token prompt:

50 × 2,500 = 125,000 input tokens
50 × 1,000 = 50,000 output tokens (estimated)
Orchestrator: ~5,000 tokens
Total: ~180,000 tokens

At Sonnet 4.6 pricing (as of 2026), this is a fraction of a dollar — but with Claude Code plans that enforce weekly [[token]] limits, burning 180,000 tokens on one orchestration run can noticeably dent your weekly quota.[5]

Community posts from May 2026 consistently flag this: users who turned on dynamic workflows for large fan-outs saw their weekly credit reset events become meaningful rather than routine.[2] The problem is not the per-run cost but the invisibility — if you do not log total tokens per orchestration run, you will not see the accumulation until the quota wall hits.

Mitigation: log input_tokens and output_tokens from every sub-agent's response. Aggregate them at orchestrator level. Write the total to your checkpoint store. You then have a cost record per run that persists across failures.

def run_agent_with_cost_tracking(role: str, prompt: str, file: str) -> dict:
    result = subprocess.run(
        ["claude", "-p", prompt,
         "--input-file", file,
         "--output-format", "json",
         "--verbose"],   # prints usage to stderr
        capture_output=True, text=True, timeout=180
    )
    payload = json.loads(result.stdout)
    # claude --verbose emits usage JSON on stderr
    try:
        usage = json.loads(result.stderr.splitlines()[-1])
    except Exception:
        usage = {}
    return {
        "role": role,
        "file": file,
        "result": payload,
        "tokens_in": usage.get("input_tokens", 0),
        "tokens_out": usage.get("output_tokens", 0),
    }

<Callout type="warning"> Weekly Claude Code plan limits apply per-account, not per-script. A dynamic workflow that fans out to 100 sub-agents burns plan quota as if a human ran 100 manual Claude Code tasks. On Pro, that can exhaust a week's allowance in a few runs. On Max, limits are higher but still finite. Before you schedule a recurring orchestration job, measure its token cost on one real run and multiply by run frequency. </Callout>

Checkpoint patterns

A [[checkpoint]] is a durable write of an intermediate result that the orchestration script can read on restart. Without checkpoints, a partial failure means re-running all sub-agents from scratch — wasting quota and time.

The minimal checkpoint pattern:

```python import json, pathlib, hashlib, time

CHECKPOINT_DIR = pathlib.Path(".checkpoints") CHECKPOINT_DIR.mkdir(exist_ok=True)

def checkpoint_key(run_id: str, file: str, role: str) -> str: content = f"{run_id}:{file}:{role}" return hashlib.sha256(content.encode()).hexdigest()[:16]

def write_checkpoint(run_id: str, file: str, role: str, result: dict): key = checkpoint_key(run_id, file, role) path = CHECKPOINT_DIR / f"{key}.json" path.write_text(json.dumps({ "run_id": run_id, "file": file, "role": role, "ts": time.time(), "result": result }))

def read_checkpoint(run_id: str, file: str, role: str) -> dict | None: key = checkpoint_key(run_id, file, role) path = CHECKPOINT_DIR / f"{key}.json" if path.exists(): return json.loads(path.read_text()) return None

def run_agent_checkpointed(run_id: str, role: str, prompt: str, file: str) -> dict: cached = read_checkpoint(run_id, file, role) if cached: return cached["result"] # skip sub-agent if already done result = run_agent_with_cost_tracking(role, prompt, file) write_checkpoint(run_id, file, role, result) return result ```

With this pattern, you can kill the orchestration script mid-run, fix a broken agent prompt, and restart. Only the incomplete (file, role) pairs re-run. Completed pairs read from disk in milliseconds.

Checkpoint granularity

Write checkpoints per unit of work (per file, per document, per API call), not per stage. A coarser checkpoint — "stage 2 is done" — forces a full stage re-run on failure. A finer checkpoint — "file X, role Y is done" — lets you resume at the exact failure point.

The cost of fine-grained checkpoints is small JSON files and slightly more I/O. The cost of coarse checkpoints is duplicated sub-agent invocations. Fine-grained is almost always the right choice.

Verification before handoff

A checkpoint stores what a sub-agent said. A [[verification]] step decides whether to trust it.

Without verification, a silent sub-agent failure — malformed JSON, hallucinated schema field, empty output, timeout — propagates into the final result. The orchestrator assembles garbage and the downstream consumer sees it.

Verification should happen before the orchestrator commits to using a checkpoint result in the final assembly.

```python from typing import TypedDict

class SecurityFinding(TypedDict): file: str line: int severity: str description: str

def verify_security_result(result: dict) -> tuple[bool, str]: """Returns (is_valid, reason). Reason is empty string on success.""" if "findings" not in result: return False, "missing 'findings' key" if not isinstance(result["findings"], list): return False, "'findings' must be a list" for i, f in enumerate(result["findings"]): if not isinstance(f.get("severity"), str): return False, f"finding[{i}] missing string 'severity'" if f["severity"] not in ("low", "medium", "high", "critical"): return False, f"finding[{i}] has invalid severity: {f['severity']}" return True, ""

def assemble_final_report(run_id: str, files: list[str]) -> dict: verified, rejected = [], [] for file in files: cp = read_checkpoint(run_id, file, "security") if cp is None: rejected.append({"file": file, "reason": "no checkpoint"}) continue ok, reason = verify_security_result(cp["result"]) if ok: verified.append({"file": file, "findings": cp["result"]["findings"]}) else: rejected.append({"file": file, "reason": reason})

if rejected: # surface failures explicitly rather than silently omitting them print(f"WARNING: {len(rejected)} files rejected during verification:") for r in rejected: print(f" {r['file']}: {r['reason']}")

return {"verified": verified, "rejected": rejected, "run_id": run_id} ```

The orchestrator should expose the rejected list to the caller. Silent omission is the worst outcome — the downstream system thinks all 80 files were analyzed when 5 were silently dropped.

Try this · claude-sonnet-4-6

An orchestration script fanned out to 20 sub-agents. Two returned empty output (timeout), one returned valid JSON with an unexpected extra field 'confidence_score' not in the agreed schema, and one re…

Empty output (2 agents — timeouts) Mark these as status: "failed", reason: "empty output / timeout". Do NOT silently drop them. Log which files were skipped. Depending on tolerance: either re-queue these specific files with a longer timeout, or surface them in the final report as "unanalyzed".

Extra schema field ('confidence_score') This is a schema evolution case, not a failure. If the field is not required by your contract, accept the result and strip or preserve the extra field based on policy. Do not reject valid outputs for forward-compatible additions. Log the unexpected field for schema tracking.

Python traceback instead of JSON This is a hard failure. The sub-agent crashed before producing output. Mark status: "failed", reason: "agent error — non-JSON output". Log the traceback for debugging. Do not attempt to parse it as a finding.

Orchestrator report Always include a summary: { "total_files": 20, "verified": 17, "failed": { "timeout": 2, "agent_error": 1 }, "warnings": { "schema_drift": 1 } }

Never present 17/20 as "complete". The 3 failures are load-bearing gaps in coverage.`} />

Rollback patterns

Not all dynamic workflows are read-only. Some fan-out to sub-agents that write to databases, call APIs, or modify files. When a verification step fails after some sub-agents have already committed writes, you need rollback.

The simplest rollback pattern is two-phase execution:

  1. Propose phase — sub-agents generate proposed changes, write them to the checkpoint store as diffs or new-state blobs. No actual writes to production systems.
  2. Commit phase — the orchestrator runs verification on all proposals. If all verify, it applies them. If any fail, it discards the entire set and surfaces the failures.

```python def execute_two_phase(run_id: str, tasks: list[dict]) -> dict: # Phase 1: propose proposals = {} for task in tasks: proposal = run_propose_agent(run_id, task) # writes to checkpoint, not to DB ok, reason = verify_proposal(proposal) if not ok: return {"status": "aborted", "reason": reason, "task": task} proposals[task["id"]] = proposal

return {"status": "committed", "count": len(committed)} ```

If you cannot implement two-phase (e.g., external API calls have no transactional rollback), design the fan-out so that each sub-agent is idempotent: running it twice has the same effect as running it once. Idempotency is the poor-person's rollback when real rollback is unavailable.

When NOT to use dynamic workflows

Dynamic workflows are not the default for tool use. They are a specialized pattern that adds complexity, cost, and operational overhead. Use them only when the task justifies it.

Do not use dynamic workflows when:

  1. The task fits in one prompt. If Claude can produce the correct result in a single turn with one or two tool calls, fan-out adds nothing except tokens and latency.
  1. Sequential dependencies prevent meaningful parallelism. If step 3 depends on step 2's output, you cannot run them in parallel. Sequential chaining inside one Claude instance is simpler.
  1. You need human approval at an undetermined mid-flight point. Dynamic workflows are designed for machine-to-machine completion. If a human must review and approve at a step that only becomes identifiable during execution, an interactive agent loop is the right tool, not an orchestration script.
  1. Latency matters more than throughput. Spawning sub-agents has overhead: process startup, context injection, output parsing, checkpoint I/O. For a task with five small sequential steps, that overhead is more than the parallelism saves.
  1. The token cost is not justified. If your job takes 10 minutes sequentially and 2 minutes with 50 sub-agents, but the sequential path costs 5,000 tokens and the fan-out costs 250,000 tokens, you are spending 50× to save 8 minutes. That trade-off only makes sense at scale or when wall-clock time has a direct dollar value to you.

Practical fan-out limits by plan tier

As of June 2026, the practical fan-out ceiling is determined by weekly token limits, not a hard API concurrency cap.[2][5]

PlanPractical max concurrent sub-agentsNotes
Claude Pro5–10Weekly credit resets quickly at high fan-out
Claude Max (×5)20–40Comfortable for mid-size batch jobs
Claude Max (×20)80–150Viable for large-scale orchestration
API (pay-as-you-go)Rate-limit dependentNo weekly cap; cost is per-token

These are community-reported heuristics, not Anthropic's official numbers.[1] Your optimal concurrency depends on your specific prompts and context sizes. Always measure token spend on a small pilot run before committing to a large fan-out.

For the API route (non-plan, pay-as-you-go), weekly limits do not apply but per-minute rate limits do. Throttle sub-agent launch rates with exponential backoff if you see 429s.

Summary of the dynamic workflow stack

A production dynamic workflow has five layers:

┌─────────────────────────────────────────────────────────┐
│ 1. Orchestrator   Decomposes task → spawns sub-agents   │
├─────────────────────────────────────────────────────────┤
│ 2. Sub-agents     Isolated Claude -p invocations        │
│                   Each receives scoped context + tools  │
├─────────────────────────────────────────────────────────┤
│ 3. Checkpoint store  JSON files per (run_id, unit, role)│
│                      Enables idempotent resume          │
├─────────────────────────────────────────────────────────┤
│ 4. Verification   Schema + value checks before assembly │
│                   Rejects or flags incomplete results   │
├─────────────────────────────────────────────────────────┤
│ 5. Cost tracking  input_tokens + output_tokens per agent│
│                   Aggregated per run in checkpoint store│
└─────────────────────────────────────────────────────────┘

Missing any layer makes the others harder to operate. Verification without checkpoints means you cannot recover from partial failures. Checkpoints without cost tracking means you cannot explain the quota bill.

Try this yourself

Hands-on exercise: build a checkpointed code reviewer

Build an orchestration script that:

  1. Accepts a directory path as input.
  2. Lists all .py files in the directory.
  3. Fans out to one claude -p sub-agent per file, asking each to return {file, issues: [{line, severity, message}]} as JSON.
  4. Writes each result to a checkpoint store keyed by (run_id, file_path).
  5. Verifies each result (checks that issues is a list, each issue has line as int and severity in ["low","medium","high","critical"]).
  6. Prints a summary: total files, verified, rejected (with reasons), total input + output tokens.

Success criteria: - Run the script on a 10-file sample directory. Kill it after 5 files complete. Restart it. Confirm that only the 5 incomplete files run again (not the already-complete 5). - Manually corrupt one checkpoint file (remove the issues key). Confirm the script flags it as a verification failure, not a crash. - Check the token totals in the summary against what you expect from your plan's usage dashboard.

Stretch goal: add a --dry-run flag that reads checkpoints and prints the summary without spawning any sub-agents.

What's next

You have now covered the full tool-use stack from basic function calling (Chapter 1) through production connector patterns (Chapters 7–9) and orchestration (this chapter). The Capstone Project applies everything: build a production-ready MCP Agentic Connector that bridges a secure domain system to Claude, includes structured logs for every tool call, authorization per tool, a compliance audit trail, and optionally a dynamic workflow layer for batch operations.


[1]: r/ClaudeAI thread on Claude Code dynamic workflows, 2026-05-29 — https://www.reddit.com/r/ClaudeAI/comments/1tq9ofy/introducing_dynamic_workflows_in_claude_code/ [2]: r/ClaudeAI thread on Claude Code credits and weekly limits, 2026-05-29 — https://www.reddit.com/r/ClaudeAI/comments/1tq9vqf/claude_code_credits_rebooted_after_coding_for/ [3]: Anthropic Claude Code SDK documentation — https://docs.anthropic.com/en/docs/claude-code/sdk [4]: Anthropic support: Claude Agent SDK with Claude plan — https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan [5]: Anthropic Claude Code settings documentation — https://docs.anthropic.com/en/docs/claude-code/settings

References

  1. https://www.reddit.com/r/ClaudeAI/comments/1tq9ofy/introducing_dynamic_workflows_in_claude_code/
  2. https://www.reddit.com/r/ClaudeAI/comments/1tq9vqf/claude_code_credits_rebooted_after_coding_for/
  3. https://docs.anthropic.com/en/docs/claude-code/sdk
  4. https://docs.anthropic.com/en/docs/claude-code/settings
  5. https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan
  6. https://docs.anthropic.com/en/docs/about-claude/models/overview