Is Cloudflare Agents a good production architecture for AI agents?

Yes, when the agent is event-driven, stateful, WebSocket-heavy, or cost-sensitive at idle. It is weaker for sustained CPU work, large local dependencies, or workloads that require a conventional always-on process.

What should own state in a Cloudflare Agents deployment?

The Durable Object should own short-lived coordination state, user/session state, scheduling, and WebSocket continuity. Large blobs belong in R2, retrieval memory in Vectorize, and long-running steps in Workflows or Project Think primitives.

Where does observability belong?

Put model traffic behind AI Gateway for request logging, analytics, caching, rate limiting, retries, and fallback; add application-level task and tenant metadata from the Worker or Agent.

How to Architect Cloudflare Agents on Workers Around Durable Objects (2026 Production Guide)

To architect a production Cloudflare Agents deployment: define each Agent as a Durable Object that owns identity, session state, tool routing, WebSocket continuity, and scheduling — then delegate everything else. Put blobs in R2, retrieval in Vectorize, model calls behind AI Gateway, and long-running steps in Workflows or Project Think. Design WebSockets for hibernation from day one or idle Agents will dominate cost. The platform wins on millions of mostly-idle stateful coordinators and loses on sustained CPU. Cloudflare Agents is a production fit when you need stateful, event-driven AI workloads on Workers: durable per-user state, WebSockets, scheduled tasks, tool calls, and model traffic close to the edge. The clean architecture is a Durable Object Agent as the coordinator, AI Gateway for model observability, R2 for large artifacts, Vectorize for retrieval memory, and Workflows or Project Think for long-running steps. Cloudflare's Agents docs describe each Agent as a TypeScript class running on a Durable Object with its own SQL database, WebSockets, and scheduling [1].

The mistake is treating this as cheaper AWS Lambda for AI. A Lambda mental model asks, "How much code can I fit in one invocation?" The Workers + Agents model asks, "Which named stateful object should wake up, coordinate a step, persist just enough, and go back to sleep?" That is a better fit for millions of mostly-idle agents than for one agent doing minutes of CPU-bound work. The architecture win is state placement, not raw compute bravado.

How to Put Coordination in the Agent and Heavy Work Outside It

The Durable Object Agent should own identity, session state, tool-routing decisions, WebSocket continuity, and scheduling. It should not own every blob, embedding, browser action, build job, or long-running research workflow. Cloudflare's current Agents limits list tens of millions of concurrent running Agents per account, 1 GB of state per unique Agent, and 30 seconds of compute time per Agent refreshed by HTTP requests or incoming WebSocket messages [2]. That shape is telling: the platform wants many small stateful coordinators, not a few overloaded workers.

Use the Agent as the control plane. Store a conversation pointer, task status, retry count, approval state, and a compact summary in the Durable Object database. Put documents, generated files, logs, exports, and artifacts in R2. Put semantic retrieval chunks in Vectorize. Route model calls through AI Gateway. Hand durable multi-step work to Workflows or Project Think primitives when a step may outlive a request.

Project Think makes the separation explicit. Cloudflare describes it as primitives for durable execution, sub-agents, persistent sessions, sandboxed code execution, and an execution ladder that ranges from workspace to isolate, npm, browser, and sandbox [3]. That is not a single bigger Worker. It is an admission that production agents need tiered execution surfaces.

Use Durable Objects for cost shape, but design for hibernation

Workers can be unusually attractive for agent workloads because idle agents should not look like idle containers. Durable Objects pricing shows how much that matters. In Cloudflare's own pricing example, 100 Durable Objects that keep WebSockets active all month without hibernation estimate at $416.51/month; a hibernatable WebSocket example with much less active compute estimates at $10.00/month [4]. The difference is not request count. It is duration.

That is the core production budget rule: keep the Agent awake only while it handles events. WebSocket-heavy support agents, tutors, notification bots, and workflow monitors fit because they spend most of their life waiting. A design that keeps a Durable Object busy like a daemon is fighting the platform's economics.

Route model calls through AI Gateway before the first incident

AI Gateway is the production boundary for model calls, not an add-on dashboard. Cloudflare's AI Gateway docs position it as visibility and control for AI apps, with analytics, logging, caching, rate limiting, request retries, and model fallback across providers such as Workers AI, Anthropic, Google Gemini, and OpenAI [5]. That is exactly where agent systems fail first: unknown prompt growth, tool-result bloat, provider errors, model fallback gaps, and tenant-level cost spikes.

Wire the Agent so every model request carries application metadata: tenant_id, agent_id, task_id, conversation_id, tool_name, and workflow_step. The Durable Object can keep the state machine, but AI Gateway should see the LLM traffic. This also keeps provider choice from leaking into the rest of the architecture. A support agent can start with Workers AI, route premium tenants to Anthropic or OpenAI through Gateway, and apply rate limits before a runaway loop becomes a bill.

Do not mistake Gateway logs for full agent observability. You still need application events for tool inputs, approval gates, retries, user-visible state, and failure reasons.

Store retrieval memory in Vectorize and blobs in R2

Do not put your knowledge base in the Agent's local state because it is convenient. Cloudflare Vectorize is the better fit for retrieval memory: the current limits list 10,000,000 vectors per index and 50,000 namespaces per index on Workers Paid [6]. That supports a natural production pattern: namespace by tenant, user, workspace, or agent, then retrieve only the top few chunks for each turn.

R2 is the artifact layer. Its pricing page says there are no egress bandwidth charges for any storage class, with Standard storage priced at $0.015 per GB-month and a monthly free tier of 10 GB-month, 1 million Class A operations, and 10 million Class B operations [7]. That makes it a sensible place for uploaded PDFs, generated reports, screenshots, crawl outputs, transcripts, and sandbox artifacts.

The reliable pattern is:

Upload the source file or generated artifact to R2.
Chunk and embed only the text that should be searchable.
Store vectors and compact metadata in Vectorize.
Store the task pointer and retrieval policy in the Agent.
Return short ranked passages to the model, not raw files.

Use Workflows when correctness needs checkpoints

Any agent step that must survive deploys, crashes, retries, or human waiting belongs outside the single Agent turn. Cloudflare's durable agent guide uses the Agents SDK with Workflows for a repository-research agent and describes real-time progress updates from a durable workflow [8]. That is the right pattern for tasks like crawling a site, evaluating many documents, waiting for human approval, generating reports, or retrying tool calls.

The production rule is simple: if replaying the step would be expensive, harmful, or confusing, checkpoint it. Let the Agent start the workflow, stream progress, and maintain the user-facing state. Let Workflows own the durable step sequence.

Here is the smallest runnable architecture probe: create an Agent project, then test that a request can carry the shape you need before adding model calls.

▶ Interactive prompt cell (full demo on lesson pages)

npx create-cloudflare@latest --template cloudflare/agents-starter cloudflare-agent-pilot
cd cloudflare-agent-pilot
npm install
npm run dev

curl -s https://your-worker.workers.dev/ | head

Expected output:

The dev server responds from the Workers/Agents starter. Your next production check is not "does chat work"; it is whether task_id, tenant_id, model route, and artifact pointers flow through the Agent boundary.

</RunPromptCell>

Choose Workers when agents are event-driven; choose containers when work is continuous

Cloudflare Agents wins when the workload is stateful, bursty, globally accessed, WebSocket-driven, and idle most of the time. It is a strong default for support copilots, learning tutors, notification agents, workflow monitors, MCP tools, and stateful chat over shared business data.

It loses when the workload is a conventional long-running process wearing an agent label. If you need sustained CPU, large native dependencies, GPU control, local daemons, custom networking, long filesystem-heavy jobs, or opaque third-party binaries, a container or VM runtime may be simpler. Cloudflare's own Project Think execution ladder points in the same direction: isolate the coordinator from the heavier execution tier [3].

The 30-day evaluation should be concrete. Build one Durable Object Agent with a stable tenant/session name. Put model calls behind AI Gateway. Store uploaded and generated files in R2. Put retrieval chunks in Vectorize. Move one long-running path into Workflows. Then inspect four numbers: active duration per task, model cost per tenant, retrieval payload size per turn, and failed-step retry rate.

If those numbers look clean, Workers is probably a better agent control plane than a container fleet. If the Agent spends most of its life doing continuous compute or shuffling large artifacts through memory, move the heavy work out before you scale.

A. Put chat state, PDFs, embeddings, model calls, and report generation inside one Durable Object Agent. B. Use a Durable Object Agent for session state and WebSockets, AI Gateway for model routing, R2 for PDFs/reports, Vectorize for retrieval, and Workflows for the multi-minute report job. C. Use Vectorize as the primary database for all user state and skip Durable Objects. D. Keep one always-on Worker polling for every user's jobs so no separate workflow system is needed.

Correct answer: B. The Durable Object coordinates state and live interaction; the surrounding Cloudflare services own model observability, artifacts, retrieval, and durable long-running execution. </KnowledgeCheck>

For Koenig Academy readers, the upgrade path is to stop learning "agent frameworks" as chat wrappers and start learning runtime boundaries: state, model traffic, retrieval, artifacts, and durable execution. Continue with MCP from First Principles to Production: Why JSON-RPC over stdio beat WebSockets + OpenAPI to connect this Cloudflare architecture to tool surfaces, authorization, and production MCP design.

How to Put Coordination in the Agent and Heavy Work Outside It

Use Durable Objects for cost shape, but design for hibernation

Route model calls through AI Gateway before the first incident

Do not mistake Gateway logs for full agent observability. You still need application events for tool inputs, approval gates, retries, user-visible state, and failure reasons.

Store retrieval memory in Vectorize and blobs in R2

The reliable pattern is:

Upload the source file or generated artifact to R2.
Chunk and embed only the text that should be searchable.
Store vectors and compact metadata in Vectorize.
Store the task pointer and retrieval policy in the Agent.
Return short ranked passages to the model, not raw files.

Use Workflows when correctness needs checkpoints

Here is the smallest runnable architecture probe: create an Agent project, then test that a request can carry the shape you need before adding model calls.

▶ Interactive prompt cell (full demo on lesson pages)

npx create-cloudflare@latest --template cloudflare/agents-starter cloudflare-agent-pilot
cd cloudflare-agent-pilot
npm install
npm run dev

curl -s https://your-worker.workers.dev/ | head

Expected output:

The dev server responds from the Workers/Agents starter. Your next production check is not "does chat work"; it is whether task_id, tenant_id, model route, and artifact pointers flow through the Agent boundary.

</RunPromptCell>

How to Architect Cloudflare Agents on Workers Around Durable Objects (2026 Production Guide)

How to Put Coordination in the Agent and Heavy Work Outside It

Use Durable Objects for cost shape, but design for hibernation

Route model calls through AI Gateway before the first incident

Store retrieval memory in Vectorize and blobs in R2

Use Workflows when correctness needs checkpoints

Choose Workers when agents are event-driven; choose containers when work is continuous

References

How to Route Cursor Composer 2, Claude Code, and Codex CLI Across Engineering Lanes (2026)

How to Architect Cloudflare Agents on Workers Around Durable Objects (2026 Production Guide)

How to Put Coordination in the Agent and Heavy Work Outside It

Use Durable Objects for cost shape, but design for hibernation

Route model calls through AI Gateway before the first incident

Store retrieval memory in Vectorize and blobs in R2

Use Workflows when correctness needs checkpoints

Choose Workers when agents are event-driven; choose containers when work is continuous

References

How to Route Cursor Composer 2, Claude Code, and Codex CLI Across Engineering Lanes (2026)

How to Architect Cloudflare Agents on Workers Around Durable Objects (2026 Production Guide)

How to Put Coordination in the Agent and Heavy Work Outside It

Use Durable Objects for cost shape, but design for hibernation

Route model calls through AI Gateway before the first incident

Store retrieval memory in Vectorize and blobs in R2

Use Workflows when correctness needs checkpoints

Choose Workers when agents are event-driven; choose containers when work is continuous

References

Related from the academy

How to Route Cursor Composer 2, Claude Code, and Codex CLI Across Engineering Lanes (2026)

How to Architect Cloudflare Agents on Workers Around Durable Objects (2026 Production Guide)

How to Put Coordination in the Agent and Heavy Work Outside It

Use Durable Objects for cost shape, but design for hibernation

Route model calls through AI Gateway before the first incident

Store retrieval memory in Vectorize and blobs in R2

Use Workflows when correctness needs checkpoints

Choose Workers when agents are event-driven; choose containers when work is continuous

References

Related from the academy

How to Route Cursor Composer 2, Claude Code, and Codex CLI Across Engineering Lanes (2026)