Gemini Enterprise Agent Platform: A Hands-On Tour — from Hello World to Production
Developers who have shipped at least one LLM demo and want to run real agents on GCP — comfortable with Python, familiar with REST APIs, may have used OpenAI or Anthropic APIs but new to Vertex.
- Build and locally run a working ADK agent with at least one tool
- Persist agent state across sessions using Agent Sessions and Memory Bank
- Wire together two or more agents in a supervisor/worker orchestration pattern
- Critically compare GEAP to Claude Agent SDK and Cloudflare Agents and choose the right platform for a given workload
What Gemini Enterprise Agent Platform Actually Is (and Isn't)
Google's Gemini Enterprise Agent Platform (GEAP), which reached general availability on 23 April 2026, is not a new product. It is a consolidation: every AI capability Google has shipped over the past four years — Vertex AI, Model Garden, Agent Builder, Dialogflow CX — now lives under one brand and one API surface. [1] For builders, understanding what that means in practice separates the people who benefit from the platform from those who spend three months wiring together services that already talk to each other.
Key facts
- GA date: 23 April 2026
- Replaces and absorbs: Vertex AI Agent Builder, Model Garden (as a sub-surface), Dialogflow CX (legacy path)
- Architecture: four pillars — Build, Scale, Govern, Optimize
- Model access: 200+ models including Gemini 3.1 Pro/Flash, Gemma 4, Anthropic Claude (Opus/Sonnet/Haiku), and third-party open models
- Entry points: Agent Studio (low-code visual), ADK — Agent Development Kit (code-first Python/TypeScript)
- State primitives: Agent Sessions (conversation-scoped) + Memory Bank (long-term cross-session)
- Security primitives: Agent Identity (cryptographic ID per agent), Agent Gateway (unified traffic control), Agent Anomaly Detection
- Strategic signal: All future Vertex AI roadmap ships exclusively through GEAP — standalone Vertex services get no new features [1]
The consolidation nobody saw coming
When Google announced GEAP, industry coverage focused on the new features: sub-agent networks, Memory Bank, Agent Identity. The buried lede was the strategic declaration at the end of the announcement: "All Vertex AI services and roadmap evolutions will be delivered exclusively through Agent Platform going forward, rather than as standalone services." [1]
That is a major commitment. It means if you are using Vertex AI Pipelines, Vertex AI Evaluation, or any other standalone Vertex service, your upgrade path is now through GEAP. Google is betting that agent-orchestration is the right abstraction for the next era of enterprise AI — not individual model calls, not standalone pipelines.
Whether that bet pays off for you depends on your use case. This chapter explains the architecture so you can make that judgment with clear eyes.
The four pillars
GEAP is organized into four operational domains. Every feature belongs to one of them. Knowing which pillar a feature belongs to tells you when in your development lifecycle you need it.
Pillar 1: Build
Build is where you create agents. There are two entry points:
Agent Studio is a low-code, visual interface. You drag components, define tools from a catalogue, set an instruction prompt, and get a deployable agent without writing code. Studio is fast for prototyping and useful for non-engineers who need to configure agents within guardrails set by an engineering team.
Agent Development Kit (ADK) is the code-first environment. It is a Python library (with TypeScript support) where agents are Python objects, tools are Python functions, and orchestration is expressed in code. ADK is where the rest of this course lives.
Within Build, two other features matter:
- Agent Garden: pre-built agent templates for common enterprise tasks (code modernization, invoice processing, financial analysis). Think of these as starting points, not production systems — they require customization.
- Workspaces: sandboxed environments where agents can execute bash commands and manage files. This is GEAP's answer to Code Interpreter: instead of running untrusted code on your infrastructure, agents get a hardened sandbox.
Pillar 2: Scale
Scale is where your agents move from working to production-grade. The key features:
Agent Runtime is GEAP's managed execution environment. It promises sub-second cold starts and provisions agents in seconds. Crucially, Agent Runtime does not require code changes — an ADK agent that runs locally deploys to Runtime with a configuration file, not a rewrite.
Agent Sessions provides conversation-scoped state management. Each session has a unique ID that you can map to an external record (a database row, a CRM contact). State stored in a session is available to any agent invocation that carries that session ID.
Memory Bank adds a layer above sessions: long-term, cross-session memory. Where a session holds raw conversation history, Memory Bank distills it — it uses a model to generate "Memory Profiles" (structured summaries) and retrieves them at low latency when a new session starts. The practical effect is that an agent that spoke to a user three months ago can recall relevant facts without loading three months of transcript.
Agent-to-agent orchestration supports both deterministic patterns (you define the routing logic) and generative patterns (the orchestrator model decides which sub-agent to invoke). This distinction matters more than it seems — we cover it in gemini-enterprise-agent-platform-hands-on-tour · 03-multi-agent-orchestration-with-vertex.
Pillar 3: Govern
Govern is GEAP's answer to the question "how do I run 50 agents in production without losing control of them?"
Agent Identity gives every deployed agent a unique cryptographic ID. Every action the agent takes is associated with that ID — tool calls, memory reads, external API calls. This creates an auditable trail you can query when something goes wrong.
Agent Registry is a central catalogue of approved tools, agents, and capabilities. Instead of every team defining their own version of a "get customer order" tool, Registry enforces a single canonical definition. Agents discover available tools through Registry rather than hardcoded imports.
Agent Gateway is the traffic layer. All agent-to-external and agent-to-agent traffic routes through Gateway, which applies consistent security policies, rate limits, and Model Armor protections. Model Armor is Google's term for prompt injection and data leakage defenses applied at the network layer — not inside the model.
Agent Security Dashboard integrates with Security Command Center for vulnerability scanning and asset discovery. If an agent starts making requests to unexpected IP ranges, this is where you see it.
Pillar 4: Optimize
Optimize is where you measure and improve agents after they are running.
Agent Simulation lets you test agents against synthetic user interactions and virtualized tools before deploying changes. This is the equivalent of a staging environment specifically designed for agent behavior — tools return canned responses so you can test routing logic without hitting real APIs.
Agent Evaluation provides continuous scoring. Multi-turn autoraters score agent responses against rubrics you define, and turnkey dashboards track those scores over time. This is important: agent quality degrades as your tools change and your user base grows. Without evaluation, you find out by reading support tickets.
Agent Observability provides visual tracing of agent reasoning. Every tool call, every model invocation, every memory read gets a trace entry. You can walk through what happened step-by-step — which matters when an agent made four tool calls and returned the wrong answer.
Agent Optimizer closes the loop: it clusters observed failures and suggests system instruction refinements. It is not autonomous (it suggests, not applies), but it reduces the manual work of reading failure logs to write better prompts.
What GEAP is not
The four pillars are comprehensive enough that it is easy to assume GEAP is everything. Three things it is not:
It is not model-agnostic in practice. GEAP supports 200+ models including Claude and open models. But the tightest integrations — Memory Bank, Agent Runtime telemetry, Agent Optimizer — are designed around Gemini. If you route all your traffic through Claude on GEAP, you are using GCP infrastructure with Anthropic's model, which works, but you lose some platform-level features that assume Gemini's specific capabilities.
It is not open-source. The ADK is open-source (Apache 2.0). The runtime, Memory Bank, Agent Gateway, and Govern features are fully proprietary GCP services. If you need to run your agent stack on-premises or on another cloud, you can use ADK locally, but you cannot replicate the platform layer.
It is not Dialogflow CX rebranded. Dialogflow CX was a flow-based, deterministic dialogue manager. GEAP's agents reason with LLMs and make probabilistic decisions. Existing Dialogflow CX flows can be migrated, but the mental model is fundamentally different. If you build a GEAP agent expecting it to follow a defined script reliably, you will be surprised.
<Callout type="warning"> Lock-in surface area: Using Agent Runtime, Memory Bank, and Agent Registry together creates deep GCP lock-in. Your agent logic is in ADK (portable), but your state, tool registry, and identity system are GCP-proprietary. Plan for this before you commit. gemini-enterprise-agent-platform-hands-on-tour · 04-comparing-to-claude-agent-sdk-and-cloudflare-agents compares exit paths across GEAP, Claude Agent SDK, and Cloudflare Agents. </Callout>
How the components connect
Here is the component map for a typical customer-support agent on GEAP. Read this as a data-flow diagram, left to right:
``
User request
│
▼
Agent Gateway ◄── Model Armor (prompt injection filter)
│
▼
Agent Runtime ── resolves Agent Identity (cryptographic ID)
│
├── loads Memory Bank profile (cross-session context)
│
├── loads Session state (current conversation)
│
▼
Agent (ADK)
│
├── Tool call A (via Agent Registry, approved tool)
├── Tool call B
│
▼
Agent Observability ── traces every step
│
▼
Response → User
│
▼
Agent Evaluation ── scores response, stores metric
│
▼
Agent Optimizer ── clusters failures, suggests instruction updates
``
Every box in this diagram is a managed GCP service. The only code you write is the Agent itself and the tool implementations. That is the core value proposition — and the core lock-in.
<KnowledgeCheck questions={[ { question: "Which GEAP pillar contains Memory Bank and Agent Sessions?", answers: [ "Build", "Scale", "Govern", "Optimize" ], correct: 1, explanation: "Memory Bank and Agent Sessions are Scale features — they exist to make agents production-grade by providing state continuity across restarts and long-term memory across session boundaries." }, { question: "An engineering team at a fintech company needs to ensure every tool call an agent makes is tied to a named, auditable identity for compliance reasons. Which GEAP feature addresses this?", answers: [ "Agent Registry", "Agent Simulation", "Agent Identity", "Agent Optimizer" ], correct: 2, explanation: "Agent Identity assigns a unique cryptographic ID to every deployed agent. All actions — tool calls, memory reads, API calls — are associated with that ID, creating the auditable trail compliance requires." }, { question: "Which statement about GEAP's model access is most accurate?", answers: [ "GEAP only supports Gemini models", "GEAP supports 200+ models including Claude, but tightest platform integrations are Gemini-optimised", "GEAP supports all models equally with no integration differences", "GEAP requires you to use Gemini as the orchestrator, with other models only as sub-agents" ], correct: 1, explanation: "GEAP supports Claude and open models, but features like Agent Optimizer and some Memory Bank integrations are designed around Gemini's capabilities. You can use other models, but with reduced platform integration." } ]} />
The contrarian view: is consolidation actually good?
GEAP's consolidation narrative is compelling, but it carries a real cost: surface area. When Vertex AI was a collection of loosely coupled services, a team could adopt Vertex Model Garden without adopting Vertex Evaluation. Now that everything is GEAP, the conceptual overhead of the platform is always in scope.
For a solo developer building a weekend project, GEAP is overkill. The four-pillar architecture is enterprise governance applied to a problem that might be solved with a single API call and a Postgres table. The marketing targets "enterprise scale" — and if your workload is not that, the platform actively gets in your way.
The more honest framing: GEAP is the right platform when you need at least two of the four pillars in production. If you need Build + Govern (multiple agents with compliance requirements), GEAP is compelling. If you only need Build, you are paying for three pillars you do not use. We make this trade-off concrete in gemini-enterprise-agent-platform-hands-on-tour · 04-comparing-to-claude-agent-sdk-and-cloudflare-agents.
Hands-on exercise
Draw the GEAP component map for your use case.
Pick a real (or plausible) agent you want to build. On paper or in a diagramming tool:
- Draw the data flow from user request to response.
- For each GEAP component you would use, label which pillar it belongs to.
- Mark any component you would not use and write one sentence explaining why.
- Identify: does your use case require two or more pillars? If not, write a note questioning whether GEAP is the right platform.
Success criteria: A diagram with at least 4 GEAP components labelled by pillar, and a written answer to the "two pillars?" question.
What's next
Chapter 2 gets hands-on: you will install ADK, define a Python function as a tool, wire it into an Agent, and add session and Memory Bank persistence. By the end you will have an agent that remembers your last session — even after a process restart.
See gemini-enterprise-agent-platform-hands-on-tour · 02-hello-world-agent-tool-state-persistence to continue.
References
[1] Google Cloud Blog. "Introducing Gemini Enterprise Agent Platform." 23 April 2026. https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform · retrieved 2026-04-30
[2] Google Agent Development Kit. Official documentation. https://adk.dev/ · retrieved 2026-04-30
[3] Google Cloud. Vertex AI documentation. https://cloud.google.com/vertex-ai/docs · retrieved 2026-04-30
[4] Cloudflare. Cloudflare Agents documentation. https://developers.cloudflare.com/agents/ · retrieved 2026-04-30
[5] Anthropic. Building with Claude — Agents and tools. https://docs.anthropic.com/en/docs/agents-and-tools · retrieved 2026-04-30
Hello World: Agent + 1 Tool + State Persistence
ADK (Agent Development Kit), released as part of Google's Gemini Enterprise Agent Platform on 23 April 2026, is a Python library that lets builders run a working agent with tool use and state persistence in under 10 lines of configuration code. By the end of this chapter you will have a local ADK agent that tracks expenses, remembers them across sessions, and summarises your spending history — without managing any database yourself. It is deliberately simple. The goal is to see exactly which lines of code map to which platform concepts before you layer in complexity.
Key facts
- ADK installs as a single pip package:
google-adk - Tools are plain Python functions — no decorator magic required in most patterns
- Session state is scoped to a conversation; Memory Bank is scoped to a user across all conversations
- Local development uses
InMemorySessionService; production usesVertexAiSessionServicewith a config swap - Agent Runtime (cloud deployment) requires no code changes — only a
deployment.yaml - The ADK web UI (
adk web) lets you test agents interactively in a browser without writing a test harness - Cold starts on Agent Runtime are sub-second for pre-warmed instances [1]
Prerequisites
Before continuing, confirm:
``bash
python --version # 3.10 or later
gcloud auth application-default login # required for Vertex AI calls
gcloud config set project YOUR_PROJECT_ID
``
You also need a Gemini API key or a GCP project with Vertex AI enabled. The examples below use gemini-flash-latest because it is the fastest and cheapest Gemini model for development — swap to gemini-pro-latest for production reasoning tasks.
Step 1: Install ADK
``bash
pip install google-adk
``
That is the entire install. ADK is a pure-Python library with no system dependencies. Verify:
``bash
python -c "import google.adk; print(google.adk.__version__)"
``
You should see a version string beginning with 1. (the current release is in the 1.3x series). If you see an import error, check that your Python environment matches the python binary you ran above.
Step 2: Define your first tool
In ADK, a tool is a Python function. The function's docstring is the tool description — the model reads it to decide when to call the function. Type annotations are the parameter schema.
Create budget_tracker/tools.py:
```python from datetime import date from typing import Optional
def log_expense(amount: float, category: str, note: Optional[str] = None) -> str: """Record a new expense.
Use this tool when the user says they spent money on something. Args: amount: The amount spent, in USD. category: Expense category, e.g. 'food', 'transport', 'software'. note: Optional description of what was purchased. Returns: Confirmation string with the logged entry. """ entry = { "date": date.today().isoformat(), "amount": amount, "category": category, "note": note or "", } _expenses.append(entry) return f"Logged: ${amount:.2f} on {category} ({note or 'no note'})"
def get_expense_summary() -> str: """Return a summary of all logged expenses grouped by category.
Use this tool when the user asks how much they have spent or wants a summary.
Returns: A formatted summary of total spending per category. """ if not _expenses: return "No expenses logged yet." totals: dict[str, float] = {} for exp in _expenses: totals[exp["category"]] = totals.get(exp["category"], 0.0) + exp["amount"] lines = [f" {cat}: ${total:.2f}" for cat, total in sorted(totals.items())] grand_total = sum(totals.values()) lines.append(f" Total: ${grand_total:.2f}") return "Expense summary:\n" + "\n".join(lines) ```
Three things to notice:
- No decorator. There is no
@toolmagic. ADK infers the tool schema from the function signature and docstring at runtime. - Docstring quality matters. The model reads the docstring — not the function name — to decide when to call this tool. "Use this tool when..." is the trigger phrase that shapes model behaviour.
- Return strings. Tools return strings (or JSON-serialisable values) that the model reads as tool output. Return structured data as JSON strings for complex results.
Step 3: Wire the agent
Create budget_tracker/agent.py:
```python from google.adk import Agent from budget_tracker.tools import log_expense, get_expense_summary
budget_agent = Agent( name="budget_tracker", model="gemini-flash-latest", description="A personal budget tracker that logs and summarises expenses.", instruction="""You are a friendly budget tracker.
When the user mentions spending money, call log_expense with the amount, category, and any note they provide. Always confirm what you logged.
When the user asks about their spending, call get_expense_summary and present the results clearly.
Keep responses short. Do not invent expenses the user did not mention.""", tools=[log_expense, get_expense_summary], ) ```
The instruction field is the system prompt. It does four things here:
- Sets the persona
- Gives explicit rules for when to call each tool
- Tells the model not to hallucinate data
- Keeps output concise
<Callout type="warning"> Instruction quality is your most important variable. A poorly written instruction produces an agent that calls the wrong tool, invents data, or returns walls of text. Treat the instruction like production code: version it, test it, refine it when you see failures. </Callout>
Step 4: Run locally with the ADK web UI
ADK ships with a built-in development server that gives you a browser-based chat interface:
``bash
adk web budget_tracker/
``
Open http://localhost:8000. You should see a chat interface with your budget_tracker agent. Try:
- "I spent $12.50 on lunch"
- "I paid $45 for a software subscription"
- "How much have I spent?"
I spent $12.50 on lunch and $4 on coffee this morning. How much have I spent on food today?
[tool_call: log_expense] {"amount": 12.50, "category": "food", "note": "lunch"} → "Logged: $12.50 on food (lunch)"
[tool_call: log_expense] {"amount": 4.00, "category": "food", "note": "coffee"} → "Logged: $4.00 on food (coffee)"
[tool_call: get_expense_summary] → "Expense summary:\n food: $16.50\n Total: $16.50"
You've spent $16.50 on food so far today — $12.50 on lunch and $4.00 on coffee.`} />
The agent correctly identifies two separate expenses from one message, calls log_expense twice, then calls get_expense_summary to answer the question. This multi-step tool use happens automatically — you did not write any routing logic.
<KnowledgeCheck
questions={[
{
question: "Where does ADK read the tool description that the model uses to decide when to call a function?",
answers: [
"A separate JSON schema file you provide",
"The function's docstring",
"A description parameter in the Agent constructor",
"A metadata decorator applied to the function"
],
correct: 1,
explanation: "ADK infers the tool description from the function's docstring. The quality of your docstring directly affects when and how accurately the model decides to invoke the tool."
}
]}
/>
Step 5: Add session state
Right now, expenses vanish when the process restarts. The _expenses list is in memory. Real agents need state that survives restarts. GEAP offers two layers: Session state (within a conversation) and Memory Bank (across all conversations for a user).
Let's start with Session state. Modify agent.py:
```python from google.adk import Agent from google.adk.sessions import InMemorySessionService, Session from budget_tracker.tools import log_expense, get_expense_summary
session_service = InMemorySessionService()
budget_agent = Agent( name="budget_tracker", model="gemini-flash-latest", description="A personal budget tracker that logs and summarises expenses.", instruction="""...""", # same as before tools=[log_expense, get_expense_summary], session_service=session_service, ) ```
Now update your tools to read and write session state instead of the module-level list:
```python # budget_tracker/tools.py (session-aware version) from datetime import date from typing import Optional from google.adk.sessions import Session
def log_expense( amount: float, category: str, session: Session, note: Optional[str] = None, ) -> str: """Record a new expense in the current session.
Use this tool when the user says they spent money on something.
Args: amount: The amount spent, in USD. category: Expense category, e.g. 'food', 'transport', 'software'. session: The current session (injected automatically by ADK). note: Optional description of what was purchased. Returns: Confirmation string with the logged entry. """ expenses = session.state.get("expenses", []) entry = { "date": date.today().isoformat(), "amount": amount, "category": category, "note": note or "", } expenses.append(entry) session.state["expenses"] = expenses return f"Logged: ${amount:.2f} on {category} ({note or 'no note'})"
def get_expense_summary(session: Session) -> str: """Return a summary of all logged expenses grouped by category.
Use this tool when the user asks how much they have spent.
Args: session: The current session (injected automatically by ADK). Returns: A formatted summary of total spending per category. """ expenses = session.state.get("expenses", []) if not expenses: return "No expenses logged yet." totals: dict[str, float] = {} for exp in expenses: totals[exp["category"]] = totals.get(exp["category"], 0.0) + exp["amount"] lines = [f" {cat}: ${total:.2f}" for cat, total in sorted(totals.items())] grand_total = sum(totals.values()) lines.append(f" Total: ${grand_total:.2f}") return "Expense summary:\n" + "\n".join(lines) ```
Key insight: ADK injects session automatically when a tool function declares a Session parameter. You do not pass it yourself — the framework sees the type annotation and injects the current session. This is ADK's dependency injection pattern.
session.state is a dictionary that ADK persists through the conversation. If you restart the process but resume the same session ID, session.state is restored.
Step 6: Understanding Session vs Memory Bank
The distinction between these two concepts is the most important architectural choice in this chapter:
| | Session state | Memory Bank |
|---|---|---|
| Scope | One conversation | All conversations for a user |
| Duration | Until session expires (configurable) | Long-term (days to indefinite) |
| Content | Raw conversation + structured state dict | Distilled "Memory Profiles" |
| Latency | Sub-millisecond (local dict) | Low-latency retrieval (indexed) |
| Who creates it | You (via session.state writes) | The platform (via model distillation) |
| Who reads it | Your tools, explicitly | The agent's instruction context, automatically |
Session state is for information that matters during the current conversation: a shopping cart, an in-progress form, the user's current task context. You write to it explicitly.
Memory Bank is for information that should survive across conversations: user preferences, past decisions, relationship context. The platform creates Memory Profiles automatically by running a model over completed sessions and distilling relevant facts. You enable it; the platform manages it.
For the budget tracker, the right model is: - Session state: the list of expenses logged so far in this conversation - Memory Bank profile: "This user tends to overspend on food; last month they spent $320 on dining"
Step 7: Switching to production sessions
When you are ready to deploy, swap InMemorySessionService for VertexAiSessionService:
```python from google.adk.sessions import VertexAiSessionService
session_service = VertexAiSessionService( project="your-gcp-project", location="us-central1", agent_engine_id="your-agent-engine-id", # from Agent Runtime ) ```
Everything else stays the same. Your tool code, your agent instruction, your tool definitions — unchanged. The Session object your tools receive has the same API. This is the portability promise of ADK: develop locally with in-memory services, deploy to Vertex with a one-line swap. For a broader introduction to the Vertex AI infrastructure GEAP builds on, see vertex-ai-fundamentals.
I spent $85 on groceries yesterday. What's my total food spend this month?
[tool_call: get_expense_summary] → "Expense summary:\n food: $101.50\n Total: $101.50"
I've logged your $85.00 grocery run. Your total food spend this month is $101.50 — that's the $12.50 lunch, $4.00 coffee, and today's $85.00 groceries.`} />
<KnowledgeCheck questions={[ { question: "Your budget tracker agent needs to remember a user's preferred currency (USD, EUR, GBP) across all future sessions. Which storage layer should you use?", answers: [ "session.state, because it persists within a conversation", "Memory Bank, because it preserves information across all sessions for a user", "A module-level Python variable, because it is fastest", "Agent Registry, because preferences are a form of tool configuration" ], correct: 1, explanation: "Preferred currency is a user preference that should persist indefinitely across conversations. Memory Bank is designed for exactly this: cross-session, long-lived context. session.state would reset at the end of each conversation." }, { question: "How does ADK inject the Session object into a tool function?", answers: [ "You manually pass it when calling the tool", "The agent reads a global session variable", "ADK sees the Session type annotation in the function signature and injects it automatically", "You register the session with a decorator before the function definition" ], correct: 2, explanation: "ADK uses type annotation-based dependency injection. If your function declares a parameter typed as Session, ADK automatically injects the current session when calling the tool. No manual wiring required." }, { question: "What is the only change required to move from local InMemorySessionService to production VertexAiSessionService?", answers: [ "Rewrite all tool functions to use a different Session API", "Replace the session_service constructor — all other code stays unchanged", "Add a @production_tool decorator to each tool", "Change the agent model from gemini-flash-latest to gemini-pro-latest" ], correct: 1, explanation: "ADK is designed for environment parity. The Session API is identical between InMemorySessionService and VertexAiSessionService — swap the constructor, everything else works." } ]} />
Hands-on exercise: Build the budget tracker
Goal: A working ADK agent with session state and (simulated) long-term memory.
Steps:
1. Create the project structure: budget_tracker/__init__.py, budget_tracker/tools.py, budget_tracker/agent.py
2. Implement log_expense and get_expense_summary with Session injection as shown above
3. Run adk web budget_tracker/ and test three messages: log two expenses, then ask for a summary
4. Stop the process, restart it, and resume the same session ID via the web UI. Confirm your expenses are still there.
5. Extension: Add a third tool clear_expenses(session: Session) -> str that deletes all logged expenses. Test that calling it and restarting the session returns "No expenses logged yet."
Success criteria: - Agent correctly logs expenses from natural language input (not JSON) - Session summary matches what you logged - Expenses survive a process restart when using the same session ID
What's next
You now have a single-agent system with state. The next step is coordination: what happens when one agent is not enough? Chapter 3 introduces multi-agent orchestration — a supervisor agent that routes work to specialist sub-agents — and shows how Agent Registry makes those sub-agents discoverable.
See gemini-enterprise-agent-platform-hands-on-tour · 03-multi-agent-orchestration-with-vertex to continue.
References
[1] Google Cloud Blog. "Introducing Gemini Enterprise Agent Platform." 23 April 2026. — https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform · retrieved 2026-04-30
[2] Google Agent Development Kit. Official documentation and quickstart. — https://adk.dev/ · retrieved 2026-04-30
[3] Google Cloud. Vertex AI Agent Builder — ADK overview. — https://cloud.google.com/vertex-ai/docs/generative-ai/agent-builder/agent-development-kit/overview · retrieved 2026-04-30
[4] Google Cloud. Agent Sessions documentation. — https://cloud.google.com/vertex-ai/docs/generative-ai/agent-builder/sessions · retrieved 2026-04-30
[5] Google Cloud. Memory Bank guide. — https://cloud.google.com/vertex-ai/docs/generative-ai/agent-builder/memory · retrieved 2026-04-30
Multi-Agent Orchestration with Vertex
GEAP's agent-to-agent orchestration system, available since Gemini Enterprise Agent Platform's general availability on 23 April 2026, lets a single coordinator delegate work to 2 or more specialist sub-agents — turning a fragile 20-tool monolith into a testable, independently-deployable network. Production breaks the single-agent model fast: a customer-support agent covering account management, billing, and technical support accumulates enough tools and instruction length to produce correlated hallucinations. The correct answer is decomposition — split the monolith into specialist agents and give them a coordinator.
This chapter builds that coordinator. By the end you will have a two-agent research pipeline: a Planner that decomposes a question into sub-questions, and a Retriever that answers each one. You will wire them through Agent Registry so the Planner discovers the Retriever by name rather than by hardcoded import, and you will use Agent Observability to walk through a trace when the handoff breaks.
Key facts
- GEAP supports two orchestration patterns: deterministic (you define routing logic in code) and generative (the orchestrator model decides routing at runtime)
- Sub-agents are ADK Agent instances — same class, different instruction and tools
transfer_to_agent(agent_name)is the built-in ADK mechanism for generative handoff; the orchestrator calls it as a tool- Agent Registry is a GCP-managed catalogue; agents discover sub-agents by name via the Registry API, not by Python import
- Agent Anomaly Detection flags unusual reasoning patterns — including infinite handoff loops — without you writing watchdog code [1]
- ADK's
SequentialAgentandParallelAgentare the code primitives for deterministic orchestration - Observability traces are available in the GCP console under GEAP > Observability within seconds of a completed invocation
Two orchestration patterns, one choice to make
Before writing code, you need to decide which pattern fits your use case. The choice has downstream consequences for debugging, cost, and reliability.
Deterministic orchestration
You write the routing logic. Sub-agent A always runs first, then sub-agent B gets A's output. Or: A and B run in parallel; their outputs are merged by a deterministic merge function.
ADK provides SequentialAgent and ParallelAgent for this:
```python from google.adk.agents import SequentialAgent, ParallelAgent
When to use: When the routing logic is stable and you want predictable costs (you know exactly which agents run). Good for ETL-style pipelines, data enrichment, and report generation.
Tradeoff: Brittle under changing inputs. If the Planner sometimes determines that no sub-questions are needed, a rigid sequential pipeline still invokes the Retriever anyway.
Generative orchestration
The orchestrator is an Agent with a strong instruction and a special transfer_to_agent tool. The model reads the user's request and decides at runtime which sub-agent to invoke, whether to invoke multiple, and in what order.
When to use: When routing decisions depend on the content of user input in ways you cannot enumerate. Good for customer support triage, intent-based routing, and dynamic workflows.
Tradeoff: Non-deterministic costs (you do not know how many sub-agent calls occur), harder to test exhaustively, and more susceptible to jailbreak if the orchestrator instruction is weak.
For this chapter we build a generative orchestration pipeline, because it showcases more GEAP-specific features. The Chapter 3 hands-on exercise includes a note on when to prefer the deterministic version.
Building the sub-agent: Retriever
Create research_pipeline/retriever.py:
```python from google.adk import Agent
def search_knowledge_base(query: str) -> str: """Search the internal knowledge base for information relevant to a query.
Use this tool when you have a specific factual question to answer.
Args: query: The specific question to answer.
Returns: A string containing the most relevant information found, or a 'no results' message if nothing was found. """ # In production, this calls a vector database, RAG pipeline, or search API. # For demo purposes, we return canned responses. knowledge = { "gemini enterprise agent platform ga date": "GEAP reached general availability on 23 April 2026.", "geap memory bank purpose": "Memory Bank stores long-term cross-session context as distilled Memory Profiles, enabling agents to recall user preferences and history across conversations.", "adk install command": "Install the Agent Development Kit with: pip install google-adk", "agent registry purpose": "Agent Registry is a centralized catalogue of approved tools, agents, and capabilities. Agents discover sub-agents by name via Registry rather than hardcoded imports.", } # Simple keyword match for demo; real implementation uses semantic search. query_lower = query.lower() for key, value in knowledge.items(): if any(word in query_lower for word in key.split()): return value return f"No results found for: {query}"
retriever_agent = Agent( name="retriever", model="gemini-flash-latest", description="A specialist agent that answers specific factual questions by searching the knowledge base.", instruction="""You are a precise factual retriever.
When given a question, call search_knowledge_base with the question text. Return only what you found — do not add interpretation or speculation. If the search returns no results, say so clearly.""", tools=[search_knowledge_base], ) ```
Building the orchestrator: Planner
The Planner does two things: it decomposes a complex question into sub-questions, and it hands each sub-question to the Retriever using transfer_to_agent.
Create research_pipeline/planner.py:
```python from google.adk import Agent from google.adk.tools import transfer_to_agent
planner_agent = Agent( name="planner", model="gemini-pro-latest", # use a stronger model for orchestration reasoning description="An orchestrator that decomposes research questions and coordinates specialist agents.", instruction="""You are a research coordinator. Your job:
- DECOMPOSE: When given a complex question, break it into 2-4 specific sub-questions.
- DELEGATE: For each sub-question, transfer to the 'retriever' agent to get the answer.
- SYNTHESISE: After all sub-questions are answered, compile a clear, complete response.
Rules: - Always decompose before delegating. Never answer factual questions yourself. - Transfer one sub-question at a time; wait for the result before the next transfer. - If the original question is already specific enough (one fact to look up), skip decomposition and delegate directly. - Your final response must cite which sub-questions were answered.""", tools=[transfer_to_agent], # Register sub-agents that this orchestrator can delegate to. # ADK resolves these by name at runtime; they do not need to be imported. sub_agents=["retriever"], ) ```
transfer_to_agent is a built-in ADK tool. When the Planner calls it with agent_name="retriever" and a message, ADK routes the message to the Retriever agent and returns the Retriever's response to the Planner. The Planner then continues its reasoning with that answer in context.
Step 3: Wire the pipeline
Create research_pipeline/agent.py:
```python from research_pipeline.planner import planner_agent from research_pipeline.retriever import retriever_agent from google.adk.sessions import InMemorySessionService
session_service = InMemorySessionService()
Run it:
``bash
adk web research_pipeline/
``
Tell me about Gemini Enterprise Agent Platform: when it launched, how to install it, and what Memory Bank does.
Sub-question 1: When did GEAP launch? [transfer_to_agent: retriever] → "GEAP reached general availability on 23 April 2026."
Sub-question 2: How do I install the ADK? [transfer_to_agent: retriever] → "Install the Agent Development Kit with: pip install google-adk"
Sub-question 3: What does Memory Bank do? [transfer_to_agent: retriever] → "Memory Bank stores long-term cross-session context as distilled Memory Profiles, enabling agents to recall user preferences and history across conversations."
Here's what I found:
Gemini Enterprise Agent Platform launched on 23 April 2026 (GA).
To get started, install the ADK: \pip install google-adk\
Memory Bank provides long-term cross-session memory — instead of rebuilding context from raw conversation history, it stores distilled "Memory Profiles" so agents can recall what they need at low latency.`} />
<KnowledgeCheck questions={[ { question: "The Planner agent calls transfer_to_agent('retriever', question). Where does ADK look to find the retriever agent at runtime?", answers: [ "It imports the retriever module directly from the Python path", "It queries Agent Registry, resolving 'retriever' to a registered agent definition", "It looks for a class named RetrieverAgent in the same file", "It sends an HTTP request to a hardcoded localhost endpoint" ], correct: 1, explanation: "In production on Vertex, ADK resolves agent names through Agent Registry — a centralized catalogue of approved agents. Locally, ADK uses the sub_agents declaration to resolve names within the same session." }, { question: "You are building a data enrichment pipeline where step B always runs after step A, regardless of what A returns. Which orchestration pattern is more appropriate?", answers: [ "Generative — use transfer_to_agent and let the model decide", "Deterministic — use SequentialAgent to define the fixed routing", "Either — they produce identical results for this case", "Neither — GEAP does not support data pipelines" ], correct: 1, explanation: "When routing is fixed and predictable, deterministic orchestration (SequentialAgent) is the right choice. It gives predictable costs, easier testing, and no risk of the model skipping or reordering steps." } ]} />
Step 4: Register agents in Agent Registry
In local development, agent resolution is handled in-process. In production on Vertex, you register agents in Agent Registry so the platform manages discovery, versioning, and access control.
Register the retriever via ADK CLI (requires a deployed Agent Runtime):
``bash
adk agents register retriever \
--engine-id=YOUR_ENGINE_ID \
--project=YOUR_PROJECT \
--location=us-central1 \
--description="Answers factual questions via knowledge base search"
``
After registration, any other agent in the same project can call transfer_to_agent("retriever", ...) and ADK resolves it through Registry — no hardcoded endpoints, no shared Python modules. This is the key governance benefit: the Registry owner controls which agents are discoverable and which are retired.
<Callout type="warning"> Registry is not import control. Agent Registry controls discovery, not execution security. A rogue agent that knows a sub-agent's name directly can still call it if it has the right IAM permissions. For true isolation, combine Registry with Agent Gateway policies that restrict which caller identities can invoke which agents. </Callout>
Step 5: Reading an Observability trace
When the Planner hands off to the Retriever and the Retriever returns the wrong answer, how do you debug it? The Agent Observability console shows the full execution trace.
A trace for a multi-agent call looks like this:
``
Trace: user-request-7f3a
├─ [0.000s] planner: received user message
│ input: "Tell me about GEAP..."
├─ [0.312s] planner: model reasoning
│ thinking: "Decompose into 3 sub-questions..."
├─ [0.891s] planner: tool_call transfer_to_agent
│ args: {agent_name: "retriever", message: "When did GEAP launch?"}
│ ├─ [0.892s] retriever: received delegation
│ ├─ [0.904s] retriever: tool_call search_knowledge_base
│ │ args: {query: "gemini enterprise agent platform ga date"}
│ │ result: "GEAP reached general availability on 23 April 2026."
│ └─ [0.967s] retriever: returned result
├─ [1.201s] planner: received sub-answer
│ content: "GEAP reached GA on 23 April 2026."
├─ [1.203s] planner: tool_call transfer_to_agent (sub-question 2)
│ ...
└─ [2.891s] planner: final response assembled
``
Each node in the trace is clickable in the GCP console — you can inspect the exact input and output of every model call and every tool call. When a handoff fails (the Retriever returns "no results" unexpectedly), you click the search_knowledge_base node and see exactly what query string it received.
Common failure patterns in traces:
1. Query transformation: The Planner rephrases the sub-question before handing it to the Retriever, and the rephrased query does not match your knowledge base. Fix: tighten the Planner instruction to pass questions verbatim.
2. Infinite delegation: The Planner calls transfer_to_agent with the Retriever, the Retriever calls transfer_to_agent back to the Planner (because its instruction is too loose). Agent Anomaly Detection flags this within 2-3 hops.
3. Silent tool failure: A tool returns an empty string instead of raising an exception. The model treats the empty string as a valid (if useless) result and continues. Always return explicit "no results" messages.
What is the Agent Registry and why does it matter for production multi-agent systems?
→ "Agent Registry is a centralized catalogue of approved tools, agents, and capabilities. Agents discover sub-agents by name via Registry rather than hardcoded imports."
Agent Registry is GEAP's centralized catalogue of agents and tools. In practice, it matters because it separates discovery from implementation: your Planner agent doesn't import the Retriever — it asks Registry for an agent named 'retriever'.
This means: - Versioning: Update the Retriever without touching the Planner's code - Governance: The Registry owner controls which agents are visible to which callers - Auditability: Registry logs show exactly which agents called which sub-agents`} />
<KnowledgeCheck questions={[ { question: "An Agent Observability trace shows the Planner calling transfer_to_agent three times in a row before the Retriever responds to the first call. What is the most likely cause?", answers: [ "The Planner instruction is missing the 'wait for result' constraint, causing it to fire multiple delegations simultaneously", "The Retriever is rate-limited, causing the Planner to retry", "Agent Registry is down", "The gemini-pro-latest model does not support multi-step tool use" ], correct: 0, explanation: "Without an explicit instruction to wait for each result before the next transfer, a generative orchestrator can issue multiple transfer_to_agent calls before processing any results. Fix: add 'wait for the result before the next transfer' to the Planner's instruction." }, { question: "Why is Agent Gateway recommended alongside Agent Registry in production multi-agent systems?", answers: [ "Gateway improves model response quality for sub-agent calls", "Registry controls discovery but not execution security; Gateway enforces IAM-backed access policies", "Gateway is required for Memory Bank to function", "Gateway reduces cold-start latency for sub-agent invocations" ], correct: 1, explanation: "Agent Registry controls which agents are discoverable by name. Agent Gateway enforces which callers are actually allowed to invoke them. For production security, you need both: Registry for governance and Gateway for enforcement." } ]} />
Hands-on exercise: Build the research pipeline
Goal: A two-agent system where the Planner decomposes questions and the Retriever answers them.
Steps:
1. Create the directory structure: research_pipeline/__init__.py, research_pipeline/retriever.py, research_pipeline/planner.py, research_pipeline/agent.py
2. Implement the Retriever with search_knowledge_base as shown. Add at least 3 additional knowledge base entries on a topic of your choice.
3. Implement the Planner with transfer_to_agent and the sub_agents=["retriever"] declaration.
4. Run adk web research_pipeline/ and ask a question that requires at least 2 sub-questions to answer fully.
5. In the ADK web UI, click on the trace view and identify the exact point where the Planner transferred to the Retriever.
6. Extension: Add a third agent — a Formatter that takes the Planner's synthesis and formats it as a structured markdown report. Wire it as a deterministic last step using SequentialAgent.
Success criteria: - Planner correctly decomposes a multi-part question (visible in the trace) - Retriever is called once per sub-question (not once per user message) - Synthesis addresses all sub-questions without hallucinating new facts - Trace in the UI shows the delegation chain clearly
What's next
You have now built a two-agent system on GEAP. Before going deeper into the platform, it is worth asking: is GEAP the right platform for your use case? Chapter 4 puts GEAP in an honest comparison with Claude Agent SDK and Cloudflare Agents — covering state management, deployment topology, lock-in, and the workloads each platform wins. For reference on the Memory Bank and session state primitives powering these agents, or for Vertex AI fundamentals, see the linked resources.
See gemini-enterprise-agent-platform-hands-on-tour · 04-comparing-to-claude-agent-sdk-and-cloudflare-agents to continue.
References
[1] Google Cloud Blog. "Introducing Gemini Enterprise Agent Platform." 23 April 2026. — https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform · retrieved 2026-04-30
[2] Google Agent Development Kit. Agent-to-agent orchestration guide. — https://adk.dev/ · retrieved 2026-04-30
[3] Google Cloud. Multi-agent documentation. — https://cloud.google.com/vertex-ai/docs/generative-ai/agent-builder/multi-agent · retrieved 2026-04-30
[4] Google Cloud. Agent Registry guide. — https://cloud.google.com/vertex-ai/docs/generative-ai/agent-builder/agent-registry · retrieved 2026-04-30
[5] Google Cloud. Agent Observability documentation. — https://cloud.google.com/vertex-ai/docs/generative-ai/agent-builder/observability · retrieved 2026-04-30
Comparing to Claude Agent SDK + Cloudflare Agents
Three production agent platforms — Google's Gemini Enterprise Agent Platform (GEAP), Anthropic's Claude Agent SDK, and Cloudflare Agents — reached general availability between 2024 and April 2026, each offering divergent approaches to state management, deployment topology, and vendor lock-in. This chapter is a structured comparison across 5 dimensions — state, deployment, model access, lock-in surface, and workload fit — so you can choose the right platform for your specific constraints without marketing-driven hype.
Key facts
- All three platforms support tool calling, multi-agent patterns, and long-running agents
- State management is the sharpest architectural divergence: managed SQL (GEAP), you-manage-it (Claude SDK), and Durable Objects with built-in SQLite (Cloudflare)
- Deployment topology: GCP-regional (GEAP), any infra (Claude SDK), Cloudflare global edge (Cloudflare Agents)
- Vendor lock-in surface: GEAP is highest (Memory Bank, Registry, Gateway), Claude SDK is lowest (just the Anthropic API), Cloudflare Agents is medium (Durable Objects are Cloudflare-proprietary)
- Model flexibility: GEAP (200+ models), Claude SDK (Claude models only without manual wiring), Cloudflare Agents (model-agnostic — bring your own provider)
- Cold-start: Cloudflare (sub-millisecond via edge), GEAP (sub-second with pre-warmed instances), Claude SDK (depends on your infra) [1]
Platform overview
Gemini Enterprise Agent Platform (GEAP)
GEAP is a fully managed, opinionated platform. You deploy agents to Agent Runtime, state lives in Agent Sessions and Memory Bank (GCP-managed), traffic routes through Agent Gateway, and the Govern/Optimize pillars give you compliance features out of the box. The platform assumes you are building on GCP and treats that as a feature, not a constraint. [1]
What you give up: portability. Moving a GEAP agent to AWS or on-premises means rewriting the state layer, the registry layer, and the gateway layer. The ADK (agent logic) is portable; the platform services are not.
Claude Agent SDK (Anthropic)
The Claude Agent SDK is Anthropic's code-first framework for building agents with Claude models. It is the least opinionated of the three platforms: the SDK gives you tool use, multi-agent coordination primitives, and model access — and leaves infrastructure, state management, and deployment entirely to you.
What you gain: maximum portability and model-specific quality. Claude Opus 4.7 is the strongest reasoning model in the current generation for complex multi-step tasks; if your workload requires the highest-quality reasoning and you can manage your own infrastructure, Claude SDK gives you direct access without platform overhead.
What you give up: the managed services. There is no equivalent of Memory Bank built in — you build your own long-term memory layer (typically with a vector database and a retrieval pipeline). There is no managed session service — you bring your own database. This is not a flaw; it is the design philosophy. Claude SDK is for builders who want control over every layer.
Cloudflare Agents
Cloudflare Agents is a TypeScript SDK that runs on Cloudflare Workers, with state managed by Durable Objects. Each agent instance is a Durable Object: a microserver with its own SQLite database, WebSocket support, and scheduling capabilities. The platform runs on Cloudflare's global edge network — 300+ locations, sub-millisecond cold starts. [4]
What you gain: edge latency, built-in WebSocket support for real-time interactions, and a state model that does not require an external database. Every agent has its own SQLite database that lives alongside the compute — no network round-trips for state reads.
What you give up: the GCP compliance features (no equivalent of Agent Identity, Agent Anomaly Detection, or Security Command Center integration) and the model ecosystem (you wire your own provider). Cloudflare Agents is TypeScript-only — no Python support.
State management: the deepest divergence
How each platform handles state is the most architecturally significant difference. It determines your data model, your failure modes, and your migration path.
GEAP: managed, layered, opinionated
```python # GEAP: state is managed by the platform # You write to session.state; the platform persists it session.state["expenses"] = expenses
GEAP's state model has two layers: Session state (within-conversation, you write explicitly) and Memory Bank (cross-conversation, platform-distilled). The platform manages persistence, retrieval indexing, and cross-session loading. You do not write database schemas or manage connections.
Tradeoff: You cannot easily inspect or migrate raw state. Memory Profiles are generated by Gemini — if the distillation model changes, your Memory Bank contents change subtly. You trust the platform to handle this correctly.
Claude Agent SDK: bring-your-own-state
```python # Claude SDK: you manage state yourself import anthropic from your_db import get_session, save_session
client = anthropic.Anthropic() session_data = get_session(user_id) # your database call
response = client.messages.create( model="claude-opus-4-7", system=build_system_prompt(session_data), # you inject state messages=conversation_history, tools=your_tools, )
save_session(user_id, updated_session_data) # your database call ```
The Claude SDK has no built-in state management. Conversation history is a list of messages you pass. Long-term memory is whatever you load into the system prompt. This is complete control — and complete responsibility.
Tradeoff: You implement the database layer, the retrieval pipeline, the context compression (conversation history grows indefinitely otherwise), and the cross-session summarisation. This is weeks of engineering for a production-grade implementation. But you own every byte of your data, can inspect it directly, and can migrate it to any platform without data loss.
Cloudflare Agents: state as a first-class primitive
```typescript // Cloudflare Agents: state is built into the agent object import { Agent, callable } from "agents";
export class BudgetAgent extends Agent<Env, { expenses: Expense[] }> { initialState = { expenses: [] };
@callable()
logExpense(amount: number, category: string): string {
const expense = { amount, category, date: new Date().toISOString() };
this.setState({ expenses: [...this.state.expenses, expense] });
return Logged: $${amount} on ${category};
}
@callable() getSummary(): string { const totals = this.state.expenses.reduce((acc, exp) => { acc[exp.category] = (acc[exp.category] || 0) + exp.amount; return acc; }, {} as Record<string, number>); return JSON.stringify(totals); } } ```
Cloudflare's model is elegantly simple: this.state is a typed object backed by Durable Object storage (SQLite under the hood). The setState call is atomic and immediately consistent. There is no distinction between "session state" and "long-term state" — it is all just state on the Durable Object.
Tradeoff: Durable Objects are Cloudflare-proprietary. You cannot run this on AWS or GCP without rewriting the state layer. And this.state is per-agent-instance — if a user talks to multiple instances (different edge locations), their state is isolated. Cloudflare has addressed this with Durable Objects' location hints, but cross-region consistency is a genuine complexity.
<KnowledgeCheck questions={[ { question: "Your agent needs to remember a user's document preferences across sessions, and your team has a strict data residency requirement that all user data must remain in a specific GCP region. Which platform is the best fit?", answers: [ "Cloudflare Agents, because it has the simplest state model", "GEAP, because Memory Bank supports regional data residency within GCP", "Claude Agent SDK, because you can deploy to any GCP-hosted database", "All three are equivalent for data residency compliance" ], correct: 1, explanation: "GEAP runs on GCP and supports regional deployment. Memory Bank and Agent Sessions respect GCP regional boundaries. Cloudflare runs on its edge network (not GCP), and Claude SDK state is wherever you host your database." } ]} />
Deployment topology
Where your agent runs determines latency, cost, and operational complexity.
| | GEAP | Claude SDK | Cloudflare Agents | |---|---|---|---| | Where it runs | GCP regions (us-central1, europe-west4, etc.) | Wherever you deploy | Cloudflare edge (300+ PoPs globally) | | Cold start | Sub-second (pre-warmed) | Depends on your infra | Sub-millisecond | | Long-running | Yes — multi-day workflows via Agent Runtime | Yes — depends on your infra | Yes — Durable Objects persist indefinitely | | WebSocket | Via bidirectional streaming API | Manual implementation | Native, built into Durable Objects | | Scheduling | Via Agent Simulation / GCP Cloud Scheduler | Your implementation | Native Durable Object alarms | | Multi-region | Requires explicit configuration | You configure | Automatic global distribution |
Cloudflare wins on global latency and simplicity for real-time use cases. GEAP wins on compliance features and deep GCP ecosystem integration. Claude SDK wins on portability — you can run it on any cloud, on-premises, or in a hybrid setup.
Model flexibility
| | GEAP | Claude SDK | Cloudflare Agents | |---|---|---|---| | Models available | 200+ (Gemini, Claude, Gemma, open models) | Claude family only (without manual wiring) | Model-agnostic (bring any API) | | Best reasoning model | Gemini 3.1 Pro (with GEAP integration) | Claude Opus 4.7 | Depends on what you wire | | Multi-model agents | Yes — different sub-agents can use different models | Requires OpenRouter or manual API calls | Yes — each agent call can target a different provider | | Platform-optimised model | Gemini (tightest integration) | Claude (native) | None (bring your own) |
A nuance worth naming: GEAP lists 200+ models including Claude, but features like Agent Optimizer and Memory Bank distillation are designed assuming Gemini. You can run Claude Opus on GEAP infrastructure, but you are paying GCP prices to call Anthropic's API and losing some platform-level features in the process. If you want Anthropic's models, the Claude SDK is a more natural fit.
<Callout type="warning"> The 200-model promise has a catch. GEAP's model diversity is real for inference. But the Govern and Optimize features — Agent Anomaly Detection, Agent Optimizer, Memory Bank distillation — are designed around Gemini's capabilities and output format. If you route through Claude or an open model, test these features explicitly before relying on them in production. </Callout>
Lock-in surface area
This is the most important section for anyone building a production system. Lock-in is not binary — it is a spectrum. The question is not "can I leave?" but "how much does it cost to leave?"
GEAP lock-in surface
High. The ADK is Apache 2.0 and portable. But: - Memory Bank: proprietary GCP service, no export API at launch - Agent Registry: your tool and agent catalogue lives in GCP - Agent Gateway: traffic routing, rate limiting, and Model Armor are GCP-native - Agent Identity: cryptographic IDs are GCP-issued; audit trails are in Cloud Audit Logs - Agent Runtime: the execution environment is GCP-managed
If you leave GCP, you take your ADK code and rebuild every platform service. This is not unprecedented — it is the same trade-off you make with AWS Lambda (portable code, locked runtime) — but you should price it in.
Mitigation: Keep your business logic in ADK tools, not in platform-specific configurations. Avoid Memory Bank for any data you expect to migrate. Use agent instructions rather than Gateway rules for routing logic where possible.
Claude Agent SDK lock-in surface
Low. The SDK calls the Anthropic API. Your agent logic is plain Python. Your state is in your own database. To leave: - Point your API calls at a different provider (or use a gateway like LiteLLM to abstract the provider) - Your tool code, conversation logic, and state management are unchanged
The only hard dependency is model compatibility — prompts tuned for Claude Opus may need adjustment for Gemini or GPT-4o. But the code itself is portable.
Cloudflare Agents lock-in surface
Medium. The TypeScript SDK and @callable() pattern are open-source. But Durable Objects are Cloudflare-proprietary:
- If you leave Cloudflare, you rewrite the state layer (Durable Objects → Postgres, Redis, or a managed database)
- Scheduling (Durable Object alarms) needs replacement
- WebSocket connection management (built into Durable Objects) needs replacement
The agent logic itself — the methods decorated with @callable() — is portable. The infrastructure contract is not.
Decision framework: which platform for which workload
Use this framework when you are choosing a platform for a new agent workload.
Choose GEAP when
- You are already on GCP and your data is in BigQuery, Cloud SQL, or GCS. The integration story is compelling — your agents read your data without egress or cross-cloud plumbing.
- You need enterprise governance. Agent Identity, Agent Anomaly Detection, and Security Command Center integration are production-ready out of the box. Building equivalent features with Claude SDK takes months.
- You are building a multi-agent system with 5+ agents. Agent Registry, Agent Gateway, and the Govern pillar were designed for exactly this scale. Wrangling 10 agents with Claude SDK and a homegrown registry is painful.
- Your workload is long-running (multi-day workflows, invoice processing pipelines, autonomous research tasks). Agent Runtime's multi-day session support is purpose-built for this.
Choose Claude Agent SDK when
- Reasoning quality is the primary constraint. For complex multi-step tasks where accuracy matters more than speed, Claude Opus 4.7 is the strongest available model. If your agent needs to reason through ambiguous legal contracts, financial statements, or complex code, Claude SDK gives you direct access.
- You need to avoid vendor lock-in. If there is any possibility you will need to move infrastructure (M&A, cloud cost negotiation, regulatory requirement), Claude SDK's portability is worth the engineering investment in a self-managed state layer.
- Your stack is heterogeneous. Running on AWS with a team that knows Postgres and Redis? Claude SDK fits without requiring GCP knowledge.
- You need model comparisons. Building an eval harness where you test Claude vs GPT vs Gemini on the same task? Claude SDK is one implementation; swapping models via LiteLLM or OpenRouter is straightforward.
Choose Cloudflare Agents when
- Latency is the primary constraint. Customer-facing chat agents where every millisecond matters, real-time WebSocket applications, or interactive voice interfaces. Cloudflare's edge delivers sub-millisecond cold starts that no regional cloud can match.
- Your app is already on Cloudflare. If you are using Cloudflare Pages, Workers, or D1, Agents slots in natively. The Durable Object model is consistent with your existing Workers patterns.
- You are building in TypeScript and want to avoid Python. Cloudflare Agents is TypeScript-first. If your team is JavaScript/TypeScript-native, the ergonomics are significantly better than ADK.
- You want simple, co-located state. For agents where state is relatively simple (user preferences, conversation history, counters),
this.setState()on a Durable Object is the simplest possible implementation — no database connection, no ORM, no schema migration.
I'm building a compliance monitoring agent for a fintech company. It needs to: flag unusual transactions, retain 90 days of alert history per user, run continuously on GCP (we're all-in on GCP), and p…
This workload is a strong GEAP match for four reasons:
- You're already on GCP: Data and compute stay in one cloud, no egress costs or cross-cloud IAM complexity.
- Compliance is table stakes: Agent Identity gives you cryptographic audit trails per agent invocation. Security Command Center integration means your regulators can audit agent activity through tools they already know. Building equivalent compliance infrastructure on Claude SDK would take months.
- Anomaly detection is built in: Agent Anomaly Detection uses statistical models + LLM-as-judge to flag unusual agent reasoning — which maps naturally onto your unusual-transaction detection use case.
- 90-day alert history: Memory Bank and Agent Sessions handle cross-session retention within GCP's data residency guarantees. Your 90-day window is a configuration, not a database schema you build.
What you'd lose: If you ever leave GCP, the audit trail and anomaly detection live in GCP services. Build your GEAP tool logic to be portable (pure Python functions with no GCP API calls inside them) and you mitigate this risk significantly.
Claude SDK would apply if reasoning quality on ambiguous transaction patterns were the primary constraint — Claude Opus 4.7 may outperform Gemini 3.1 Pro on nuanced edge cases. You could use both: GEAP for orchestration, governance, and state, with a Claude Opus sub-agent for the high-stakes flagging decisions.`} />
The hybrid approach
Nothing in these three platforms is mutually exclusive. The most sophisticated production setups mix them:
- GEAP orchestration + Claude sub-agents: Use Agent Registry and Agent Gateway for governance, but route specific high-stakes decisions through a Claude Opus sub-agent via GEAP's Anthropic integration
- Cloudflare edge + GEAP backend: Real-time WebSocket connection via Cloudflare Agents for <50ms user-facing latency, with heavy processing delegated to GEAP Agent Runtime via an async queue
- Claude SDK + Cloudflare state: Use Claude for reasoning, Cloudflare Durable Objects as a simple, co-located state store, deploy on a VPS or Lambda
The lock-in analysis applies to each layer independently. You can use GEAP's agent governance while keeping your raw data in your own database — you just cannot use Memory Bank for that data.
<KnowledgeCheck questions={[ { question: "A startup is building a real-time coding assistant that runs inside a VS Code extension. The agent needs to respond within 200ms, maintain per-user context (active file, recent edits), and support TypeScript. Which platform is the best primary fit?", answers: [ "GEAP — because it has the most comprehensive model access", "Claude Agent SDK — because Claude Opus 4.7 has the best code reasoning", "Cloudflare Agents — because sub-millisecond edge latency and TypeScript-native Durable Objects match all three constraints", "All three are equally suitable" ], correct: 2, explanation: "Sub-200ms response time favors Cloudflare's edge. Per-user context fits naturally into Durable Object state. TypeScript-native is Cloudflare's strength. GEAP would add unnecessary latency for this real-time use case; Claude SDK lacks managed state and edge deployment." }, { question: "Which GEAP feature has no direct equivalent in either Claude Agent SDK or Cloudflare Agents?", answers: [ "Tool calling", "Multi-agent orchestration", "Agent Identity (cryptographic per-agent audit ID)", "Long-running workflows" ], correct: 2, explanation: "Tool calling and multi-agent orchestration exist in all three platforms. Long-running workflows are supported by all three (Durable Objects persist indefinitely; Claude SDK runs on your infra). Agent Identity — a cryptographic ID tied to every agent invocation with auditable trails — is a GEAP-specific Govern feature with no built-in equivalent elsewhere." }, { question: "You are migrating an agent from GEAP to Claude Agent SDK. Which GEAP component requires the most migration engineering?", answers: [ "ADK tool functions", "The agent instruction (system prompt)", "Memory Bank (cross-session long-term memory)", "The model selection (gemini-flash-latest)" ], correct: 2, explanation: "ADK tool functions are plain Python — copy them. The instruction is text — copy it. Model selection is a configuration change. Memory Bank is a proprietary managed service with no export API at launch; you must rebuild the long-term memory layer from scratch using a vector DB or similar." } ]} />
Hands-on exercise: Map the budget tracker to three platforms
Goal: Understand what changes and what stays the same when you move an agent across platforms — without writing code.
Steps:
Take the budget tracker agent from Chapter 2 (the agent with log_expense, get_expense_summary, and session state). For each of the three platforms, answer these questions in writing:
GEAP (you already built this): 1. Where does session state live? 2. How would you implement long-term memory of the user's monthly spending patterns? 3. Which Govern feature would you enable first in production, and why?
Claude Agent SDK: 1. What database/service would you use for session state? (Be specific: Postgres, Redis, DynamoDB, etc.) 2. How would you implement long-term memory? (Describe the retrieval mechanism) 3. What changes in the tool function signatures when moving from ADK to the Anthropic SDK?
Cloudflare Agents:
1. Draw the BudgetAgent class structure (TypeScript, using this.setState()). What fields does the state object have?
2. How would you expose the logExpense and getSummary methods? (Hint: @callable())
3. What is the state isolation risk when the same user connects from two different Cloudflare edge locations?
Success criteria: A written comparison that correctly identifies: (a) the state management approach for each platform, (b) one genuine trade-off for each, and (c) which platform you would choose for your specific use case and why.
What's next
You have completed the Gemini Enterprise Agent Platform hands-on tour. The logical next step is the capstone: a two-agent invoice-processing pipeline that ties together everything from Chapters 1-4. Full capstone specification is in the gemini-enterprise-agent-platform-hands-on-tour · outline.
If you are evaluating other agent platforms, see also: - claude-tool-use-from-zero for a deep dive on Claude's tool-use patterns - cloudflare-agents-edge-patterns for Durable Objects and edge agent architecture
References
[1] Google Cloud Blog. "Introducing Gemini Enterprise Agent Platform." 23 April 2026. — https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform · retrieved 2026-04-30
[2] Google Agent Development Kit. Official documentation. — https://adk.dev/ · retrieved 2026-04-30
[3] Anthropic. Claude platform API reference. — https://claude.com/platform/api · retrieved 2026-04-30
[4] Cloudflare. Cloudflare Agents documentation. — https://developers.cloudflare.com/agents/ · retrieved 2026-04-30
[5] Anthropic. "Agents and Tools." Anthropic Documentation. — https://docs.anthropic.com/en/docs/agents-and-tools · retrieved 2026-04-30