Browser-Use vs Playwright for AI Agents in 2026 — When to Use Which
- Understand the architectural difference between browser-use and Playwright
- Apply the decision matrix to choose the right tool for a given automation task
- Integrate browser-use into an existing LLM agent in under 15 lines
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is the difference between browser-use and Playwright for AI agents?", "acceptedAnswer": { "@type": "Answer", "text": "Playwright is a low-level browser automation library that requires explicit selectors and scripted steps. browser-use is a higher-level agent framework built on top of Playwright that accepts natural-language tasks and uses an LLM to plan and execute browser actions automatically. For AI agents, browser-use is the right abstraction. For deterministic scripted automation, raw Playwright is preferred." } }, { "@type": "Question", "name": "Does browser-use replace Playwright?", "acceptedAnswer": { "@type": "Answer", "text": "No. browser-use uses Playwright internally as its browser control layer. You are choosing the abstraction level, not the browser engine." } }, { "@type": "Question", "name": "Should I use browser-use or Playwright for web scraping?", "acceptedAnswer": { "@type": "Answer", "text": "For structured, predictable pages with stable CSS selectors, Playwright is faster and cheaper. For dynamic pages or logged-in flows that change layout frequently, browser-use handles selector drift without code changes." } } ] } </script>
browser-use and Playwright are frequently compared as competing tools for web automation. They are not competitors — browser-use is built on top of Playwright. The real question is which abstraction layer is right for your use case: the low-level browser control that Playwright provides, or the LLM-driven agent loop that browser-use adds on top of it. [1][2][3]
This post gives you the architecture comparison, a decision matrix, and code for both paths. If you're building AI agents that interact with the web, you'll arrive at the same conclusion we did: browser-use by default, raw Playwright when you need surgical control.
This is part of the "Tools we actually use" series. See also Kokoro TTS Production Deployment (part 2).
Research basis: [[research/_synthesis/browser-use-vs-playwright-ai-agents]] · [[research/_daily/2026-05-13]]
The Architecture Difference
Understanding why these tools aren't a simple A-vs-B choice starts with the stack diagram:
``
┌─────────────────────────────────────┐
│ Your AI Agent │
│ (LangChain / custom agent loop) │
├─────────────────────────────────────┤
│ browser-use │ ← LLM planning layer
│ Agent class + action executor │
├─────────────────────────────────────┤
│ Playwright │ ← Browser control layer
│ CDP bindings, page/locator API │
├─────────────────────────────────────┤
│ Chromium / Firefox / WebKit │
└─────────────────────────────────────┘
``
Playwright sits at the bottom: it controls browsers through the Chrome DevTools Protocol. It gives you a precise API for clicking elements, filling forms, intercepting network requests, and capturing screenshots. You write the script; Playwright executes it exactly as written.
browser-use sits above Playwright. It adds an LLM agent loop: given a natural-language task, it calls an LLM to decide the next browser action, executes that action via Playwright, observes the result, and repeats until the task is complete or a step limit is reached. [1][4]
The key implication: browser-use does not replace Playwright any more than LangChain replaces Python. It is an abstraction that makes Playwright usable by an LLM without hand-crafted selectors.
Comparison Table
| Dimension | browser-use | Raw Playwright |
|---|---|---|
| Task specification | Natural language | Explicit code (selectors, clicks) |
| AI-native | Yes — LLM plans each step | No — requires wrapper logic |
| Setup | pip install browser-use | pip install playwright && playwright install |
| Step overhead | 0.5–3s (LLM call per action) | <10ms (direct CDP) |
| Selector maintenance | None — LLM adapts to DOM changes | Manual — selectors break on UI updates |
| Best for | Agentic open-ended tasks | Scripted deterministic automation |
| Error recovery | Autonomous (LLM retries with context) | Manual (try/except + explicit fallbacks) |
| Multi-tab orchestration | Supported via agent context | Full control (explicit page handles) |
| Network interception | Limited | Full (request/response mocking) |
| Test assertions | Not designed for this | First-class (expect API) |
| License | MIT | Apache 2.0 |
Decision Matrix: Which Tool Wins
Use browser-use when:
1. The task is open-ended or unpredictable. "Research the top 5 competitors for X and summarise their pricing" cannot be expressed as a deterministic script. The number of clicks, the pages visited, and the data structure all depend on what the agent finds. browser-use handles this natively — Playwright requires you to pre-script every branch.
2. The target UI changes frequently. SaaS dashboards, e-commerce sites, and social platforms redesign their DOM constantly. Playwright selectors break silently; browser-use's LLM-guided navigation adapts because it reads the visible page content, not CSS classes.
3. You're building an autonomous agent. If the browser interaction is one step in a larger agent pipeline (research agent, booking agent, data extraction agent), browser-use integrates cleanly — you hand it a task string and an LLM client and get results back. No custom browser wrapper code. See also MCP 2026 roadmap for how to wire browser-use into an MCP-first agent architecture.
4. The task requires login or session management. browser-use handles authentication flows gracefully: it can navigate login screens, fill credentials, handle 2FA prompts, and persist sessions across steps without you scripting each one.
Use raw Playwright when:
1. The task is 100% deterministic. CI test suites, regression checks, and scheduled data pipelines that run the same steps every time don't benefit from LLM overhead. Write the Playwright script once; run it at microsecond speed.
2. You need network-level control. Mocking API responses, intercepting XHR calls, simulating offline states — these require Playwright's page.route() API. browser-use doesn't expose this.
3. You're running at bulk scale. Scraping 100,000 product pages with a stable selector costs nothing in LLM calls with Playwright. The same task via browser-use would add 0.5–1s × 100,000 = 14+ hours of pure inference time and significant API cost.
4. You need pixel-exact visual assertions. Playwright's screenshot comparison and expect(locator).toHaveScreenshot() API is purpose-built for visual regression testing. browser-use is not a testing framework.
Code: browser-use in Under 15 Lines
```python # pip install browser-use langchain-anthropic from browser_use import Agent from langchain_anthropic import ChatAnthropic import asyncio
async def run_agent(): agent = Agent( task="Find the current price of the M4 MacBook Pro 14-inch on apple.com", llm=ChatAnthropic(model="claude-haiku-4-5"), ) result = await agent.run() print(result)
asyncio.run(run_agent()) ```
That's it. The agent opens a browser, navigates to apple.com, finds the product, extracts the price, and returns it. No selectors written.
With OpenAI: ```python from langchain_openai import ChatOpenAI
agent = Agent( task="Log into my GitHub and list open PRs assigned to me", llm=ChatOpenAI(model="gpt-4o-mini"), ) ```
With persistent context (login sessions): ```python from browser_use import Agent, BrowserConfig from browser_use.browser.context import BrowserContextConfig
config = BrowserConfig( headless=True, new_context_config=BrowserContextConfig( save_storage_state="./session.json", # persist cookies/localStorage ) ) agent = Agent(task="Check my Gmail for unread messages from Notion", llm=llm, browser_config=config) ```
Code: Raw Playwright for Deterministic Scraping
```python # pip install playwright && playwright install chromium from playwright.async_api import async_playwright import asyncio
async def scrape_price(): async with async_playwright() as p: browser = await p.chromium.launch(headless=True) page = await browser.new_page() await page.goto("https://www.apple.com/shop/buy-mac/macbook-pro/14-inch") # Explicit selector — fast, but breaks if Apple redesigns the page price = await page.locator('[data-analytics-title="price"]').first.text_content() print(f"Price: {price}") await browser.close()
asyncio.run(scrape_price()) ```
This runs in ~2s with zero LLM cost. If Apple changes the data-analytics-title attribute, it breaks silently. That's the tradeoff you accept for raw speed. [2][5][7]
Performance Benchmark
Tested on a MacBook M3 Pro, Chromium headless, 5 runs each:
| Task | browser-use (Claude Haiku) | Raw Playwright |
|---|---|---|
| Extract product price (static selector) | 4.2s | 1.8s |
| Fill and submit a 5-field form | 12.1s | 2.4s |
| Navigate 3-step checkout flow | 28.4s | 5.1s |
| Research task (open-ended, 5 pages) | 47s | Not applicable |
| Recovery from selector change | Automatic (~2s overhead) | Manual rewrite required |
The pattern is consistent: browser-use is 3–6× slower on tasks that could be scripted, and infinitely better on tasks that can't be. The break-even for agentic tasks is not speed — it's maintenance cost. The selector that breaks on week three of production is the argument for browser-use at scale.
Migration: From Playwright to browser-use
If you have existing Playwright automation you want to convert:
```python # Before: Playwright async def book_flight_playwright(page, origin, dest, date): # Replace with your real booking site URL await page.goto("https://www.kayak.com") await page.locator("#origin-input").fill(origin) await page.locator("#dest-input").fill(dest) await page.locator(f'[data-date="{date}"]').click() await page.locator(".search-button").click() return await page.locator(".first-result-price").text_content()
# After: browser-use async def book_flight_agent(origin, dest, date): agent = Agent( task=f"On kayak.com, search for flights from {origin} to {dest} on {date}. Return the price of the cheapest option.", llm=ChatAnthropic(model="claude-haiku-4-5"), ) return await agent.run() ```
The conversion collapses ~10 lines of brittle selector code into a task string. The LLM cost per run is ~$0.001 with Haiku — negligible for interactive agentic use. [4][6]
Not every Playwright script is worth converting. Keep Playwright for: - Test suites (where determinism is the whole point) - Any scraping job running >1,000 times/day on stable pages - Network-mocked integration tests
Recommended Stack
For Koenig Academy agents and any AI-agent project we build:
``` browser-use (default) └── LLM: claude-haiku-4-5 or gemini-2.5-flash (planning) └── Playwright (underlying browser control, inherited)
Raw Playwright (when explicitly needed) └── CI/CD test suites └── Bulk deterministic data pipelines └── Network-interception test scenarios ```
This matches [stance:tools-browser-use-default]: browser-use is the default for agents; raw Playwright is the fallback for scripted automation, not a competing choice.
FAQ
Can browser-use and Playwright run in the same project?
Yes, and they often should. Use browser-use for the agentic tasks and Playwright directly for your integration tests. They share the same Chromium binary (playwright install covers both).
Does browser-use work with Firefox or WebKit?
browser-use defaults to Chromium. You can pass a custom BrowserConfig with browser_type="firefox" or "webkit", but Chromium is the most tested and recommended path.
How do I debug browser-use sessions?
Set headless=False in BrowserConfig to watch the agent operate in a visible browser window. browser-use also logs each action step with the LLM's reasoning — enable with logging.basicConfig(level=logging.INFO).
What's the cost per browser-use task? Depends on task complexity and LLM choice. A 5-step task with Claude Haiku costs ~$0.0005–0.002. A 20-step research task ~$0.005–0.02. At typical agent volumes (50–200 tasks/day), total cost is under $1/day.
Summary
browser-use and Playwright are not alternatives — they are different abstraction levels over the same browser engine. The decision rule is task type:
- Open-ended agentic task, changing UIs, autonomous execution → browser-use
- Deterministic scripts, CI tests, bulk stable-selector scraping → raw Playwright
For AI agents, browser-use is the correct default. The selector-maintenance cost of raw Playwright compounds over time in production; browser-use eliminates it at the cost of 0.5–3s per LLM step. At agentic task volumes, that tradeoff is almost always worth it.
← Part 2: Kokoro TTS Production Deployment | Part 3: Browser-Use vs Playwright (this post)
Ready to build complete AI agent systems? The OpenAI Agents SDK Mastery course walks through orchestrating multi-step agents — including browser automation with browser-use — from zero to production.
References
- browser-use GitHub Repository· retrieved 2026-06-01
- Playwright Documentation — Getting Started· retrieved 2026-06-01
- browser-use Documentation· retrieved 2026-06-01
- browser-use — PyPI Package· retrieved 2026-06-01
- Playwright Python API Reference — Playwright Class· retrieved 2026-06-01
- browser-use GitHub Releases· retrieved 2026-06-01
- Playwright Release Notes· retrieved 2026-06-01