← All guides

Claude Agent SDK + Playwright: Browser Automation Patterns

How to combine Claude Agent SDK with Playwright for AI-driven browser automation — web scraping, form filling, UI testing, and multi-step browser.

Claude Agent SDK + Playwright: Browser Automation Patterns

Combining Claude Agent SDK with Playwright gives you browser automation that can reason — not just click a fixed sequence of selectors, but decide what to do next based on what it sees on the page. Claude reads the page, decides which actions to take, calls Playwright tools to execute them, and interprets the results. This guide covers the integration patterns, the tool definitions that make it work reliably, and the error recovery that keeps it production-grade.


Architecture Overview

Claude acts as the reasoning layer. Playwright acts as the execution layer. You expose Playwright operations as Claude tools.

User goal → Claude (plan + reason) → Tool call → Playwright (execute) → Claude reads result → next step

The key design decision: keep each Playwright tool narrow and reliable. Don't build one do_everything_on_page tool — build click_element, fill_input, get_page_text, wait_for_element. Claude decides which combination to call.


Setting Up the Integration

pip install anthropic playwright
playwright install chromium
import anthropic
import json
import asyncio
from playwright.async_api import async_playwright, Page, Browser

client = anthropic.Anthropic()

# Global browser state
browser: Browser = None
page: Page = None


async def init_browser(headless: bool = True):
    """Initialize Playwright browser."""
    global browser, page
    playwright = await async_playwright().start()
    browser = await playwright.chromium.launch(headless=headless)
    page = await browser.new_page()
    return page


async def close_browser():
    global browser
    if browser:
        await browser.close()

Defining Playwright Tools for Claude

PLAYWRIGHT_TOOLS = [
    {
        "name": "navigate_to",
        "description": "Navigate the browser to a URL. Use when you need to load a new page.",
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The full URL to navigate to, including https://"
                }
            },
            "required": ["url"]
        }
    },
    {
        "name": "get_page_content",
        "description": "Get the visible text content of the current page. Use to understand what's on screen before taking action.",
        "input_schema": {
            "type": "object",
            "properties": {
                "max_chars": {
                    "type": "integer",
                    "description": "Maximum characters to return (default 5000)",
                    "default": 5000
                }
            }
        }
    },
    {
        "name": "click_element",
        "description": "Click an element on the page. Provide a CSS selector or text to find the element.",
        "input_schema": {
            "type": "object",
            "properties": {
                "selector": {
                    "type": "string",
                    "description": "CSS selector OR text content of the element to click. For text: use 'text=Submit' format."
                },
                "wait_after_ms": {
                    "type": "integer",
                    "description": "Milliseconds to wait after clicking (default 500)",
                    "default": 500
                }
            },
            "required": ["selector"]
        }
    },
    {
        "name": "fill_input",
        "description": "Fill a text input or textarea with a value.",
        "input_schema": {
            "type": "object",
            "properties": {
                "selector": {
                    "type": "string",
                    "description": "CSS selector for the input field"
                },
                "value": {
                    "type": "string",
                    "description": "Text to type into the field"
                },
                "clear_first": {
                    "type": "boolean",
                    "description": "Clear existing content before typing (default true)",
                    "default": True
                }
            },
            "required": ["selector", "value"]
        }
    },
    {
        "name": "take_screenshot",
        "description": "Take a screenshot of the current page state. Use when you need to see the current visual state to decide next action.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "File path to save screenshot (e.g., /tmp/screenshot.png)"
                }
            },
            "required": ["path"]
        }
    },
    {
        "name": "wait_for_selector",
        "description": "Wait for an element to appear on the page. Use after navigation or after clicking something that triggers loading.",
        "input_schema": {
            "type": "object",
            "properties": {
                "selector": {
                    "type": "string",
                    "description": "CSS selector to wait for"
                },
                "timeout_ms": {
                    "type": "integer",
                    "description": "Maximum wait time in ms (default 5000)",
                    "default": 5000
                }
            },
            "required": ["selector"]
        }
    },
    {
        "name": "get_element_text",
        "description": "Get the text content of a specific element. More precise than get_page_content when you need a specific value.",
        "input_schema": {
            "type": "object",
            "properties": {
                "selector": {
                    "type": "string",
                    "description": "CSS selector for the element"
                }
            },
            "required": ["selector"]
        }
    },
    {
        "name": "extract_table",
        "description": "Extract data from an HTML table as JSON. Use for scraping tabular data.",
        "input_schema": {
            "type": "object",
            "properties": {
                "selector": {
                    "type": "string",
                    "description": "CSS selector for the table element",
                    "default": "table"
                },
                "max_rows": {
                    "type": "integer",
                    "description": "Maximum rows to extract",
                    "default": 100
                }
            }
        }
    }
]

Tool Execution Functions

async def execute_playwright_tool(tool_name: str, tool_input: dict) -> str:
    """Execute a Playwright tool and return result as string."""
    global page

    try:
        if tool_name == "navigate_to":
            await page.goto(tool_input["url"], wait_until="domcontentloaded")
            return f"Navigated to: {page.url}"

        elif tool_name == "get_page_content":
            max_chars = tool_input.get("max_chars", 5000)
            content = await page.evaluate("document.body.innerText")
            if len(content) > max_chars:
                content = content[:max_chars] + f"\n... [truncated at {max_chars} chars]"
            return content

        elif tool_name == "click_element":
            selector = tool_input["selector"]
            wait_ms = tool_input.get("wait_after_ms", 500)

            # Support 'text=...' format
            if selector.startswith("text="):
                text = selector[5:]
                await page.click(f"text={text}")
            else:
                await page.click(selector)

            await page.wait_for_timeout(wait_ms)
            return f"Clicked: {selector}"

        elif tool_name == "fill_input":
            selector = tool_input["selector"]
            value = tool_input["value"]
            clear_first = tool_input.get("clear_first", True)

            if clear_first:
                await page.fill(selector, "")
            await page.type(selector, value, delay=50)  # Simulate human typing
            return f"Filled {selector} with: {value[:50]}..."

        elif tool_name == "take_screenshot":
            path = tool_input["path"]
            await page.screenshot(path=path)
            return f"Screenshot saved: {path}"

        elif tool_name == "wait_for_selector":
            selector = tool_input["selector"]
            timeout = tool_input.get("timeout_ms", 5000)
            await page.wait_for_selector(selector, timeout=timeout)
            return f"Element found: {selector}"

        elif tool_name == "get_element_text":
            selector = tool_input["selector"]
            element = await page.query_selector(selector)
            if not element:
                return f"Element not found: {selector}"
            text = await element.inner_text()
            return text

        elif tool_name == "extract_table":
            selector = tool_input.get("selector", "table")
            max_rows = tool_input.get("max_rows", 100)

            table_data = await page.evaluate(f"""
                () => {{
                    const table = document.querySelector('{selector}');
                    if (!table) return null;
                    const rows = Array.from(table.querySelectorAll('tr')).slice(0, {max_rows + 1});
                    const headers = Array.from(rows[0]?.querySelectorAll('th, td') || []).map(c => c.innerText.trim());
                    const data = rows.slice(1).map(row =>
                        Object.fromEntries(
                            Array.from(row.querySelectorAll('td')).map((c, i) => [headers[i] || `col${{i}}`, c.innerText.trim()])
                        )
                    );
                    return data;
                }}
            """)

            if not table_data:
                return "No table found"
            return json.dumps(table_data[:max_rows], indent=2)

        else:
            return f"Unknown tool: {tool_name}"

    except Exception as e:
        return f"Tool error ({tool_name}): {type(e).__name__}: {str(e)}"

The Agent Loop

async def run_browser_agent(goal: str, max_turns: int = 20) -> str:
    """
    Run a Claude agent that uses Playwright to achieve a browser-based goal.
    """
    messages = []
    turn = 0

    system = """You are a browser automation agent. You have tools to control a web browser.

Strategy:
1. Start by navigating to the relevant URL
2. Use get_page_content to understand what's on the page before taking action
3. Take targeted actions (click, fill) based on what you see
4. Use wait_for_selector after actions that trigger loading
5. Extract the data you need with get_element_text or extract_table
6. Report your findings clearly when done

Important:
- If a selector fails, try a different approach (text=..., broader selector, etc.)
- Read page content before clicking — don't assume element locations
- For login flows, stop and report back rather than entering credentials"""

    messages.append({"role": "user", "content": goal})

    while turn < max_turns:
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=2048,
            system=system,
            tools=PLAYWRIGHT_TOOLS,
            messages=messages
        )

        # Add assistant response to history
        messages.append({"role": "assistant", "content": response.content})

        # Check if done
        if response.stop_reason == "end_turn":
            # Extract final text response
            for block in response.content:
                if hasattr(block, 'text'):
                    return block.text
            return "Task completed."

        # Process tool calls
        if response.stop_reason == "tool_use":
            tool_results = []

            for block in response.content:
                if block.type == "tool_use":
                    print(f"[Turn {turn+1}] Calling: {block.name}({json.dumps(block.input, ensure_ascii=False)[:100]})")
                    result = await execute_playwright_tool(block.name, block.input)
                    print(f"  → {result[:200]}")

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            messages.append({"role": "user", "content": tool_results})

        turn += 1

    return f"Reached max turns ({max_turns}) without completing task."


# Example usage
async def main():
    await init_browser(headless=True)

    try:
        result = await run_browser_agent(
            "Go to https://news.ycombinator.com and extract the top 5 story titles and their scores"
        )
        print("\nResult:")
        print(result)
    finally:
        await close_browser()

if __name__ == "__main__":
    asyncio.run(main())

Common Patterns

Pattern 1: Data Extraction

# Extract product pricing from e-commerce sites
result = await run_browser_agent("""
Go to https://example-shop.com/products and extract:
- Product name
- Current price
- Whether in stock

Format as a JSON array.
""")

Pattern 2: Form Automation

# Fill out a contact form
result = await run_browser_agent("""
Go to https://example.com/contact
Fill out the contact form with:
- Name: "Test User"
- Email: "test@example.com"
- Message: "Testing the contact form"
Then click Submit and report whether it succeeded.
""")

Pattern 3: UI Testing

# Verify UI functionality
result = await run_browser_agent("""
Test the login flow at https://app.example.com:
1. Navigate to the login page
2. Verify the email and password fields exist
3. Verify the Submit button exists
4. Take a screenshot of the login page
5. Report what you found (don't actually log in)
""")

Frequently Asked Questions

Is Claude + Playwright reliable enough for production automation? For data extraction and read-only workflows: yes, with error handling. For form submissions and transactional workflows: use with approval gates. The combination is more reliable than traditional selector-based automation because Claude can adapt to page structure changes.

How do I handle login-protected pages? Pre-authenticate using Playwright's cookie/storage state mechanism before starting the agent. Save the authenticated session with context.storage_state() and load it at agent startup. Never let the agent handle credentials.

What happens when a selector fails? The tool returns an error string, which Claude reads and adapts to. Claude will typically try an alternative selector or a different strategy. Include "report the error and explain what you tried" in your system prompt to ensure failures surface clearly.

How do I prevent the agent from accidentally submitting forms? Add to the system prompt: "Do NOT click any button labeled Submit, Confirm, Purchase, Delete, or similar. Treat these as requiring human approval." For production: add a click_element pre-check that blocks certain selectors.

What's the cost of running browser automation with Claude? A typical extraction task (5-15 turns, reading pages) costs $0.05-0.20 at Sonnet pricing. For bulk automation of 100 tasks/day, budget $5-20/day. Use Haiku for simpler tasks to reduce costs 3-4x.


Related Guides


Go Deeper

Agent SDK Cookbook — $49 — Full Playwright integration including authenticated session management, anti-bot evasion patterns, parallel browser automation across multiple pages, and the approval gate pattern for transactional workflows.

→ Get the Agent SDK Cookbook — $49

30-day money-back guarantee. Instant download.

AI Disclosure: Written with Claude Code; patterns tested in production browser automation.

Tools and references