← All guides

Claude Computer Use: Setup, Capabilities, and Practical Limitations

A practical guide to Claude's computer use capability — what it can actually do, how to set it up, current limitations, and real use cases for 2026.

Claude Computer Use: Setup, Capabilities, and Practical Limitations

Claude's computer use capability lets Claude control a desktop environment — take screenshots, move the mouse, click, type, and execute commands — to complete tasks that require a graphical interface. As of 2026, it works reliably for structured, well-defined tasks (form filling, data entry, file management) but remains unreliable for tasks requiring complex visual reasoning or multi-step decisions under ambiguity. If you need browser automation with stable HTML selectors, conventional tools like Playwright are faster and more reliable. Computer use is the right choice when no stable API or selector-based approach exists.


What computer use actually is

Computer use is a tool set that Anthropic provides via the API. When enabled, Claude can:

  1. Take a screenshot to see the current state of the screen
  2. Move the mouse to a specific (x, y) coordinate
  3. Click (left, right, double)
  4. Type text
  5. Press keyboard shortcuts
  6. Run terminal commands

Claude observes the screen through screenshots, decides what to do, and calls these tools in sequence. It's not pre-programmed automation — Claude is reasoning about what it sees and determining each action.


When to use computer use vs alternatives

Use computer use when:

Use Playwright/Selenium instead when:

Use a direct API instead when:


Setup: running computer use with Docker

Anthropic provides a reference implementation using Docker:

# Clone the Anthropic quickstarts repo
git clone https://github.com/anthropics/anthropic-quickstarts
cd anthropic-quickstarts/computer-use-demo

# Set your API key
export ANTHROPIC_API_KEY=sk-ant-...

# Run the Docker container (includes VNC, desktop environment, Chrome)
docker build -t computer-use-demo .
docker run \
    -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    -v $HOME/.anthropic:/home/user/.anthropic \
    -p 5900:5900 \   # VNC port
    -p 8501:8501 \   # Streamlit UI port
    -p 6080:6080 \   # noVNC (browser-based VNC)
    computer-use-demo

Access the interface at http://localhost:8501 (Streamlit UI) or http://localhost:6080 (browser-based desktop view).


The computer use API call

The core API pattern is simple: include the computer use tools in your tools list and handle tool_use blocks:

import anthropic
import base64

client = anthropic.Anthropic()

def run_computer_task(task: str) -> str:
    """
    Run a computer use task. Returns the final response text.
    """
    messages = [{"role": "user", "content": task}]
    
    while True:
        response = client.messages.create(
            model="claude-opus-4-0",  # Computer use requires Opus
            max_tokens=4096,
            tools=[
                {
                    "type": "computer_20241022",
                    "name": "computer",
                    "display_width_px": 1366,
                    "display_height_px": 768,
                    "display_number": 1,
                },
                {
                    "type": "bash_20241022",
                    "name": "bash",
                },
                {
                    "type": "text_editor_20241022",
                    "name": "str_replace_editor",
                },
            ],
            messages=messages,
        )
        
        # If no tool use, we're done
        if response.stop_reason == "end_turn":
            return next(
                (b.text for b in response.content if hasattr(b, "text")),
                "Task completed."
            )
        
        # Process tool calls
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })
        
        # Add assistant message and tool results to continue
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

def execute_tool(tool_name: str, tool_input: dict) -> str:
    """Execute a tool call and return the result."""
    if tool_name == "computer":
        action = tool_input["action"]
        if action == "screenshot":
            screenshot_data = take_screenshot()  # your screenshot implementation
            return [{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot_data}}]
        elif action == "left_click":
            click(tool_input["coordinate"])
            return "Clicked"
        elif action == "type":
            type_text(tool_input["text"])
            return "Typed"
        # ... handle other actions
    return "Tool executed"

Practical reliability patterns

Computer use requires careful prompting to be reliable. These patterns improve completion rates:

1. Specify the exact success state

task = """
Navigate to https://example.com/forms/contact.
Fill in:
- Name: John Smith
- Email: john@example.com  
- Subject: Demo request
- Message: I'd like to schedule a demo.

Click Submit. 
STOP when you see a confirmation message on screen.
If you see an error, report it and stop.
"""

Without an explicit stop condition, Claude may keep clicking around after completing the task.

2. Break large tasks into subtasks

Instead of "log in and download all invoices from the last 3 months," break it into:

  1. "Log in to https://billing.example.com with username/password X/Y"
  2. "Navigate to the invoices section and identify invoices from January–March 2026"
  3. "Download each invoice by clicking the Download button for each"

Each subtask is verifiable. If one fails, you know where it broke.

3. Add verification steps

task = """
Fill out the form at https://example.com/survey.
After clicking Submit:
1. Take a screenshot
2. Confirm the confirmation message is visible
3. Report the exact text of the confirmation message
"""

This produces an audit trail and catches silent failures (the form appeared to submit but actually didn't).


Current limitations (2026)

Latency: Each action cycle (screenshot → decision → action) takes 2–8 seconds. A task requiring 20 actions takes 1–3 minutes. Not suitable for real-time use cases.

OCR reliability: Claude reads screen text from screenshots. Small fonts, low-contrast text, and complex layouts reduce reliability. Standard UI components work well; custom-rendered interfaces are unpredictable.

Multi-monitor support: Computer use works reliably on a single monitor. Multi-monitor setups require careful coordinate mapping.

Dynamic content: JavaScript-rendered content that changes after load (infinite scroll, lazy loading) requires explicit waiting instructions. Add "wait for the page to fully load before proceeding" when necessary.

File system limitations: Computer use can interact with files visible in the desktop GUI. For programmatic file operations, use the bash tool directly.

Cost: Computer use tasks are expensive. Each screenshot + decision cycle uses ~1,000–3,000 tokens. A 20-step task at Opus pricing costs $0.30–$1.00. For tasks over 50 steps, cost can exceed $5 per run.


Frequently asked questions

Does computer use require Claude Opus, or can I use Sonnet? Anthropic recommends Claude Opus for computer use due to its superior visual reasoning. Sonnet can be used but produces less reliable results on complex interfaces.

Can computer use access any website, including authenticated ones? Yes, if you configure the browser session with the correct cookies or credentials. You can import browser cookies into the Docker environment or use the bash tool to log in before the main task.

Is computer use available on the free tier? Computer use requires API access (not claude.ai). It's billed at standard Opus token rates. There's no separate computer use fee — you pay for the tokens consumed.

Can I run computer use without Docker? Yes, if you provide your own screenshot/click/type infrastructure and implement the tool execution handlers. The Docker setup is a reference implementation. For production use, you'll likely build custom tooling.

What's the difference between computer use and Claude Code? Claude Code is a CLI agent that operates on your local filesystem and runs commands via a terminal. Computer use controls a full graphical desktop. Use Claude Code for software development tasks; use computer use for GUI applications and web automation without APIs.


Related guides


Take It Further

Claude Agent SDK Cookbook: 40 Production Patterns — Pattern 35 covers Computer Use in production: reliable task decomposition, error recovery, cost management, and the hybrid approach (computer use for setup + API for data extraction) that reduces costs by 70%.

→ Get the Agent SDK Cookbook — $49

30-day money-back guarantee. Instant download.

AI Disclosure: Drafted with Claude Code; capability assessment from Anthropic documentation and direct testing as of April 2026.

Tools and references