Skip to content

Claude computer use

Using Claude to complete tasks within Tallyfy

Claude can control computers by looking at screens, moving cursors, clicking buttons, and typing text. This “Computer Use” capability launched in October 2024 as a public beta, available through Anthropic’s API, Amazon Bedrock, and Google Cloud Vertex AI.

Start small with AI agent tasks

Put your step-by-step instructions in the Tallyfy task description. Start with short, mundane tasks. Don’t ask an AI agent to handle huge, decision-driven jobs - they’re prone to unpredictable behavior and hallucination, and costs add up fast.

Computer Use vs MCP integration

This article covers Claude Computer Use - where Claude sees and controls screens through screenshots, mouse movements, and keyboard actions. That’s different from Claude’s MCP integration, which gives text-based chat access to data sources and APIs.

When to use each:

  • Computer Use (this article): Automating visual UI tasks - clicking buttons, filling forms, working through menus
  • MCP Integration: Data queries, API-based workflow management, text-based automation

Both can complement each other in automation workflows.

How computer use works

Rather than building thousands of app-specific integrations, Anthropic gave Claude general computer skills. Claude uses an API to see and interact with any application inside a sandboxed environment.

Diagram

What to notice:

  • Tallyfy provides the task description and expected outputs that guide Claude’s actions
  • Claude loops through screenshot-analyze-act cycles until the task is done
  • Results, logs, and screenshots get captured back into Tallyfy fields

Model support and performance

Models with computer use support:

  • Claude Sonnet 4.6 - Best balance of performance and cost for most automation
  • Claude Opus 4.6 - Flagship model for the most demanding tasks
  • Claude Haiku 4.5 - Lighter option for simpler, faster automation

Performance benchmarks (OSWorld):

  • Sonnet 4.6 scores 72.5% - now matching human-level performance (72.4%)
  • Rapid improvement from earlier models (Sonnet 4.5 scored 61.4%)
  • Still experimental, so expect some errors on tricky UI interactions

The agent loop

Here’s how Tallyfy coordinates Claude’s computer use through an iterative loop - Claude perceives, acts, and gets feedback until your task is done.

Diagram

What to notice:

  • Tallyfy triggers your intermediary app via webhook with task data
  • The loop between Claude and the sandbox continues until the task is done
  • All tool execution happens in an isolated sandbox for security

Core components

Sandboxed environment: The Docker container typically includes:

  • A virtual X11 display server (like Xvfb) for rendering the desktop
  • A lightweight Linux desktop environment
  • Pre-installed apps (Firefox, LibreOffice, text editors)
  • Your implementations of Anthropic’s defined tools

Three core tools (Anthropic-defined, you execute them):

  • computer: Mouse/keyboard actions (clicks, typing, scrolling, cursor movement) and taking screenshots
  • text_editor: View, create, and edit files
  • bash: Run shell commands in the sandbox

Pricing

API pricing (verify current rates at Anthropic’s pricing page):

  • Claude Sonnet 4.6: $3 per million input tokens, $15 per million output tokens
  • Claude Haiku 4.5: $1 per million input tokens, $5 per million output tokens
  • Computer use adds extra tokens to each system prompt

Access requirements:

  • Anthropic API key with sufficient credits
  • Available through Anthropic API, Amazon Bedrock, or Google Cloud Vertex AI
  • Docker needed for the reference implementation

Real-world use cases

Computer Use works well for specific automation scenarios. Early adopters include Asana, Canva, Replit, and DoorDash.

Good applications:

  • Form filling across desktop apps
  • Extracting data from legacy systems without APIs
  • QA testing with synthetic test case generation
  • Multi-step workflows spanning multiple applications
  • Desktop file management tasks

Current limitations

Claude’s computer use is still developing. Anthropic acknowledges these constraints:

Technical:

  • Latency: Tasks with dozens or hundreds of steps can be slow
  • Error-prone: Scrolling, dragging, and zooming remain challenging
  • Resolution: May struggle above 1024x768 or 1280x800 due to image scaling
  • Reliability: Some actions people do effortlessly are still hard for Claude

Safety:

  • Claude may follow instructions found on-screen, even if they conflict with yours
  • Risk of prompt injection from webpages or images
  • Potential for misuse if not properly isolated

Rate limits:

  • API rate limits apply based on your tier
  • Processing time varies with task complexity

Getting started

You’ll need to build an intermediary app that connects Tallyfy to the Anthropic API. Anthropic provides a reference implementation with Docker.

  1. Get Anthropic API access:

    • Get an API key from the Anthropic Console
    • Review the API docs on “Tool Use” and “Computer Use”
  2. Install Docker:

    • Install the latest version of Docker
    • Required for the sandboxed environment
  3. Pull the reference implementation:

    • Anthropic provides a Docker-based reference implementation with containerized environment, tool implementations, and agent loop
    • Pull: docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
  4. Configure the environment:

    • Run the Docker container with proper security settings
    • Container runs with minimal privileges (1 CPU, 2GB RAM default)
    • Access the interface at http://localhost:8080
    • Never run Computer Use unattended - always monitor
  5. Build the intermediary app:

    • Receive webhook requests from Tallyfy
    • Build prompts and tool lists for the Claude API
    • Manage the agent loop between Claude and your sandbox
    • Send results back to Tallyfy
  6. Prompt tips:

    • Keep tasks simple and well-defined
    • Tell Claude to verify outcomes with screenshots after each step
    • Suggest keyboard shortcuts for tricky UI elements
    • Provide examples of successful interactions when you have them

Basic integration example

A simplified example of the integration flow:

from anthropic import Anthropic
import os
client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
# Basic computer use request
response = client.messages.create(
model="claude-sonnet-4-6-20260220",
max_tokens=1024,
tools=[
{
"type": "computer_20250124",
"name": "computer",
"display_width_px": 1024,
"display_height_px": 768,
},
{
"type": "text_editor_20250124",
"name": "text_editor",
},
{
"type": "bash_20250124",
"name": "bash",
}
],
messages=[
{
"role": "user",
"content": "Open the file manager and navigate to Documents folder"
}
]
)
# Handle tool use requests in the response
# Execute tools in your sandbox
# Return results to Claude
# Continue loop until task complete

Note: This is simplified. Real implementations need full agent loop handling, tool execution in a Docker sandbox, and result processing.

Security best practices

Key measures:

  • Run Computer Use in a dedicated container or VM with minimal privileges
  • Limit internet access to approved domains only
  • Never give access to sensitive data or credentials
  • Keep Claude isolated from production systems
  • Require human confirmation for critical actions
  • Enable audit logging

Known risks:

  • Prompt injection - Claude may follow on-screen instructions
  • Code execution risks if not properly sandboxed
  • Information theft if given access to sensitive data

When to use it

Good fit:

  • Desktop app automation (Excel, legacy software)
  • Data extraction from systems without APIs
  • Automated testing of desktop apps
  • Form filling across multiple apps
  • Low-risk, repetitive UI tasks

Poor fit:

  • Real-time or time-critical operations
  • Tasks needing creative judgment
  • Social media content creation (restricted by Anthropic)
  • High-security environments without proper isolation

Tips for success:

  • Start simple and well-defined
  • Set strong security boundaries
  • Monitor closely and keep humans in the loop
  • Test with low-risk data first

Compared to alternatives

Advantages:

  • Works with any desktop or web app
  • No app-specific APIs or integrations needed
  • Adapts when UIs change

Disadvantages:

  • Slower than traditional RPA for simple tasks
  • Still experimental with some error-prone execution
  • Requires Docker and sandbox infrastructure
  • Higher latency than direct API calls

Alternatives to consider:

  • Traditional RPA for stable, high-volume workflows
  • Direct API integrations when available
  • Browser-only automation tools for web tasks

Getting started checklist

  • Identify repetitive desktop tasks worth automating
  • Document exact steps with screenshots
  • Set up Anthropic API access with credits
  • Install Docker and pull the reference implementation
  • Create a Tallyfy process with clear task instructions
  • Test with low-risk, non-sensitive data first
  • Set up security isolation and monitoring
  • Refine prompts based on success rates
  • Scale gradually with proven workflows

Integrations > Computer AI agents

Computer AI agents are programs that visually interpret and interact with any screen-based interface like a human would and Tallyfy provides the structured workflow layer that sends instructions and captures results so these agents can be monitored and managed alongside your broader business processes.

Mcp Server > Using Tallyfy MCP server with Claude (text chat)

Claude Desktop provides the most mature MCP client experience for connecting to Tallyfy’s API through natural language and this guide covers setup via manual JSON configuration or one-click Desktop Extensions along with practical workflow examples like task management and process analysis and hybrid automation combining text-based MCP data queries with Claude Computer Use for visual UI tasks.

Vendors > OpenAI agent capabilities

OpenAI’s agent tools including the Responses API and Agents SDK and Computer Use model connect with Tallyfy to automate web interactions like form filling and data extraction where Tallyfy provides the structured workflow layer with audit trails and error routing to humans while agents handle simple repetitive browser tasks though performance remains modest and you should always start small and keep a human fallback for critical processes.

Byo Ai > Claude integration

Claude connects to Tallyfy’s MCP server at mcp.tallyfy.com using OAuth 2.1 with PKCE so you can manage tasks and processes and templates and automations through natural language chat in both Claude.ai web and Claude Desktop with over 40 available tools spanning read/write access to every part of your workflow system.