Claude computer use

Using Claude to complete tasks within Tallyfy

Claude can control computers by looking at screens, moving cursors, clicking buttons, and typing text. This “Computer Use” capability launched in October 2024 as a public beta, available through Anthropic’s API, Amazon Bedrock, and Google Cloud Vertex AI.

Start small with AI agent tasks

Put your step-by-step instructions in the Tallyfy task description. Start with short, mundane tasks. Don’t ask an AI agent to handle huge, decision-driven jobs - they’re prone to unpredictable behavior and hallucination, and costs add up fast.

Computer Use vs MCP integration

This article covers Claude Computer Use - where Claude sees and controls screens through screenshots, mouse movements, and keyboard actions. That’s different from Claude’s MCP integration, which gives text-based chat access to data sources and APIs.

When to use each:

Computer Use (this article): Automating visual UI tasks - clicking buttons, filling forms, working through menus
MCP Integration: Data queries, API-based workflow management, text-based automation

Both can complement each other in automation workflows.

How computer use works

Rather than building thousands of app-specific integrations, Anthropic gave Claude general computer skills. Claude uses an API to see and interact with any application inside a sandboxed environment.

What to notice:

Tallyfy provides the task description and expected outputs that guide Claude’s actions
Claude loops through screenshot-analyze-act cycles until the task is done
Results, logs, and screenshots get captured back into Tallyfy fields

Model support and performance

Models with computer use support:

Claude Sonnet 4.6 - Best balance of performance and cost for most automation
Claude Opus 4.6 - Flagship model for the most demanding tasks
Claude Haiku 4.5 - Lighter option for simpler, faster automation

Performance benchmarks (OSWorld):

Sonnet 4.6 scores 72.5% - now matching human-level performance (72.4%)
Rapid improvement from earlier models (Sonnet 4.5 scored 61.4%)
Still experimental, so expect some errors on tricky UI interactions

The agent loop

Here’s how Tallyfy coordinates Claude’s computer use through an iterative loop - Claude perceives, acts, and gets feedback until your task is done.

What to notice:

Tallyfy triggers your intermediary app via webhook with task data
The loop between Claude and the sandbox continues until the task is done
All tool execution happens in an isolated sandbox for security

Core components

Sandboxed environment: The Docker container typically includes:

A virtual X11 display server (like Xvfb) for rendering the desktop
A lightweight Linux desktop environment
Pre-installed apps (Firefox, LibreOffice, text editors)
Your implementations of Anthropic’s defined tools

Three core tools (Anthropic-defined, you execute them):

computer: Mouse/keyboard actions (clicks, typing, scrolling, cursor movement) and taking screenshots
text_editor: View, create, and edit files
bash: Run shell commands in the sandbox

Pricing

API pricing (verify current rates at Anthropic’s pricing page ↗):

Claude Sonnet 4.6: $3 per million input tokens, $15 per million output tokens
Claude Haiku 4.5: $1 per million input tokens, $5 per million output tokens
Computer use adds extra tokens to each system prompt

Access requirements:

Anthropic API key with sufficient credits
Available through Anthropic API, Amazon Bedrock, or Google Cloud Vertex AI
Docker needed for the reference implementation

Real-world use cases

Computer Use works well for specific automation scenarios. Early adopters include Asana, Canva, Replit, and DoorDash.

Good applications:

Form filling across desktop apps
Extracting data from legacy systems without APIs
QA testing with synthetic test case generation
Multi-step workflows spanning multiple applications
Desktop file management tasks

Current limitations

Claude’s computer use is still developing. Anthropic acknowledges these constraints:

Technical:

Latency: Tasks with dozens or hundreds of steps can be slow
Error-prone: Scrolling, dragging, and zooming remain challenging
Resolution: May struggle above 1024x768 or 1280x800 due to image scaling
Reliability: Some actions people do effortlessly are still hard for Claude

Safety:

Claude may follow instructions found on-screen, even if they conflict with yours
Risk of prompt injection from webpages or images
Potential for misuse if not properly isolated

Rate limits:

API rate limits apply based on your tier
Processing time varies with task complexity

Getting started

You’ll need to build an intermediary app that connects Tallyfy to the Anthropic API. Anthropic provides a reference implementation with Docker.

Get Anthropic API access:
- Get an API key from the Anthropic Console
- Review the API docs on “Tool Use” and “Computer Use”
Install Docker:
- Install the latest version of Docker
- Required for the sandboxed environment
Pull the reference implementation:
- Anthropic provides a Docker-based reference implementation with containerized environment, tool implementations, and agent loop
- Pull: docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
Configure the environment:
- Run the Docker container with proper security settings
- Container runs with minimal privileges (1 CPU, 2GB RAM default)
- Access the interface at http://localhost:8080
- Never run Computer Use unattended - always monitor
Build the intermediary app:
- Receive webhook requests from Tallyfy
- Build prompts and tool lists for the Claude API
- Manage the agent loop between Claude and your sandbox
- Send results back to Tallyfy
Prompt tips:
- Keep tasks simple and well-defined
- Tell Claude to verify outcomes with screenshots after each step
- Suggest keyboard shortcuts for tricky UI elements
- Provide examples of successful interactions when you have them

Basic integration example

A simplified example of the integration flow:

from anthropic import Anthropic
import os

client = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])

# Basic computer use request
response = client.messages.create(
    model="claude-sonnet-4-6-20260220",
    max_tokens=1024,
    tools=[
        {
            "type": "computer_20250124",
            "name": "computer",
            "display_width_px": 1024,
            "display_height_px": 768,
        },
        {
            "type": "text_editor_20250124",
            "name": "text_editor",
        },
        {
            "type": "bash_20250124",
            "name": "bash",
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Open the file manager and navigate to Documents folder"
        }
    ]
)

# Handle tool use requests in the response
# Execute tools in your sandbox
# Return results to Claude
# Continue loop until task complete

Note: This is simplified. Real implementations need full agent loop handling, tool execution in a Docker sandbox, and result processing.

Security best practices

Key measures:

Run Computer Use in a dedicated container or VM with minimal privileges
Limit internet access to approved domains only
Never give access to sensitive data or credentials
Keep Claude isolated from production systems
Require human confirmation for critical actions
Enable audit logging

Known risks:

Prompt injection - Claude may follow on-screen instructions
Code execution risks if not properly sandboxed
Information theft if given access to sensitive data

When to use it

Good fit:

Desktop app automation (Excel, legacy software)
Data extraction from systems without APIs
Automated testing of desktop apps
Form filling across multiple apps
Low-risk, repetitive UI tasks

Poor fit:

Real-time or time-critical operations
Tasks needing creative judgment
Social media content creation (restricted by Anthropic)
High-security environments without proper isolation

Tips for success:

Start simple and well-defined
Set strong security boundaries
Monitor closely and keep humans in the loop
Test with low-risk data first

Compared to alternatives

Advantages:

Works with any desktop or web app
No app-specific APIs or integrations needed
Adapts when UIs change

Disadvantages:

Slower than traditional RPA for simple tasks
Still experimental with some error-prone execution
Requires Docker and sandbox infrastructure
Higher latency than direct API calls

Alternatives to consider:

Traditional RPA for stable, high-volume workflows
Direct API integrations when available
Browser-only automation tools for web tasks

Getting started checklist

Identify repetitive desktop tasks worth automating
Document exact steps with screenshots
Set up Anthropic API access with credits
Install Docker and pull the reference implementation
Create a Tallyfy process with clear task instructions
Test with low-risk, non-sensitive data first
Set up security isolation and monitoring
Refine prompts based on success rates
Scale gradually with proven workflows

Integrations > Computer AI agents

Computer AI agents are programs that visually interpret and interact with any screen-based interface like a human would and Tallyfy provides the structured workflow layer that sends instructions and captures results so these agents can be monitored and managed alongside your broader business processes.

Mcp Server > Using Tallyfy MCP server with Claude (text chat)

Claude Desktop provides the most mature MCP client experience for connecting to Tallyfy’s API through natural language and this guide covers setup via manual JSON configuration or one-click Desktop Extensions along with practical workflow examples like task management and process analysis and hybrid automation combining text-based MCP data queries with Claude Computer Use for visual UI tasks.

Vendors > OpenAI agent capabilities

OpenAI’s agent tools including the Responses API and Agents SDK and Computer Use model connect with Tallyfy to automate web interactions like form filling and data extraction where Tallyfy provides the structured workflow layer with audit trails and error routing to humans while agents handle simple repetitive browser tasks though performance remains modest and you should always start small and keep a human fallback for critical processes.

Byo Ai > Claude integration

Claude connects to Tallyfy’s MCP server at mcp.tallyfy.com using OAuth 2.1 with PKCE so you can manage tasks and processes and templates and automations through natural language chat in both Claude.ai web and Claude Desktop with over 40 available tools spanning read/write access to every part of your workflow system.

Was this helpful?

Get in touch

About Tallyfy