A Node.js/TypeScript port of Anthropic's official Python computer-use demo. This implementation provides a complete TypeScript version of Claude's computer control capabilities, allowing Claude to interact with your computer through mouse movements, keyboard input, and screen captures.
This project converts Anthropic's Python implementation to TypeScript while maintaining all the core functionalities and adding some TypeScript-specific enhancements. It enables Claude to:
- Control your computer's mouse and keyboard
- Capture and analyze screenshots
- Manage windows and applications
- Execute system commands
Perfect for developers who prefer Node.js/TypeScript or want to integrate Claude's computer control capabilities into TypeScript projects.
-
🖱️ Mouse Control
- Movement and clicks
- Dragging and scrolling
- Position tracking
- Multiple button support
-
⌨️ Keyboard Actions
- Key press and release
- Text typing
- Modifier key combinations
- Multiple key sequences
-
🪟 Window Management
- Focus control
- Move and resize
- Minimize/maximize
- Cross-platform support
-
📸 Screen Capture
- High-quality screenshots
- Automatic compression
- Organized storage
- Metadata tracking
Create a .env
file in the root directory:
# Required
ANTHROPIC_API_KEY=sk-ant-xxxx # Your Anthropic API key
# Install dependencies
pnpm install
# Build the project
pnpm run build
# Run example
pnpm run test:basic
import { ComputerTool } from './src/tools/computer';
const tool = new ComputerTool();
// Move mouse
await tool.execute({
action: 'mouse_move',
coordinate: [100, 100]
});
// Type text
await tool.execute({
action: 'type',
text: 'Hello, World!'
});
// Take screenshot
await tool.execute({
action: 'screenshot'
});
// Mouse scroll with direction
await tool.execute({
action: 'mouse_scroll',
scrollAmount: 5,
direction: 'down'
});
// Key combination
await tool.execute({
action: 'key',
text: 'Control+C'
});
// Window management
await tool.execute({
action: 'focus_window',
windowTitle: 'Chrome'
});
mouse_move
: Move cursor to coordinatesleft_click
,right_click
,middle_click
: Mouse clicksleft_click_drag
: Click and dragmouse_scroll
: Scroll in any directionmouse_toggle
: Press/release mouse buttons
key
: Single key or combination presstype
: Type text stringkey_toggle
: Press/release keyskey_tap_multiple
: Repeat key taps
focus_window
: Activate windowmove_window
: Change window positionresize_window
: Adjust window sizeminimize_window
,maximize_window
: Window state
screenshot
: Capture screencursor_position
: Get current cursor location
Screenshots are automatically organized:
screenshots/
├── metadata.json
└── YYYY/MM/DD/
├── screenshot-{timestamp}-original.png
└── screenshot-{timestamp}-compressed.[png|jpg]
Key settings can be modified in constants:
const TIMING = {
TYPING_DELAY_MS: 12,
SCREENSHOT_DELAY_MS: 2000,
RETRY_DELAY_MS: 500
};
const MAX_IMAGE_SIZE = 5 * 1024 * 1024; // 5MB
- Node.js (v16+)
- TypeScript
- Dependencies:
- robotjs
- screenshot-desktop
- sharp
- Relevant system libraries
sudo apt-get install -y \
libxtst-dev \
libpng-dev \
libxss-dev \
xvfb
brew install opencv@4
brew install cairo pango
- Requires windows-build-tools:
npm install --global windows-build-tools
MIT
- Fork the repo
- Create feature branch
- Commit changes
- Push to branch
- Create Pull Request