Skip to content

feat(tools): add use_computer tool to Strands tools repository #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jimbrub
Copy link
Contributor

@jimbrub jimbrub commented Jul 2, 2025

Description

Key Features

  • This tool was tested on macOS, so some features may not work on Windows/Linux

  • Basic Computer Control:

    • Mouse operations (click, move, drag)
    • Keyboard input (typing, key combinations, hotkeys)
    • Scrolling (vertical and horizontal)
    • Application management (open, close, focus)
  • Screen Analysis:

    • Screenshot capture and analysis
    • OCR text detection using Pytesseract
    • Coordinate mapping for detected elements
    • Text grouping and line organization
    • Return the image as bytes to the LLM model
  • Cross-Platform Support:

    • Windows, macOS, and Linux compatibility
    • Platform-specific optimizations (e.g., native macOS double-click)
    • Adaptive application naming conventions

Technical Implementation

  • Utilizes pyautogui for computer control
  • Integrates OpenCV and Pytesseract for image processing and text detection
  • Implements error handling and safety measures
  • Provides coordinate system for precise interaction

Dependencies

  • opencv-python
  • pytesseract
  • pyautogui
  • psutil
  • numpy
  • Note: Tesseract OCR engine needs to be installed separately on the system for text detection functionality.

Download Tesseract OCR on mac: brew install tesseract

Related Issues

[Link to related issues using #issue-number format]

Documentation PR

TBD

Type of Change

  • Bug fix
  • New Tool
  • Breaking change
  • Other (please describe):

Testing

[How have you tested the change?]

  • hatch fmt --linter
  • hatch fmt --formatter
  • hatch test --all

Checklist

  • I have read the CONTRIBUTING document

  • I have added tests that prove my fix is effective or my feature works

  • I have updated the documentation accordingly

  • I have added an appropriate example to the documentation to outline the feature

  • My changes generate no new warnings

  • Any dependent changes have been merged and published

  • By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jimbrub jimbrub requested a review from a team as a code owner July 2, 2025 20:20
@@ -66,7 +67,7 @@ pip install strands-agents-tools
To install the dependencies for optional tools:

```bash
pip install strands-agents-tools[mem0_memory, use_browser]
pip install strands-agents-tools[mem0_memory, use_browser, use_computer]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install strands-agents-tools[use_computer] isn't working for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to do something like the following? pip install "strands-agents-tools[use_computer]@file:///Volumes/workplace/tools/" Until the pyproject.toml file is published I think we have to download them that way. I tested in a new environment and there are a few more dependencies I need to add under use_computer in the pyproject.toml file

Copy link
Member

@cagataycali cagataycali Jul 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did tested deeply, here's the agent code and conversation history: https://gist.github.com/cagataycali/c1af4f368b4159bc3a2474372c884c03

Here're some screenshots captured by use_computer:

screenshot_20250718_202247 screenshot_20250718_203338

@mehtarac mehtarac self-assigned this Jul 9, 2025
@mehtarac mehtarac requested a review from a team July 9, 2025 15:05
@jimbrub jimbrub force-pushed the use_computer branch 3 times, most recently from 40f4ca5 to 2dfe4e6 Compare July 17, 2025 21:31
cagataycali
cagataycali previously approved these changes Jul 19, 2025
image_path = screenshot_path
should_delete = False
else:
image_path = create_screenshot(region)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we returning the screenshot parameters to LLM back, seems like it's not returned back to the model ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants