Skip to content

elyase/screenium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎯 Screenium

Screenium is like Selenium after discovering there's a whole world outside the browser. Find and click any text on your screen using OCR. That "Accept Cookies" popup for the millionth time? Banish it.

Demo

🔍 Installation

pip install screenium

or

uv add screenium

✨ Use Cases

  • Auto-Accept Cookies: Run a background task that automatically clicks "Accept Cookies" popups
  • Meeting Assistant: Auto-join video calls by detecting "Join Meeting" buttons and automatically enable/disable your camera/mic based on meeting names
  • Game Automation: Automate repetitive game tasks by detecting and clicking on in-game text (e.g., "Collect Rewards", "Start Battle")
  • System Dialog Monitor: Handling system or application dialogs, like "Are you sure?" or "Run command," in Cursor based on set rules.
  • Multi-Browser Testing: Perform UI tests on browsers that lack automation APIs or web drivers with no separate web driver installations needed.

🔍 Finding Text

The most basic way to find text is:

from screenium import Text

# Simple text matching
text = Text("Login")  # Finds "Login" anywhere on screen
text = Text("LOG")    # Matches the first of "LOGIN", "LOGOUT", etc.

# Exact matching
text = Text("Login", exact_match=True)  # Only matches "Login" exactly
text = Text("login", case_sensitive=True)  # Case-sensitive matching

📍 Spatial Relationships

You can find text based on its position relative to other text:

# Basic relationships
Text("Password").below("Username")  # Find "Password" below "Username"
Text("Cancel").left_of("Submit")    # Find "Cancel" left of "Submit"
Text("Help").right_of("Back")       # Find "Help" right of "Back"
Text("Title").above("Content")      # Find "Title" above "Content"

# Chain relationships
Text("Save").below("Options").right_of("Cancel")

📐 Aligned Elements

For more precise positioning, use aligned to require elements to line up:

# Elements must be vertically aligned (same x-axis)
Text("Email").aligned.below("Username")

# Elements must be horizontally aligned (same y-axis)
Text("Back").aligned.left_of("Next")

🎨 Background Color Matching

Find text by its background color:

# Match text with specific background colors
Text("Error", background_color="red")
Text("Success", background_color="green")
Text("Info", background_color="#0088ff")

# Adjust color matching tolerance (0-100)
Text("Warning", background_color="yellow", background_tolerance=50)

🖱️ Mouse Actions

Interact with matched text:

# Basic mouse actions
Text("Submit").click()
Text("Options").right_click()
Text("Link").double_click()
Text("Button").mouse_move()  # Just move mouse without clicking

# Type text
Text("Username").click().typewrite("myuser")
Text("Password").click().typewrite("mypass")

# Special keys
Text("Terminal").click().typewrite(["command", "k"])  # Clear terminal

⏰ Waiting and Monitoring

Handle timing and background monitoring:

# Wait for specific duration
Text("Loading").wait(2).click()  # Wait 2 seconds then click

🔍 Inspecting Matches

Get detailed information about matches:

# Get all matches
matches = Text("Button").matches
for match in matches:
    print(f"Found at: ({match['x']}, {match['y']})")
    print(f"Size: {match['width']}x{match['height']}")
    print(f"Confidence: {match['confidence']}")

🎨 Visual Debugging

Draw boxes around matches to debug:

# Draw green box for 3 seconds
Text("Username").draw(duration=3, color="green")

# Chain with other operations
Text("Password").below("Username").draw(duration=2, color="red").click()

🎯 Tips & Best Practices

  1. Start with basic text matching before adding spatial relationships
  2. Use draw() to visually verify matches
  3. Adjust background color tolerance if color matching is too strict/loose
  4. Use aligned when elements should line up precisely
  5. Chain operations for more precise matching

⚠️ Limitations

  • macOS only (uses Apple Vision framework)
  • Requires macOS Monterey (12.x) or higher
  • Screen resolution and scaling can affect matching

🔐 Required macOS Permissions

Screenium needs two key macOS permissions to function. The app running your Python script (e.g., Terminal, VS Code, PyCharm) will need:

  1. Screen Recording

    • Open System Settings > Privacy & Security > Screen Recording
    • Toggle on your terminal app/IDE
    • Required for:
      • Finding text on screen using OCR
      • Background monitoring
      • Visual debugging with draw()
  2. Accessibility

    • Open System Settings > Privacy & Security > Accessibility
    • Toggle on your terminal app/IDE
    • Required for:
      • Mouse actions (click(), right_click(), etc.)
      • Keyboard input (typewrite(), hotkey())

💡 Tip: If running from different apps, each one needs separate permissions. For example, if you run scripts from both Terminal and VS Code, you'll need to grant permissions to both.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages