52 Branches 104 Tags

Name	Name	Last commit message	Last commit date
Latest commit Mustaballer Merge branch 'main' into share-magic-wormhole Jun 23, 2023 0977859 · Jun 23, 2023 History 450 Commits
.github	.github	Merge pull request #266 from dianzrong/add-contributing-md	Jun 16, 2023
alembic	alembic	Fix Pytest	Jun 12, 2023
assets	assets	Add scrub module	May 30, 2023
install	install	add link in TODO	Jun 13, 2023
openadapt	openadapt	Merge branch 'main' into share-magic-wormhole	Jun 23, 2023
tests/openadapt	tests/openadapt	Merge remote-tracking branch 'upstream/main' into share-magic-wormhole	Jun 23, 2023
.gitignore	.gitignore	Merge pull request #207 from 0dm/clear-data	Jun 16, 2023
CONTRIBUTING.md	CONTRIBUTING.md	Correct typo	Jun 13, 2023
LICENSE	LICENSE	add alembic version	Apr 12, 2023
README.md	README.md	Update README.md	Jun 20, 2023
alembic.ini	alembic.ini	add alembic LICENCE requirements.txt setup.py; fix README	Apr 12, 2023
permissions_in_macOS.md	permissions_in_macOS.md	Improve description	May 18, 2023
poetry.lock	poetry.lock	Merge branch 'main' into share-magic-wormhole	Jun 23, 2023
pyproject.toml	pyproject.toml	Merge branch 'main' into share-magic-wormhole	Jun 23, 2023
requirements.txt	requirements.txt	Merge branch 'main' into share-magic-wormhole	Jun 23, 2023
setup.py	setup.py	#109 renamed all instances of puterbot to openadapt	May 9, 2023

Repository files navigation

Join us on Slack

OpenAdapt: AI-First Process Automation with Transformers

Enormous volumes of mental labor are wasted on repetitive GUI workflows.

Foundation Models (e.g. GPT-4, ACT-1) are powerful automation tools.

OpenAdapt connects Foundation Models to GUIs:

(Slides)

Welcome to OpenAdapt! This Python library implements AI-First Process Automation with the power of Transformers by:

Recording screenshots and associated user input
Aggregating and visualizing user input and recordings for development
Converting screenshots and user input into tokenized format
Generating synthetic input via transformer model completions
Replaying synthetic input to complete tasks

The goal is similar to that of Robotic Process Automation, except that we use transformers instead of conventional RPA tools.

The direction is adjacent to Adept.ai, with some key differences:

OpenAdapt is model agnostic;
OpenAdapt generates prompts automatically (auto-prompted, not user-prompted);
OpenAdapt works with all types of desktop GUIs, including virtualized (e.g. Citrix) and web
OpenAdapt is open source! (license TBD, please see #246)

Install

Recommended: Install with Poetry:

git clone https://github.com/MLDSAI/OpenAdapt.git
cd OpenAdapt
pip install poetry
poetry install
poetry shell
alembic upgrade head
pytest

Manual:

git clone https://github.com/MLDSAI/OpenAdapt.git
cd OpenAdapt
python3.10 -m venv .venv
source .venv/bin/activate
pip install wheel
pip install -r requirements.txt
pip install -e .
python -m spacy download en_core_web_trf
alembic upgrade head
pytest

Permissions

See how to set up system permissions on macOS here.

Run

Record

Create a new recording by running the following command:

python -m openadapt.record "testing out openadapt"

Wait until all three event writers have started:

| INFO     | __mp_main__:write_events:230 - event_type='screen' starting
| INFO     | __mp_main__:write_events:230 - event_type='action' starting
| INFO     | __mp_main__:write_events:230 - event_type='window' starting

Type a few words into the terminal and move your mouse around the screen to generate some events, then stop the recording by pressing CTRL+C.

Current limitations:

recording should be short (i.e. under a minute), as they are somewhat memory intensive, and there is currently an open issue describing a possible memory leak
the only touchpad and trackpad gestures currently supported are pointing the cursor and left or right clicking, as described in this open issue

Visualize

Visualize the latest recording you created by running the following command:

python -m openadapt.visualize

This will open your browser. It will look something like this:

Playback

You can play back the recording using the following command:

python -m openadapt.replay NaiveReplayStrategy

More ReplayStrategies coming soon! (see Contributing).

Contributing

Problem Statement

Our goal is to automate the task described and demonstrated in a Recording. That is, given a new Screenshot, we want to generate the appropriate ActionEvent(s) based on the previously recorded ActionEvents in order to accomplish the task specified in the Recording.task_description, while accounting for differences in screen resolution, window size, application behavior, etc.

If it's not clear what ActionEvent is appropriate for the given Screenshot, (e.g. if the GUI application is behaving in a way we haven't seen before), we can ask the user to take over temporarily to demonstrate the appropriate course of action.

Dataset

The dataset consists of the following entities:

Recording: Contains information about the screen dimensions, platform, and other metadata.
ActionEvent: Represents a user action event such as a mouse click or key press. Each ActionEvent has an associated Screenshot taken immediately before the event occurred. ActionEvents are aggregated to remove unnecessary events (see visualize.)
Screenshot: Contains the PNG data of a screenshot taken during the recording.
WindowEvent: Represents a window event such as a change in window title, position, or size.

You can assume that you have access to the following functions:

create_recording("doing taxes"): Creates a recording.
get_latest_recording(): Gets the latest recording.
get_events(recording): Returns a list of ActionEvent objects for the given recording.

Instructions

Join us on Slack. Then:

Fork this repository and clone it to your local machine.
Get OpenAdapt up and running by following the instructions under Setup.
Look through the list of open issues at https://github.com/MLDSAI/OpenAdapt/issues and once you find one you would like to address, indicate your interest with a comment.
Implement a solution to the issue you selected. Write unit tests for your implementation.
Submit a Pull Request (PR) to this repository. Note: submitting a PR before your implementation is complete (e.g. with high level documentation and/or implementation stubs) is encouraged, as it provides us with the opportunity to provide early feedback and iterate on the approach.

Evaluation Criteria

Your submission will be evaluated based on the following criteria:

Functionality : Your implementation should correctly generate the new ActionEvent objects that can be replayed in order to accomplish the task in the original recording.
Code Quality : Your code should be well-structured, clean, and easy to understand.
Scalability : Your solution should be efficient and scale well with large datasets.
Testing : Your tests should cover various edge cases and scenarios to ensure the correctness of your implementation.

Submission

Commit your changes to your forked repository.
Create a pull request to the original repository with your changes.
In your pull request, include a brief summary of your approach, any assumptions you made, and how you integrated external libraries.
Bonus: interacting with ChatGPT and/or other language transformer models in order to generate code and/or evaluate design decisions is encouraged. If you choose to do so, please include the full transcript.

We're hiring!

If you're interested in getting paid for your work, please mention it in your Pull Request.

Troubleshooting

MacOS: if you encounter system alert messages or find issues when making and replaying recordings, make sure to set up permissions accordingly.

In summary (from https://stackoverflow.com/a/69673312):

Settings -> Security & Privacy
Click on the Privacy tab
Scroll and click on the Accessibility Row
Click +
Navigate to /System/Applications/Utilities/ (or wherever Terminal.app is installed)
Click okay.

Developing

Generate migration (after editing a model)

alembic revision --autogenerate -m "<msg>"

Submitting an Issue

Please submit any issues to https://github.com/MLDSAI/openadapt/issues with the following information:

Problem description (please include any relevant console output and/or screenshots)
Steps to reproduce (please help others to help you!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAdapt: AI-First Process Automation with Transformers

Enormous volumes of mental labor are wasted on repetitive GUI workflows.

Foundation Models (e.g. GPT-4, ACT-1) are powerful automation tools.

OpenAdapt connects Foundation Models to GUIs:

Install

Permissions

Run

Record

Visualize

Playback

Contributing

Problem Statement

Dataset

Instructions

Evaluation Criteria

Submission

We're hiring!

Troubleshooting

Developing

Generate migration (after editing a model)

Submitting an Issue

About

Releases 103

Sponsor this project

Contributors 19

Languages

License

OpenAdaptAI/OpenAdapt

Folders and files

Latest commit

History

Repository files navigation

OpenAdapt: AI-First Process Automation with Transformers

Enormous volumes of mental labor are wasted on repetitive GUI workflows.

Foundation Models (e.g. GPT-4, ACT-1) are powerful automation tools.

OpenAdapt connects Foundation Models to GUIs:

Install

Permissions

Run

Record

Visualize

Playback

Contributing

Problem Statement

Dataset

Instructions

Evaluation Criteria

Submission

We're hiring!

Troubleshooting

Developing

Generate migration (after editing a model)

Submitting an Issue

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 103

Sponsor this project

Contributors 19

Languages