🌐 Homepage • 🗃️ arXiv • 📃 PDF • 💻 Code • 🤗 Data
❗We tested exclusively on Android OS. Mobile-Agent-E does not support iOS at this time.
❗All experiments are done on a Samsung Galaxy A15 device, performance may vary on a different device. We encourage the users to custom the inital tips for your device and tasks.
conda create -n mobile_agent_e python=3.10 -y
conda activate mobile_agent_e
pip install -r requirements.txt
- Download the Android Debug Bridge.
- Turn on the ADB debugging switch on your Android phone, it needs to be turned on in the developer options first.
- Connect your phone to the computer with a data cable and select "Transfer files".
- Test your ADB environment as follow:
/path/to/adb devices
. If the connected devices are displayed, the preparation is complete. - If you are using a MAC or Linux system, make sure to turn on adb permissions as follow:
sudo chmod +x /path/to/adb
- If you are using Windows system, the path will be
xx/xx/adb.exe
- Download the ADB keyboard apk installation package.
- Click the apk to install on your mobile device.
- Switch the default input method in the system settings to "ADB Keyboard".
Please refer to the # Edit your Setting #
section in inference_agent_E.py
for all configs for customizing your agent. You can directly modify the macros or control some of them by setting the environment varibles as follows:
-
ADB Path
export ADB_PATH="your/path/to/adb"
-
Backbone model and API keys: you can choose from OpenAI, Gemini, and Claude; Set the corresponding keys as follows:
export BACKBONE_TYPE="OpenAI" export OPENAI_API_KEY="your-openai-key"
export BACKBONE_TYPE="Gemini" export GEMINI_API_KEY="your-gemini-key"
export BACKBONE_TYPE="Claude" export CLAUDE_API_KEY="your-claude-key"
-
Perceptor: By default, the icon captioning model (
CAPTION_MODEL
) in Perceptor uses "qwen-vl-plus" from Qwen API:- Follow this to get an Qwen API Key
- Set the Qwen API key:
export QWEN_API_KEY="your-qwen-api-key"
- You can set the
CAPTION_MODEL
ininference_agent_E.py
to "qwen-vl-max" for a better perception performance but with higher pricing. - If you machine is equipped with a high-performance GPU, you can also choose to host the icon captioning model locally: (1) set the
CAPTION_CALL_METHOD
to "local"; (2) setCAPTION_MODEL
to 'qwen-vl-chat' or 'qwen-vl-chat-int4' depending on the GPU spec.
-
Customize initial Tips: You can tailor the tips for the agent to suit your specific device and needs. To do so, modify the
INIT_TIPS
ininference_agent_E.py
. An example of customized tips for Chinese apps such as Xiaohongshu and Taobao are provided indata/custom_tips_example_for_cn_apps.txt
.
The agent can be run in both individual
(performing a standalone task) or evolution
(performing a sequence of tasks with evolution) settings. We provide example shell scripts as follows:
-
Run on a standalone task:
bash scripts/run_task.sh
-
Run on a sequence of tasks with self-evolution. This script loads in an toy example json file from
data/custom_tasks_example.json
.bash scripts/run_tasks_evolution.sh
The proposed Mobile-Eval-E benchmark can be found in data/Mobile-Eval-E
and also on Huggingface Datasets.
@article{wang2025mobile,
title={Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks},
author={Wang, Zhenhailong and Xu, Haiyang and Wang, Junyang and Zhang, Xi and Yan, Ming and Zhang, Ji and Huang, Fei and Ji, Heng},
journal={arXiv preprint arXiv:2501.11733},
year={2025}
}