Skip to content

Latest commit

 

History

History
134 lines (112 loc) · 6.05 KB

README.md

File metadata and controls

134 lines (112 loc) · 6.05 KB

logo

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

🌐 Homepage🗃️ arXiv📃 PDF 💻 Code🤗 Data

Zhenhailong Wang1†, Haiyang Xu2†, Junyang Wang2, Xi Zhang2, Ming Yan2, Ji Zhang2, Fei Huang2, Heng Ji1†

{wangz3, hengji}@illinois.edu, [email protected]

1University of Illinois Urbana-Champaign 2Alibaba Group
Corresponding author

logo

💻 Environment Setup

❗We tested exclusively on Android OS. Mobile-Agent-E does not support iOS at this time.

❗All experiments are done on a Samsung Galaxy A15 device, performance may vary on a different device. We encourage the users to custom the inital tips for your device and tasks.

Installation

conda create -n mobile_agent_e python=3.10 -y
conda activate mobile_agent_e
pip install -r requirements.txt

Preparation for Connecting Mobile Device with ADB

  1. Download the Android Debug Bridge.
  2. Turn on the ADB debugging switch on your Android phone, it needs to be turned on in the developer options first.
  3. Connect your phone to the computer with a data cable and select "Transfer files".
  4. Test your ADB environment as follow: /path/to/adb devices. If the connected devices are displayed, the preparation is complete.
  5. If you are using a MAC or Linux system, make sure to turn on adb permissions as follow: sudo chmod +x /path/to/adb
  6. If you are using Windows system, the path will be xx/xx/adb.exe

Install the ADB Keyboard on your Mobile Device

  1. Download the ADB keyboard apk installation package.
  2. Click the apk to install on your mobile device.
  3. Switch the default input method in the system settings to "ADB Keyboard".

Agent Configs

Please refer to the # Edit your Setting # section in inference_agent_E.py for all configs for customizing your agent. You can directly modify the macros or control some of them by setting the environment varibles as follows:

  1. ADB Path

    export ADB_PATH="your/path/to/adb"
    
  2. Backbone model and API keys: you can choose from OpenAI, Gemini, and Claude; Set the corresponding keys as follows:

    export BACKBONE_TYPE="OpenAI"
    export OPENAI_API_KEY="your-openai-key"
    
    export BACKBONE_TYPE="Gemini"
    export GEMINI_API_KEY="your-gemini-key"
    
    export BACKBONE_TYPE="Claude"
    export CLAUDE_API_KEY="your-claude-key"
    
  3. Perceptor: By default, the icon captioning model (CAPTION_MODEL) in Perceptor uses "qwen-vl-plus" from Qwen API:

    • Follow this to get an Qwen API Key
    • Set the Qwen API key:
      export QWEN_API_KEY="your-qwen-api-key"
      
    • You can set the CAPTION_MODEL in inference_agent_E.py to "qwen-vl-max" for a better perception performance but with higher pricing.
    • If you machine is equipped with a high-performance GPU, you can also choose to host the icon captioning model locally: (1) set the CAPTION_CALL_METHOD to "local"; (2) set CAPTION_MODEL to 'qwen-vl-chat' or 'qwen-vl-chat-int4' depending on the GPU spec.
  4. Customize initial Tips: You can tailor the tips for the agent to suit your specific device and needs. To do so, modify the INIT_TIPS in inference_agent_E.py. An example of customized tips for Chinese apps such as Xiaohongshu and Taobao are provided in data/custom_tips_example_for_cn_apps.txt.

🚀 Quick Start

The agent can be run in both individual (performing a standalone task) or evolution (performing a sequence of tasks with evolution) settings. We provide example shell scripts as follows:

  • Run on a standalone task:

    bash scripts/run_task.sh
    
  • Run on a sequence of tasks with self-evolution. This script loads in an toy example json file from data/custom_tasks_example.json.

    bash scripts/run_tasks_evolution.sh
    

🤗 Mobile-Eval-E Benchmark

The proposed Mobile-Eval-E benchmark can be found in data/Mobile-Eval-E and also on Huggingface Datasets.

📚 Citation

@article{wang2025mobile,
  title={Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks},
  author={Wang, Zhenhailong and Xu, Haiyang and Wang, Junyang and Zhang, Xi and Yan, Ming and Zhang, Ji and Huang, Fei and Ji, Heng},
  journal={arXiv preprint arXiv:2501.11733},
  year={2025}
}