From 56218181ce729b9548f16cfe42373f91a0cb3c55 Mon Sep 17 00:00:00 2001 From: David Chang Date: Tue, 30 Apr 2024 14:52:37 +0800 Subject: [PATCH] ver Apr30th updated documents and version number --- Changelog | 17 +++++++++++------ README-zh.md | 12 ++++++++++-- README.md | 11 ++++++++++- docs/other-tools-en.md | 17 +++++++++++++---- docs/other-tools-zh.md | 6 +++--- docs/task-definition-en.md | 19 ++++++++++++------- docs/task-definition-zh.md | 6 +++--- pyproject.toml | 2 +- setup.py | 2 +- 9 files changed, 64 insertions(+), 28 deletions(-) diff --git a/Changelog b/Changelog index be709c8..4c89c2d 100644 --- a/Changelog +++ b/Changelog @@ -1,15 +1,17 @@ -2024-03-25 Danyang Zhang +2024-04-30 Danyang Zhang - Fixed bugs in VhIoWrapper w.r.t. null view hierarchy from OS + v3.6 - * android_env/wrappers/vh_io_wrapper.py + Updated documents. -2024-03-23 Danyang Zhang +2024-03-25 Danyang Zhang Fixed bugs - * android_env/environment.py - * demos/openmoneybox.add_billings.textproto + * (Bugs caused by new remote_path arg): android_env/environment.py + * (Typos, bugs w.r.t. new episode end event, removed reset steps which + cause AVD haulting): demos/openmoneybox.add_billings.textproto + * (Bugs w.r.t null VH from OS): android_env/wrappers/vh_io_wrapper.py Updated new types of ADB operations for SetupStep @@ -91,6 +93,9 @@ Fixed bugs w.r.t. ResponseEvent + Updated episode end event slots by making it triggered only when the + returned value is True + * android_env/proto/task.proto * android_env/components/event_listeners.py * android_env/components/task_manager.py diff --git a/README-zh.md b/README-zh.md index 018fc47..701271e 100644 --- a/README-zh.md +++ b/README-zh.md @@ -4,6 +4,13 @@ ## 最近更新 +* (2024-04-30 v3.6) + * 更新了加载远程模拟器的函数,用以为远程资源提供不同于本地任务定义文件所在目录的路径 + * 更新了任务模板工具,增加了新的槽位修饰符与任务配置文件语法 + * 修复了已知的问题 + +具体信息请查看[更新日志](Changelog)和相关文档。 + * (2023-12-18 v3.5) * 由于检查视图框架和屏幕图像耗时较长,因此更新了机制来更灵活地管理在什么时机检查视图框架与屏幕图像,以平衡对历程事件的充分检查的需求和所带来交互时延升高 * 为`ResponseEvent`(回复事件)添加了多种评分方式:正则匹配、模糊匹配、向量编码匹配 @@ -139,10 +146,11 @@ pip install . ```bibtex @article{DanyangZhang2023_MobileEnv, - title = {{Mobile-Env}: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era}, + title = {{Mobile-Env}: An Evaluation Platform and Benchmark for LLM-GUI Interaction}, author = {Danyang Zhang and - Lu Chen and + Hongshen Xu and Zihan Zhao and + Lu Chen and Ruisheng Cao and Kai Yu}, journal = {CoRR}, diff --git a/README.md b/README.md index 1ed9472..beaa0bb 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,14 @@ ## NEWS!! +* (2024-04-30 v3.6) + * Updated function to load a remote simulator to enable providing the remote + resources with a different path with the path of the local task definition + file. + * Updated task template toolkit, added new slot modifiers and sytaxes for + task config file. + * Fixed known bugs. + * (2023-12-18 v3.5) * Owing to the long time delay of VH check and screenshot check, we updated the mechanism of managing the check time. By this way, the requirement of @@ -217,8 +225,9 @@ the following BibTeX: @article{DanyangZhang2023_MobileEnv, title = {{Mobile-Env}: An Evaluation Platform and Benchmark for LLM-GUI Interaction}, author = {Danyang Zhang and - Lu Chen and + Hongshen Xu and Zihan Zhao and + Lu Chen and Ruisheng Cao and Kai Yu}, journal = {CoRR}, diff --git a/docs/other-tools-en.md b/docs/other-tools-en.md index e01fedc..caff392 100644 --- a/docs/other-tools-en.md +++ b/docs/other-tools-en.md @@ -207,15 +207,24 @@ keywords: bake,lobster,tails ``` As for the config file of task token combination `.task`, each line -in the file should specify a file name of the task token config like: +in the file should specify a file name of the task token config and an optional +combination option like: ``` search-lobster -access_author-Bob +access_author-Bob sr ``` -The instances will be combined in order during instantiation and become the -small steps of the final large multi-step task. +Currently, there are two combination options: `s` and `r`. If `s` is specified, +the setup steps (`setup_steps`) of the current task token will be appended into +the final task definition. Similarly, if `r` is specified, the reset steps +(`reset_steps`) of the current task token will be appended. If no combination +options are specified, any setup or reset steps won't be appended, *i.e.*, only +the steps added for the preceding task tokens are preserved. By default, the +setup ans reset steps of the first task token will always be preserved in the +final combined task definition. The instances will be combined in the +declaration order during instantiation and become the small steps of the final +large multi-step task. ##### The Syntax of the Modifiers diff --git a/docs/other-tools-zh.md b/docs/other-tools-zh.md index 869872d..73f9566 100644 --- a/docs/other-tools-zh.md +++ b/docs/other-tools-zh.md @@ -148,14 +148,14 @@ name: search-task keywords: bake,lobster,tails ``` -组合任务元的配置文件`.task`,每行指定一个任务元配置的文件名,如: +组合任务元的配置文件`.task`,每行指定一个任务元配置的文件名,以及一个可选的组合选项,如: ``` search-lobster -access_author-Bob +access_author-Bob sr ``` -实例化时,各实例会按序组合,成为最终的大的多步任务中的小步骤。 +目前有两种组合选项:`s`和`r`。若指定了`s`选项,则组合该任务元时会在最终任务定义的任务载入操作(`setup_steps`)中追加该任务元的载入操作;若指定了`r`选项,则会类似地追加该任务元的重置操作(`reset_steps`);若未指定任何组合选项,则不会加入该任务元的任何载入或重置操作,即,只保留前缀任务元已加入的那些操作。每个组合任务的首个任务元的载入与重置操作都会默认保留在最终的组合任务中。各任务元会在实例化时按声明顺序组合,成为最终的大的多步任务中的小步骤。 ##### 修饰符语法 diff --git a/docs/task-definition-en.md b/docs/task-definition-en.md index a3bd50f..12aae24 100644 --- a/docs/task-definition-en.md +++ b/docs/task-definition-en.md @@ -766,7 +766,10 @@ is the same with the screen text event sources. The response event source will react to the response from the agent to human user. If the response matches with the defined pattern, the event source will -be triggered and return a tuple comprising all the regex-captured groups. +be triggered. Mobile-Env supports various matching methods, spanning regex +match, fuzzy match, and embedding match. If regex match is adopted, the source +will return a tuple comprising all the regex-captured groups, or the source +will return the numeric match score. ##### The Event Slots @@ -789,8 +792,9 @@ The episode end event slot (`episode_end_listener`) indicated if the episode comes to the end and the platform will restart the task at the next step. This usually means that the agent has just achieved the task target. But it is also possible that several severe errors have occured and the system cannot resume -and has to restart. Only the triggering flag of this event slot makes sense and -it returns no further values to the agent. +and has to restart. ~~Only the triggering flag of this event slot makes sense +and it returns no further values to the agent.~~ Only when the event slot is +triggered and the returned value is `True`, the episode will be restarted. The instruction event slot (`instruction_listener`) gives the agent the novel step supplementary instructions during the interaction. This slot accepts and @@ -876,10 +880,11 @@ The options of `event` is the aforementioned event sources: * `floating` - A floating reference + `log_event` - Matches the system log lines and requires two fields: - `filters` - An array of string for the log filters like `jd:D`. The system - logs are obtained by the command `adb logcat -v epoch FILTERS *:S`, where - `FILTERS` is all the filter names declared in the definition. All the - filters declared across the log event sources in the definition file will - be merged (with duplicates removed) before invoking the ADB command. + logs are obtained by the command [`adb logcat -v epoch FILTERS + *:S`](https://developer.android.com/tools/logcat), where `FILTERS` is all + the filter names declared in the definition. All the filters declared + across the log event sources in the definition file will be merged (with + duplicates removed) before invoking the ADB command. - `pattern` - The regex for the expected log line. + `response_event` - Matches the response to human user. - `mode` - Enum indicating the matching method. Valid options are: `REGEX`, diff --git a/docs/task-definition-zh.md b/docs/task-definition-zh.md index 2dcdcb0..69d51e2 100644 --- a/docs/task-definition-zh.md +++ b/docs/task-definition-zh.md @@ -580,7 +580,7 @@ $$ 日志事件源会监听系统运行日志,检查每一行是否匹配定义的正则表达式。若匹配,且满足触发条件,则事件会触发,并返回所有正则捕获组构成的元组,这与两类屏幕文本事件源相同。 -回复事件源会响应智能体给人类用户的回复。若回复匹配上了定义好的模式,则会触发该事件,并返回所有正则捕获组构成的元组。 +回复事件源会响应智能体给人类用户的回复。若回复匹配上了定义好的模式,则会触发该事件。本平台支持不同的匹配方式:正则匹配、模糊匹配、向量匹配。采用正则匹配方式时,会返回所有正则捕获组构成的元组;采用其他匹配方式(模糊匹配、向量匹配)时,则会返回匹配度数值。 ##### 事件槽 @@ -588,7 +588,7 @@ $$ 分数事件槽(`score_listener`)和回报事件槽(`reward_listener`),都用于解析要反馈给智能体的回报。不同之处在于,其对接收到的信号的解释:分数事件槽将接收到的信号解释为一个累积的分数,若有分数事件触发,则平台会将当前读到的新分数与上次触发时记录的分数作差,作为该步骤的单步回报;而回报事件槽则认为事件树呈递上来的信号就是单步回报,因此会直接返回。两个事件槽计算出的单步回报会在相加后,反馈给智能体。 -历程结束事件槽(`episode_end_listener`),用来指示交互到达了终点,平台会在下一步重启任务。这通常是由于智能体已达成了任务目标,但也可能是由于系统出现了难以恢复的错误而需要重启系统。历程终点事件槽,仅其触发与否的状态有意义,而不会给智能体反馈任何其他值。 +历程结束事件槽(`episode_end_listener`),用来指示交互到达了终点,平台会在下一步重启任务。这通常是由于智能体已达成了任务目标,但也可能是由于系统出现了难以恢复的错误而需要重启系统。~~历程终点事件槽,仅其触发与否的状态有意义,而不会给智能体反馈任何其他值,~~当且仅当历程结束事件槽触发且得到的值为`True`时,平台会重启任务。 指令事件槽(`instruction_listener`),用来定义任务进行中,某些关键步骤需要补充给智能体的新步骤指令。其要接收、返回的都是字符串列表,每个列表元素代表一行或一句指令。 @@ -626,7 +626,7 @@ $$ * `integer` - 提供整数参考值 * `floating` - 提供浮点数参考值 + `log_event` - 识别系统日志中的行,提供两个字段: - - `filters` - 字符串数组,提供要使用的过滤器,如`jd:D`;系统日志是由`adb logcat -v epoch FILTERS *:S`命令获得的,其中`FILTERS`为任务定义中指定的所有过滤器;该定义文件中所有的日志事件源中声明了的过滤器,会去重后混合在一起用于调用该ADB命令 + - `filters` - 字符串数组,提供要使用的过滤器,如`jd:D`;系统日志是由[`adb logcat -v epoch FILTERS *:S`](https://developer.android.com/tools/logcat)命令获得的,其中`FILTERS`为任务定义中指定的所有过滤器;该定义文件中所有的日志事件源中声明了的过滤器,会去重后混合在一起用于调用该ADB命令 - `pattern` - 提供要识别的日志行的正则表达式 + `response_event` - 识别智能体给人类用户的回复 - `mode` - 枚举值,指定匹配事件的方式,可选`REGEX`、`DIFFLIB`、`FUZZ`、`SBERT`。`REGEX`为采用正则匹配,`DIFFLIB`为采用`difflib`做模糊匹配,`FUZZ`为采用`rapidfuzz`库做模糊匹配,`SBERT`为采用`sentence-tranformers`库计算嵌入向量匹配(`FUZZ`模式下的匹配度范围为0~100) diff --git a/pyproject.toml b/pyproject.toml index 7c8d55f..a4df3d0 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "mobile-env-rl" -version = "3.5" +version = "3.6" authors = [{name = "Danyang Zhang @X-Lance", email = "zdy004007@126.com"}] license = {file = "LICENSE"} description = "A Universal Platform for Training and Evaluation of Mobile Interaction" diff --git a/setup.py b/setup.py index 7b3f575..59a59cd 100644 --- a/setup.py +++ b/setup.py @@ -112,7 +112,7 @@ def run(self): setup( name='mobile-env-rl', - version='3.5', + version='3.6', description='Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction', long_description=description, author='Danyang Zhang @X-Lance',