Skip to content

Support Online-Mind2Web task evaluation. #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Syclus123
Copy link

This release adds the following features:

  1. Support screenshots of the evaluation process
  2. Support Online_Mind2Web task evaluation
  3. Support access to gpt-4.1, o3-mini, o4-mini and other models

Tips: To run in a Linux environment without a visual interface, use the following command to start
sudo yum install -y xorg-x11-server-Xvfb
xvfb-run python batch_eval.py

@Copilot Copilot AI review requested due to automatic review settings May 20, 2025 13:02
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Online-Mind2Web task evaluation while introducing screenshot capture during task execution and extended model support including gpt-4.1, o3-mini, and o4-mini.

  • Added utility functions for JSON file operations.
  • Integrated log parsing and processing for Online-Mind2Web tasks.
  • Updated configuration and evaluation logic to support new task modes and enhanced API parameters.

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
utils/utils.py Added JSON load/save utilities.
utils/parser.py Introduced a parser for log files with adjusted key names.
utils/log_processor.py Improved log processing with task mapping support.
logs.py Updated log folder path and directory creation.
evaluate/evaluate_utils.py Enhanced evaluation utility with screenshot support and token count.
eval.py Extended evaluation flow with new task and screenshot parameters.
data/Online-Mind2Web/* Added Git LFS filters and updated README for Online-Mind2Web.
configs/setting.toml Updated default task mode, max time step, and screenshot settings.
configs/log_config.json Adjusted log and task mapping file paths.
batch_eval.py Created batch evaluation script to run multiple tasks sequentially.
agent/LLM/openai.py Modified token parameters and added new API parameter support.
agent/LLM/llm_instance.py Updated model detection logic for JSON mode instantiation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant