update eval #450

n1ck-guo · 2025-03-03T05:51:52Z

No description provided.

Signed-off-by: n1ck-guo <[email protected]>

PR Overview

This PR updates the evaluation-related functionality and argument parsing in the project. Key changes include:

Modifying argument aliases and default values for tasks and evaluation batch size.
Refactoring and enhancing exception handling in evaluation functions to better manage out-of-memory errors.
Minor adjustments in utility functions and autoround module to improve error detection.

Reviewed Changes

File	Description
auto_round/script/llm.py	Updated argument names/aliases and refactored evaluation calls to use eval_sequence consistently.
auto_round/utils.py	Removed an extra blank line; no functional changes.
auto_round/main.py	Adjusted evaluation call parameters per updated interface.
auto_round/autoround.py	Enhanced error handling in caching intermediate data by extending memory error check.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

auto_round/script/llm.py:201

The introduction of both '--tasks' and '--task' may lead to confusion regarding which alias to use, especially when the default value implies a list. Consider standardizing on a single, consistent alias (e.g., '--tasks') throughout the code.

self.add_argument(
            "--tasks",
            "--task",

auto_round/script/llm.py:209

Renaming the evaluation batch size argument from '--eval_bs' to '--batch_size' might introduce ambiguity with training batch size parameters defined elsewhere. Consider using distinct names for evaluation and training batch sizes to avoid potential misconfiguration.

self.add_argument("--batch_size", "--eval_bs", "--bs", default=8, type=int, help="batch size in evaluation")

auto_round/__main__.py

auto_round/script/llm.py

wenhuach21 · 2025-03-03T06:08:17Z

auto_round/script/llm.py

+        batch_size = 8
+    if not isinstance(model, str):
+        parallelism = False
+    hflm = HFLM(


why not just call simple_eval_with_user_model

trying use this func to replace the whole eval folder

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 · 2025-03-04T02:05:29Z

auto_round/script/llm.py

+            "--native_eval",
+            "--native",
+            action="store_true",
+            help="use the native lm_eval instead of eval task by task.")


use the native as default.

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 · 2025-03-06T05:16:39Z

auto_round/script/llm.py

        self.add_argument(
            "--disable_trust_remote_code", action='store_true', help="whether to disable trust_remote_code")
-        self.add_argument("--eval_bs", "--bs", "--batch_size", default=None, type=int, help="batch size in evaluation")
+        self.add_argument("--eval_bs", "--bs", "--batch_size", default=8, type=int, help="batch size in evaluation")


why change the default value

default set to None, and if bs is None, will use auto.

wenhuach21 · 2025-03-06T05:17:50Z

auto_round/script/llm.py

@@ -580,14 +587,23 @@ def tune(args):

            if args.eval_bs is None or args.eval_bs == "auto":
                args.eval_bs = 16


could we set it to auto bs for task by task scenario

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 · 2025-03-06T05:24:50Z

auto_round/script/llm.py

            if args.eval_task_by_task:
                eval_task_by_task(user_model, device=device_str, tasks=args.tasks, batch_size=args.eval_bs)
            else:
+                if args.eval_bs is None:


better double check if it works well. I remember it has some issues.

checked. Work well.

update

a8d4945

Signed-off-by: n1ck-guo <[email protected]>

n1ck-guo requested review from wenhuach21, WeiweiZhang1 and Copilot March 3, 2025 05:51

Copilot AI reviewed Mar 3, 2025

View reviewed changes

wenhuach21 reviewed Mar 3, 2025

View reviewed changes

auto_round/__main__.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Mar 3, 2025

View reviewed changes

auto_round/script/llm.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Mar 3, 2025

View reviewed changes

n1ck-guo added 4 commits March 3, 2025 01:16

update

b9d7253

Signed-off-by: n1ck-guo <[email protected]>

add arg to choose whether use task by task eval

db55333

Signed-off-by: n1ck-guo <[email protected]>

Merge branch 'main' into hengguo/update_eval

85c2972

pylint

b9c02b9

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 reviewed Mar 4, 2025

View reviewed changes

n1ck-guo added 2 commits March 4, 2025 00:34

change native as default and add ut

a1f16e9

Signed-off-by: n1ck-guo <[email protected]>

Merge branch 'main' into hengguo/update_eval

37085af

wenhuach21 reviewed Mar 6, 2025

View reviewed changes

default bs set to auto

9cc08e3

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 reviewed Mar 6, 2025

View reviewed changes

wenhuach21 self-requested a review March 6, 2025 06:28

wenhuach21 approved these changes Mar 6, 2025

View reviewed changes

n1ck-guo merged commit d51bd5e into main Mar 6, 2025
8 checks passed

n1ck-guo deleted the hengguo/update_eval branch March 6, 2025 07:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update eval #450

update eval #450

n1ck-guo commented Mar 3, 2025

wenhuach21 Mar 3, 2025

n1ck-guo Mar 3, 2025

wenhuach21 Mar 4, 2025

n1ck-guo Mar 4, 2025

wenhuach21 Mar 6, 2025

n1ck-guo Mar 6, 2025

wenhuach21 Mar 6, 2025

n1ck-guo Mar 6, 2025

wenhuach21 Mar 6, 2025

n1ck-guo Mar 6, 2025

		@@ -580,14 +587,23 @@ def tune(args):

		if args.eval_bs is None or args.eval_bs == "auto":
		args.eval_bs = 16

update eval #450

update eval #450

Conversation

n1ck-guo commented Mar 3, 2025

Choose a reason for hiding this comment

PR Overview

Reviewed Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment