Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update eval #450

Merged
merged 8 commits into from
Mar 6, 2025
Merged

update eval #450

merged 8 commits into from
Mar 6, 2025

Conversation

n1ck-guo
Copy link
Contributor

@n1ck-guo n1ck-guo commented Mar 3, 2025

No description provided.

Signed-off-by: n1ck-guo <[email protected]>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR updates the evaluation-related functionality and argument parsing in the project. Key changes include:

  • Modifying argument aliases and default values for tasks and evaluation batch size.
  • Refactoring and enhancing exception handling in evaluation functions to better manage out-of-memory errors.
  • Minor adjustments in utility functions and autoround module to improve error detection.

Reviewed Changes

File Description
auto_round/script/llm.py Updated argument names/aliases and refactored evaluation calls to use eval_sequence consistently.
auto_round/utils.py Removed an extra blank line; no functional changes.
auto_round/main.py Adjusted evaluation call parameters per updated interface.
auto_round/autoround.py Enhanced error handling in caching intermediate data by extending memory error check.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

auto_round/script/llm.py:201

  • The introduction of both '--tasks' and '--task' may lead to confusion regarding which alias to use, especially when the default value implies a list. Consider standardizing on a single, consistent alias (e.g., '--tasks') throughout the code.
self.add_argument(
            "--tasks",
            "--task",

auto_round/script/llm.py:209

  • Renaming the evaluation batch size argument from '--eval_bs' to '--batch_size' might introduce ambiguity with training batch size parameters defined elsewhere. Consider using distinct names for evaluation and training batch sizes to avoid potential misconfiguration.
self.add_argument("--batch_size", "--eval_bs", "--bs", default=8, type=int, help="batch size in evaluation")
batch_size = 8
if not isinstance(model, str):
parallelism = False
hflm = HFLM(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just call simple_eval_with_user_model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying use this func to replace the whole eval folder

"--native_eval",
"--native",
action="store_true",
help="use the native lm_eval instead of eval task by task.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the native as default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.add_argument(
"--disable_trust_remote_code", action='store_true', help="whether to disable trust_remote_code")
self.add_argument("--eval_bs", "--bs", "--batch_size", default=None, type=int, help="batch size in evaluation")
self.add_argument("--eval_bs", "--bs", "--batch_size", default=8, type=int, help="batch size in evaluation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default set to None, and if bs is None, will use auto.

@@ -580,14 +587,23 @@ def tune(args):

if args.eval_bs is None or args.eval_bs == "auto":
args.eval_bs = 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we set it to auto bs for task by task scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: n1ck-guo <[email protected]>
if args.eval_task_by_task:
eval_task_by_task(user_model, device=device_str, tasks=args.tasks, batch_size=args.eval_bs)
else:
if args.eval_bs is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better double check if it works well. I remember it has some issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked. Work well.

@wenhuach21 wenhuach21 self-requested a review March 6, 2025 06:28
@n1ck-guo n1ck-guo merged commit d51bd5e into main Mar 6, 2025
8 checks passed
@n1ck-guo n1ck-guo deleted the hengguo/update_eval branch March 6, 2025 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants