-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update eval #450
update eval #450
Conversation
Signed-off-by: n1ck-guo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR updates the evaluation-related functionality and argument parsing in the project. Key changes include:
- Modifying argument aliases and default values for tasks and evaluation batch size.
- Refactoring and enhancing exception handling in evaluation functions to better manage out-of-memory errors.
- Minor adjustments in utility functions and autoround module to improve error detection.
Reviewed Changes
File | Description |
---|---|
auto_round/script/llm.py | Updated argument names/aliases and refactored evaluation calls to use eval_sequence consistently. |
auto_round/utils.py | Removed an extra blank line; no functional changes. |
auto_round/main.py | Adjusted evaluation call parameters per updated interface. |
auto_round/autoround.py | Enhanced error handling in caching intermediate data by extending memory error check. |
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
auto_round/script/llm.py:201
- The introduction of both '--tasks' and '--task' may lead to confusion regarding which alias to use, especially when the default value implies a list. Consider standardizing on a single, consistent alias (e.g., '--tasks') throughout the code.
self.add_argument(
"--tasks",
"--task",
auto_round/script/llm.py:209
- Renaming the evaluation batch size argument from '--eval_bs' to '--batch_size' might introduce ambiguity with training batch size parameters defined elsewhere. Consider using distinct names for evaluation and training batch sizes to avoid potential misconfiguration.
self.add_argument("--batch_size", "--eval_bs", "--bs", default=8, type=int, help="batch size in evaluation")
batch_size = 8 | ||
if not isinstance(model, str): | ||
parallelism = False | ||
hflm = HFLM( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just call simple_eval_with_user_model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying use this func to replace the whole eval folder
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
auto_round/script/llm.py
Outdated
"--native_eval", | ||
"--native", | ||
action="store_true", | ||
help="use the native lm_eval instead of eval task by task.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the native as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: n1ck-guo <[email protected]>
auto_round/script/llm.py
Outdated
self.add_argument( | ||
"--disable_trust_remote_code", action='store_true', help="whether to disable trust_remote_code") | ||
self.add_argument("--eval_bs", "--bs", "--batch_size", default=None, type=int, help="batch size in evaluation") | ||
self.add_argument("--eval_bs", "--bs", "--batch_size", default=8, type=int, help="batch size in evaluation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change the default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default set to None, and if bs is None, will use auto.
auto_round/script/llm.py
Outdated
@@ -580,14 +587,23 @@ def tune(args): | |||
|
|||
if args.eval_bs is None or args.eval_bs == "auto": | |||
args.eval_bs = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we set it to auto bs for task by task scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: n1ck-guo <[email protected]>
if args.eval_task_by_task: | ||
eval_task_by_task(user_model, device=device_str, tasks=args.tasks, batch_size=args.eval_bs) | ||
else: | ||
if args.eval_bs is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better double check if it works well. I remember it has some issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked. Work well.
No description provided.