-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend accuracy tests for models that we support #824
base: habana_main
Are you sure you want to change the base?
Conversation
9385f80
to
9216947
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the delta time with new test suites?
Time taken by the new test suites:
|
@@ -0,0 +1,11 @@ | |||
model_name: "/mnt/weka/data/pytorch/mistral/Mistral-7B-Instruct-v0.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AnetaKaczynska I was rather more interested in the time taken by the test itself, those measurements are for whole suite including environment preparation (which can vary for many reasons, and will vary also for existing suites). The exact time for the suite can be measured by taking the "============================= test session starts ==============================" time and test finish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suite | model | time1 | time2 |
---|---|---|---|
gsm8k_small_g3_tp1_part2 | granite-8b.yaml | 00:01:21 | 00:01:24 |
gsm8k_small_g3_tp1_part2 | granite-20b.yaml | 00:01:48 | 00:01:51 |
gsm8k_small_g3_tp1_part3 | Qwen2-7b-Instruct.yaml | 00:01:13 | 00:01:16 |
gsm8k_small_g3_tp1_part3 | Mistral-7B-Instruct-v0.3.yaml | 00:01:13 | 00:01:16 |
gsm8k_large_g3_tp2_part2 | Mixtral-8x7B-Instruct-v0.1.yaml | 00:01:39 | 00:06:29 |
gsm8k_small_g3_tp1_fp8 | granite-8b-fp8.yaml | 00:01:23 | 00:05:42 |
gsm8k_small_g3_tp1_fp8 | granite-20b-fp8.yaml | 00:01:51 | 00:05:38 |
Ok, for clarity I gathered two values:
- test time as reported directly in logs (e.g. ================== 1 passed, 3 warnings in 111.16s (0:01:51) ===================)
- time taken between 'test session starts' and 'PASSED MODEL', which for large and fp8 models is usually several minutes longer.
This PR extends dev Nightly test suite with accuracy tests for broader range of models.