Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend accuracy tests for models that we support #824

Open
wants to merge 9 commits into
base: habana_main
Choose a base branch
from

Conversation

AnetaKaczynska
Copy link

This PR extends dev Nightly test suite with accuracy tests for broader range of models.

Copy link

@michalkuligowski michalkuligowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the delta time with new test suites?

@AnetaKaczynska
Copy link
Author

AnetaKaczynska commented Feb 27, 2025

Time taken by the new test suites:

name time
gsm8k_small_g3_tp1_part2 6m 41s
gsm8k_small_g3_tp1_part3 9m 2s
gsm8k_large_g3_tp2_part2 11m 15s
gsm8k_small_g3_tp1_fp8 22m 26s

@@ -0,0 +1,11 @@
model_name: "/mnt/weka/data/pytorch/mistral/Mistral-7B-Instruct-v0.3"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnetaKaczynska I was rather more interested in the time taken by the test itself, those measurements are for whole suite including environment preparation (which can vary for many reasons, and will vary also for existing suites). The exact time for the suite can be measured by taking the "============================= test session starts ==============================" time and test finish

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suite model time1 time2
gsm8k_small_g3_tp1_part2 granite-8b.yaml 00:01:21 00:01:24
gsm8k_small_g3_tp1_part2 granite-20b.yaml 00:01:48 00:01:51
gsm8k_small_g3_tp1_part3 Qwen2-7b-Instruct.yaml 00:01:13 00:01:16
gsm8k_small_g3_tp1_part3 Mistral-7B-Instruct-v0.3.yaml 00:01:13 00:01:16
gsm8k_large_g3_tp2_part2 Mixtral-8x7B-Instruct-v0.1.yaml 00:01:39 00:06:29
gsm8k_small_g3_tp1_fp8 granite-8b-fp8.yaml 00:01:23 00:05:42
gsm8k_small_g3_tp1_fp8 granite-20b-fp8.yaml 00:01:51 00:05:38

Ok, for clarity I gathered two values:

  1. test time as reported directly in logs (e.g. ================== 1 passed, 3 warnings in 111.16s (0:01:51) ===================)
  2. time taken between 'test session starts' and 'PASSED MODEL', which for large and fp8 models is usually several minutes longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants