Extend accuracy tests for models that we support #824

AnetaKaczynska · 2025-02-13T10:59:36Z

This PR extends dev Nightly test suite with accuracy tests for broader range of models.

michalkuligowski

What is the delta time with new test suites?

AnetaKaczynska · 2025-02-27T16:19:36Z

Time taken by the new test suites:

name	time
gsm8k_small_g3_tp1_part2	6m 41s
gsm8k_small_g3_tp1_part3	9m 2s
gsm8k_large_g3_tp2_part2	11m 15s
gsm8k_small_g3_tp1_fp8	22m 26s

michalkuligowski · 2025-02-28T06:41:07Z

.jenkins/lm-eval-harness/configs/Mistral-7B-Instruct-v0.3.yaml

@@ -0,0 +1,11 @@
+model_name: "/mnt/weka/data/pytorch/mistral/Mistral-7B-Instruct-v0.3"


@AnetaKaczynska I was rather more interested in the time taken by the test itself, those measurements are for whole suite including environment preparation (which can vary for many reasons, and will vary also for existing suites). The exact time for the suite can be measured by taking the "============================= test session starts ==============================" time and test finish

suite model time1 time2

gsm8k_small_g3_tp1_part2 granite-8b.yaml 00:01:21 00:01:24

gsm8k_small_g3_tp1_part2 granite-20b.yaml 00:01:48 00:01:51

gsm8k_small_g3_tp1_part3 Qwen2-7b-Instruct.yaml 00:01:13 00:01:16

gsm8k_small_g3_tp1_part3 Mistral-7B-Instruct-v0.3.yaml 00:01:13 00:01:16

gsm8k_large_g3_tp2_part2 Mixtral-8x7B-Instruct-v0.1.yaml 00:01:39 00:06:29

gsm8k_small_g3_tp1_fp8 granite-8b-fp8.yaml 00:01:23 00:05:42

gsm8k_small_g3_tp1_fp8 granite-20b-fp8.yaml 00:01:51 00:05:38

Ok, for clarity I gathered two values:

test time as reported directly in logs (e.g. ================== 1 passed, 3 warnings in 111.16s (0:01:51) ===================)

time taken between 'test session starts' and 'PASSED MODEL', which for large and fp8 models is usually several minutes longer.

AnetaKaczynska requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz, vivekgoe and afierka-intel as code owners February 13, 2025 10:59

AnetaKaczynska added 9 commits February 24, 2025 11:00

Extend accuracy tests for models that we support

5df25da

Remove Llama-2-7B

9a1c5fe

Move models to existing test suites

6d88711

Limit number of samples to 500 to speed up tests

3326256

Run new small models only on G3 with TP=1

ecb863a

Run new fp8 models only on G3 with TP=1

d5e7947

Limit number of samples to 250 for Mixtral

cefd270

Run Mixtral only on G3 with TP=2

a57179c

Split long tests into smaller parts

9216947

michalkuligowski force-pushed the dev/akaczynska/tests branch from 9385f80 to 9216947 Compare February 24, 2025 10:00

michalkuligowski requested changes Feb 27, 2025

View reviewed changes

michalkuligowski requested changes Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend accuracy tests for models that we support #824

Extend accuracy tests for models that we support #824

AnetaKaczynska commented Feb 13, 2025

michalkuligowski left a comment

AnetaKaczynska commented Feb 27, 2025 •

edited

Loading

michalkuligowski Feb 28, 2025

AnetaKaczynska Mar 1, 2025

		@@ -0,0 +1,11 @@
		model_name: "/mnt/weka/data/pytorch/mistral/Mistral-7B-Instruct-v0.3"

suite	model	time1	time2
gsm8k_small_g3_tp1_part2	granite-8b.yaml	00:01:21	00:01:24
gsm8k_small_g3_tp1_part2	granite-20b.yaml	00:01:48	00:01:51
gsm8k_small_g3_tp1_part3	Qwen2-7b-Instruct.yaml	00:01:13	00:01:16
gsm8k_small_g3_tp1_part3	Mistral-7B-Instruct-v0.3.yaml	00:01:13	00:01:16
gsm8k_large_g3_tp2_part2	Mixtral-8x7B-Instruct-v0.1.yaml	00:01:39	00:06:29
gsm8k_small_g3_tp1_fp8	granite-8b-fp8.yaml	00:01:23	00:05:42
gsm8k_small_g3_tp1_fp8	granite-20b-fp8.yaml	00:01:51	00:05:38

Extend accuracy tests for models that we support #824

Are you sure you want to change the base?

Extend accuracy tests for models that we support #824

Conversation

AnetaKaczynska commented Feb 13, 2025

michalkuligowski left a comment

Choose a reason for hiding this comment

AnetaKaczynska commented Feb 27, 2025 • edited Loading

michalkuligowski Feb 28, 2025

Choose a reason for hiding this comment

AnetaKaczynska Mar 1, 2025

Choose a reason for hiding this comment

AnetaKaczynska commented Feb 27, 2025 •

edited

Loading