[sharktank] Evaluation - Add Perplexity test #286

archana-ramalingam · 2024-10-16T17:06:46Z

Add Perplexity test for LLM evaluation

ScottTodd

The diff between the original and this PR looks good to make progress while keeping the other CI builds working.

I would like to see my other comments addressed at some point though:

Have the test download/cache the input gguf/json files instead of assume that they exist at some /data/extra/models/ path on the CI runner, so developers can also run these tests themselves.
Organize requirements.txt into sharktank/requirements-tests.txt so we can track what each subproject needs for devs/users/tests at a fine granularity.

.github/workflows/eval_test.yaml

docs/model_cookbook.md

sharktank/tests/evaluate/perplexity_test.py

Co-authored-by: Scott Todd <[email protected]>

…tform into perplexity-test

sharktank/sharktank/evaluate/data/eval_prompts.txt

sharktank/tests/evaluate/perplexity_test.py

…tform into perplexity-test

IanNod

Some minor comments that could be done in a follow up PR

sharktank/sharktank/evaluate/perplexity.py

sharktank/tests/evaluate/perplexity_test.py

IanNod · 2024-10-18T15:13:34Z

sharktank/tests/evaluate/baseline_perplexity_scores.json

@@ -0,0 +1,213 @@
+{


can we just store these values in a npy file or something and retrieve from sharkblobs?

…tform into perplexity-test

archana-ramalingam and others added 30 commits September 27, 2024 04:19

Add 'datasets' package to load golden dataset

aee0d58

Isolate padding function in tokenizer

1ee8594

Add utility function to load/run LLMs for evaluation pipeline

0103293

Add perplexity test

1dfdbc6

Cleanup

14050f5

delete file

b7c75f3

Add perplexity test

a44a8a2

Fix dataset loading

8034432

Update page_cache_size

a26b17d

add run_perplexity and prompts

cd079a7

Merge branch 'main' into perplexity-test

df84163

Shift logits and change activation dtype

3e0871e

Add Grok model

4a74107

Remove decode and run prefill on every turn

64e812d

Change activation dtype to enable quantized models

29e6031

Add timing wrapper

9c168f3

Add instructions to run evaluation-perplexity

38590bb

Add prompts text file

2bf8739

Add logging + cleanup

70f6ba5

Add CI perplexity test

848da59

Update prompt file path

7e49580

Remove unit tests for nightly

ec6968f

Add relative path + push attention_mask to device

f7667ec

Remove debug changes

7054141

Merge branch 'main' into perplexity-test

70a7b10

Update dtype to F32 for compatibility across torch versions

134c77f

Merge branch 'main' into perplexity-test

27f4e15

Add decode

8a0a081

Fix padding logits

e47fe4a

Add local model path

b15c06d

archana-ramalingam added 5 commits October 15, 2024 23:44

Clean up

4af53c3

Update argument

e2eb98c

Update baseline perplexity data for 405B

ae63d5b

Add 'longrun' pytest marker to skip perplexity tests for presubmit CI

76b1253

Remove Shortfin dependency

30be2f9

archana-ramalingam requested review from ScottTodd and IanNod October 16, 2024 18:56

Add 30F as a runner

26526e7

ScottTodd approved these changes Oct 16, 2024

View reviewed changes

.github/workflows/eval_test.yaml Outdated Show resolved Hide resolved

docs/model_cookbook.md Outdated Show resolved Hide resolved

sharktank/tests/evaluate/perplexity_test.py Outdated Show resolved Hide resolved

archana-ramalingam and others added 6 commits October 16, 2024 12:41

Add 27F as a runner the right way

0c4e4dc

Co-authored-by: Scott Todd <[email protected]>

Abstract prompt cleaning to get_prompts funtion

64e4654

Round perplexity scores to 6 decimals

f31da28

Add instructions to run perplexit tests for existing/new models

79880b7

Add README.md and requirements.txt for evaluation tests

cf75a4d

Merge branch 'perplexity-test' of https://github.com/nod-ai/SHARK-Pla…

0ef1eab

…tform into perplexity-test

IanNod reviewed Oct 17, 2024

View reviewed changes

archana-ramalingam and others added 6 commits October 17, 2024 12:04

Remove prompts.txt file

274cd62

Merge branch 'main' into perplexity-test

b3c752b

Install requirements-tests.txt to fix CI failure

bd4e2f3

Merge branch 'perplexity-test' of https://github.com/nod-ai/SHARK-Pla…

f809c2f

…tform into perplexity-test

Install requirements-tests.txt to fix CI failure

9a27623

Add model paths as args + refactor perplexity test

42b35f0

IanNod approved these changes Oct 18, 2024

View reviewed changes

archana-ramalingam and others added 4 commits October 18, 2024 18:58

Add sharding fixes

da66539

Merge branch 'main' into perplexity-test

7d440e3

Fix conflict with changes in main

726ec7e

Merge branch 'perplexity-test' of https://github.com/nod-ai/SHARK-Pla…

71a2f1e

…tform into perplexity-test

archana-ramalingam merged commit d181d67 into main Oct 19, 2024
9 checks passed

archana-ramalingam deleted the perplexity-test branch October 19, 2024 06:59

ScottTodd mentioned this pull request Oct 22, 2024

Benchmark Llama 3.1 f16 and fp8 with CI #284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sharktank] Evaluation - Add Perplexity test #286

[sharktank] Evaluation - Add Perplexity test #286

archana-ramalingam commented Oct 16, 2024

ScottTodd left a comment

IanNod left a comment

IanNod Oct 18, 2024

[sharktank] Evaluation - Add Perplexity test #286

[sharktank] Evaluation - Add Perplexity test #286

Conversation

archana-ramalingam commented Oct 16, 2024

ScottTodd left a comment

Choose a reason for hiding this comment

IanNod left a comment

Choose a reason for hiding this comment

IanNod Oct 18, 2024

Choose a reason for hiding this comment