Skip to content

Commit

Permalink
Merge branch 'main' into fiber-dist-tweak
Browse files Browse the repository at this point in the history
  • Loading branch information
monorimet authored Nov 22, 2024
2 parents 9ac9735 + 779adc3 commit 3efc767
Show file tree
Hide file tree
Showing 10 changed files with 114 additions and 51 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/ci-llama-large-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,14 @@ jobs:
iree-base-runtime
- name: Run llama tests
run: pytest sharktank/tests/models/llama/benchmark_amdgpu_test.py -v -s --run-nightly-llama-tests --iree-hip-target=gfx942 --html=out/index.html
run: pytest sharktank/tests/models/llama/benchmark_amdgpu_test.py -v -s --run-nightly-llama-tests --iree-hip-target=gfx942 --html=out/llm/llama/benchmark/index.html

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@4f9cc6602d3f66b9c108549d475ec49e8ef4d45e # v4.0.0
with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./out/llm/llama/benchmarks
destination_dir: ./llm/llama/benchmarks
publish_dir: ./out/llm/llama/benchmark
destination_dir: ./llm/llama/benchmark
keep_files: true

- name: Upload llama executable files
Expand Down
30 changes: 23 additions & 7 deletions .github/workflows/ci_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ concurrency:
cancel-in-progress: true

jobs:
test_perplexity_vmfb:
test_perplexity_iree:
if: ${{ github.repository_owner == 'nod-ai' || github.event_name != 'schedule' }}
timeout-minutes: 1000
name: "IREE/vmfb"
name: "Perplexity-IREE"
strategy:
matrix:
version: [3.11]
Expand Down Expand Up @@ -74,13 +74,21 @@ jobs:
iree-base-compiler \
iree-base-runtime
- name: Run perplexity test with vmfb
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_vmfb_test.py --longrun --iree-device='hip://7' --iree-hip-target=gfx942 --iree-hal-target-backends=rocm --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json
- name: Run perplexity test with IREE
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_iree_test.py --longrun --iree-device='hip://7' --iree-hip-target=gfx942 --iree-hal-target-backends=rocm --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json --html=out/llm/llama/perplexity/iree_perplexity/index.html

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@4f9cc6602d3f66b9c108549d475ec49e8ef4d45e # v4.0.0
with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./out/llm/llama/perplexity/iree_perplexity
destination_dir: ./llm/llama/perplexity/iree_perplexity
keep_files: true

test_perplexity_torch:
if: ${{ github.repository_owner == 'nod-ai' || github.event_name != 'schedule' }}
timeout-minutes: 1000
name: "Torch/eager mode"
name: "Perplexity-Torch"
strategy:
matrix:
version: [3.11]
Expand Down Expand Up @@ -123,5 +131,13 @@ jobs:
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"
- name: Run perplexity test in eager mode
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json
- name: Run perplexity test with Torch
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json --html=out/llm/llama/perplexity/torch_perplexity/index.html

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@4f9cc6602d3f66b9c108549d475ec49e8ef4d45e # v4.0.0
with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./out/llm/llama/perplexity/torch_perplexity
destination_dir: ./llm/llama/perplexity/torch_perplexity
keep_files: true
6 changes: 4 additions & 2 deletions docs/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,17 @@ sudo apt update && sudo apt install -y clang lld

Install:

```
python-is-python3 python3-venv python3-dev
```bash
sudo apt install python-is-python3 python3-venv python3-dev
```

<details>

<summary> Or, alternatively, use `pyenv` to manage a separate python installation for more control over its version: </summary>


The following instructions are taken from pyenv's guide here: https://github.com/pyenv/pyenv?tab=readme-ov-file#a-getting-pyenv

First, install pyenv and its dependencies.

```bash
Expand Down
20 changes: 18 additions & 2 deletions sharktank/sharktank/evaluate/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,32 @@ pip install -r sharktank/requirements-tests.txt

### Perplexity

Test perplexity for Llama3.1 8B & 405B (FP16 & FP8) models:
Perplexity score measures the ability of a language model to predict the next token in a sequence. A lower score indicates that a model has higher certainty in it's predictions. Perplexity acts as an intrinsic evaluation metric that measures the model quality, independent of any downstream task.

In SHARK-Platform, we use perplexity to track code regressions and quality loss across quantized models (with FP16 as baseline). We use 100 prompts randomly selected from the Wikitext-2 test set and calculate the mean perplexities shown below. These numbers are neither comparable between models with different tokenizers nor with other projects due to varying implementations.

* Test perplexity for Llama3.1 8B (FP16) model:

```bash
pytest sharktank/tests/evaluate/perplexity_test.py --longrun
```

Get perplexity for a new model:
* Calculate perplexity for a new model:

```bash
python -m sharktank.evaluate.perplexity \
--gguf-file=llama3_70b_f16.gguf \
--tokenizer-config-json=tokenizer_config.json
```

### Perplexity Scoreboard

| CPU | GPU |
|:-------------: |:----------:|
| AMD EPYC 9554 | MI300X |

#### LLaMA 3.1

|Models |Model size (GB) |Torch score |IREE score |
|:----------------------|:---------------|:-------------|:-------------|
|8B FP16 TP1 decomposed |16.07 |14.930181 |14.991893 |
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import json
import time
import random
import re
from datetime import timedelta
from tqdm import tqdm

Expand Down Expand Up @@ -83,11 +84,18 @@ def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
seconds = end - start
time_taken = abs(timedelta(seconds=round(seconds)))

if seconds < 1:
time_taken = f" {seconds * 1000} ms"
total_seconds = end - start
time_taken = abs(timedelta(seconds=total_seconds))
hours, minutes, seconds = re.split(":", str(time_taken))

if total_seconds < 1:
time_taken = f" {round(total_seconds * 1000, 3)} ms"
elif total_seconds < 60:
time_taken = "{:.2f} secs".format(round(float(total_seconds), 2))
else:
time_taken = "{:02d} hrs : {:02d} mins : {:.2f} secs".format(
int(hours), int(minutes), round(float(seconds), 2)
)

func_name = func.__name__
if func_name == "get_perplexity":
Expand Down
18 changes: 13 additions & 5 deletions sharktank/sharktank/evaluate/perplexity_torch.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import logging
import time
import random
import re
from datetime import timedelta
import json
import numpy as np
Expand Down Expand Up @@ -69,11 +70,18 @@ def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
seconds = end - start
time_taken = abs(timedelta(seconds=round(seconds)))

if seconds < 1:
time_taken = f" {seconds * 1000} ms"
total_seconds = end - start
time_taken = abs(timedelta(seconds=total_seconds))
hours, minutes, seconds = re.split(":", str(time_taken))

if total_seconds < 1:
time_taken = f" {round(total_seconds * 1000, 3)} ms"
elif total_seconds < 60:
time_taken = "{:.2f} secs".format(round(float(total_seconds), 2))
else:
time_taken = "{:02d} hrs : {:02d} mins : {:.2f} secs".format(
int(hours), int(minutes), round(float(seconds), 2)
)

func_name = func.__name__
if func_name == "get_perplexity":
Expand Down
18 changes: 13 additions & 5 deletions sharktank/sharktank/utils/export_artifacts.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import subprocess
import logging
import time
import re
from pathlib import Path
from datetime import timedelta
from typing import List, Optional
Expand Down Expand Up @@ -107,11 +108,18 @@ def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
seconds = end - start
time_taken = abs(timedelta(seconds=round(seconds)))

if seconds < 1:
time_taken = f" {seconds * 1000} ms"
total_seconds = end - start
time_taken = abs(timedelta(seconds=total_seconds))
hours, minutes, seconds = re.split(":", str(time_taken))

if total_seconds < 1:
time_taken = f" {round(total_seconds * 1000, 3)} ms"
elif total_seconds < 60:
time_taken = "{:.2f} secs".format(round(float(total_seconds), 2))
else:
time_taken = "{:02d} hrs : {:02d} mins : {:.2f} secs".format(
int(hours), int(minutes), round(float(seconds), 2)
)

func_name = func.__name__
logger.info(f" {func_name}: {time_taken}")
Expand Down
2 changes: 1 addition & 1 deletion sharktank/tests/evaluate/baseline_perplexity_scores.json
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@
],
"mean_perplexity": 6.060831
},
"llama3_8B_f16_decomposed_vmfb": {
"llama3_8B_f16_decomposed_iree": {
"perplexities": [
6.651368,
22.059452,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import pytest
import json

from sharktank.evaluate import perplexity_vmfb
from sharktank.evaluate import perplexity_iree

longrun = pytest.mark.skipif("not config.getoption('longrun')")

Expand All @@ -32,10 +32,10 @@ def test_llama3_8B_f16_decomposed(self):

# Llama 3.1 8B decomposed

model_name = "llama3_8B_f16_decomposed_vmfb"
model_name = "llama3_8B_f16_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_f16_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -67,10 +67,10 @@ def test_llama3_8B_f16(self):

# Llama 3.1 8B non-decomposed

model_name = "llama3_8B_f16_vmfb"
model_name = "llama3_8B_f16_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_f16_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -102,10 +102,10 @@ def test_llama3_8B_fp8_decomposed(self):

# Llama 3.1 8B decomposed

model_name = "llama3_8B_fp8_decomposed_vmfb"
model_name = "llama3_8B_fp8_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_fp8_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -137,10 +137,10 @@ def test_llama3_8B_fp8(self):

# Llama 3.1 8B non-decomposed

model_name = "llama3_8B_fp8_vmfb"
model_name = "llama3_8B_fp8_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_fp8_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -172,10 +172,10 @@ def test_llama3_405B_f16_decomposed(self):

# Llama 3.1 405B decomposed

model_name = "llama3_405B_f16_decomposed_vmfb"
model_name = "llama3_405B_f16_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_f16_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -207,10 +207,10 @@ def test_llama3_405B_f16(self):

# Llama 3.1 405B non-decomposed

model_name = "llama3_405B_f16_vmfb"
model_name = "llama3_405B_f16_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_f16_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -242,10 +242,10 @@ def test_llama3_405B_fp8_decomposed(self):

# Llama 3.1 405B decomposed

model_name = "llama3_405B_fp8_decomposed_vmfb"
model_name = "llama3_405B_fp8_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_fp8_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -277,10 +277,10 @@ def test_llama3_405B_fp8(self):

# Llama 3.1 405B non-decomposed

model_name = "llama3_405B_fp8_vmfb"
model_name = "llama3_405B_fp8_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_fp8_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down
13 changes: 9 additions & 4 deletions sharktank/tests/models/llama/benchmark_amdgpu_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,6 @@ def testBenchmark8B_f16_Decomposed(self):
)

@skipif_run_quick_llama_test
@pytest.mark.xfail(reason="Compile Error", strict=True, raises=IreeCompileException)
def testBenchmark8B_f16_Non_Decomposed_Prefill(self):
output_file_name = self.dir_path_8b / "f16_torch_prefill"
output_mlir = self.llama8b_f16_torch_sdpa_artifacts.create_file(
Expand Down Expand Up @@ -780,7 +779,9 @@ def testBenchmark405B_f16_TP8_Decomposed(self):
cwd=self.repo_root,
)

@pytest.mark.xfail(reason="Compile Error", strict=True, raises=IreeCompileException)
@pytest.mark.xfail(
reason="Benchmarking Error", strict=True, raises=IreeBenchmarkException
)
def testBenchmark405B_f16_TP8_Non_Decomposed(self):
output_file_name = self.dir_path_405b / "f16_torch"
output_mlir = self.llama405b_f16_torch_sdpa_artifacts.create_file(
Expand Down Expand Up @@ -828,7 +829,9 @@ def testBenchmark405B_f16_TP8_Non_Decomposed(self):
cwd=self.repo_root,
)

@pytest.mark.xfail(reason="Compile Error", strict=True, raises=IreeCompileException)
@pytest.mark.xfail(
reason="KeyError in theta.py", strict=True, raises=ExportMlirException
)
def testBenchmark405B_fp8_TP8_Decomposed(self):
output_file_name = self.dir_path_405b / "fp8_decomposed"
output_mlir = self.llama405b_fp8_decomposed_artifacts.create_file(
Expand Down Expand Up @@ -874,7 +877,9 @@ def testBenchmark405B_fp8_TP8_Decomposed(self):
cwd=self.repo_root,
)

@pytest.mark.xfail(reason="Compile Error", strict=True, raises=IreeCompileException)
@pytest.mark.xfail(
reason="KeyError in theta.py", strict=True, raises=ExportMlirException
)
def testBenchmark405B_fp8_TP8_Non_Decomposed(self):
output_file_name = self.dir_path_405b / "fp8_torch"
output_mlir = self.llama405b_fp8_torch_sdpa_artifacts.create_file(
Expand Down

0 comments on commit 3efc767

Please sign in to comment.