Add test script and workflow for llama export, compile, serve. #70

ScottTodd · 2024-06-25T23:11:39Z

Progress on #22

Sample runs on my fork:

I decided to run this on a nightly schedule and on workflow_dispatch. It takes around 10 minutes so it could run on pull_request if we want too.

As these components stabilize and we spend less time hacking on individual steps using the full toolkit (python -> manual iree-compile vs. using the in-process compiler API) we can switch the test from a bash script to a pytest file. Need to start somewhere :)

ScottTodd · 2024-06-25T23:13:26Z

tests/llama_export_compile_serve.sh

Is https://github.com/nod-ai/sharktank/tree/main/sharktank/sharktank/models/punet running through IREE yet? We could set up a similar set of tests for that model too.

Yes, but fragile. Can start with just eager and import/export. Need to fetch from an HF repo to do it.

tests/llama_export_compile_serve.sh

.github/workflows/test.yaml

stellaraccident

Thanks. I suspect we'll want to run a variant of this on a real runner with a GPU, and that should cache pytorch/rocm and HF download locally. Will need that for punet since that really needs to run on a GPU.

.github/workflows/test.yaml

stellaraccident · 2024-06-27T20:51:55Z

tests/llama_export_compile_serve.sh

Yes, but fragile. Can start with just eager and import/export. Need to fetch from an HF repo to do it.

tests/llama_export_compile_serve.sh

ScottTodd · 2024-06-28T15:43:23Z

Thanks. I suspect we'll want to run a variant of this on a real runner with a GPU, and that should cache pytorch/rocm and HF download locally. Will need that for punet since that really needs to run on a GPU.

Good to know for punet. In other repos (IREE, SHARK-TestSuite) we have a few persistent runners and test scripts are already set up to use a local cache.

This is currently using a 7GB .gguf file (3b parameters in f16) from an open model on huggingface. We can test with other quantized data types and larger parameter sizes too. That may work well enough on nightly runs using standard runners but yeah - should use larger runners with persistent caches for best results.

Progress on nod-ai/sharktank#22 This adds one test for a llama model running through https://github.com/nod-ai/sharktank. That project is still getting set up, so new docs for this particular workflow are coming in at nod-ai/sharktank#69 and tests in that repo are in nod-ai/sharktank#70. Specifically, this exercises: * [`sharktank/models/llama/llama.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/models/llama/llama.py) * [`sharktank/examples/export_paged_llm_v1.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/examples/export_paged_llm_v1.py) with batch sizes == [4] * The `open-llama-3b-v2-f16.gguf` file from https://huggingface.co/SlyEcho/open_llama_3b_v2_gguf * Compilation and crashless execution, _not_ numerical correctness (yet) Ideas for future work: * Test cases for the same model/parameters * Other batch sizes * `decode()` as well as `prefill()` * Real inputs with expected outputs (`decode()` crashes on some faked inputs still 🤔) * Other flag combinations and target configurations (starting simple though) * Test cases for other models/parameters * 8b / 70b parameter models * Mistral, Mixtral, Gemma, etc.

Add test script and workflow for llama export, compile, serve.

80e4648

ScottTodd requested review from stellaraccident, rsuderman and saienduri June 25, 2024 23:11

ScottTodd commented Jun 25, 2024

View reviewed changes

ScottTodd commented Jun 26, 2024

View reviewed changes

tests/llama_export_compile_serve.sh Outdated Show resolved Hide resolved

tests/llama_export_compile_serve.sh Outdated Show resolved Hide resolved

.github/workflows/test.yaml Show resolved Hide resolved

This was referenced Jun 26, 2024

Add test for open-llama-3b-v2-f16 model through sharktank. nod-ai/SHARK-TestSuite#272

Merged

[punet] CI for quantization import/compilation/golden check #76

Open

stellaraccident approved these changes Jun 27, 2024

View reviewed changes

ScottTodd added 3 commits June 28, 2024 08:39

Move script to build_tools/integration_tests.

064c71e

Switch from pinned release to latest nightly.

4bcaeb8

Merge remote-tracking branch 'upstream/main' into llama-ci

6796706

Set executable bit on script after moving (ugh Windows/git).

483859f

ScottTodd merged commit 0db27e1 into nod-ai:main Jun 28, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test script and workflow for llama export, compile, serve. #70

Add test script and workflow for llama export, compile, serve. #70

ScottTodd commented Jun 25, 2024 •

edited

Loading

ScottTodd Jun 25, 2024

stellaraccident Jun 27, 2024

stellaraccident left a comment

stellaraccident Jun 27, 2024

ScottTodd commented Jun 28, 2024

Add test script and workflow for llama export, compile, serve. #70

Add test script and workflow for llama export, compile, serve. #70

Conversation

ScottTodd commented Jun 25, 2024 • edited Loading

ScottTodd Jun 25, 2024

Choose a reason for hiding this comment

stellaraccident Jun 27, 2024

Choose a reason for hiding this comment

stellaraccident left a comment

Choose a reason for hiding this comment

stellaraccident Jun 27, 2024

Choose a reason for hiding this comment

ScottTodd commented Jun 28, 2024

ScottTodd commented Jun 25, 2024 •

edited

Loading