Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test script and workflow for llama export, compile, serve. #70

Merged
merged 5 commits into from
Jun 28, 2024

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Jun 25, 2024

Progress on #22

Sample runs on my fork:

I decided to run this on a nightly schedule and on workflow_dispatch. It takes around 10 minutes so it could run on pull_request if we want too.

As these components stabilize and we spend less time hacking on individual steps using the full toolkit (python -> manual iree-compile vs. using the in-process compiler API) we can switch the test from a bash script to a pytest file. Need to start somewhere :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is https://github.com/nod-ai/sharktank/tree/main/sharktank/sharktank/models/punet running through IREE yet? We could set up a similar set of tests for that model too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but fragile. Can start with just eager and import/export. Need to fetch from an HF repo to do it.

tests/llama_export_compile_serve.sh Outdated Show resolved Hide resolved
tests/llama_export_compile_serve.sh Outdated Show resolved Hide resolved
.github/workflows/test.yaml Show resolved Hide resolved
Copy link
Contributor

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I suspect we'll want to run a variant of this on a real runner with a GPU, and that should cache pytorch/rocm and HF download locally. Will need that for punet since that really needs to run on a GPU.

.github/workflows/test.yaml Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but fragile. Can start with just eager and import/export. Need to fetch from an HF repo to do it.

tests/llama_export_compile_serve.sh Outdated Show resolved Hide resolved
@ScottTodd
Copy link
Member Author

Thanks. I suspect we'll want to run a variant of this on a real runner with a GPU, and that should cache pytorch/rocm and HF download locally. Will need that for punet since that really needs to run on a GPU.

Good to know for punet. In other repos (IREE, SHARK-TestSuite) we have a few persistent runners and test scripts are already set up to use a local cache.

This is currently using a 7GB .gguf file (3b parameters in f16) from an open model on huggingface. We can test with other quantized data types and larger parameter sizes too. That may work well enough on nightly runs using standard runners but yeah - should use larger runners with persistent caches for best results.

@ScottTodd ScottTodd merged commit 0db27e1 into nod-ai:main Jun 28, 2024
2 of 3 checks passed
ScottTodd added a commit to nod-ai/SHARK-TestSuite that referenced this pull request Jun 28, 2024
Progress on nod-ai/sharktank#22

This adds one test for a llama model running through
https://github.com/nod-ai/sharktank. That project is still getting set
up, so new docs for this particular workflow are coming in at
nod-ai/sharktank#69 and tests in that repo are
in nod-ai/sharktank#70.

Specifically, this exercises:
*
[`sharktank/models/llama/llama.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/models/llama/llama.py)
*
[`sharktank/examples/export_paged_llm_v1.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/examples/export_paged_llm_v1.py)
with batch sizes == [4]
* The `open-llama-3b-v2-f16.gguf` file from
https://huggingface.co/SlyEcho/open_llama_3b_v2_gguf
* Compilation and crashless execution, _not_ numerical correctness (yet)

Ideas for future work:

* Test cases for the same model/parameters
  * Other batch sizes
  * `decode()` as well as `prefill()`
* Real inputs with expected outputs (`decode()` crashes on some faked
inputs still 🤔)
* Other flag combinations and target configurations (starting simple
though)
* Test cases for other models/parameters
  * 8b / 70b parameter models
  * Mistral, Mixtral, Gemma, etc.
renxida pushed a commit to nod-ai/SHARK-TestSuite that referenced this pull request Jul 18, 2024
Progress on nod-ai/sharktank#22

This adds one test for a llama model running through
https://github.com/nod-ai/sharktank. That project is still getting set
up, so new docs for this particular workflow are coming in at
nod-ai/sharktank#69 and tests in that repo are
in nod-ai/sharktank#70.

Specifically, this exercises:
*
[`sharktank/models/llama/llama.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/models/llama/llama.py)
*
[`sharktank/examples/export_paged_llm_v1.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/examples/export_paged_llm_v1.py)
with batch sizes == [4]
* The `open-llama-3b-v2-f16.gguf` file from
https://huggingface.co/SlyEcho/open_llama_3b_v2_gguf
* Compilation and crashless execution, _not_ numerical correctness (yet)

Ideas for future work:

* Test cases for the same model/parameters
  * Other batch sizes
  * `decode()` as well as `prefill()`
* Real inputs with expected outputs (`decode()` crashes on some faked
inputs still 🤔)
* Other flag combinations and target configurations (starting simple
though)
* Test cases for other models/parameters
  * 8b / 70b parameter models
  * Mistral, Mixtral, Gemma, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants