Introduce CausalLMModel intefrace and add IREE numerics test for Llama 3.1 8B FP16 TP8 #375

sogartar · 2024-10-30T15:15:19Z

We do not have a clearly defined interface for LMs decode and prefill have different signature when exporting to IREE.
Here is added a new ABC CausalLMModel that makes a distinction between the to variants.
The BaseCausalLMModel provides a default implementation for the new prefill_from_seq_lens and decode_from_seq_lens methods.

The export script export_paged_llm_v1 does too much in its exported functions, first it computes the attention mask then it shards its arguments results.
This change moves this into a separate class that conforms to the CausalLMModel intefrace.

Introduce a new CausalLMIreeModel that conforms to CausalLMModel, but is backed by an IREE module.
It is not performant and only meant for testing as it marshals tensors and uses the IREE Python bindings.
This can then be used for example in the paged_llm_v1.TorchGenerator or other places where a LM is expected.

Refactor the sharded Llama tests. Increase code reuse and use the TorchGenerator in the toy-sized tests. Use the shard_llm_dataset and export_paged_llm_v1 scripts in the test flow to increase their test coverage.

Introduce a Llama 3.1 8B FP16 TP8 test that appears to not have good numerical accuracy. It is compared to an fp64 unsharded torch variant to ensure that the reference is of high accuracy.

…a 3.1 8B FP16 TP8 We do not have a clearly defined interface for LMs decode and prefill have different signature when exporting to IREE. Here is added a new ABC CausalLMModel that makes a distinction between the to variants. The BaseCausalLMModel provides a default implementation for the new prefill_from_seq_lens and decode_from_seq_lens methods. The export script export_paged_llm_v1 does too much in its exported functions, first it computes the attention mask then it shards its arguments results. This change moves this into a separate class that conforms to the CausalLMModel intefrace. Introduce a new CausalLMIreeModel that conforms to CausalLMModel, but is backed by an IREE module. It is not performant and only meant for testing as it marshals tensors and uses the IREE Python bindings. This can then be used for example in the paged_llm_v1.TorchGenerator or other places where a LM is expected. Refactor the sharded Llama tests. Increase code reuse and use the TorchGenerator in the toy-sized tests. Use the shard_llm_dataset and export_paged_llm_v1 scripts in the test flow to increase their test coverage. Introduce a Llama 3.1 8B FP16 TP8 test that appears to not have good numerical accuracy. It is compared to an fp64 unsharded torch variant to ensure that the reference is of high accuracy.

rsuderman

There is way too much in one PR making this almost impossible to reasonably review. Can you please break this into multiple PRs with clear goals. This scale of change is almost impossible to tell whether it will trigger more failures

rsuderman · 2024-10-30T16:52:37Z

sharktank/sharktank/models/llama/llama.py

@@ -27,7 +27,7 @@
 ################################################################################


-class PagedLlamaModelV1(BaseCausalLMModel):
+class PagedLlamaModelV1(BaseCausalLMModel, CausalLMModel):


Double inheritance is a giant red flag to me. It feels extremely wrong to have two sets of CausalLMModel

I guess the naming is not good. One of them is implementation and the other is an ABC. I will rename them to something more clear. I did it like that because BaseCausalLMModel implements just a part of the whole CausalLMModel interface.

rsuderman · 2024-10-30T16:53:22Z

sharktank/sharktank/examples/export_paged_llm_v1.py



-def main():
+def dtype_from_str(s: str) -> torch.dtype:


Use a map rather than string manipulations. It is as simple as

{ "fp8":torch.fp8, "f32":torch.f32, .... }

rsuderman · 2024-10-30T16:54:10Z

sharktank/sharktank/examples/export_paged_llm_v1.py

@@ -59,8 +69,18 @@ def main():
        default="decomposed",
        choices=["decomposed", "torch"],
    )
+    parser.add_argument(
+        "--attention-dtype",


Can you provide a justification for adding these in the first place? They should be inferred from the data types of the functions

There is one test that uses fp32.

sogartar · 2024-10-30T21:00:18Z

@rsuderman here is one part of this PR as a separate #383.

sogartar · 2024-10-31T11:34:11Z

@rsuderman, per you request I have broken up this into multiple PRs. The last one is #394, which references all its dependencies.

sogartar marked this pull request as draft October 30, 2024 15:15

sogartar requested a review from dan-garvey October 30, 2024 15:16

sogartar marked this pull request as ready for review October 30, 2024 15:16

sogartar requested review from rsuderman and archana-ramalingam October 30, 2024 15:16

Make PathPrefixTestBase inherit TempDirTestBase

9c9defd

sogartar requested a review from IanNod October 30, 2024 15:38

rsuderman requested changes Oct 30, 2024

View reviewed changes

sogartar closed this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce CausalLMModel intefrace and add IREE numerics test for Llama 3.1 8B FP16 TP8 #375

Introduce CausalLMModel intefrace and add IREE numerics test for Llama 3.1 8B FP16 TP8 #375

sogartar commented Oct 30, 2024

rsuderman left a comment

rsuderman Oct 30, 2024

sogartar Oct 30, 2024

rsuderman Oct 30, 2024

rsuderman Oct 30, 2024

sogartar Oct 30, 2024

sogartar commented Oct 30, 2024

sogartar commented Oct 31, 2024

Introduce CausalLMModel intefrace and add IREE numerics test for Llama 3.1 8B FP16 TP8 #375

Introduce CausalLMModel intefrace and add IREE numerics test for Llama 3.1 8B FP16 TP8 #375

Conversation

sogartar commented Oct 30, 2024

rsuderman left a comment

Choose a reason for hiding this comment

rsuderman Oct 30, 2024

Choose a reason for hiding this comment

sogartar Oct 30, 2024

Choose a reason for hiding this comment

rsuderman Oct 30, 2024

Choose a reason for hiding this comment

rsuderman Oct 30, 2024

Choose a reason for hiding this comment

sogartar Oct 30, 2024

Choose a reason for hiding this comment

sogartar commented Oct 30, 2024

sogartar commented Oct 31, 2024