Refactor llama / mixtral / grok for shared features #267

rsuderman · 2024-10-11T04:30:15Z

Many of these features can toggle between depending on architecture. Replumbing the configurations separately allows better reuse and understanding of how models vary between eachother.

grok uses a softcap, plumbing a value enables sc * tanh( v / sc) grok has some hardcoded values that have better representations, e.g. sqrt(6144) and sqrt(3).

output normalization is optional but used by mixtral. Presence of the tensor is sufficient for performing the normalization.

sharktank/sharktank/models/grok/grok.py

archana-ramalingam · 2024-10-11T22:40:59Z

sharktank/sharktank/models/grok/grok.py

@@ -122,33 +123,32 @@ def prefill(
        self._assert_device(seq_block_ids)
        self._assert_device(*cache_state, dtype=self.activation_dtype)
        h = self.token_embedding(tokens)
-        h *= 78.38367176906169
+        h *= math.sqrt(h.shape[-1])


This requires import math

better as torch? I don't think math traces well

Do we not have prefill here testing export? If not that needs to be added

We should try to make faked versions of each model that is locally exportable. @KyleHerndon can you look into making faked theta parameters for the model? It can just be a single layer with smallers tensors.

Here are the theta generators for attention/FFN/MOE blocks. We already have attention and prefill tests, might need to add export component in there.

sharktank/sharktank/export_layer/export_moe.py

IanNod · 2024-10-14T18:03:21Z

sharktank/sharktank/models/grok/grok.py

@@ -122,33 +123,32 @@ def prefill(
        self._assert_device(seq_block_ids)
        self._assert_device(*cache_state, dtype=self.activation_dtype)
        h = self.token_embedding(tokens)
-        h *= 78.38367176906169
+        h *= math.sqrt(h.shape[-1])


Do we not have prefill here testing export? If not that needs to be added

sharktank/sharktank/models/grok/grok.py

sharktank/sharktank/layers/paged_llama_attention_block.py

sharktank/sharktank/models/grok/grok.py

sharktank/sharktank/layers/paged_llama_attention_block.py

sharktank/sharktank/models/grok/grok.py

IanNod

Looks good to me

Many of these features can toggle between depending on architecture. Replumbing the configurations separately allows better reuse and understanding of how models vary between eachother. grok uses a softcap, plumbing a value enables `sc * tanh( v / sc)` grok has some hardcoded values that have better representations, e.g. `sqrt(6144)` and `sqrt(3)`. output normalization is optional but used by mixtral. Presence of the tensor is sufficient for performing the normalization.

Many of these features can toggle between depending on architecture. Replumbing the configurations separately allows better reuse and understanding of how models vary between eachother. grok uses a softcap, plumbing a value enables `sc * tanh( v / sc)` grok has some hardcoded values that have better representations, e.g. `sqrt(6144)` and `sqrt(3)`. output normalization is optional but used by mixtral. Presence of the tensor is sufficient for performing the normalization. We remove the sparse moe block as we now know it will not be used due to poor performance.

rsuderman requested review from KyleHerndon, archana-ramalingam and dan-garvey October 11, 2024 05:03

rsuderman force-pushed the refactor_llm branch from 62aecb3 to bb5be7e Compare October 11, 2024 19:45

archana-ramalingam reviewed Oct 11, 2024

View reviewed changes

sharktank/sharktank/models/grok/grok.py Outdated Show resolved Hide resolved

archana-ramalingam reviewed Oct 11, 2024

View reviewed changes

IanNod reviewed Oct 14, 2024

View reviewed changes

rsuderman force-pushed the refactor_llm branch from bb5be7e to 770f83f Compare October 14, 2024 20:20

rsuderman requested review from IanNod and archana-ramalingam October 14, 2024 20:22

archana-ramalingam reviewed Oct 14, 2024

View reviewed changes

sharktank/sharktank/models/grok/grok.py Outdated Show resolved Hide resolved

archana-ramalingam reviewed Oct 14, 2024

View reviewed changes

sharktank/sharktank/layers/paged_llama_attention_block.py Outdated Show resolved Hide resolved

archana-ramalingam reviewed Oct 14, 2024

View reviewed changes

sharktank/sharktank/models/grok/grok.py Outdated Show resolved Hide resolved

archana-ramalingam reviewed Oct 14, 2024

View reviewed changes

sharktank/sharktank/layers/paged_llama_attention_block.py Show resolved Hide resolved

archana-ramalingam reviewed Oct 14, 2024

View reviewed changes

sharktank/sharktank/models/grok/grok.py Outdated Show resolved Hide resolved

rsuderman force-pushed the refactor_llm branch from 770f83f to 2bd1bbc Compare October 16, 2024 00:02

rsuderman requested a review from archana-ramalingam October 16, 2024 00:08

rsuderman force-pushed the refactor_llm branch from 5beb357 to 4fb7663 Compare October 16, 2024 19:10

IanNod approved these changes Oct 16, 2024

View reviewed changes

rsuderman added 6 commits October 16, 2024 12:30

fix tests

da77058

rework for review comments

7d6657a

Archana comments

933e18b

fix zip

80891fe

rsuderman force-pushed the refactor_llm branch from 4fb7663 to 80891fe Compare October 16, 2024 19:30

archana-ramalingam approved these changes Oct 16, 2024

View reviewed changes

rsuderman merged commit f5fcd00 into nod-ai:main Oct 16, 2024
8 of 9 checks passed

rsuderman deleted the refactor_llm branch October 16, 2024 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor llama / mixtral / grok for shared features #267

Refactor llama / mixtral / grok for shared features #267

rsuderman commented Oct 11, 2024

archana-ramalingam Oct 11, 2024

dan-garvey Oct 14, 2024

IanNod Oct 14, 2024

rsuderman Oct 14, 2024

archana-ramalingam Oct 14, 2024

IanNod Oct 14, 2024

IanNod left a comment

Refactor llama / mixtral / grok for shared features #267

Refactor llama / mixtral / grok for shared features #267

Conversation

rsuderman commented Oct 11, 2024

archana-ramalingam Oct 11, 2024

Choose a reason for hiding this comment

dan-garvey Oct 14, 2024

Choose a reason for hiding this comment

IanNod Oct 14, 2024

Choose a reason for hiding this comment

rsuderman Oct 14, 2024

Choose a reason for hiding this comment

archana-ramalingam Oct 14, 2024

Choose a reason for hiding this comment

IanNod Oct 14, 2024

Choose a reason for hiding this comment

IanNod left a comment

Choose a reason for hiding this comment