Add block with Llama-like implementations #346

2015aroras · 2023-10-26T23:37:46Z

When investigating why OLMo and Llama produce different results, we found a few different causes. This change adds a Llama block that we can use to address the following causes:

Fused output dimensions causes differing results on CUDA despite attempts to make computation deterministic.
Torch's attention output does not match that of Llama. This is somehow caused by torch's F.scaled_dot_product_attention.

Also, OLMo always applies rotary embeddings in fp32, whereas Llama does it in the current type (which can be bf16). I have added a config that allows us to configure how rotary embeddings are applied.

Closes #345.

2015aroras · 2023-10-30T19:48:23Z

olmo/model.py

@@ -309,10 +309,12 @@ def apply_rotary_pos_emb(self, pos_sin: torch.Tensor, pos_cos: torch.Tensor, t:
        return out.to(t.dtype)

    def forward(self, q: torch.Tensor, k: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
-        q_, k_ = q.float(), k.float()
+        q_, k_ = q.to(dtype=self.config.rope_precision), k.to(dtype=self.config.rope_precision)


These changes to rope should not affect the overall result if rope_precision_type = fp32 (the default).

dirkgr

No requested changes, but some questions.

dirkgr · 2023-10-30T23:12:36Z

olmo/model.py

        with torch.autocast(q.device.type, enabled=False):
            query_len, key_len = q_.shape[-2], k_.shape[-2]  # could be different if layer_past not None
            pos_sin, pos_cos = self.get_rotary_embedding(key_len, q_.device)
+            pos_sin = pos_sin.type_as(q_)


Is this going to change any existing runs that use Rope?

The rotary embeddings are fp32 by construction, so if rope_precision_type = fp32 (the default) then this shouldn't change the type.

dirkgr · 2023-10-30T23:15:39Z

olmo/model.py

+    ) -> torch.Tensor:
+        attn_weights = torch.matmul(q, k.transpose(-2, -1)) / math.sqrt(q.size(-1))
+
+        attn_bias = torch.zeros_like(attn_weights)


It does this every batch?

Yes. I can change this to re-use the attn_bias (Llama does this). I think there might be some code in place already that I can leverage

The problem is that it changes attn_bias in place?

I've updated the code to leverage existing get_causal_attention_bias. Now the bias should be re-used between calls (in the Llama implementation only). The moving around of get_causal_attention_bias to achieve this is not the cleanest.

dirkgr · 2023-10-30T23:16:56Z

olmo/model.py

+            attn_bias.masked_fill_(context_mask.logical_not(), torch.finfo(attn_bias.dtype).min)
+
+        if attn_mask is not None:
+            attn_bias += attn_mask.to(q.dtype)


This happens every batch?

Yes. I'm not sure how we can avoid doing this addition though. Llama does this addition every time it computes attention

2015aroras · 2023-10-31T00:52:46Z

Once this change is in, using Llama will require setting model.rope_precision_type = amp_bf16 and model.block_type = llama (as well as any changes needed before).

epwalsh

Just a couple comments

epwalsh · 2023-10-31T20:34:18Z

olmo/config.py

@@ -280,6 +286,11 @@ class ModelConfig(BaseConfig):
    Use rotary positional embeddings (RoPE). Mutually exclusive with ``alibi``.
    """

+    rope_precision_type: str = "fp32"


Consider making a StrEnum for the options here.

Or never mind if you go with my other suggestion.

epwalsh · 2023-10-31T20:40:00Z

olmo/config.py

@@ -280,6 +286,11 @@ class ModelConfig(BaseConfig):
    Use rotary positional embeddings (RoPE). Mutually exclusive with ``alibi``.
    """

+    rope_precision_type: str = "fp32"
+    """
+    Precision with which to apply RoPE (e.g. "amp_bf16", "amp_fp16", or "fp32").


I don't think "amp_*" is meaningful here. It seems like there should really be two options:

fp32, or

whatever type q and k are

So maybe change this to a flag called rope_full_precision or something?

I went with this suggestion instead of making it a StrEnum as in your other one.

epwalsh

LGTM

2015aroras added 6 commits October 25, 2023 15:00

Add OlmoLlamaBlock

a57a050

Add config for using the Llama block

53d68c8

Fix inverted context mask in attention impl, further clean impl

6c4b8e1

Run black

7743b0f

Add rope precision

07eb67c

Add missing type cast in updated RoPE

cebdbe5

2015aroras marked this pull request as ready for review October 30, 2023 19:44

2015aroras requested review from epwalsh and dirkgr October 30, 2023 19:44

2015aroras commented Oct 30, 2023

View reviewed changes

dirkgr approved these changes Oct 30, 2023

View reviewed changes

2015aroras added 3 commits October 30, 2023 17:43

Move some attention bias logic out of Olmo to use in OlmoLlamaBlock

b8938d5

Run black

2da1a0a

Store buffer cache in OlmoLlamaBlock

57caf05

Merge branch 'main' into shanea/llama-block

9f0d165

epwalsh requested changes Oct 31, 2023

View reviewed changes

2015aroras added 3 commits November 2, 2023 09:20

Merge branch 'main' into shanea/llama-block

4b9aec6

Change RoPE precision to a boolean config

7d443b6

Add activation checkpointing to Llama block

6b1a77c

epwalsh approved these changes Nov 2, 2023

View reviewed changes

2015aroras merged commit 4ccf2bd into main Nov 2, 2023
10 checks passed

2015aroras deleted the shanea/llama-block branch November 2, 2023 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add block with Llama-like implementations #346

Add block with Llama-like implementations #346

2015aroras commented Oct 26, 2023 •

edited

Loading

2015aroras Oct 30, 2023 •

edited

Loading

dirkgr left a comment

dirkgr Oct 30, 2023

2015aroras Oct 30, 2023 •

edited

Loading

dirkgr Oct 30, 2023

2015aroras Oct 30, 2023

dirkgr Oct 30, 2023

2015aroras Oct 31, 2023 •

edited

Loading

dirkgr Oct 30, 2023

2015aroras Oct 30, 2023

2015aroras commented Oct 31, 2023 •

edited

Loading

epwalsh left a comment

epwalsh Oct 31, 2023

epwalsh Oct 31, 2023

epwalsh Oct 31, 2023

2015aroras Nov 2, 2023

epwalsh left a comment

Add block with Llama-like implementations #346

Add block with Llama-like implementations #346

Conversation

2015aroras commented Oct 26, 2023 • edited Loading

2015aroras Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

dirkgr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2015aroras Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2015aroras Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2015aroras commented Oct 31, 2023 • edited Loading

epwalsh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epwalsh left a comment

Choose a reason for hiding this comment

2015aroras commented Oct 26, 2023 •

edited

Loading

2015aroras Oct 30, 2023 •

edited

Loading

2015aroras Oct 30, 2023 •

edited

Loading

2015aroras Oct 31, 2023 •

edited

Loading

2015aroras commented Oct 31, 2023 •

edited

Loading