[Model] Add OLMo November 2024 model #10503

2015aroras · 2024-11-20T22:16:12Z

An updated OLMo model will be released in November. The new model has a few small architecture changes compared to the original model:

RMSNorm is used instead of standard layer norm.
Norm is applied to attention queries and keys.
Norm is applied after attention/feedforward rather than before.

The model has been implemented in transformers (huggingface/transformers#34551). This PR implements the model in vLLM.

Signed-off-by: Shane A <[email protected]>

github-actions · 2024-11-20T22:16:26Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

tlrmchlsmth · 2024-11-20T22:24:58Z

vllm/model_executor/models/olmo_1124.py

do you think it would make sense to have the existing olmo model definition (with expanded functionality) cover this one?

I'm happy to do that if you prefer. We went that route first with transformers and they told us to do separate models instead 😆 .

If you think the model definitions are sufficiently similar, then I think that would be a good move!

It's pretty messy since:

The new norm is parametric while the old one is not.

Making norm able to be put both before and after feedforward/attention (depending on the model) makes the forward pass pretty messy. Also, the modules have different names depending on whether they are before or after.

We would strongly prefer to have separate models, but if you insist then we will follow whatever decision you folks choose.

Is this what the model definition would look like if we kept them together https://github.com/huggingface/transformers/pull/34497/files#diff-bcd9325f22ada9d41cdb22d8497a1c31dd874d1a6c2ea4315f1bf795aabb9a43?

I see what you mean about the if self.norm_after: checks all over the place. OK by me

Yes, that's the model definition (barring minor improvements/cleanup). Thank you!

I agree it should be fine to have separate models in this case

youkaichao · 2024-11-23T00:31:54Z

vllm/model_executor/models/olmo_1124.py

+            num_kv_heads=self.num_kv_heads,
+            cache_config=vllm_config.cache_config,
+            quant_config=vllm_config.quant_config,
+        )


please pass in the prefix , it is required for attention now.

19e26b7 Wasn't sure what to pass in for this since it doesn't correspond to a separate set of weights. I followed exaone and just passed prefix as is.

Signed-off-by: Shane A <[email protected]>

2015aroras · 2024-11-25T16:24:39Z

We renamed the model to OLMo2 in transformers (huggingface/transformers#34864). I have updated the model here accordingly.

mgoin

This looks good to me, thanks for the quick responses

2015aroras · 2024-11-25T18:09:15Z

A test fails because Olmo2Config is not in the version of transformers being used in the vLLM test. The rename was only merged a few hours ago. I believe transformers is looking to do a new release very soon.
https://buildkite.com/vllm/ci-aws/builds/11766#01936444-39a8-48ad-9085-7d964b2f56fc/277-553

youkaichao · 2024-11-25T18:18:28Z

A test fails because Olmo2Config is not in the version of transformers being used in the vLLM test. The rename was only merged a few hours ago. I believe transformers is looking to do a new release very soon. buildkite.com/vllm/ci-aws/builds/11766#01936444-39a8-48ad-9085-7d964b2f56fc/277-553

it is also fine to directly copy the config file into vllm, e.g.

vllm/vllm/transformers_utils/configs/chatglm.py

Line 6 in 452a4e8

class ChatGLMConfig(PretrainedConfig):

Signed-off-by: Shane A <[email protected]>

2015aroras · 2024-11-25T18:36:59Z

it is also fine to directly copy the config file into vllm, e.g.

vllm/vllm/transformers_utils/configs/chatglm.py

Line 6 in 452a4e8

class ChatGLMConfig(PretrainedConfig):

Wow, I didn't know! I've done this approach now (b50a3bf) and verified that vllm serve works for an old transformers version.

youkaichao · 2024-11-25T18:51:01Z

another option is to keep an AutoConfig field in your config.json, like https://huggingface.co/deepseek-ai/DeepSeek-V2.5/blob/98b11844770b2c3ffc18b175c758a803640f4e77/config.json#L8

youkaichao · 2024-11-25T18:51:24Z

see https://docs.vllm.ai/en/latest/design/huggingface_integration.html for more details.

2015aroras · 2024-11-25T22:02:44Z

I'm happy with the current state of the implementation, and all checks are passing. Please let me know if anything else needs to be done on my end to get this merged.

Signed-off-by: Andrew Feldman <[email protected]>

2015aroras added 9 commits November 20, 2024 13:41

Add OLMo November release implementation

67c6011

Signed-off-by: Shane A <[email protected]>

Add OLMo model to registry

e073590

Signed-off-by: Shane A <[email protected]>

Add weight loading

75bc973

Signed-off-by: Shane A <[email protected]>

Run formatter

ab57eb0

Signed-off-by: Shane A <[email protected]>

Update tests

ffe121d

Signed-off-by: Shane A <[email protected]>

Update docs

1d3c611

Signed-off-by: Shane A <[email protected]>

Update comments

b8a47a8

Signed-off-by: Shane A <[email protected]>

Change module prefixes from Olmo to Olmo1124

65d62c9

Signed-off-by: Shane A <[email protected]>

Run formatter

b941a24

Signed-off-by: Shane A <[email protected]>

2015aroras requested review from youkaichao, DarkLight1337 and ywang96 as code owners November 20, 2024 22:16

mergify bot added the documentation Improvements or additions to documentation label Nov 20, 2024

tlrmchlsmth reviewed Nov 20, 2024

View reviewed changes

tlrmchlsmth mentioned this pull request Nov 21, 2024

[Distributed] Tensor Parallel RMSNorm #10542

Draft

youkaichao reviewed Nov 23, 2024

View reviewed changes

2015aroras added 3 commits November 23, 2024 23:05

Pass Olmo1124Attention prefix to Attention

19e26b7

Signed-off-by: Shane A <[email protected]>

Rename Olmo1124 to Olmo2

b717887

Signed-off-by: Shane A <[email protected]>

Update OLMo2 documentation

17893b4

Signed-off-by: Shane A <[email protected]>

mgoin approved these changes Nov 25, 2024

View reviewed changes

mgoin added new model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed labels Nov 25, 2024

2015aroras added 2 commits November 25, 2024 10:29

Add Olmo2Config implementation

b50a3bf

Signed-off-by: Shane A <[email protected]>

Run formatter

a398635

Signed-off-by: Shane A <[email protected]>

mgoin merged commit 9db713a into vllm-project:main Nov 25, 2024
53 checks passed

afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Dec 2, 2024

[Model] Add OLMo November 2024 model (vllm-project#10503)

0b34acf

Signed-off-by: Andrew Feldman <[email protected]>

This was referenced Dec 4, 2024

[Usage]: Sampling several sequences from OpenAI compatible server. #10852

Open

[New Model]: OLMo 2 13B #10869

Closed

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Model] Add OLMo November 2024 model (vllm-project#10503)

6144476

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add OLMo November 2024 model #10503

[Model] Add OLMo November 2024 model #10503

2015aroras commented Nov 20, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 20, 2024

tlrmchlsmth Nov 20, 2024

2015aroras Nov 20, 2024

tlrmchlsmth Nov 20, 2024

2015aroras Nov 20, 2024 •

edited

Loading

tlrmchlsmth Nov 20, 2024

2015aroras Nov 20, 2024 •

edited

Loading

mgoin Nov 22, 2024

youkaichao Nov 23, 2024

2015aroras Nov 24, 2024

2015aroras commented Nov 25, 2024

mgoin left a comment

2015aroras commented Nov 25, 2024 •

edited

Loading

youkaichao commented Nov 25, 2024

2015aroras commented Nov 25, 2024

youkaichao commented Nov 25, 2024

youkaichao commented Nov 25, 2024

2015aroras commented Nov 25, 2024

[Model] Add OLMo November 2024 model #10503

[Model] Add OLMo November 2024 model #10503

Conversation

2015aroras commented Nov 20, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 20, 2024

tlrmchlsmth Nov 20, 2024

Choose a reason for hiding this comment

2015aroras Nov 20, 2024

Choose a reason for hiding this comment

tlrmchlsmth Nov 20, 2024

Choose a reason for hiding this comment

2015aroras Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

tlrmchlsmth Nov 20, 2024

Choose a reason for hiding this comment

2015aroras Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

mgoin Nov 22, 2024

Choose a reason for hiding this comment

youkaichao Nov 23, 2024

Choose a reason for hiding this comment

2015aroras Nov 24, 2024

Choose a reason for hiding this comment

2015aroras commented Nov 25, 2024

mgoin left a comment

Choose a reason for hiding this comment

2015aroras commented Nov 25, 2024 • edited Loading

youkaichao commented Nov 25, 2024

2015aroras commented Nov 25, 2024

youkaichao commented Nov 25, 2024

youkaichao commented Nov 25, 2024

2015aroras commented Nov 25, 2024

2015aroras commented Nov 20, 2024 •

edited by github-actions bot

Loading

2015aroras Nov 20, 2024 •

edited

Loading

2015aroras Nov 20, 2024 •

edited

Loading

2015aroras commented Nov 25, 2024 •

edited

Loading