-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add OLMo November 2024 model #10503
Conversation
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it would make sense to have the existing olmo model definition (with expanded functionality) cover this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do that if you prefer. We went that route first with transformers and they told us to do separate models instead 😆 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think the model definitions are sufficiently similar, then I think that would be a good move!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty messy since:
- The new norm is parametric while the old one is not.
- Making norm able to be put both before and after feedforward/attention (depending on the model) makes the forward pass pretty messy. Also, the modules have different names depending on whether they are before or after.
We would strongly prefer to have separate models, but if you insist then we will follow whatever decision you folks choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what the model definition would look like if we kept them together https://github.com/huggingface/transformers/pull/34497/files#diff-bcd9325f22ada9d41cdb22d8497a1c31dd874d1a6c2ea4315f1bf795aabb9a43?
I see what you mean about the if self.norm_after:
checks all over the place. OK by me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the model definition (barring minor improvements/cleanup). Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it should be fine to have separate models in this case
num_kv_heads=self.num_kv_heads, | ||
cache_config=vllm_config.cache_config, | ||
quant_config=vllm_config.quant_config, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please pass in the prefix
, it is required for attention now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
19e26b7 Wasn't sure what to pass in for this since it doesn't correspond to a separate set of weights. I followed exaone and just passed prefix
as is.
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
We renamed the model to OLMo2 in transformers (huggingface/transformers#34864). I have updated the model here accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks for the quick responses
A test fails because |
it is also fine to directly copy the config file into vllm, e.g.
|
Signed-off-by: Shane A <[email protected]>
Signed-off-by: Shane A <[email protected]>
Wow, I didn't know! I've done this approach now (b50a3bf) and verified that |
another option is to keep an |
see https://docs.vllm.ai/en/latest/design/huggingface_integration.html for more details. |
I'm happy with the current state of the implementation, and all checks are passing. Please let me know if anything else needs to be done on my end to get this merged. |
Signed-off-by: Andrew Feldman <[email protected]>
An updated OLMo model will be released in November. The new model has a few small architecture changes compared to the original model:
The model has been implemented in transformers (huggingface/transformers#34551). This PR implements the model in vLLM.