-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support granite and granitemoe models #1099
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
related to huggingface/optimum-intel#1099 added opportunity to test these models via llm_bench Co-authored-by: Ilya Lavrenov <[email protected]>
related to huggingface/optimum-intel#1099 added opportunity to test these models via llm_bench Co-authored-by: Ilya Lavrenov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for investigating the MoE tracing problem !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
# copied from https://github.com/huggingface/transformers/blob/v4.47.1/src/transformers/models/granitemoe/modeling_granitemoe.py#L281 | ||
def _granite_moe_parallel_experts_forward(self, inputs, expert_size): | ||
output_list = [] | ||
# difference with original | ||
# 1) expert_size is tensor instead of list of ints after gating patching, that does not allow use original inputs.split(expert_size) | ||
# 2) use index_start:next_index for obtaining expert inputs splits one by one instead of precomputed splits once before cycle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super helpful, thanks!
Thank you eaidova and team for your work on this! Very instructive about how the IR format works. I ran a checkout to open this branch in a fresh conda environment and inspected the changes locally. Yet I am still getting errors that the optimum exporters extension is missing when running via the CLI tool or through export=True in from.pretrained. I still have a lot to learn about advanced package management with python but once the unrecognized export config error resolved I figured it might be useful to share here. |
a54202f
to
3609dd1
Compare
@SearchSavior in a clean env, after you checkout to this branch or even on main you should do |
@IlyasMoutawwakil Sorry for the delay! I did end up getting that working. However, in my dev env for inference performance with Granite has been much worse than expected on GPU and CPU across the model family. It's possible my conversion was borked in some way. Multiple conversions are uploaded here; https://huggingface.co/Echo9Zulu Will try converting again soon now that support is merged and will open an issue to present something more concrete. |
What does this PR do?
Fixes #1097
Before submitting