-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jamba liger fused linear+xentropy #102
base: main
Are you sure you want to change the base?
Conversation
awesome! please make sure you add both conv (w logits and w/o logits) and unit tests. we are very focused on testing |
I added the following additional monkey patch for Jamba. from transformers.models.jamba import modeling_jamba
if rms_norm:
# https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/gemma/modeling_gemma.py#L109
modeling_jamba.JambaRMSNorm = LigerRMSNorm
if cross_entropy:
modeling_jamba.CrossEntropyLoss = LigerCrossEntropyLoss
if swiglu:
modeling_jamba.JambaMLP = LigerSwiGLUMLP However, convergence test seems to be failing for some values in the tensor:
I tracked this down to LigerRMSNorm but needs more time to investigate why there is a difference |
if rope: | ||
modeling_jamba.apply_rotary_pos_emb = liger_rotary_pos_emb | ||
if rms_norm: | ||
# https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/gemma/modeling_gemma.py#L109 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the comment is wrong
if cross_entropy: | ||
modeling_jamba.CrossEntropyLoss = LigerCrossEntropyLoss | ||
if swiglu: | ||
modeling_jamba.JambaMLP = LigerSwiGLUMLP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is lce_forward?
HI @winglian created a PR towards main branch of your fork. Do you want to merge it first and then update this PR to base on that? winglian#1 Or I can create a separate PR to linkedin:main #214 @ByronHsu thoughts? |
@yubofredwang if your PR captures all the changes, I'm happy to have your PR supersede mine. thanks! |
@yubofredwang there are few conflicts |
Summary
Testing Done
make test
to ensure correctnessmake checkstyle
to ensure code stylemake test-convergence
to ensure convergence