Support Granite 3.0 and 3.1 models #558

JamesKunstle · 2025-02-05T03:37:33Z

Granite 3.(0,1) models are Llama-architecture models with some different scaling terms in various places. This commit adds granite model patching for decoder-only granite 3 models (not multimodal) and the corresponding tests.

Summary

This change enables patching Granite 3.(0,1) models w/ Liger kernels. We would like to use Liger kernels in our training implementation but we're a Granite-first codebase for the moment.

Testing Done

Convergence tests confirm that loss and model parameters are equivalent w/ and w/o Liger kernels. Logits, however, are not equivalent even when only swapping the SwiGLUMLP layer. The ator and rtol may need to be tuned for Granite vs. Llama, I'm going to continue investigating this before this PR is merged.

Hardware Type: EC2 g6e.12xlarge; 4xL40s
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

JamesKunstle · 2025-02-05T03:38:05Z

Fixes #557

DRXD1000

Conversion Tests (except for Multimodal ones) have been done with the changes in here

JamesKunstle · 2025-02-17T19:16:26Z

@DRXD1000 Thank you so much for the support! I really appreciate that you caught that config PEBCAK, I've been trying to figure out why the layer has different output behavior from Llama!

DRXD1000 · 2025-02-17T20:31:12Z

@JamesKunstle was a pleasure! I can't see what causes the merge conflict, if i can be of further assistance let me know :)

JamesKunstle · 2025-02-17T20:32:40Z

@DRXD1000 I'd like to request pausing on the merge for just a bit for two things:

I want to run the convergence tests myself so I can support this code in the future,
I need to try to get the FusedLinearCrossEntropy layer working correctly for Granite and error correctly if someone selects it. I think Granite may require logit materialization for scaling during the backward pass but I'm going to try to find a solution so we can still use that kernel.

I'll also fix the merge conflicts.

JamesKunstle · 2025-02-18T02:54:09Z

@DRXD1000 For my future education: how did you debug the logit scaling value problem and pick the values per data type?

DRXD1000 · 2025-02-18T05:18:25Z

@JamesKunstle i was looking at the granite implementation in huggingface and tried to load the model with Llama directly (in transformers not Liger to see if it is possible to skip a sperate implementatio). After a short benchmark i noticed this did not work.

After reading the modeling_granite.py in transformers again i noticed the logits_scaling with the very obvious #Main difference to llama comment (must have been blind the first time reading it...)

After that i checked your pr, changed the Tests and ran them. The test failed at logits. With the source Code in mind I then looked at the default settings in GraniteConfig and at some values of the trained models. Since they scaled with model size i gradually increased them till the test passed.

Since normal users will not do pretraing and a proper value of logits_scaling will allready exists, it was fine for me to do this trial and error approach.

Wow this got longer than planned 😅

JamesKunstle · 2025-02-18T22:07:11Z

@DRXD1000 That's an excellent explanation, thanks a lot, it helps a lot! I hadn't considered that value- I figured it was defaulted in the config so I didn't inspect it. I was trying to debug by isolating the layer from the GraniteMLP layer and comparing the individual logits and those were different too in my testing, so I was pretty confused. This seems like a much better way to investigate.

Granite 3.(0,1) models are Llama-architecture models with some different scaling terms in various places. This commit adds granite model patching for decoder-only granite 3 models (not multimodal) and the corresponding tests. Signed-off-by: James Kunstle <[email protected]>

Signed-off-by: James Kunstle <[email protected]>

JamesKunstle · 2025-02-18T23:34:03Z

Sorry for any reviewers- ruff reformatted lots of stuff from make checkstyle.

JamesKunstle · 2025-02-18T23:34:38Z

@ByronHsu Would like to have the workflow tests approved for merge!

DRXD1000 reviewed Feb 16, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

JamesKunstle force-pushed the support-granite-3 branch from 98e4062 to 9e59e13 Compare February 18, 2025 02:50

JamesKunstle and others added 5 commits February 18, 2025 15:32

Fix for Test Structure

62a3301

Adding Logits Scaling to MiniModelConfig

afb5d5d

Adjusting Value for bf16

f97b322

Update Granite FusedLCE error message; lint

e5ad0f6

Signed-off-by: James Kunstle <[email protected]>

JamesKunstle force-pushed the support-granite-3 branch from d8c00be to e5ad0f6 Compare February 18, 2025 23:32

lancerts self-requested a review February 19, 2025 05:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Granite 3.0 and 3.1 models #558

Support Granite 3.0 and 3.1 models #558

JamesKunstle commented Feb 5, 2025 •

edited

Loading

JamesKunstle commented Feb 5, 2025

DRXD1000 left a comment •

edited

Loading

JamesKunstle commented Feb 17, 2025

This comment was marked as outdated.

DRXD1000 commented Feb 17, 2025

JamesKunstle commented Feb 17, 2025 •

edited

Loading

JamesKunstle commented Feb 18, 2025

DRXD1000 commented Feb 18, 2025 •

edited

Loading

JamesKunstle commented Feb 18, 2025

JamesKunstle commented Feb 18, 2025

JamesKunstle commented Feb 18, 2025

Support Granite 3.0 and 3.1 models #558

Are you sure you want to change the base?

Support Granite 3.0 and 3.1 models #558

Conversation

JamesKunstle commented Feb 5, 2025 • edited Loading

Summary

Testing Done

JamesKunstle commented Feb 5, 2025

DRXD1000 left a comment • edited Loading

Choose a reason for hiding this comment

JamesKunstle commented Feb 17, 2025

This comment was marked as outdated.

DRXD1000 commented Feb 17, 2025

JamesKunstle commented Feb 17, 2025 • edited Loading

JamesKunstle commented Feb 18, 2025

DRXD1000 commented Feb 18, 2025 • edited Loading

JamesKunstle commented Feb 18, 2025

JamesKunstle commented Feb 18, 2025

JamesKunstle commented Feb 18, 2025

JamesKunstle commented Feb 5, 2025 •

edited

Loading

DRXD1000 left a comment •

edited

Loading

JamesKunstle commented Feb 17, 2025 •

edited

Loading

DRXD1000 commented Feb 18, 2025 •

edited

Loading