add Gemma2 support to MaxText #814

ZhaoyueCheng · 2024-08-06T23:08:04Z

add Gemma2 support to MaxText on top of PR

add gemma Decoder block which merges [local_sliding_attention, global_attention] combination into one Decoder layer with post_attn_norm, post_ffw_norm support
add convert_gemma2_chkpt.py to convert checkpoint from Gemma2 architecture
- merges one [local_sliding_attention, global_attention] combination into one Decoder layer
- adds post_attn_norm, post_ffw_norm, transpose_gating_einsum and query_pre_attn_scalar support which is new from Gemma1
enabling post_attn_norm, post_ffw_norm in the Gemma2 Decoder architecture
add final_logits_soft_cap support
add gemma2-2b gemma2-9b gemma2-27b config yaml file and 2b end_to_end test script and test golden logits dumped from flax

gagika

Some quick comments

MaxText/configs/base.yml

MaxText/configs/models/gemma2-9b.yml

end_to_end/tpu/gemma2/9b/1_test_gemma.sh

end_to_end/tpu/gemma2/9b/2_test_gemma.sh

MaxText/pyconfig.py

gagika · 2024-08-06T23:20:29Z

MaxText/scratch_code/golden_gemma2-2b_export-flax.ipynb

@@ -0,0 +1,331 @@
+{


Do we want to open source all the notebooks?

I think we had the original gemma export notebook and other notebooks (llama, mixtral, etc) open sourced so added the gemma2 notebook to scratch_code folder as well with other open source notebooks

salrowili · 2024-08-07T14:31:19Z

Please note that if we merge local_sliding_attention, global_attention and assigned half the decoder layer to "base_num_decoder_layers", the TFLOPS calculation will report half the actual TFLOPS . This is due to how maxtext_utils calcuate the TFLOPS
https://github.com/google/maxtext/blob/644eb87ae90dd8b210ce17f1c16ca7a54e80fceb/MaxText/maxtext_utils.py#L139

gobbleturk · 2024-08-07T14:47:35Z

Please note that if we merge local_sliding_attention, global_attention and assigned half the decoder layer to "base_num_decoder_layers", the TFLOPS calculation will report half the actual TFLOPS . This is due to how maxtext_utils calcuate the TFLOPS

https://github.com/google/maxtext/blob/644eb87ae90dd8b210ce17f1c16ca7a54e80fceb/MaxText/maxtext_utils.py#L139

Good catch! Thank you for noting this

ZhaoyueCheng · 2024-08-07T16:15:53Z

Please note that if we merge local_sliding_attention, global_attention and assigned half the decoder layer to "base_num_decoder_layers", the TFLOPS calculation will report half the actual TFLOPS . This is due to how maxtext_utils calcuate the TFLOPS

https://github.com/google/maxtext/blob/644eb87ae90dd8b210ce17f1c16ca7a54e80fceb/MaxText/maxtext_utils.py#L139

updated, thanks for the comment!

khatwanimohit

Thanks for adding this! Left a few nits.

QQ: Are we going to add logit checker for Gemma9B in a separate PR? Since you have the code for golden logits here, can you also add golden logits jsonl file for Gemma2-9B in this PR

khatwanimohit · 2024-08-07T22:11:16Z

MaxText/layers/gemma2.py

@@ -0,0 +1,246 @@
+"""
+Copyright 2023 Google LLC


nit: Update the copyright to 2024

updated, thanks for the review!

MaxText/pyconfig.py

gobbleturk

Overall looks very good, thank you for adding this! Just want to clean up the flops calculation.

MaxText/maxtext_utils.py

MaxText/pyconfig.py

gobbleturk

Awesome, thank you for fixing the tflops calculation!

ZhaoyueCheng · 2024-08-09T01:31:05Z

Awesome, thank you for fixing the tflops calculation!

Thanks for the detailed review and suggestions!!

…erter, Config Files, Flop Calculation and Run Scripts

salrowili · 2024-08-17T19:45:15Z

@ZhaoyueCheng I would like to thank you for adding Gemma 2 support which really is appreciated.

I have pre-trained Gemma 2 using maxtext but now i stuck since i could not convert MaxText to hugging face format to do the SFT stage. Gemma2, unlike Llama, Gemma and Mistral, uses local and global attention . Is there any thing you can do to "MaxText/llama_or_mistral_ckpt.py" script to include Gemma 2 checkpoint conversion to HF format? The global and local attention used in the following weights:

mlp_global
mlp_local
post_ffw_norm_global
post_ffw_norm_local
post_self_attention_norm_global
post_self_attention_norm_local
pre_ffw_norm_global
pre_ffw_norm_local
pre_self_attention_norm_global
pre_self_attention_norm_local
self_attention_global

This related to #829

ZhaoyueCheng requested review from gagika, gobbleturk and khatwanimohit August 6, 2024 23:08

ZhaoyueCheng self-assigned this Aug 6, 2024

gagika reviewed Aug 6, 2024

View reviewed changes

ZhaoyueCheng assigned gagika, gobbleturk and khatwanimohit and unassigned ZhaoyueCheng Aug 7, 2024

khatwanimohit approved these changes Aug 7, 2024

View reviewed changes

khatwanimohit removed their assignment Aug 7, 2024

gobbleturk requested changes Aug 7, 2024

View reviewed changes

MaxText/maxtext_utils.py Outdated Show resolved Hide resolved

MaxText/pyconfig.py Outdated Show resolved Hide resolved

gobbleturk assigned ZhaoyueCheng and unassigned gobbleturk Aug 8, 2024

ZhaoyueCheng assigned gobbleturk and unassigned ZhaoyueCheng Aug 8, 2024

gobbleturk approved these changes Aug 9, 2024

View reviewed changes

gobbleturk assigned ZhaoyueCheng and unassigned gobbleturk Aug 9, 2024

Add Gemma2 Support to MaxText: Gemma2 Decoder Layers, Checkpoint Conv…

0a4f0eb

…erter, Config Files, Flop Calculation and Run Scripts

ZhaoyueCheng force-pushed the gemma2-2b branch from da01053 to 0a4f0eb Compare August 9, 2024 22:38

ZhaoyueCheng unassigned gagika Aug 9, 2024

github-actions bot added the pull ready label Aug 9, 2024

copybara-service bot merged commit da50760 into main Aug 10, 2024
13 of 14 checks passed

copybara-service bot deleted the gemma2-2b branch August 10, 2024 00:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Gemma2 support to MaxText #814

add Gemma2 support to MaxText #814

ZhaoyueCheng commented Aug 6, 2024 •

edited

Loading

gagika left a comment

gagika Aug 6, 2024

ZhaoyueCheng Aug 6, 2024

salrowili commented Aug 7, 2024

gobbleturk commented Aug 7, 2024

ZhaoyueCheng commented Aug 7, 2024

khatwanimohit left a comment

khatwanimohit Aug 7, 2024

ZhaoyueCheng Aug 7, 2024

gobbleturk left a comment

gobbleturk left a comment

ZhaoyueCheng commented Aug 9, 2024

salrowili commented Aug 17, 2024

add Gemma2 support to MaxText #814

add Gemma2 support to MaxText #814

Conversation

ZhaoyueCheng commented Aug 6, 2024 • edited Loading

gagika left a comment

Choose a reason for hiding this comment

gagika Aug 6, 2024

Choose a reason for hiding this comment

ZhaoyueCheng Aug 6, 2024

Choose a reason for hiding this comment

salrowili commented Aug 7, 2024

gobbleturk commented Aug 7, 2024

ZhaoyueCheng commented Aug 7, 2024

khatwanimohit left a comment

Choose a reason for hiding this comment

khatwanimohit Aug 7, 2024

Choose a reason for hiding this comment

ZhaoyueCheng Aug 7, 2024

Choose a reason for hiding this comment

gobbleturk left a comment

Choose a reason for hiding this comment

gobbleturk left a comment

Choose a reason for hiding this comment

ZhaoyueCheng commented Aug 9, 2024

salrowili commented Aug 17, 2024

ZhaoyueCheng commented Aug 6, 2024 •

edited

Loading