Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 1.81 KB

Run_Gemma.md

File metadata and controls

31 lines (22 loc) · 1.81 KB

Gemma

Gemma is a family of lightweight, state-of-the art open models built from research and technology that we used to create the Gemini models.

Following the instructions at kaggle will let you download Gemma model weights. You will have to consent to license for Gemma using your kaggle account's API credentials.

After downloading the weights run convert_gemma_chkpt.py, which converts the checkpoint to be compatible with MaxText and uploads them to a GCS bucket. You can run decode and finetuning using instructions mentioned in the test scripts at end_to_end/tpu/gemma.

MaxText supports pretraining and finetuning with high performance

Model Flop utilization for training on v5e and v5p TPUs.

Model v5e-256 (bf16) v5p-128 (bf16) v5e-256 (int8) v5p-128 (int8)
Gemma-2b 58% 55% 64% 68%
Gemma-7b 58% 60% 70% 70%