Skip to content

Latest commit

 

History

History
36 lines (21 loc) · 1.69 KB

Run_Mixtral.md

File metadata and controls

36 lines (21 loc) · 1.69 KB

Mixtral

Mixtral is a state-of-the-art AI model developed by Mistral AI, utilizing a sparse mixture-of-experts (MoE) architecture.

To get started, follow the instructions at mistral-inference to download the model. Once downloaded, run llama_or_mistral_ckpt.py to convert the checkpoint for MaxText compatibility. You can then proceed with decoding, pretraining, and finetuning. You could find Mixtral 8x7B example in the end_to_end/tpu/mixtral/8x7b test scripts.

Additionally, Mixtral integrates with MegaBlocks, an efficient dropless MoE strategy, which can be activated by setting the megablox flag to True (default).

MaxText supports pretraining and finetuning with high performance

Model Flop utilization for training on v5p TPUs.

Model size Accelerator type TFLOP/chip/sec Model flops utilization (MFU)
Mixtral 8X7B v5p-128 251.94 54.89%