Added Support for ReLU Mistral and Llama Huggingface models #400

mmoffatt2 · 2025-02-18T21:19:49Z

Similar to the gpt_train.py and gpt_model.py files, these allow for Huggingface Mistral and Llama models with customizable attention implementations.

To run, simply call python3 mistral_train.py in the huggingface_models folder, which will invoke the ReLU attention Mistral model (with pre-trained weights) from mistral_model.py and fine-tune it on Wikitext103.

added training options for ReLU mistral and llama

cf8d85f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Support for ReLU Mistral and Llama Huggingface models #400

Added Support for ReLU Mistral and Llama Huggingface models #400

mmoffatt2 commented Feb 18, 2025 •

edited

Loading

Added Support for ReLU Mistral and Llama Huggingface models #400

Are you sure you want to change the base?

Added Support for ReLU Mistral and Llama Huggingface models #400

Conversation

mmoffatt2 commented Feb 18, 2025 • edited Loading

mmoffatt2 commented Feb 18, 2025 •

edited

Loading