10 Jan 22:43

danielhanchen

b4c48d9

Phi-4 & Bug Fixes Latest

Latest

Please update Unsloth if you're seeing significant or unusual loss results—the latest update fixes an issue caused by the new transformers version. See our new updating instructions here

Phi-4 is now supported! We also fixed a few bugs in Phi-4 including:

EOS token should not be <|endoftext|> but rather <|im_end|>
Chat template did not have a add_generation_prompt
Pad token should not be <|endoftext|> but rather <|dummy_87|>

Try Phi-4 finetuning out in our Colab Notebook for conversational workloads.

We also llama-fied Phi-4 (converted to Llama arch), and uploaded all fixed versions and GGUFs to https://huggingface.co/unsloth

Using Hugging Face's Open LLM Leaderboard, we can see our fixes and changes do work, and sometimes increase accuracy vs Microsoft's original Phi-4 model!

Other community members show our fixes do in fact work:

More details on bug fixes here: https://unsloth.ai/blog/phi4

We also uploaded dynamic 4bit quants as well, with error plots here:

Bug Fixes

Gradient accumulation bug fix was not applied on Llama 3.2 Vision
Fix attention backwards performance regression for T4s, V100s for Xformers < 0.0.29
Enable gradients without make_inputs_require_grad - ie not just on model.get_input_embeddings() - this meant only language models had correctly applied requires_grad_(True), and all vision models had their PEFT modules on the vision part essentially be untrained.
Fix train_on_responses_only for Phi-4

New features

Auto enable gradient accumulation fixes for all models
Windows support via Triton Windows and Xformers for Windows
Xformers==0.0.29 support makes H100 training 10% faster
Updated all notebooks to use Github links

Update instructions

Please update Unsloth with no dependency updates via

pip install --upgrade --force-reinstall --no-deps --no-cache-dir unsloth unsloth_zoo

To get the latest Unsloth version, go to https://pypi.org/project/unsloth/ or do:

pip install unsloth

What's Changed

Feat/kto by @Erland366 in #1316
Fix orpo/dpo trainer by @dame-cell in #1286
Update README.md by @shimmyshimmer in #1383
Fix vision model tokenizer padding side. by @ZewenShen in #1384
Add citation section to README.md by @Erland366 in #1377
Granite support by @Datta0 in #1218
Llama 3.3 by @danielhanchen in #1393
Update README.md by @shimmyshimmer in #1401
Update README.md by @shimmyshimmer in #1411
Update README.md by @shimmyshimmer in #1412
Fix loader.py to work on Windows by @fundthmcalculus in #1453
Update save.py warning message by @qingy1337 in #1425
Change _fix_chat_template in case a template has both endif and endfor by @giuliabaldini in #1388
Pass position embeddings explicitly from decoder layer. by @Datta0 in #1442
Bug fixes by @danielhanchen in #1458
Name Error Bug Fix - import from packaging.version import Version by @developer0hye in #1468
Bug Fixes by @danielhanchen in #1470
Bug fixes by @danielhanchen in #1473
Bug fixes by @danielhanchen in #1484
Create CONTRIBUTING.md by @shimmyshimmer in #1472
Update CONTRIBUTING.md by @NinoRisteski in #1507
Bug fixes by @danielhanchen in #1516
Update init.py by @sebaxakerhtc in #1520
Phi-4 by @danielhanchen in #1523
Update README.md for Notebooks by @shimmyshimmer in #1515

New Contributors

@ZewenShen made their first contribution in #1384
@fundthmcalculus made their first contribution in #1453
@qingy1337 made their first contribution in #1425
@developer0hye made their first contribution in #1468
@NinoRisteski made their first contribution in #1507
@sebaxakerhtc made their first contribution in #1520

Contributors

fundthmcalculus, danielhanchen, and 10 other contributors

Assets 3

04 Dec 13:59

danielhanchen

December-2024

9dc399a

Llama 3.3 + Dynamic 4bit Quants

We provide dynamic 4bit quants which uses a bit more memory, but vastly improves accuracy for finetuning and inference. Unsloth will now default to these versions! See https://unsloth.ai/blog/dynamic-4bit for more details.

Llama 3.3 is out now! Read our blog: https://unsloth.ai/blog/llama3-3

You can now fine-tune Llama 3.3 (70B) up to 90,000 context lengths with Unsloth, which is 13x longer than what Hugging Face + FA2 supports at 6,900 on a 80GB GPU.
For Llama 3.1 (8B), Unsloth can now do a whopping 342,000 context length, which exceeds the 128K context lengths Llama 3.1 natively supported. HF + FA2 can only do 28,000 on a 80GB GPU, so Unsloth supports 12x context lengths.
70B models can now fit on 41GB of VRAM - nearly 40GB!

All notebooks now use these dynamic quants:

Llama 3.2 Vision finetuning - Radiography use case. Free Colab Kaggle Notebook
Qwen 2 VL Vision finetuning - Maths OCR to LaTeX. Free Colab Kaggle Notebook
Pixtral 12B Vision finetuning - General QA datasets. Free Colab
Please run pip install --upgrade --no-cache-dir unsloth unsloth_zoo

Experiments

Quantizing Qwen2-VL-2B Instruct down to 4 bits breaks the model entirely.

Qwen2-VL-2B-Instruct	Description	Size	Result
16bit	The image shows a train traveling on tracks.	4.11GB	✅
Default 4bit all layers	The image depicts a vibrant and colorful scene of a coastal area.	1.36GB	❌
Unsloth quant	The image shows a train traveling on tracks.	1.81GB	✅

Merging to 16bit now works as expected.

Fixed a major bug which caused merges to not function correctly for vision models.

Llama.cpp GGUF saving now uses `cmake`.

All saving modules are also updated inside of Unsloth!

Apple Cut Cross Entropy

We worked with Apple to add Cut Cross Entropy into Unsloth which reduces VRAM use and increase context length further.

QwQ 4bit quants and GGUFs

Try a O1 test time compute LLM out! See https://huggingface.co/unsloth

What's Changed

Vision by @danielhanchen in #1318
Bug fixes for vision by @danielhanchen in #1340
Update README.md by @shimmyshimmer in #1374
Fix llama.cpp GGUF by @danielhanchen in #1375
Dynamic quants by @danielhanchen in #1379

Full Changelog: November-2024...December-2024

Contributors

danielhanchen and shimmyshimmer

Assets 2

21 Nov 17:55

danielhanchen

November-2024

c2b185e

Vision finetuning

We support Llama 3.2 Vision 11B, 90B; Pixtral; Qwen2VL 2B, 7B, 72B; and any Llava variants like Llava NeXT!
We support 16bit LoRA or 4bit QLoRA. Both are accelerated and use much less memory!
Llama 3.2 Vision finetuning - Radiography use case. Free Colab Kaggle Notebook
Qwen 2 VL Vision finetuning - Maths OCR to LaTeX. Free Colab Kaggle Notebook
Pixtral 12B Vision finetuning - General QA datasets. Free Colab
Please run pip install --upgrade --no-cache-dir unsloth unsloth_zoo

from unsloth import FastVisionModel # NEW instead of FastLanguageModel
import torch

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Llama-3.2-11B-Vision-Instruct",
    load_in_4bit = True, # Use 4bit quantization to reduce memory usage. Can be False.
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
)

model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = True, # False if not finetuning vision part
    finetune_language_layers   = True, # False if not finetuning language part
    finetune_attention_modules = True, # False if not finetuning attention layers
    finetune_mlp_modules       = True, # False if not finetuning MLP layers

    r = 16,           # The larger, the higher the accuracy, but might overfit
    lora_alpha = 16,  # Recommended alpha == r at least
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
    # target_modules = "all-linear", # Optional now! Can specify a list if needed
)

from datasets import load_dataset
dataset = load_dataset("unsloth/llava-instruct-mix-vsft-mini", split = "train")
from unsloth import is_bf16_supported
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig

FastVisionModel.for_training(model) # Enable for training!

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    data_collator = UnslothVisionDataCollator(model, tokenizer), # Must use!
    train_dataset = dataset,
    args = SFTConfig(
        per_device_train_batch_size = 1, # Reduce to 1 to make Pixtral fit!
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 30,
        # num_train_epochs = 1, # Set this instead of max_steps for full training runs
        learning_rate = 2e-4,
        fp16 = not is_bf16_supported(),
        bf16 = is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",     # For Weights and Biases

        # You MUST put the below items for vision finetuning:
        remove_unused_columns = False,
        dataset_text_field = "",
        dataset_kwargs = {"skip_prepare_dataset": True},
        dataset_num_proc = 4,
        max_seq_length = 2048,
    ),
)
trainer_stats = trainer.train()

After finetuning, you can also do inference:

FastVisionModel.for_inference(model) # Enable for inference!

image = dataset[2]["images"][0]
instruction = "Is there something interesting about this image?"

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

We also support merging QLoRA / LoRA directly into 16bit weights for serving:

# Select ONLY 1 to save! (Both not needed!)

# Save locally to 16bit
if False: model.save_pretrained_merged("unsloth_finetune", tokenizer,)

# To export and save to your Hugging Face account
if False: model.push_to_hub_merged("YOUR_USERNAME/unsloth_finetune", tokenizer, token = "PUT_HERE")

What's Changed

Llama 3.2 by @danielhanchen in #1058
Fix merges by @danielhanchen in #1079
Handle absolute paths for save_to_gguf using pathlib by @giuliabaldini in #1120
Only remove folder in sentencepiece check if it was created by @giuliabaldini in #1121
Gradient Accumulation Fix by @danielhanchen in #1134
Gradient Accumulation Fix by @danielhanchen in #1146
fix: compute_loss bug by @vo1d-ai in #1151
Windows installation guide in README by @timothelaborie in #1165
chore: update chat_templates.py by @eltociear in #1166
Many bug fixes by @danielhanchen in #1162
Fix/patch tokenizer by @Erland366 in #1171
Fix DPO, ORPO by @danielhanchen in #1177
fix/transformers-unpack by @Erland366 in #1180
Fix 4.47 issue by @danielhanchen in #1182
25% less mem and 10% faster training: Do not upcast lm_head and embedding to float32 by @Datta0 in #1186
Cleanup upcast logs by @Datta0 in #1188
Fix/phi-longrope by @Erland366 in #1193
Bug fixes by @danielhanchen in #1195
Fix/casting continue pretraining by @Erland366 in #1200
Feat/all tmp by @danielhanchen in #1219
Bug fixes by @danielhanchen in #1245
Bug fix by @danielhanchen in #1249
Bug fixes by @danielhanchen in #1255
Fix: cast logits to float32 in cross_entropy_forward to prevent errors by @Erland366 in #1254
Throw error when inferencing longer than max_popsition_embeddings by @Datta0 in #1236
CLI now handles user input strings for dtype correctly by @Rabbidon in #1235
Bug fixes by @danielhanchen in #1259
Qwen 2.5 by @danielhanchen in #1280
Fix/export mistral by @Erland366 in #1281
DOC Update - Update README.md with os.environ in example by @udaygirish in #1269
fix/get_chat_template by @Erland366 in #1246
fix/sft-trainer by @Erland366 in #1276
Bug fixes by @danielhanchen in #1288
fix/sfttrainer-compatibility by @Erland366 in #1293

New Contributors

@giuliabaldini made their first contribution in #1120
@vo1d-ai made their first contribution in #1151
@timothelaborie made their first contribution in #1165
@eltociear made their first contribution in #1166
@Erland366 made their first contribution in #1171
@Datta0 made their first contribution in #1186
@Rabbidon made their first contribution in #1235
@udaygirish made their first contribution in #1269

Full Changelog: September-2024...November-2024

Contributors

udaygirish, eltociear, and 7 other contributors

Assets 2

15 Oct 16:48

danielhanchen

October-2024

38663b0

Gradient Accumulation Fix

We fixed a gradient accumulation bug which was actually discovered since 2021 here, and rediscovered here. Read more in our blog post: https://unsloth.ai/blog/gradient

We have a Colab Notebook for Llama 3.2 using the fixed trainer and a Kaggle Notebook as well.

Essentially theoretically bsz * ga should be equivalent to full batch training with no gradient accumulation, but weirdly the training losses do no match up:

We fixed it in Unsloth!

To use Unsloth's fixed trainer with gradient accumulation, use:

from unsloth import unsloth_train
# trainer_stats = trainer.train() << Buggy if using gradient accumulation
trainer_stats = unsloth_train(trainer) # << Fixed gradient accumulation

Please update Unsloth on local machines (no need for Colab / Kaggle) via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Read our blog post: https://unsloth.ai/blog/gradient for more details!

What's Changed

Llama 3.2 by @danielhanchen in #1058
Fix merges by @danielhanchen in #1079
Handle absolute paths for save_to_gguf using pathlib by @giuliabaldini in #1120
Only remove folder in sentencepiece check if it was created by @giuliabaldini in #1121
Gradient Accumulation Fix by @danielhanchen in #1134

New Contributors

@giuliabaldini made their first contribution in #1120

Full Changelog: September-2024...October-2024

Contributors

danielhanchen and giuliabaldini

Assets 2

23 Sep 21:32

danielhanchen

September-2024

fb77505

Qwen 2.5 Support

Qwen 2.5 Support is here!

There are some issues with Qwen 2.5 models which Unsloth has fixed!

Kaggle Base model finetuning notebook: https://www.kaggle.com/code/danielhanchen/kaggle-qwen-2-5-unsloth-notebook/notebook
Kaggle Instruct model finetuning notebook: https://www.kaggle.com/code/danielhanchen/kaggle-qwen-2-5-conversational-unsloth
Colab finetuning notebook: https://colab.research.google.com/drive/1Kose-ucXO1IBaZq5BvbwWieuubP7hxvQ?usp=sharing
Colab conversational notebook: https://colab.research.google.com/drive/1qN1CEalC70EO1wGKhNxs1go1W9So61R5?usp=sharing

EOS token issues

Qwen 2.5 Base models (0.5b all the way until 72b) - EOS token should be <|endoftext|> not <|im_end|>. The base models <|im_end|> is actually untrained, so it'll cause NaN gradients if you use it. You should re-pull the tokenizer from source, or you can download fixed base models from https://huggingface.co/unsloth if that helps.

Chat template issues

Qwen 2.5 Base models should NOT have a chat_template, this will actually cause errors especially in Unsloth's finetuning notebooks, since I check if untrained tokens exist in the chat template to counteract NaN gradients.
Do NOT use Qwen 2.5's chat template for the base models. This will cause NaN gradients!

4bit uploaded models

Qwen 2.5 0.5b 4bit 0.5b Instruct 0.5b 4bit Instruct 0.5b
Qwen 2.5 1.5b 4bit 1.5b Instruct 1.5b 4bit Instruct 1.5b
Qwen 2.5 3b 4bit 3b Instruct 3b 4bit Instruct 3b
Qwen 2.5 7b 4bit 7b Instruct 7b 4bit Instruct 7b
Qwen 2.5 14b 4bit 14b Instruct 14b 4bit Instruct 14b
Qwen 2.5 32b 4bit 32b Instruct 32b 4bit Instruct 32b
Qwen 2.5 72b 4bit 72b Instruct 72b 4bit Instruct 72b

What's Changed

Phi 3.5 by @danielhanchen in #940
Phi 3.5 by @danielhanchen in #941
Fix DPO by @danielhanchen in #947
Phi 3.5 bug fix by @danielhanchen in #955
Cohere, Bug fixes by @danielhanchen in #984
Gemma faster inference by @danielhanchen in #987
Bug fixes by @danielhanchen in #1004
Update README.md by @danielhanchen in #1033
Update README.md by @danielhanchen in #1036
fix: chat_templates.py bug by @NazimHAli in #1048

New Contributors

@NazimHAli made their first contribution in #1048

Full Changelog: August-2024...September-2024

Contributors

danielhanchen and NazimHAli

Assets 2

21 Aug 01:08

danielhanchen

August-2024

be8b3d8

Phi 3.5

Phi 3.5 is here!

Try it out here: https://colab.research.google.com/drive/1lN6hPQveB_mHSnTOYifygFcrO8C1bxq4?usp=sharing

What's Changed

Llama 3.1 by @danielhanchen in #797
Better debugging by @danielhanchen in #826
fix UnboundLocalError by @xyangk in #834
Gemma by @danielhanchen in #843
Fix ROPE extension issue and device mismatch by @xyangk in #840
Fix RoPE extension by @danielhanchen in #846
fix: fix config.torch_dtype bug by @relic-yuexi in #874
pascal support by @emuchogu in #870
Fix tokenizers by @danielhanchen in #887
Torch 2.4, Xformers>0.0.27, TRL>0.9, Python 3.12 + bug fixes by @danielhanchen in #902
Fix DPO stats by @danielhanchen in #906
Fix Chat Templates by @danielhanchen in #916
Fix chat templates by @danielhanchen in #917
Bug Fixes by @danielhanchen in #920
Fix mapping by @danielhanchen in #921
untrained tokens llama 3.1 base by @danielhanchen in #929
Bug #930 by @danielhanchen in #931
Fix NEFTune by @danielhanchen in #937
Update README.md by @danielhanchen in #938

New Contributors

@relic-yuexi made their first contribution in #874
@emuchogu made their first contribution in #870

Full Changelog: July-Mistral-2024...August-2024

Contributors

emuchogu, xyangk, and 2 other contributors

Assets 2

23 Jul 20:42

danielhanchen

July-Llama-2024

d1f3b6c

Llama 3.1 Support

Excited to announce Unsloth makes finetuning Llama 3.1 2.1x faster and use 60% less VRAM! Read up on our release here: https://unsloth.ai/blog/llama3-1

We uploaded a Google Colab notebook to finetune Llama 3.1 (8B) on a free Tesla T4: Llama 3.1 (8B) Notebook. We also have a new UI on Google Colab for chatting with your Llama 3.1 Instruct models which uses our own 2x faster inference engine.

Run UI Preview

We created a new chat UI using Gradio where users can upload and chat with their Llama 3.1 Instruct models online for free on Google Colab.

We uploaded 4bit bitsandbytes quants here: https://huggingface.co/unsloth
To finetune Llama 3.1, please update Unsloth:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Assets 2

19 Jul 15:37

danielhanchen

July-Mistral-2024

568bfdd

July-Mistral-2024

Mistral NeMo, Ollama & CSV support

See https://unsloth.ai/blog/mistral-nemo for more details. 4 bit pre-quantized weights at https://huggingface.co/unsloth

2x faster 60% less VRAM Colab finetuning notebook here and also our Kaggle notebook is here

Export to Ollama & CSV Support

To use, create and customize your chat template with a dataset and Unsloth will automatically export the finetune to Ollama with automatic Modelfile creation. We also created a 'Step-by-Step Tutorial on How to Finetune Llama-3 and Deploy to Ollama'. Check out our Ollama Llama-3 Alpaca and CSV/Excel Ollama Guide notebooks.

Unlike regular chat templates that use 3 columns, Ollama simplifies the process with just 2 columns: instruction and output. And with Ollama, you can save, run, and deploy your finetuned models locally on your own device.

Train on Completions / Inputs

We now support training only on the output tokens and not the inputs, which can increase accuracy. Try it with:

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    ...
    args = TrainingArguments(
        ...
    ),
)
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(trainer)

RoPE Scaling for all models

We now allow you to finetune Gemma 2, Mistral, Mistral NeMo, Qwen2 and more models with “unlimited” context lengths through RoPE linear scaling through Unsloth. Coupled with our 4x longer context support, Unsloth can do extremely long context support!

New Docs!

Introducing our new Documentation site which has all the most important info about Unsloth in one place. If you'd like to contribute, please contact us! Docs: https://docs.unsloth.ai/

Update instructions

Please update Unsloth in local machines (Colab and Kaggle just refresh and reload notebooks) via:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Assets 2

03 Jul 22:02

danielhanchen

July-2024

5ab565f

2x faster Gemma 2

Gemma 2 support

We now support Gemma 2! It's 2x faster and uses 63% less VRAM than HF+FA2!

We have a Gemma 2 9b notebook here: https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing

To use Gemma 2, please update Unsloth:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Head over to our blog post: https://unsloth.ai/blog/gemma2 for more details.

We uploaded 4bit quants for 4x faster downloading to:

https://huggingface.co/unsloth/gemma-2-9b-bnb-4bit

https://huggingface.co/unsloth/gemma-2-27b-bnb-4bit

https://huggingface.co/unsloth/gemma-2-9b-it-bnb-4bit

https://huggingface.co/unsloth/gemma-2-27b-it-bnb-4bit

Continued pretraining

You can now do continued pretraining with Unsloth. See https://unsloth.ai/blog/contpretraining for more details!

Continued pretraining is 2x faster and uses 50% less VRAM than HF + FA2 QLoRA. We offload embed_tokens and lm_head to disk to save VRAM!

You can now simply use both in the target modules like below:

model = FastLanguageModel.get_peft_model(
    model,
    r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",
                      "embed_tokens", "lm_head",], # Add for continual pretraining
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

We also allow 2 learning rates - one for the embedding matrices and another for the LoRA adapters:

from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments

trainer = UnslothTrainer(
    args = UnslothTrainingArguments(
        ....
        learning_rate = 5e-5,
        embedding_learning_rate = 5e-6,
    ),
)

We also share a free Colab to finetune Mistral v3 to learn Korean (you can select any language you like) using Wikipedia and the Aya Dataset: https://colab.research.google.com/drive/1tEd1FrOXWMnCU9UIvdYhs61tkxdMuKZu?usp=sharing

And we're sharing our free Colab notebook for continued pretraining for text completion: https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing

What's Changed

Ollama Chat Templates by @danielhanchen in #582
Fix case where GGUF saving fails when model_dtype is torch.float16 ("f16") by @chrehall68 in #630
Support revision parameter in FastLanguageModel.from_pretrained by @chrehall68 in #629
clears any selected_adapters before calling internal_model.save_pretr… by @neph1 in #609
Check for incompatible modules before importing unsloth by @xyangk in #602
Fix #603 handling of formatting_func in tokenizer_utils for assitant/chat/completion training by @Oseltamivir in #604
Add GGML saving option to Unsloth for easier Ollama model creation and testing. by @mahiatlinux in #345
Add Documentation for LoraConfig Parameters by @sebdg in #619
llama.cpp failing by @bet0x in #371
fix libcuda_dirs import for triton 3.0 by @t-vi in #227
Nightly by @danielhanchen in #632
README: Fix minor typo. by @shaper in #559
Qwen bug fixes by @danielhanchen in #639
Fix segfaults by @danielhanchen in #641
Nightly by @danielhanchen in #646
Nightly by @danielhanchen in #648
Nightly by @danielhanchen in #649
Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf by @ArcadaLabs-Jason in #651
Revert "Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf" by @danielhanchen in #652
Revert "Revert "Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf"" by @danielhanchen in #653
Fix GGUF by @danielhanchen in #654
Fix continuing LoRA finetuning by @danielhanchen in #656

New Contributors

@chrehall68 made their first contribution in #630
@neph1 made their first contribution in #609
@xyangk made their first contribution in #602
@Oseltamivir made their first contribution in #604
@mahiatlinux made their first contribution in #345
@sebdg made their first contribution in #619
@bet0x made their first contribution in #371
@t-vi made their first contribution in #227
@shaper made their first contribution in #559
@ArcadaLabs-Jason made their first contribution in #651

Full Changelog: https://github.com/unslothai/unsloth/commits/June-2024

Contributors

shaper, bet0x, and 9 other contributors

Assets 2

Releases: unslothai/unsloth

Phi-4 & Bug Fixes

Bug Fixes

New features

Update instructions

What's Changed

New Contributors

Contributors

Llama 3.3 + Dynamic 4bit Quants

Experiments

Merging to 16bit now works as expected.

Llama.cpp GGUF saving now uses cmake.

Apple Cut Cross Entropy

QwQ 4bit quants and GGUFs

What's Changed

Contributors

Vision finetuning

What's Changed

New Contributors

Contributors

Gradient Accumulation Fix

What's Changed

New Contributors

Contributors

Qwen 2.5 Support

Qwen 2.5 Support is here!

EOS token issues

Chat template issues

4bit uploaded models

What's Changed

New Contributors

Contributors

Phi 3.5

Phi 3.5 is here!

What's Changed

New Contributors

Contributors

Llama 3.1 Support

Llama 3.1 Support

Run UI Preview

July-Mistral-2024

Mistral NeMo, Ollama & CSV support

Export to Ollama & CSV Support

Train on Completions / Inputs

RoPE Scaling for all models

New Docs!

Update instructions

2x faster Gemma 2

Gemma 2 support

Continued pretraining

What's Changed

New Contributors

Contributors

Llama.cpp GGUF saving now uses `cmake`.