Skip to content

Commit

Permalink
Docs: fix GaLore optimizer code example (#32249)
Browse files Browse the repository at this point in the history
Docs: fix GaLore optimizer example

Fix incorrect usage of GaLore optimizer in Transformers trainer code example.

The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in #29588.

Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in #29588 (comment). This pull request fixes this issue.
  • Loading branch information
gil2rok authored Jul 30, 2024
1 parent f0bc49e commit 3e8106d
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions docs/source/en/trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ args = TrainingArguments(
max_steps=100,
per_device_train_batch_size=2,
optim="galore_adamw",
optim_target_modules=["attn", "mlp"]
optim_target_modules=[r".*.attn.*", r".*.mlp.*"]
)

model_id = "google/gemma-2b"
Expand Down Expand Up @@ -315,7 +315,7 @@ args = TrainingArguments(
max_steps=100,
per_device_train_batch_size=2,
optim="galore_adamw",
optim_target_modules=["attn", "mlp"],
optim_target_modules=[r".*.attn.*", r".*.mlp.*"],
optim_args="rank=64, update_proj_gap=100, scale=0.10",
)

Expand Down Expand Up @@ -359,7 +359,7 @@ args = TrainingArguments(
max_steps=100,
per_device_train_batch_size=2,
optim="galore_adamw_layerwise",
optim_target_modules=["attn", "mlp"]
optim_target_modules=[r".*.attn.*", r".*.mlp.*"]
)

model_id = "google/gemma-2b"
Expand Down

0 comments on commit 3e8106d

Please sign in to comment.