TRL upgrade #2307

winglian · 2025-02-03T02:43:21Z

wip towards adding support for GRPO

src/axolotl/core/trainers/grpo/__init__.py

SalmanMohammadi · 2025-02-06T10:34:27Z

src/axolotl/utils/config/models/input/v0_4_1/trl.py

+from pydantic import BaseModel, Field
+
+
+class TrlConfig(BaseModel):


This looks great : )

clean path and add mounts handle mounting

…ith 2.4.1

abodacs · 2025-02-10T14:44:40Z

src/axolotl/core/trainers/grpo/trainer.py

+                    texts,
+                    return_tensors="pt",
+                    padding=True,
+                    padding_side="right",


why there padding_side mismatched between reward_inputs and prompt_inputs?

this was a copy/paste from upstream trl trainer as I needed this PR to land to simplify our end. https://github.com/huggingface/trl/pull/2817/files I've removed this method from this class now that the referenced PR is merged.

SalmanMohammadi · 2025-02-10T15:45:08Z

src/axolotl/utils/trainer.py

@@ -576,7 +576,7 @@ def prepare_opinionated_env(cfg):
 def setup_trainer(
    cfg, train_dataset, eval_dataset, model, tokenizer, processor, total_num_steps
 ):
-    if cfg.rl in ("dpo", "ipo", "orpo", "kto", "simpo"):
+    if cfg.rl in ("dpo", "grpo", "ipo", "orpo", "kto", "simpo"):


I think we can just check for if cfg.rl here right?

SalmanMohammadi reviewed Feb 4, 2025

View reviewed changes

src/axolotl/core/trainers/grpo/__init__.py Outdated Show resolved Hide resolved

winglian force-pushed the grpo branch from b3fca89 to 753146b Compare February 4, 2025 16:06

SalmanMohammadi reviewed Feb 6, 2025

View reviewed changes

winglian marked this pull request as ready for review February 7, 2025 13:22

winglian force-pushed the grpo branch from 189a3e4 to 1222ca4 Compare February 7, 2025 13:26

winglian and others added 25 commits February 7, 2025 21:35

upgrade trl to 0.14.0

5b1b67c

refactor dpo trainer into own module

2b79e54

respect dotenv for cli

2936caf

refactor a bit for better grpo support

fffb763

passthrough dataset parser for dpo/grpo

48ec475

honor skip prepare for rl

681906f

support custom module prompt strategy for rl

a2ef745

collator for grpo and prompt loader

b966eb5

use correct builder

2beb09f

load the class from strat

1613676

make it a dataclass

c78b125

order matters

b156d46

be nice with self.cfg.dataset_processes

43137d7

remove ununsable args kwargs

846eaa9

more fixes to get grpo working

202314f

add support for passing map kwargs to dataset map in rl

441add5

don't shrink embeddings unless told to

336ee8c

bump pydantic to support vllm

167a8e6

add support for num_generations

f6819b7

fix dpo config and add use_logits_to_keep

b11deac

fix failure case in prompter loading

976c5fb

fix config cls

de09b7e

max_length moved to reward config

1143023

adding reward fn verification

63d5434

adding 'reward_processing_classes'

452ed5f

winglian added 21 commits February 7, 2025 21:35

seperately include max_completion_len

ee3b699

use cfg.max_completion_length, not sequence_len

419dfdc

refactor cfg.grpo_* to use cfg.trl.*

56b7ea2

set default on trl config

81f238f

include vllm in build

f2319e2

check for src axolotl in PYTHONPATH before removing it

2904690

fix num_processes in passing to accelerate

b5a9ff7

make sure to handle num-processes with cloud

b802073

make sure to pass kwargs when using accelerate

370d548

test not deleting pythonpath for custom code bundling

529fd4c

clean path and add mounts handle mounting

cleanup pythonpath if axo in it

f12f733

don't set total num steps for grpo

04edc99

set max steps to -1 when empty

9a6995d

make sure to set num train epochs for rl

1048deb

chore: lint

4241c2a

set vllm so it's compatible with older pytorch

5dbb6fd

fix for UnboundLocalError when reward not valid

f5053a1

fix comparator

faa9474

only pre-install vllm on 2.5.1 since there is no package compatible w…

b5a973c

…ith 2.4.1

loosen xformers for vllm support

fada21e

support log completions

f16e861

winglian force-pushed the grpo branch from 9b05aac to f16e861 Compare February 8, 2025 02:35

get state dict for lora models

ceb2004

abodacs reviewed Feb 10, 2025

View reviewed changes

nit: trlconfig

dc52c3b

SalmanMohammadi reviewed Feb 10, 2025

View reviewed changes

winglian and others added 4 commits February 10, 2025 10:49

simplify method override now that upstream has refactored

1d457d1

updating num_procceses

1fe2917

Merge branch 'grpo' of github.com:axolotl-ai-cloud/axolotl into grpo

4f1315c

updating num_procceses

2e2b876

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRL upgrade #2307

TRL upgrade #2307

winglian commented Feb 3, 2025

SalmanMohammadi Feb 6, 2025

abodacs Feb 10, 2025

winglian Feb 10, 2025

SalmanMohammadi Feb 10, 2025

		from pydantic import BaseModel, Field


		class TrlConfig(BaseModel):

TRL upgrade #2307

Are you sure you want to change the base?

TRL upgrade #2307

Conversation

winglian commented Feb 3, 2025

SalmanMohammadi Feb 6, 2025

Choose a reason for hiding this comment

abodacs Feb 10, 2025

Choose a reason for hiding this comment

winglian Feb 10, 2025

Choose a reason for hiding this comment

SalmanMohammadi Feb 10, 2025

Choose a reason for hiding this comment