-
Notifications
You must be signed in to change notification settings - Fork 172
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[misc] fix reward model issue with TokenClassification model and supp…
…ort running particular steps instead of epochs (#99) * support user specify training steps * fix typo * update ci * add ci * fix reward model and write more ci script * update ci * lint * align * delete post training val * fix script
- Loading branch information
Showing
12 changed files
with
267 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
name: e2e_gsm8k | ||
|
||
on: | ||
# Trigger the workflow on push or pull request, | ||
# but only for the main branch | ||
push: | ||
branches: | ||
- main | ||
paths: | ||
- "**/*.py" | ||
- .github/workflows/e2e_gsm8k.yml | ||
pull_request: | ||
branches: | ||
- main | ||
paths: | ||
- "**/*.py" | ||
- .github/workflows/e2e_gsm8k.yml | ||
|
||
jobs: | ||
e2e_gsm8k: | ||
runs-on: [self-hosted, l20-1] | ||
env: | ||
HTTP_PROXY: ${{ secrets.PROXY_HTTP }} | ||
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }} | ||
NO_PROXY: "localhost,127.0.0.1" | ||
HF_HUB_ENABLE_HF_TRANSFER: 1 | ||
container: | ||
image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3 | ||
options: --gpus all --shm-size=10g | ||
steps: | ||
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 | ||
with: | ||
fetch-depth: 0 | ||
- name: Install the current repository | ||
run: | | ||
pip3 install hf_transfer | ||
pip3 install -e .[test] | ||
- name: Prepare gsm8k dataset | ||
run: | | ||
python3 examples/data_preprocess/gsm8k.py | ||
- name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm | ||
run: | | ||
bash tests/e2e/run_qwen_gsm8k_function_rm.sh | ||
- name: Running gsm8k e2e without rmpad using function rm | ||
run: | | ||
bash tests/e2e/run_qwen_gsm8k_function_rm_no_rmpad.sh | ||
- name: Running gsm8k e2e with rmpad using model rm | ||
run: | | ||
bash tests/e2e/run_qwen_gsm8k_model_rm.sh | ||
- name: Running gsm8k e2e without rmpad using model rm | ||
run: | | ||
bash tests/e2e/run_qwen_gsm8k_model_rm_no_rmpad.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
set -x | ||
|
||
python3 -m verl.trainer.main_ppo \ | ||
data.train_files=$HOME/data/gsm8k/train.parquet \ | ||
data.val_files=$HOME/data/gsm8k/test.parquet \ | ||
data.train_batch_size=1024 \ | ||
data.val_batch_size=1312 \ | ||
data.max_prompt_length=512 \ | ||
data.max_response_length=512 \ | ||
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B \ | ||
actor_rollout_ref.actor.optim.lr=1e-6 \ | ||
actor_rollout_ref.model.use_remove_padding=True \ | ||
actor_rollout_ref.actor.ppo_mini_batch_size=256 \ | ||
actor_rollout_ref.actor.ppo_micro_batch_size=32 \ | ||
actor_rollout_ref.actor.fsdp_config.param_offload=False \ | ||
actor_rollout_ref.actor.fsdp_config.grad_offload=False \ | ||
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ | ||
actor_rollout_ref.rollout.log_prob_micro_batch_size=128 \ | ||
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ | ||
actor_rollout_ref.rollout.name=vllm \ | ||
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ | ||
actor_rollout_ref.ref.log_prob_micro_batch_size=128 \ | ||
actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
critic.optim.lr=1e-5 \ | ||
critic.model.use_remove_padding=True \ | ||
critic.model.path=Qwen/Qwen2.5-0.5B \ | ||
critic.model.enable_gradient_checkpointing=False \ | ||
critic.ppo_micro_batch_size=32 \ | ||
critic.model.fsdp_config.param_offload=False \ | ||
critic.model.fsdp_config.grad_offload=False \ | ||
critic.model.fsdp_config.optimizer_offload=False \ | ||
algorithm.kl_ctrl.kl_coef=0.001 \ | ||
trainer.critic_warmup=0 \ | ||
trainer.logger=['console'] \ | ||
trainer.project_name='verl_example_gsm8k' \ | ||
trainer.experiment_name='qwen_e2e_ci_function_rm' \ | ||
trainer.n_gpus_per_node=8 \ | ||
trainer.nnodes=1 \ | ||
trainer.save_freq=-1 \ | ||
trainer.total_training_steps=1 $@ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
set -x | ||
|
||
python3 -m verl.trainer.main_ppo \ | ||
data.train_files=$HOME/data/gsm8k/train.parquet \ | ||
data.val_files=$HOME/data/gsm8k/test.parquet \ | ||
data.train_batch_size=1024 \ | ||
data.val_batch_size=1312 \ | ||
data.max_prompt_length=512 \ | ||
data.max_response_length=512 \ | ||
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B \ | ||
actor_rollout_ref.actor.optim.lr=1e-6 \ | ||
actor_rollout_ref.model.use_remove_padding=False \ | ||
actor_rollout_ref.actor.ppo_mini_batch_size=256 \ | ||
actor_rollout_ref.actor.ppo_micro_batch_size=32 \ | ||
actor_rollout_ref.actor.fsdp_config.param_offload=False \ | ||
actor_rollout_ref.actor.fsdp_config.grad_offload=False \ | ||
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ | ||
actor_rollout_ref.rollout.log_prob_micro_batch_size=128 \ | ||
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ | ||
actor_rollout_ref.rollout.name=vllm \ | ||
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ | ||
actor_rollout_ref.ref.log_prob_micro_batch_size=128 \ | ||
actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
critic.optim.lr=1e-5 \ | ||
critic.model.use_remove_padding=False \ | ||
critic.model.path=Qwen/Qwen2.5-0.5B \ | ||
critic.model.enable_gradient_checkpointing=False \ | ||
critic.ppo_micro_batch_size=32 \ | ||
critic.model.fsdp_config.param_offload=False \ | ||
critic.model.fsdp_config.grad_offload=False \ | ||
critic.model.fsdp_config.optimizer_offload=False \ | ||
algorithm.kl_ctrl.kl_coef=0.001 \ | ||
trainer.critic_warmup=0 \ | ||
trainer.logger=['console'] \ | ||
trainer.project_name='verl_example_gsm8k' \ | ||
trainer.experiment_name='qwen_e2e_ci_function_rm' \ | ||
trainer.n_gpus_per_node=8 \ | ||
trainer.nnodes=1 \ | ||
trainer.save_freq=-1 \ | ||
trainer.total_training_steps=1 $@ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
set -x | ||
|
||
python3 -m verl.trainer.main_ppo \ | ||
data.train_files=$HOME/data/gsm8k/train.parquet \ | ||
data.val_files=$HOME/data/gsm8k/test.parquet \ | ||
data.train_batch_size=1024 \ | ||
data.val_batch_size=1312 \ | ||
data.max_prompt_length=512 \ | ||
data.max_response_length=512 \ | ||
data.return_raw_chat=True \ | ||
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B \ | ||
actor_rollout_ref.actor.optim.lr=1e-6 \ | ||
actor_rollout_ref.model.use_remove_padding=True \ | ||
actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.1 \ | ||
actor_rollout_ref.actor.ppo_mini_batch_size=256 \ | ||
actor_rollout_ref.actor.ppo_micro_batch_size=32 \ | ||
actor_rollout_ref.actor.fsdp_config.param_offload=False \ | ||
actor_rollout_ref.actor.fsdp_config.grad_offload=False \ | ||
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ | ||
actor_rollout_ref.rollout.log_prob_micro_batch_size=128 \ | ||
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ | ||
actor_rollout_ref.rollout.name=vllm \ | ||
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ | ||
actor_rollout_ref.ref.log_prob_micro_batch_size=128 \ | ||
actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
critic.optim.lr=1e-5 \ | ||
critic.model.use_remove_padding=True \ | ||
critic.optim.lr_warmup_steps_ratio=0.05 \ | ||
critic.model.path=Qwen/Qwen2.5-0.5B \ | ||
critic.model.enable_gradient_checkpointing=False \ | ||
critic.ppo_micro_batch_size=32 \ | ||
critic.model.fsdp_config.param_offload=False \ | ||
critic.model.fsdp_config.grad_offload=False \ | ||
critic.model.fsdp_config.optimizer_offload=False \ | ||
reward_model.enable=True \ | ||
reward_model.model.path=Qwen/Qwen2.5-0.5B\ | ||
reward_model.model.use_remove_padding=True \ | ||
reward_model.model.fsdp_config.param_offload=True \ | ||
reward_model.micro_batch_size=16 \ | ||
algorithm.kl_ctrl.kl_coef=0.001 \ | ||
trainer.critic_warmup=0 \ | ||
trainer.logger=['console'] \ | ||
trainer.project_name='verl_example' \ | ||
trainer.experiment_name='Qwen2.5-0.5B-ci_hybrid_rm' \ | ||
trainer.n_gpus_per_node=8 \ | ||
trainer.nnodes=1 \ | ||
trainer.save_freq=-1 \ | ||
trainer.total_training_steps=1 $@ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
set -x | ||
|
||
python3 -m verl.trainer.main_ppo \ | ||
data.train_files=$HOME/data/gsm8k/train.parquet \ | ||
data.val_files=$HOME/data/gsm8k/test.parquet \ | ||
data.train_batch_size=1024 \ | ||
data.val_batch_size=1312 \ | ||
data.max_prompt_length=512 \ | ||
data.max_response_length=512 \ | ||
data.return_raw_chat=True \ | ||
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B \ | ||
actor_rollout_ref.actor.optim.lr=1e-6 \ | ||
actor_rollout_ref.model.use_remove_padding=False \ | ||
actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.1 \ | ||
actor_rollout_ref.actor.ppo_mini_batch_size=256 \ | ||
actor_rollout_ref.actor.ppo_micro_batch_size=32 \ | ||
actor_rollout_ref.actor.fsdp_config.param_offload=False \ | ||
actor_rollout_ref.actor.fsdp_config.grad_offload=False \ | ||
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ | ||
actor_rollout_ref.rollout.log_prob_micro_batch_size=128 \ | ||
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ | ||
actor_rollout_ref.rollout.name=vllm \ | ||
actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \ | ||
actor_rollout_ref.ref.log_prob_micro_batch_size=128 \ | ||
actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
critic.optim.lr=1e-5 \ | ||
critic.model.use_remove_padding=False \ | ||
critic.optim.lr_warmup_steps_ratio=0.05 \ | ||
critic.model.path=Qwen/Qwen2.5-0.5B \ | ||
critic.model.enable_gradient_checkpointing=False \ | ||
critic.ppo_micro_batch_size=32 \ | ||
critic.model.fsdp_config.param_offload=False \ | ||
critic.model.fsdp_config.grad_offload=False \ | ||
critic.model.fsdp_config.optimizer_offload=False \ | ||
reward_model.enable=True \ | ||
reward_model.model.path=Qwen/Qwen2.5-0.5B\ | ||
reward_model.model.use_remove_padding=False \ | ||
reward_model.model.fsdp_config.param_offload=True \ | ||
reward_model.micro_batch_size=16 \ | ||
algorithm.kl_ctrl.kl_coef=0.001 \ | ||
trainer.critic_warmup=0 \ | ||
trainer.logger=['console'] \ | ||
trainer.project_name='verl_example' \ | ||
trainer.experiment_name='Qwen2.5-0.5B-ci_hybrid_rm' \ | ||
trainer.n_gpus_per_node=8 \ | ||
trainer.nnodes=1 \ | ||
trainer.save_freq=-1 \ | ||
trainer.total_training_steps=1 $@ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.