OLMO + RL #424

vwxyzjn · 2024-11-08T17:35:13Z

I put the code here. To reproduce my work, pip install ai2_olmo and run

for beta in 0.05
do
for lr in 3e-7
do
python mason.py \
    --cluster ai2/augusta-google-1 --image nathanl/open_instruct_auto --pure_docker_mode \
    --workspace ai2/tulu-3-dev \
    --priority high \
    --preemptible \
    --num_nodes 1 \
    --image costah/open_instruct_ppo_ray_olmo \
    --budget ai2/allennlp \
    --gpus 8 --  pip install --upgrade transformers \&\& python open_instruct/ppo_vllm_thread_ray_gtrl_olmo.py \
    --exp_name "ppo_olmo_rm_init_one_epoch_beta_${beta}_lr_${lr}" \
    --beta $beta \
    --learning_rate $lr \
    --dataset_mixer "{\"ai2-adapt-dev/gsm8k_ground_truth\": 1.0}" \
    --dataset_train_splits train \
    --dataset_eval_mixer "{\"ai2-adapt-dev/gsm8k_math_ground_truth\": 1.0}" \
    --dataset_eval_splits test \
    --max_token_length 2048 \
    --max_prompt_token_length 2048 \
    --response_length 1024 \
    --model_name_or_path allenai/open_instruct_dev \
    --model_revision olmo_7b_soup_anneal_v3.9_4_DPO___model__42__1730863426 \
    --reward_model_path allenai/open_instruct_dev \
    --reward_model_revision reward_modeling__1__1730930663 \
    --non_stop_penalty \
    --stop_token eos \
    --temperature 1.0 \
    --ground_truths_key ground_truth \
    --chat_template tulu \
    --sft_messages_key messages \
    --total_episodes 200000 \
    --penalty_reward_value -10.0 \
    --deepspeed_stage 3 \
    --per_device_train_batch_size 4 \
    --local_rollout_forward_batch_size 8 \
    --local_mini_batch_size 32 \
    --local_rollout_batch_size 32 \
    --actor_num_gpus_per_node 7 \
    --vllm_tensor_parallel_size 1 \
    --num_epochs 1 \
    --apply_verifiable_reward true \
    --output_dir /output \
    --seed 3 \
    --num_evals 3 \
    --reward_model_multiplier 0.0 \
    --no_try_launch_beaker_eval_jobs \
    --gradient_checkpointing \
    --with_tracking
done
done

…oading

* Prototype ppo + ray * reduce gradient * push * push changes * quick push * cache changes; this actually works with 6 nodes * push changes * psuh changes * push the latest change * push changes * Fix uploading * Make style * style and quality * update docs * update mason.py * log wandb tables * update docs * make style quality * make sure to save the right thing * push changes * push * push * push changes * push * push * fix * remove preemption code * fix * push changes * push * quick fix * quick push

* add ability to use alternate image * rever

* final configs * update

* unseen evals * default value * conditional * flag for unseen evals

* safety evals and parquet files * import Dataset

* update data dist plots * nit * smooth operation * updates * nits * clean git * cleaning for final SFT version

* oe eval priority * up --------- Co-authored-by: Nathan Lambert <[email protected]>

* fix and add script * update * fix copilot typo

* Reorganize the data preparation scripts for tulu v1 and v2. * Minor improvement * Remove open_platypus_commercial subset from Daring-Anteater * Use hard-coded examples repo. * Fix some bugs. * Add OpenMathInstruct. * Add a few more v3.5.x SFT mix ablations for the cleaner datasets. * More experiments on mixes. * help merge * prep for merge * reapply changes * fix naming --------- Co-authored-by: Nathan Lambert <[email protected]>

* Use vllm for all evaluations * Do not use VLLM only for MMLU and TruthfulQA

* Quick change * weight converter

* Support weka eval * quick fix

* Quick change * weight converter * Add olmo1124 converter

hamishivi and others added 30 commits October 1, 2024 09:31

first pass

b444e80

fix spelling, ground truth stuff

f0569a3

fix misspelling

8e0f517

count verifieds and intermediate saving

6eebf7f

save intermediate steps

f9a0b3c

small fix to logging

bad1933

fix bug for forward rollout batching

028315d

support gsm8k and math, more flexibility in future

faa7dc0

add costas plo thing

5970243

add numina math

f8fb8eb

remove plo, add value model rand init, first stab at rephrase model l…

eda4849

…oading

math strict verify

b709f37

Merge branch 'main' into verifiable-rewards

a328946

ifeval code

51f0b2a

ifeval debug

79ec960

incorporate val fixes

b1b47bf

data fixed, remove skips

d61038e

Merge branch 'main' into verifiable-rewards

b9de634

add weka save override

f6a2b75

add multinode ray file

b59659a

lint and fix

36a2ed4

first stab at flan

527c51f

eval on intermediate checkpoints (#414)

f037460

Merge branch 'main' into verifiable-rewards

ed18615

Fix dataset mixing logic (#415)

bdc3fa6

quick change

63a4449

Add ability to use alternate image for safety eval (#422)

2bc1772

* add ability to use alternate image * rever

Adding final nc configs for v3.9 (#416)

de33290

* final configs * update

Merge branch 'olmo_again' into rlolmo

3cfc9e2

update OLMo code

ebdf456

vwxyzjn mentioned this pull request Nov 8, 2024

Olmo rl #417

Closed

vwxyzjn changed the base branch from main to olmo_again November 8, 2024 17:35

pdasigi and others added 17 commits November 10, 2024 16:49

Unseen evals n the oe-eval.sh script (#425)

ef2f3ec

* unseen evals * default value * conditional * flag for unseen evals

Minor additions to the decontamination script (#421)

404d933

* safety evals and parquet files * import Dataset

update data dist plots (#410)

863b808

* update data dist plots * nit * smooth operation * updates * nits * clean git * cleaning for final SFT version

push changes

3422229

Ability to set oe-eval priority (#423)

1b44d61

* oe eval priority * up --------- Co-authored-by: Nathan Lambert <[email protected]>

Last fix for unseen evals. (#426)

8de53e6

* fix and add script * update * fix copilot typo

Use vllm for MMLU Pro (#428)

dd16008

* Use vllm for all evaluations * Do not use VLLM only for MMLU and TruthfulQA

Olmo1124ForCausalLM config. (#432)

b17443e

* Quick change * weight converter

Support weka evaluation oe eval (#435)

fe2817d

* Support weka eval * quick fix

Olmo1124converter (#434)

7fcbcfa

* Quick change * weight converter * Add olmo1124 converter

mmlu cot added (#429)

db4c0a1

Support for olmo1124 eval (#436)

27a9b9d

Mount oe-training weka bucket (#437)

d8bc8dc

Add files via upload (#438)

90b821c

push changes

7905e63

Merge branch 'main' into rlolmo

918b701

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OLMO + RL #424

OLMO + RL #424

vwxyzjn commented Nov 8, 2024 •

edited

Loading

OLMO + RL #424

Are you sure you want to change the base?

OLMO + RL #424

Conversation

vwxyzjn commented Nov 8, 2024 • edited Loading

vwxyzjn commented Nov 8, 2024 •

edited

Loading