Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLMO + RL #424

Draft
wants to merge 48 commits into
base: olmo_again
Choose a base branch
from
Draft

OLMO + RL #424

wants to merge 48 commits into from

Conversation

vwxyzjn
Copy link
Collaborator

@vwxyzjn vwxyzjn commented Nov 8, 2024

I put the code here. To reproduce my work, pip install ai2_olmo and run

for beta in 0.05
do
for lr in 3e-7
do
python mason.py \
    --cluster ai2/augusta-google-1 --image nathanl/open_instruct_auto --pure_docker_mode \
    --workspace ai2/tulu-3-dev \
    --priority high \
    --preemptible \
    --num_nodes 1 \
    --image costah/open_instruct_ppo_ray_olmo \
    --budget ai2/allennlp \
    --gpus 8 --  pip install --upgrade transformers \&\& python open_instruct/ppo_vllm_thread_ray_gtrl_olmo.py \
    --exp_name "ppo_olmo_rm_init_one_epoch_beta_${beta}_lr_${lr}" \
    --beta $beta \
    --learning_rate $lr \
    --dataset_mixer "{\"ai2-adapt-dev/gsm8k_ground_truth\": 1.0}" \
    --dataset_train_splits train \
    --dataset_eval_mixer "{\"ai2-adapt-dev/gsm8k_math_ground_truth\": 1.0}" \
    --dataset_eval_splits test \
    --max_token_length 2048 \
    --max_prompt_token_length 2048 \
    --response_length 1024 \
    --model_name_or_path allenai/open_instruct_dev \
    --model_revision olmo_7b_soup_anneal_v3.9_4_DPO___model__42__1730863426 \
    --reward_model_path allenai/open_instruct_dev \
    --reward_model_revision reward_modeling__1__1730930663 \
    --non_stop_penalty \
    --stop_token eos \
    --temperature 1.0 \
    --ground_truths_key ground_truth \
    --chat_template tulu \
    --sft_messages_key messages \
    --total_episodes 200000 \
    --penalty_reward_value -10.0 \
    --deepspeed_stage 3 \
    --per_device_train_batch_size 4 \
    --local_rollout_forward_batch_size 8 \
    --local_mini_batch_size 32 \
    --local_rollout_batch_size 32 \
    --actor_num_gpus_per_node 7 \
    --vllm_tensor_parallel_size 1 \
    --num_epochs 1 \
    --apply_verifiable_reward true \
    --output_dir /output \
    --seed 3 \
    --num_evals 3 \
    --reward_model_multiplier 0.0 \
    --no_try_launch_beaker_eval_jobs \
    --gradient_checkpointing \
    --with_tracking
done
done

hamishivi and others added 30 commits October 1, 2024 09:31
* Prototype ppo + ray

* reduce gradient

* push

* push changes

* quick push

* cache changes; this actually works with 6 nodes

* push changes

* psuh changes

* push the latest change

* push changes

* Fix uploading

* Make style

* style and quality

* update docs

* update mason.py

* log wandb tables

* update docs

* make style quality

* make sure to save the right thing

* push changes

* push

* push

* push changes

* push

* push

* fix

* remove preemption code

* fix

* push changes

* push

* quick fix

* quick push
* add ability to use alternate image

* rever
@vwxyzjn vwxyzjn mentioned this pull request Nov 8, 2024
@vwxyzjn vwxyzjn changed the base branch from main to olmo_again November 8, 2024 17:35
pdasigi and others added 17 commits November 10, 2024 16:49
* unseen evals

* default value

* conditional

* flag for unseen evals
* safety evals and parquet files

* import Dataset
* update data dist plots

* nit

* smooth operation

* updates

* nits

* clean git

* cleaning for final SFT version
* oe eval priority

* up

---------

Co-authored-by: Nathan Lambert <[email protected]>
* fix and add script

* update

* fix copilot typo
* Reorganize the data preparation scripts for tulu v1 and v2.

* Minor improvement

* Remove open_platypus_commercial subset from Daring-Anteater

* Use hard-coded examples repo.

* Fix some bugs.

* Add OpenMathInstruct.

* Add a few more v3.5.x SFT mix ablations for the cleaner datasets.

* More experiments on mixes.

* help merge

* prep for merge

* reapply changes

* fix naming

---------

Co-authored-by: Nathan Lambert <[email protected]>
* Use vllm for all evaluations

* Do not use VLLM only for MMLU and TruthfulQA
* Quick change

* weight converter
* Support weka eval

* quick fix
* Quick change

* weight converter

* Add olmo1124 converter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants