-
Notifications
You must be signed in to change notification settings - Fork 444
OLMo 2 Retrofit #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
OLMo 2 Retrofit #895
Changes from all commits
77f91c3
13c057b
ee61222
1423264
ed2ec83
a663287
abe3902
7d74b69
e5002cc
8dcf71c
2ea2e37
f2d6e97
93b88e2
667963b
e2925ec
18e3f7c
5820756
512651d
ebcac11
6cbafa3
1e5e1f9
24c8dc8
77048d2
cb87b45
d0b6bfc
6cac122
db7e3d4
3ed8657
0c432cd
782337c
1c609b8
20354e3
ef9e855
bd28584
e43aa8e
5aef3bd
1bea281
ce3fec0
ee243ef
755ac15
782ac53
ad89b37
54ed043
d9dd800
54c9a39
551f58c
4b3dd54
3f21704
5455f64
f188425
7177b0a
e00ef62
cfc9b8d
4e0082d
04ee33d
7e564d1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -159,3 +159,4 @@ dmypy.json | |
cache/ | ||
local_dataset_cache/ | ||
scratch/ | ||
vllm_olmo2.5/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,10 +127,11 @@ fi | |
# Set wandb run path to upload to wandb if available | ||
WANDB_ARG="" | ||
if [[ -n "$WANDB_RUN_PATH" ]]; then | ||
beaker_user=$(beaker account whoami --format json | jq -r '.[0].name') | ||
beaker_user=$(beaker account whoami --format text | awk 'NR==2 {print $2}') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Username Extraction Reliability IssueThe |
||
echo "Using WANDB_API_KEY from ${beaker_user}" | ||
if ! beaker secret list --workspace ai2/tulu-3-results | grep -q "${beaker_user}_WANDB_API_KEY"; then | ||
echo "WARNING: No ${beaker_user}_WANDB_API_KEY secret found in workspace ai2/tulu-3-results." | ||
echo "add your WANDB_API_KEY as a secret to this workspace in order to use --oe_eval_log_to_wandb" | ||
echo "add your WANDB_API_KEY as a secret to this workspace in order to log oe-eval results to wandb" | ||
else | ||
WANDB_ARG=" --wandb-run-path $WANDB_RUN_PATH --gantry-secret-wandb-api-key ${beaker_user}_WANDB_API_KEY" | ||
fi | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
#!/bin/bash | ||
|
||
# OLMo 2.5 model | ||
MODEL_NAME_OR_PATH="/weka/oe-eval-default/ai2-llm/checkpoints/lucas/olmo25_7b_lc_64k_6T_M100B_round5-sparkle_6634-pre_s2pdf_gzip2080_cweN-yake-all-olmo_yarn-fullonly_50B-740666e3/step11921-hf" | ||
GS_MODEL_NAME="olmo25_7b_lc_beta_740666e3" | ||
|
||
# english only DAPO | ||
DATASETS="mnoukhov/DAPO-Math-14k-Processed-RLVR 1.0" | ||
# DATASETS="TTTXXX01/MATH_3000_Filtered 1.0" | ||
|
||
# math evals | ||
EVALS="minerva_math::hamish_zs_reasoning_deepseek,minerva_math_500::hamish_zs_reasoning_deepseek,aime:zs_cot_r1::pass_at_32_2024_deepseek,aime:zs_cot_r1::pass_at_32_2025_deepseek" | ||
|
||
# AIME 2024, 2025 local evals | ||
LOCAL_EVALS="mnoukhov/aime2024-25-rlvr 1.0 mnoukhov/aime2024-25-rlvr 1.0" | ||
LOCAL_EVAL_SPLITS="test_2024 test_2024 test_2025 test_2025" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Duplicate Dataset Configurations Cause Redundant EvaluationsThe |
||
# tengmath3k | ||
EXP_NAME="grpo_tengmath3k_k16_${GS_MODEL_NAME}" | ||
# EXP_NAME="grpo_dapo14k_${GS_MODEL_NAME}" | ||
|
||
cluster=ai2/augusta | ||
|
||
python mason.py \ | ||
--task_name ${EXP_NAME} \ | ||
--cluster ${cluster} \ | ||
--workspace ai2/tulu-thinker \ | ||
--priority high \ | ||
--pure_docker_mode \ | ||
--image ${1:-michaeln/open_instruct_olmo2_retrofit} \ | ||
--preemptible \ | ||
--num_nodes 4 \ | ||
--env VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \ | ||
--env VLLM_ATTENTION_BACKEND="FLASH_ATTN" \ | ||
--gs_model_name $GS_MODEL_NAME \ | ||
--gpus 8 \ | ||
--budget ai2/oe-adapt \ | ||
-- \ | ||
source configs/beaker_configs/ray_node_setup.sh \&\& \ | ||
source configs/beaker_configs/code_api_setup.sh \&\& \ | ||
python open_instruct/grpo_fast.py \ | ||
--exp_name ${EXP_NAME} \ | ||
--beta 0.0 \ | ||
--num_samples_per_prompt_rollout 16 \ | ||
--num_unique_prompts_rollout 24 \ | ||
--num_mini_batches 1 \ | ||
--learning_rate 1e-6 \ | ||
--per_device_train_batch_size 1 \ | ||
--kl_estimator kl3 \ | ||
--dataset_mixer_list $DATASETS \ | ||
--dataset_mixer_list_splits train \ | ||
--dataset_mixer_eval_list $LOCAL_EVALS \ | ||
--dataset_mixer_eval_list_splits $LOCAL_EVAL_SPLITS \ | ||
--max_token_length 2048 \ | ||
--max_prompt_token_length 2048 \ | ||
--response_length 8192 \ | ||
--pack_length 32768 \ | ||
--model_name_or_path ${MODEL_NAME_OR_PATH} \ | ||
--chat_template_name olmo_thinker_r1_style_nochat \ | ||
--stop_strings "</answer>" \ | ||
--non_stop_penalty False \ | ||
--temperature 1.0 \ | ||
--total_episodes 38400 \ | ||
--deepspeed_stage 3 \ | ||
--num_learners_per_node 8 \ | ||
--vllm_num_engines 24 \ | ||
--vllm_tensor_parallel_size 1 \ | ||
--lr_scheduler_type constant \ | ||
--apply_verifiable_reward true \ | ||
--seed 1 \ | ||
--local_eval_every 50 \ | ||
--save_freq 50 \ | ||
--checkpoint_state_freq 50 \ | ||
--gradient_checkpointing \ | ||
--with_tracking \ | ||
--vllm_enable_prefix_caching \ | ||
--clip_higher 0.272 \ | ||
--mask_truncated_completions True \ | ||
--oe_eval_max_length 32000 \ | ||
--eval_priority high \ | ||
--try_launch_beaker_eval_jobs_on_weka True \ | ||
--oe_eval_tasks $EVALS \ | ||
--oe_eval_beaker_image oe-eval-beaker/oe_eval_olmo2_retrofit_auto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Docker Build Fails on Missing Directory
The
COPY
command foroe-eval-internal
changed from an optional wildcard pattern to a direct copy. This removes the intended fault tolerance, causing Docker builds to fail if theoe-eval-internal
directory is missing, instead of silently skipping.