Skip to content
Draft

2.5 rl0 #1004

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
77f91c3
all changes from olmo3 but for olmo2.5
mnoukhov Aug 14, 2025
13c057b
example script
mnoukhov Aug 14, 2025
ee61222
fix path and uv lock
mnoukhov Aug 14, 2025
1423264
olmo2 retrofit naming
mnoukhov Aug 15, 2025
ed2ec83
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 16, 2025
a663287
updated script
mnoukhov Aug 19, 2025
abe3902
makefile delete old image
mnoukhov Aug 19, 2025
7d74b69
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 19, 2025
e5002cc
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 19, 2025
8dcf71c
resumable
mnoukhov Aug 20, 2025
f9b82f2
logging oe eval to wandb when using new oe-eval-interal
mnoukhov Aug 20, 2025
2ea2e37
fix for 4 nodes maybe
mnoukhov Aug 20, 2025
f2d6e97
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 20, 2025
93b88e2
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 21, 2025
667963b
revert change, 3 - 1 node still not working
mnoukhov Aug 21, 2025
e2925ec
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 21, 2025
98af8e3
wandb run step arg
mnoukhov Aug 21, 2025
18e3f7c
custom vllm in pyproject no need to clone
mnoukhov Aug 21, 2025
5820756
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 22, 2025
512651d
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 25, 2025
c30b5b8
Merge branch 'main' into log-oe-eval-wandb
mnoukhov Aug 25, 2025
ebcac11
vllm is extra dependency
mnoukhov Aug 26, 2025
6cbafa3
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 26, 2025
5d0e81a
Merge branch 'main' of github.com:allenai/open-instruct into log-oe-e…
mnoukhov Aug 26, 2025
1e5e1f9
make vllm a dependency either way but do local vllm as extra
mnoukhov Aug 27, 2025
24c8dc8
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 27, 2025
77048d2
back to basics, make setup to git clone
mnoukhov Aug 27, 2025
cb87b45
editable
mnoukhov Aug 27, 2025
d0b6bfc
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 28, 2025
6cac122
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Aug 28, 2025
db7e3d4
merge in main (#962)
jacob-morrison Aug 28, 2025
3ed8657
debug script
mnoukhov Aug 29, 2025
0c432cd
smaller run on one node
mnoukhov Sep 3, 2025
782337c
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 3, 2025
1c609b8
attention type fix
mnoukhov Sep 3, 2025
20354e3
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 4, 2025
ef9e855
synchronous weight sync
mnoukhov Sep 5, 2025
bd28584
start generate thread trigger event
mnoukhov Sep 5, 2025
e43aa8e
single weight sync and generate thread
mnoukhov Sep 5, 2025
5aef3bd
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 6, 2025
1bea281
sync weight sync
mnoukhov Sep 6, 2025
ce3fec0
cleanup
mnoukhov Sep 8, 2025
ee243ef
fix env var check
mnoukhov Sep 8, 2025
755ac15
temporary logging
mnoukhov Sep 8, 2025
782ac53
disable log stats
mnoukhov Sep 8, 2025
ad89b37
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 8, 2025
54ed043
fix lock file and revert extra logging
mnoukhov Sep 8, 2025
d9dd800
un-revert weight sync
mnoukhov Sep 8, 2025
3a833c0
olmo dapo
mnoukhov Sep 8, 2025
c4497c5
olmo simple thinker
mnoukhov Sep 9, 2025
b7bd670
Merge branch 'main' into log-oe-eval-wandb
mnoukhov Sep 9, 2025
013c6b7
undo formatting
mnoukhov Sep 9, 2025
1b69161
Merge branch 'log-oe-eval-wandb' of github.com:allenai/open-instruct …
mnoukhov Sep 11, 2025
54c9a39
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 11, 2025
551f58c
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 14, 2025
4b3dd54
good r1-zero script and olmo simple thinker template
mnoukhov Sep 14, 2025
3f21704
2 epochs
mnoukhov Sep 14, 2025
5455f64
deepseek evals
mnoukhov Sep 15, 2025
f188425
shorter run
mnoukhov Sep 19, 2025
7177b0a
Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…
mnoukhov Sep 19, 2025
e00ef62
filtering vllm top p
mnoukhov Sep 19, 2025
cfc9b8d
fix copy since we need the folder
mnoukhov Sep 21, 2025
68fe5ef
Merge branch 'olmo2-retrofit' of github.com:allenai/open-instruct int…
mnoukhov Sep 21, 2025
bec6c40
generate script
mnoukhov Sep 23, 2025
c4ec086
test run of RL 0
mnoukhov Sep 24, 2025
e85463a
Merge branch 'main' of github.com:allenai/open-instruct into 2.5-rl0
mnoukhov Sep 25, 2025
6c152b5
fix oe eval and eval on 0
mnoukhov Sep 25, 2025
3a844c1
whoami without jq
mnoukhov Sep 25, 2025
f7c572f
correct whoami
mnoukhov Sep 25, 2025
5f6c75d
simpler template
mnoukhov Sep 27, 2025
9aaae46
gpu multiplier
mnoukhov Sep 27, 2025
0c2a1c6
new hyperparams
mnoukhov Sep 27, 2025
a9527ed
actually nochat template
mnoukhov Sep 28, 2025
4a27a61
nearly there
mnoukhov Sep 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,4 @@ dmypy.json
cache/
local_dataset_cache/
scratch/
vllm_olmo2.5/
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ ENV UV_CACHE_DIR=/root/.cache/uv
ENV HF_HUB_ENABLE_HF_TRANSFER=1
ENV UV_COMPILE_BYTECODE=0

RUN git clone -b shanea/olmo2-retrofit https://github.com/2015aroras/vllm.git vllm_olmo2.5

# Install dependencies
RUN --mount=type=cache,target=${UV_CACHE_DIR} \
--mount=type=bind,source=uv.lock,target=uv.lock \
Expand All @@ -78,7 +80,7 @@ COPY configs configs
COPY scripts scripts
COPY mason.py mason.py
# Copy oe-eval-internal if it exists (wildcard pattern won't fail if missing)
COPY oe-eval-interna[l] oe-eval-internal/
COPY oe-eval-internal oe-eval-internal
COPY open_instruct open_instruct

# Add build arguments for git information
Expand Down
12 changes: 11 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: style quality
.PHONY: style quality docker

# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = open_instruct
Expand All @@ -16,3 +16,13 @@ style-check: ## *fail* if anything needs rewriting

quality-check: ## *fail* if any rewrite was needed
uv run ruff check --exit-non-zero-on-fix $(check_dirs)

setup:
git clone -b shanea/olmo2-retrofit https://github.com/2015aroras/vllm.git vllm_olmo2.5

docker:
DOCKER_BUILDKIT=1 docker build -f Dockerfile --build-arg UV_CACHE_DIR=$(UV_CACHE_DIR) -t open_instruct_olmo2_retrofit .
# if you are internally at AI2, you can create an image like this:
$(eval beaker_user := $(shell beaker account whoami --format json | jq -r '.[0].name'))
beaker image delete $(beaker_user)/open_instruct_olmo2_retrofit
beaker image create open_instruct_olmo2_retrofit -n open_instruct_olmo2_retrofit -w ai2/$(beaker_user)
33 changes: 33 additions & 0 deletions generate_olmo25.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash

MODEL_NAME_OR_PATH="/weka/oe-training-default/ai2-llm/checkpoints/tylerr/long-context/olmo25_7b_lc_64k_6T_M100B_round5-sparkle_6634-pre_s2pdf_gzip2080_cweN-yake-all-olmo_packing_yarn-fullonly_50B-fb13a737/step11921-hf"
# DATASET="mnoukhov/DAPO-Math-14k-Processed-RLVR"
DATASET="TTTXXX01/MATH_3000_Filtered"
EXP_NAME="generate_olmo25_teng3k"

python mason.py \
--task_name ${EXP_NAME} \
--cluster ai2/jupiter \
--image ${1:-michaeln/open_instruct_olmo2_retrofit} \
--workspace ai2/tulu-thinker \
--priority high \
--pure_docker_mode \
--preemptible \
--gpus 2 \
--num_nodes 1 \
--max_retries 0 \
--budget ai2/oe-adapt \
--env VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
--env VLLM_ATTENTION_BACKEND="FLASH_ATTN" \
-- \
python scripts/data/rlvr/filtering_vllm.py \
--model $MODEL_NAME_OR_PATH \
--dataset $DATASET \
--split train \
--temperature 0.7 \
--top_p 0.95 \
--offset 0 \
--size 100000 \
--chat_template olmo_thinker_r1_style_nochat \
--output-file filtered_datasets/olmo25_7b_lc_dapo.jsonl \
--number_samples 16
27 changes: 27 additions & 0 deletions open_instruct/dataset_transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,33 @@ def visualize_token_role(tokens: list[int], masks: list[int], tokenizer: PreTrai
"{% endif %}"
"{% endfor %}"
),
"olmo_thinker_r1_style_nochat": (
"Solve the following math problem step by step. "
"Reason about the question in <think> </think> tags "
"then provide the final answer in <answer> </answer> tags "
"so the full response is <think> reasoning process here </think> "
"<answer> answer here </answer>."
"\n\n"
"{% for message in messages %}"
"{{ '\n\n' if not loop.first else '' }}"
"{{ message['content'] + '\n' }}"
"{% if loop.last and add_generation_prompt %}"
"{{ 'Solving step by step\n<think>' }}"
"{% endif %}"
"{% endfor %}"
),
"olmo_thinker_dapo": (
"Solve the following math problem step by step. "
"The last line of your response should be the answer to the problem in form Answer: $Answer (without quotes) where $Answer is the answer to the problem."
"\n\n"
"{% for message in messages %}"
"{{ '\n\n' if not loop.first else '' }}"
"{{ message['content'] + '\n' }}"
"{% if loop.last and add_generation_prompt %}"
"{{ '\nRemember to put your answer on its own line after \"Answer:\"' }}"
"{% endif %}"
"{% endfor %}"
),
# template is taken from https://arxiv.org/abs/2501.12948.
"r1_simple_chat": (
"A conversation between User and Assistant. "
Expand Down
5 changes: 4 additions & 1 deletion open_instruct/grpo_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,8 @@ class Args:
"""the max generation length for evaluation for oe-eval"""
oe_eval_beaker_image: Optional[str] = None
"""the docker image for evaluation for oe-eval"""
oe_eval_gpu_multiplier: Optional[int] = 1
"""gpu mulitplier for eval jobs"""
eval_priority: Literal["low", "normal", "high", "urgent"] = "normal"
"""the priority of auto-launched evaluation jobs"""

Expand Down Expand Up @@ -1224,6 +1226,7 @@ def launch_ai2_evals_on_weka_wrapper(self, step_dir, leaderboard_name, wandb_url
args.gs_bucket_path,
args.eval_priority,
args.oe_eval_beaker_image,
args.oe_eval_gpu_multiplier,
)


Expand Down Expand Up @@ -2366,7 +2369,7 @@ def one_training_step(
)

save_time = 0
if args.save_freq > 0 and training_step % args.save_freq == 0 and (args.eval_on_step_0 or training_step > 1):
if args.save_freq > 0 and (training_step % args.save_freq == 0 or (training_step == 1 and args.eval_on_step_0)):
with Timer("[Main Thread] 🗡️ Saving model") as timer:
checkpoint_dir = f"{args.output_dir}_checkpoints"
step_dir = os.path.join(checkpoint_dir, f"step_{training_step}")
Expand Down
2 changes: 2 additions & 0 deletions open_instruct/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1145,6 +1145,7 @@ def launch_ai2_evals_on_weka(
gs_bucket_path: Optional[str] = None,
eval_priority: Optional[str] = "normal",
beaker_image: Optional[str] = None,
oe_eval_gpu_multiplier: Optional[int] = 1,
) -> None:
weka_cluster = "ai2/saturn-cirrascale ai2/neptune-cirrascale"
gcp_cluster = "ai2/augusta-google-1"
Expand Down Expand Up @@ -1174,6 +1175,7 @@ def launch_ai2_evals_on_weka(

command = f"""\
python scripts/submit_eval_jobs.py \
--gpu_multiplier {oe_eval_gpu_multiplier} \
--model_name {leaderboard_name} \
--location {path} \
--cluster {cluster} \
Expand Down
9 changes: 6 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,17 @@ dependencies = [
"nvitop>=1.4.2",
"packaging>=24.2",
"peft>=0.13.2",
"ray[default]>=2.44.1",
"ray[default]==2.46.0",
"setuptools>=75.6.0,<80.0.0",
"tensorboard>=2.18.0",
"torch>=2.7.0,<2.8",
"transformers>=4.52.4,<4.54.0", # see https://github.com/vllm-project/vllm-ascend/issues/2046
"vllm==0.9.1",
"transformers @ git+https://github.com/2015aroras/transformers.git@shanea/olmo2-retrofit",
"wandb==0.18.1",
"langdetect==1.0.9",
"immutabledict==1.2.0",
"flash-attn>=2.8.0.post2; platform_system != 'Darwin'",
"liger-kernel>=0.5.4; platform_system != 'Darwin'",
"vllm" # installed locally with git clone because otherwise errors
]

[build-system]
Expand All @@ -44,12 +44,14 @@ flash-attn = [{ requirement = "torch", match-runtime = true }]

[tool.uv.extra-build-variables]
flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE" }
vllm = { VLLM_USE_PRECOMPILED = "1" }

# pytorch related setups
[tool.uv.sources]
torch = [
{ index = "pytorch-cu128", marker = "platform_system != 'Darwin'"},
]
vllm = { path = "vllm_olmo2.5", editable = true }

[[tool.uv.index]]
name = "pytorch-cu128"
Expand Down Expand Up @@ -95,6 +97,7 @@ target-version = ['py310']

[tool.isort]
known_first_party = ["open_instruct"]
known-third-party = ["wandb"]
profile = "black"
src_paths = ["open_instruct"]

Expand Down
86 changes: 18 additions & 68 deletions scripts/data/rlvr/filtering_vllm.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
'''
"""
python mason.py \
--cluster ai2/jupiter-cirrascale-2 --image nathanl/open_instruct_auto \
--workspace ai2/tulu-thinker \
Expand All @@ -15,7 +15,8 @@
--size 100000 \
--output-file filtered_datasets/qwen2_5_openthoughts2/orz.jsonl \
--number_samples 8
'''
"""

import argparse
import json

Expand All @@ -27,65 +28,20 @@


def main():
parser = argparse.ArgumentParser(
description="Bulk-generate N samples per HF dataset record using vLLM."
)
parser.add_argument(
"--model",
required=True,
help="vLLM model ID (e.g. facebook/opt-125m)"
)
parser.add_argument(
"--dataset",
required=True,
help="HF dataset name (e.g. squad)"
)
parser = argparse.ArgumentParser(description="Bulk-generate N samples per HF dataset record using vLLM.")
parser.add_argument("--model", required=True, help="vLLM model ID (e.g. facebook/opt-125m)")
parser.add_argument("--dataset", required=True, help="HF dataset name (e.g. squad)")
parser.add_argument("--split", default="train", help="Which split to load")
parser.add_argument("--offset", type=int, required=True, help="Start index into the split")
parser.add_argument("--size", type=int, required=True, help="Number of records to process")
parser.add_argument("--output-file", default=None, help="Path for output JSONL")
parser.add_argument(
"--split",
default="train",
help="Which split to load"
)
parser.add_argument(
"--offset",
type=int,
required=True,
help="Start index into the split"
)
parser.add_argument(
"--size",
type=int,
required=True,
help="Number of records to process"
)
parser.add_argument(
"--output-file",
default=None,
help="Path for output JSONL"
)
parser.add_argument(
"--push_to_hub",
default=None,
type=str,
help="Give a dataset name to push this data to the hub."
)
parser.add_argument(
"--chat_template",
type=str,
default=None,
help="Chat template name"
)
parser.add_argument(
"--number_samples",
type=int,
default=8,
help="Number of samples to generate per record"
)
parser.add_argument(
"--temperature",
type=float,
default=1.0,
help="Sampling temperature"
"--push_to_hub", default=None, type=str, help="Give a dataset name to push this data to the hub."
)
parser.add_argument("--chat_template", type=str, default=None, help="Chat template name")
parser.add_argument("--number_samples", type=int, default=8, help="Number of samples to generate per record")
parser.add_argument("--temperature", type=float, default=1.0, help="Sampling temperature")
parser.add_argument("--top_p", type=float, default=1.0, help="Sampling temperature")
args = parser.parse_args()

# 1. Load and slice dataset
Expand All @@ -106,20 +62,14 @@ def main():
tokenizer.apply_chat_template(
sample["messages"][:-1] if len(sample["messages"]) > 1 else sample["messages"],
add_generation_prompt=True,
tokenize=False
tokenize=False,
)
for sample in subset
]
# 4. vLLM bulk generate
llm = LLM(
model=args.model,
dtype="bfloat16",
enable_prefix_caching=True
)
llm = LLM(model=args.model, dtype="bfloat16", enable_prefix_caching=True)
sampling_params = SamplingParams(
temperature=args.temperature,
n=args.number_samples,
max_tokens=32768,
temperature=args.temperature, top_p=args.top_p, n=args.number_samples, max_tokens=32768
)
outputs = llm.generate(prompts, sampling_params)

Expand Down
3 changes: 2 additions & 1 deletion scripts/eval/oe-eval.sh
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,8 @@ fi
# Set wandb run path to upload to wandb if available
WANDB_ARG=""
if [[ -n "$WANDB_RUN_PATH" ]]; then
beaker_user=$(beaker account whoami --format json | jq -r '.[0].name')
beaker_user=$(beaker account whoami --format text | awk 'NR==2 {print $2}')
echo "Assuming beaker user $beaker_user"
if ! beaker secret list --workspace ai2/tulu-3-results | grep -q "${beaker_user}_WANDB_API_KEY"; then
echo "WARNING: No ${beaker_user}_WANDB_API_KEY secret found in workspace ai2/tulu-3-results."
echo "add your WANDB_API_KEY as a secret to this workspace in order to use --oe_eval_log_to_wandb"
Expand Down
2 changes: 1 addition & 1 deletion scripts/train/build_image_and_launch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ git_branch=$(git rev-parse --abbrev-ref HEAD)
# Sanitize the branch name to remove invalid characters for Beaker names
# Beaker names can only contain letters, numbers, -_. and may not start with -
sanitized_branch=$(echo "$git_branch" | sed 's/[^a-zA-Z0-9._-]/-/g' | sed 's/^-//')
image_name=open-instruct-integration-test-${sanitized_branch}
image_name=open-instruct-integration-test-${sanitized_branch}-${git_hash}

# Build the Docker image exactly like push-image.yml does, passing git info as build args
docker build --platform=linux/amd64 \
Expand Down
32 changes: 27 additions & 5 deletions scripts/train/debug/grpo_fast.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
#!/bin/bash

python mason.py \
--task_name grpo_debug_small \
--cluster ai2/augusta \
--workspace ai2/oe-adapt-code \
--priority high \
--pure_docker_mode \
--image michaeln/open_instruct_2.5-rl0 \
--preemptible \
--num_nodes 1 \
--env VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
--env VLLM_ATTENTION_BACKEND="FLASH_ATTN" \
--gpus 1 \
--budget ai2/oe-adapt \
-- \
uv run python open_instruct/grpo_fast.py \
--dataset_mixer_list ai2-adapt-dev/rlvr_gsm8k_zs 64 \
--dataset_mixer_list_splits train \
Expand All @@ -18,19 +34,25 @@ uv run python open_instruct/grpo_fast.py \
--ground_truths_key ground_truth \
--chat_template_name r1_simple_chat_postpend_think \
--learning_rate 3e-7 \
--total_episodes 200 \
--total_episodes 1600 \
--deepspeed_stage 2 \
--num_epochs 1 \
--num_learners_per_node 1 \
--vllm_tensor_parallel_size 1 \
--beta 0.01 \
--beta 0. \
--seed 3 \
--local_eval_every 1 \
--local_eval_every 25 \
--vllm_sync_backend gloo \
--vllm_gpu_memory_utilization 0.3 \
--save_traces \
--vllm_enforce_eager \
--gradient_checkpointing \
--single_gpu_mode \
--push_to_hub false \
# --with_tracking
--with_tracking \
--save_freq 25 \
--eval_on_step_0 \
--oe_eval_max_length 512 \
--try_launch_beaker_eval_jobs_on_weka True \
--oe_eval_tasks gsm8k \
--oe_eval_beaker_image michaeln/oe_eval_olmo2_retrofit \
--eval_priority high
Loading