2.5 rl0 #1004

mnoukhov · 2025-09-11T18:46:04Z

No description provided.

…trofit

requires PR from oe-eval-internal allenai/oe-eval-internal#636

…trofit

…val-wandb

…trofit

…into olmo2-retrofit

gemini-code-assist

Summary of Changes

Hello @mnoukhov, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refining the project's operational environment and training configurations. It cleans up outdated dependencies, integrates robust logging for evaluation runs with Weights & Biases, and updates key parameters for reinforcement learning training, specifically for models that employ a 'thinking' process. These changes aim to improve stability, traceability, and performance of the training and evaluation pipelines.

Highlights

Dependency Management Refinement: Removed flashinfer-python and several associated CUDA-related packages from pyproject.toml and requirements.txt, streamlining the project's dependencies. Various other package versions in uv.lock were also updated.
Enhanced Evaluation Logging with WandB: Introduced new functionality to log evaluation results to Weights & Biases (WandB). This includes adding a log_eval_to_wandb argument, a utility function to convert WandB URLs to run paths, and updating evaluation scripts (oe-eval.sh, submit_eval_jobs.py) to support this feature.
RL Training Configuration Updates: Adjusted the configuration for the grpo_olmo25.sh training script, including changing the dataset mix to mnoukhov/DAPO-Math-14k-Processed-RLVR, updating model paths, and significantly increasing vllm_num_engines from 4 to 24 and num_learners_per_node from 4 to 8.
New Chat Template for Thinking Models: Added a new chat template, olmo_simple_thinker_r1_style, to open_instruct/dataset_transformation.py. This template is designed for models that use explicit <think> and <answer> tags for reasoning and response.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces functionality to log evaluation results to Weights & Biases by adding a wandb_run_path argument across several scripts. It also includes dependency cleanups and updates to training configurations. The core logic for the new feature is sound, but I've identified a bug in scripts/eval/oe-eval.sh that could lead to malformed commands, and suggested improvements for robustness and clarity in open_instruct/utils.py and scripts/submit_eval_jobs.py.

gemini-code-assist · 2025-09-11T18:48:22Z

scripts/eval/oe-eval.sh

 fi
 if [[ -n "$STEP" ]]; then
    DATALAKE_ARGS+=",step=$STEP"
+    WANDB_ARG+="--wandb-run-step $STEP"


There's a missing space when appending --wandb-run-step to WANDB_ARG. If WANDB_ARG is already set, this will result in a malformed command argument like ..._API_KEY--wandb-run-step....

Suggested change

WANDB_ARG+="--wandb-run-step $STEP"

WANDB_ARG+=" --wandb-run-step $STEP"

gemini-code-assist · 2025-09-11T18:48:22Z

open_instruct/utils.py

+def wandb_url_to_run_path(url):
+    """
+    Convert a wandb URL to a wandb run path.
+
+    Args:
+        url (str): wandb URL in format https://wandb.ai/entity/project/runs/run_id
+
+    Returns:
+        str: wandb run path in format entity/project/run_id
+    """
+    # Remove the base URL and split by '/'
+    path_parts = url.replace("https://wandb.ai/", "").split("/")
+
+    # Extract entity, project, and run_id
+    entity = path_parts[0]
+    project = path_parts[1]
+    run_id = path_parts[3]  # Skip 'runs' at index 2
+
+    return f"{entity}/{project}/{run_id}"


The implementation of wandb_url_to_run_path is a bit fragile as it relies on string replacement and splitting, which might fail with slightly different but valid URL formats (e.g., different domains, query parameters, or trailing slashes). Using urllib.parse would make it more robust. Also, adding type hints and better error handling would improve maintainability.

Please also add from urllib.parse import urlparse at the top of the file.

def wandb_url_to_run_path(url: str) -> str: """ Convert a wandb URL to a wandb run path. Args: url (str): wandb URL in format https://wandb.ai/entity/project/runs/run_id Returns: str: wandb run path in format entity/project/run_id Raises: ValueError: if the URL format is invalid. """ try: path_parts = urlparse(url).path.strip('/').split('/') # The path is expected to be /<entity>/<project>/runs/<run_id> if len(path_parts) >= 4 and path_parts[2] == 'runs': return f"{path_parts[0]}/{path_parts[1]}/{path_parts[3]}" except (ValueError, IndexError) as e: raise ValueError(f"Could not parse wandb URL '{url}'") from e raise ValueError(f"Invalid wandb URL format: {url}")

gemini-code-assist · 2025-09-11T18:48:22Z

scripts/submit_eval_jobs.py

 parser.add_argument("--oe_eval_tasks", type=str, default=None, help="Evaluate OE eval on Beaker.")
 parser.add_argument("--step", type=int, default=None, help="Step number for postgresql logging.")
 parser.add_argument("--run_id", type=str, default=None, help="A unique run ID for postgresql logging.")
+parser.add_argument("--wandb_run_path", type=str, default=None, help="A unique run ID for postgresql logging.")


The help text for --wandb_run_path seems to be a copy-paste error from another argument. It should describe that this argument is for the Weights & Biases run path.

Suggested change

parser.add_argument("--wandb_run_path", type=str, default=None, help="A unique run ID for postgresql logging.")

parser.add_argument("--wandb_run_path", type=str, default=None, help="Weights & Biases run path for logging evaluation results.")

…trofit

…o 2.5-rl0

mnoukhov and others added 30 commits August 14, 2025 20:43

all changes from olmo3 but for olmo2.5

77f91c3

example script

13c057b

fix path and uv lock

ee61222

olmo2 retrofit naming

1423264

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

ed2ec83

…trofit

updated script

a663287

makefile delete old image

abe3902

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

7d74b69

…trofit

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

e5002cc

…trofit

resumable

8dcf71c

logging oe eval to wandb when using new oe-eval-interal

f9b82f2

requires PR from oe-eval-internal allenai/oe-eval-internal#636

fix for 4 nodes maybe

2ea2e37

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

f2d6e97

…trofit

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

93b88e2

…trofit

revert change, 3 - 1 node still not working

667963b

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

e2925ec

…trofit

wandb run step arg

98af8e3

custom vllm in pyproject no need to clone

18e3f7c

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

5820756

…trofit

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

512651d

…trofit

Merge branch 'main' into log-oe-eval-wandb

c30b5b8

vllm is extra dependency

ebcac11

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

6cbafa3

…trofit

Merge branch 'main' of github.com:allenai/open-instruct into log-oe-e…

5d0e81a

…val-wandb

make vllm a dependency either way but do local vllm as extra

1e5e1f9

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

24c8dc8

…trofit

back to basics, make setup to git clone

77048d2

editable

cb87b45

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

d0b6bfc

…trofit

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

6cac122

…trofit

mnoukhov and others added 5 commits September 8, 2025 23:04

olmo dapo

3a833c0

olmo simple thinker

c4497c5

Merge branch 'main' into log-oe-eval-wandb

b7bd670

undo formatting

013c6b7

Merge branch 'log-oe-eval-wandb' of github.com:allenai/open-instruct …

1b69161

…into olmo2-retrofit

mnoukhov marked this pull request as draft September 11, 2025 18:46

gemini-code-assist bot reviewed Sep 11, 2025

View reviewed changes

mnoukhov added 16 commits September 11, 2025 18:52

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

54c9a39

…trofit

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

551f58c

…trofit

good r1-zero script and olmo simple thinker template

4b3dd54

2 epochs

3f21704

deepseek evals

5455f64

shorter run

f188425

Merge branch 'main' of github.com:allenai/open-instruct into olmo2-re…

7177b0a

…trofit

filtering vllm top p

e00ef62

fix copy since we need the folder

cfc9b8d

Merge branch 'olmo2-retrofit' of github.com:allenai/open-instruct int…

68fe5ef

…o 2.5-rl0

generate script

bec6c40

test run of RL 0

c4ec086

Merge branch 'main' of github.com:allenai/open-instruct into 2.5-rl0

e85463a

fix oe eval and eval on 0

6c152b5

whoami without jq

3a844c1

correct whoami

f7c572f

mnoukhov changed the base branch from olmo2-retrofit to main September 26, 2025 05:13

mnoukhov added 5 commits September 27, 2025 05:46

simpler template

5f6c75d

gpu multiplier

9aaae46

new hyperparams

0c2a1c6

actually nochat template

a9527ed

nearly there

4a27a61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2.5 rl0 #1004

2.5 rl0 #1004

Uh oh!

mnoukhov commented Sep 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Uh oh!

gemini-code-assist bot Sep 11, 2025

Uh oh!

gemini-code-assist bot Sep 11, 2025

Uh oh!

Uh oh!

	WANDB_ARG+="--wandb-run-step $STEP"
	WANDB_ARG+=" --wandb-run-step $STEP"

	parser.add_argument("--wandb_run_path", type=str, default=None, help="A unique run ID for postgresql logging.")
	parser.add_argument("--wandb_run_path", type=str, default=None, help="Weights & Biases run path for logging evaluation results.")

2.5 rl0 #1004

Are you sure you want to change the base?

2.5 rl0 #1004

Uh oh!

Conversation

mnoukhov commented Sep 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!