Skip to content

Conversation

mnoukhov
Copy link
Contributor

No description provided.

mnoukhov and others added 30 commits August 14, 2025 20:43
@mnoukhov mnoukhov marked this pull request as draft September 11, 2025 18:46
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @mnoukhov, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refining the project's operational environment and training configurations. It cleans up outdated dependencies, integrates robust logging for evaluation runs with Weights & Biases, and updates key parameters for reinforcement learning training, specifically for models that employ a 'thinking' process. These changes aim to improve stability, traceability, and performance of the training and evaluation pipelines.

Highlights

  • Dependency Management Refinement: Removed flashinfer-python and several associated CUDA-related packages from pyproject.toml and requirements.txt, streamlining the project's dependencies. Various other package versions in uv.lock were also updated.
  • Enhanced Evaluation Logging with WandB: Introduced new functionality to log evaluation results to Weights & Biases (WandB). This includes adding a log_eval_to_wandb argument, a utility function to convert WandB URLs to run paths, and updating evaluation scripts (oe-eval.sh, submit_eval_jobs.py) to support this feature.
  • RL Training Configuration Updates: Adjusted the configuration for the grpo_olmo25.sh training script, including changing the dataset mix to mnoukhov/DAPO-Math-14k-Processed-RLVR, updating model paths, and significantly increasing vllm_num_engines from 4 to 24 and num_learners_per_node from 4 to 8.
  • New Chat Template for Thinking Models: Added a new chat template, olmo_simple_thinker_r1_style, to open_instruct/dataset_transformation.py. This template is designed for models that use explicit <think> and <answer> tags for reasoning and response.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces functionality to log evaluation results to Weights & Biases by adding a wandb_run_path argument across several scripts. It also includes dependency cleanups and updates to training configurations. The core logic for the new feature is sound, but I've identified a bug in scripts/eval/oe-eval.sh that could lead to malformed commands, and suggested improvements for robustness and clarity in open_instruct/utils.py and scripts/submit_eval_jobs.py.

fi
if [[ -n "$STEP" ]]; then
DATALAKE_ARGS+=",step=$STEP"
WANDB_ARG+="--wandb-run-step $STEP"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a missing space when appending --wandb-run-step to WANDB_ARG. If WANDB_ARG is already set, this will result in a malformed command argument like ..._API_KEY--wandb-run-step....

Suggested change
WANDB_ARG+="--wandb-run-step $STEP"
WANDB_ARG+=" --wandb-run-step $STEP"

Comment on lines 1204 to 1222
def wandb_url_to_run_path(url):
"""
Convert a wandb URL to a wandb run path.
Args:
url (str): wandb URL in format https://wandb.ai/entity/project/runs/run_id
Returns:
str: wandb run path in format entity/project/run_id
"""
# Remove the base URL and split by '/'
path_parts = url.replace("https://wandb.ai/", "").split("/")

# Extract entity, project, and run_id
entity = path_parts[0]
project = path_parts[1]
run_id = path_parts[3] # Skip 'runs' at index 2

return f"{entity}/{project}/{run_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of wandb_url_to_run_path is a bit fragile as it relies on string replacement and splitting, which might fail with slightly different but valid URL formats (e.g., different domains, query parameters, or trailing slashes). Using urllib.parse would make it more robust. Also, adding type hints and better error handling would improve maintainability.

Please also add from urllib.parse import urlparse at the top of the file.

def wandb_url_to_run_path(url: str) -> str:
    """
    Convert a wandb URL to a wandb run path.

    Args:
        url (str): wandb URL in format https://wandb.ai/entity/project/runs/run_id

    Returns:
        str: wandb run path in format entity/project/run_id

    Raises:
        ValueError: if the URL format is invalid.
    """
    try:
        path_parts = urlparse(url).path.strip('/').split('/')
        # The path is expected to be /<entity>/<project>/runs/<run_id>
        if len(path_parts) >= 4 and path_parts[2] == 'runs':
            return f"{path_parts[0]}/{path_parts[1]}/{path_parts[3]}"
    except (ValueError, IndexError) as e:
        raise ValueError(f"Could not parse wandb URL '{url}'") from e

    raise ValueError(f"Invalid wandb URL format: {url}")

parser.add_argument("--oe_eval_tasks", type=str, default=None, help="Evaluate OE eval on Beaker.")
parser.add_argument("--step", type=int, default=None, help="Step number for postgresql logging.")
parser.add_argument("--run_id", type=str, default=None, help="A unique run ID for postgresql logging.")
parser.add_argument("--wandb_run_path", type=str, default=None, help="A unique run ID for postgresql logging.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The help text for --wandb_run_path seems to be a copy-paste error from another argument. It should describe that this argument is for the Weights & Biases run path.

Suggested change
parser.add_argument("--wandb_run_path", type=str, default=None, help="A unique run ID for postgresql logging.")
parser.add_argument("--wandb_run_path", type=str, default=None, help="Weights & Biases run path for logging evaluation results.")

@mnoukhov mnoukhov changed the base branch from olmo2-retrofit to main September 26, 2025 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants