-
Notifications
You must be signed in to change notification settings - Fork 445
2.5 rl0 #1004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
requires PR from oe-eval-internal allenai/oe-eval-internal#636
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @mnoukhov, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on refining the project's operational environment and training configurations. It cleans up outdated dependencies, integrates robust logging for evaluation runs with Weights & Biases, and updates key parameters for reinforcement learning training, specifically for models that employ a 'thinking' process. These changes aim to improve stability, traceability, and performance of the training and evaluation pipelines.
Highlights
- Dependency Management Refinement: Removed
flashinfer-python
and several associated CUDA-related packages frompyproject.toml
andrequirements.txt
, streamlining the project's dependencies. Various other package versions inuv.lock
were also updated. - Enhanced Evaluation Logging with WandB: Introduced new functionality to log evaluation results to Weights & Biases (WandB). This includes adding a
log_eval_to_wandb
argument, a utility function to convert WandB URLs to run paths, and updating evaluation scripts (oe-eval.sh
,submit_eval_jobs.py
) to support this feature. - RL Training Configuration Updates: Adjusted the configuration for the
grpo_olmo25.sh
training script, including changing the dataset mix tomnoukhov/DAPO-Math-14k-Processed-RLVR
, updating model paths, and significantly increasingvllm_num_engines
from 4 to 24 andnum_learners_per_node
from 4 to 8. - New Chat Template for Thinking Models: Added a new chat template,
olmo_simple_thinker_r1_style
, toopen_instruct/dataset_transformation.py
. This template is designed for models that use explicit<think>
and<answer>
tags for reasoning and response.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces functionality to log evaluation results to Weights & Biases by adding a wandb_run_path
argument across several scripts. It also includes dependency cleanups and updates to training configurations. The core logic for the new feature is sound, but I've identified a bug in scripts/eval/oe-eval.sh
that could lead to malformed commands, and suggested improvements for robustness and clarity in open_instruct/utils.py
and scripts/submit_eval_jobs.py
.
scripts/eval/oe-eval.sh
Outdated
fi | ||
if [[ -n "$STEP" ]]; then | ||
DATALAKE_ARGS+=",step=$STEP" | ||
WANDB_ARG+="--wandb-run-step $STEP" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open_instruct/utils.py
Outdated
def wandb_url_to_run_path(url): | ||
""" | ||
Convert a wandb URL to a wandb run path. | ||
Args: | ||
url (str): wandb URL in format https://wandb.ai/entity/project/runs/run_id | ||
Returns: | ||
str: wandb run path in format entity/project/run_id | ||
""" | ||
# Remove the base URL and split by '/' | ||
path_parts = url.replace("https://wandb.ai/", "").split("/") | ||
|
||
# Extract entity, project, and run_id | ||
entity = path_parts[0] | ||
project = path_parts[1] | ||
run_id = path_parts[3] # Skip 'runs' at index 2 | ||
|
||
return f"{entity}/{project}/{run_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of wandb_url_to_run_path
is a bit fragile as it relies on string replacement and splitting, which might fail with slightly different but valid URL formats (e.g., different domains, query parameters, or trailing slashes). Using urllib.parse
would make it more robust. Also, adding type hints and better error handling would improve maintainability.
Please also add from urllib.parse import urlparse
at the top of the file.
def wandb_url_to_run_path(url: str) -> str:
"""
Convert a wandb URL to a wandb run path.
Args:
url (str): wandb URL in format https://wandb.ai/entity/project/runs/run_id
Returns:
str: wandb run path in format entity/project/run_id
Raises:
ValueError: if the URL format is invalid.
"""
try:
path_parts = urlparse(url).path.strip('/').split('/')
# The path is expected to be /<entity>/<project>/runs/<run_id>
if len(path_parts) >= 4 and path_parts[2] == 'runs':
return f"{path_parts[0]}/{path_parts[1]}/{path_parts[3]}"
except (ValueError, IndexError) as e:
raise ValueError(f"Could not parse wandb URL '{url}'") from e
raise ValueError(f"Invalid wandb URL format: {url}")
parser.add_argument("--oe_eval_tasks", type=str, default=None, help="Evaluate OE eval on Beaker.") | ||
parser.add_argument("--step", type=int, default=None, help="Step number for postgresql logging.") | ||
parser.add_argument("--run_id", type=str, default=None, help="A unique run ID for postgresql logging.") | ||
parser.add_argument("--wandb_run_path", type=str, default=None, help="A unique run ID for postgresql logging.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The help text for --wandb_run_path
seems to be a copy-paste error from another argument. It should describe that this argument is for the Weights & Biases run path.
parser.add_argument("--wandb_run_path", type=str, default=None, help="A unique run ID for postgresql logging.") | |
parser.add_argument("--wandb_run_path", type=str, default=None, help="Weights & Biases run path for logging evaluation results.") |
No description provided.