-
Notifications
You must be signed in to change notification settings - Fork 445
[Draft] Comparing toolu with main #963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
shatu
wants to merge
223
commits into
toolu
Choose a base branch
from
main
base: toolu
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
223 commits
Select commit
Hold shift + click to select a range
e2be8d0
Update oe-eval.sh to set a default timeout of 48h. (#789)
finbarrtimbers ca55010
Updated configs to support changes. (#790)
finbarrtimbers c56efad
Add benchmark scripts (#786)
finbarrtimbers 004a48d
Add remap verifier (#773)
hamishivi 3b5b354
Ran the linter. (#792)
finbarrtimbers 7f7308c
fix the URL for code api setup (#791)
mnoukhov 829796a
Add nltk setup to uv dockerfile (#785)
hamishivi e4e5dfb
Switches the actors to use the Ray queue. (#784)
finbarrtimbers 7eb6c4d
Set new default value for num_samples
finbarrtimbers 7bf039f
Updates the benchmark script (#795)
finbarrtimbers d5e7160
install nginx in uv (#793)
mnoukhov bb7477d
allow passing local models, bubble up dataset cache errors (#797)
mnoukhov 839a806
binary reward for code (#798)
saurabh111233212 541058c
Now, we run individual prompts through the queue. (#796)
finbarrtimbers 774edca
Adds flashinfer dep. (#800)
finbarrtimbers b3e8e70
new beaker names (#803)
hamishivi 266f214
Remove Unused DPO Function (#794)
fabianlim 8048c9a
extra reporting (#799)
garrett361 4659dca
Revert "Now, we run individual prompts through the queue. (#796)" (#804)
saurabh111233212 45ae474
Fix misnamed variables. (#808)
finbarrtimbers 9d1620d
Fix broken syntax. (#809)
finbarrtimbers de8a14f
Add new olmo chat templates, and improve data mixing/tokenization (#765)
jacob-morrison d944d42
Fixes from last PR (#810)
hamishivi 207268a
Delete run_repro.sh (#813)
finbarrtimbers cc33540
Fix disk space error on image creation (#814)
hamishivi 8c45fd8
fix: chat template kwarg in save_with_accelerate (#824)
garrett361 6f30008
Fixes the ray double init error. (#822)
finbarrtimbers b28ac65
[WIP] Filtering: Remove special tokens, Chinese characters, readme (#…
natolambert 7b92068
Makes pytest ignore oe-eval-internal. (#829)
finbarrtimbers 4477615
repetition filtering for reasoning traces (#801)
natolambert 535ff07
fix avg_loss rename (#828)
garrett361 ec3e61b
look at changes (#825)
jacob-morrison 80d6566
Fix head so grpo_fast.py can run. (#821)
finbarrtimbers a388f3e
perf penalty (#805)
saurabh111233212 e8d4cf0
delete ref to async moe (#834)
saurabh111233212 bf22d22
fix some format in code_stdio (#835)
saurabh111233212 9d59a72
Adds an integration test, which launches a beaker experiment that run…
finbarrtimbers bcd65a1
oops! (#840)
saurabh111233212 cecdea6
Added comments explaining why we set NCCL_CUMEM_ENABLE. (#838)
finbarrtimbers d407b09
Adds an optional timeout flag to mason, and sets it on the single GPU…
finbarrtimbers 2497ce0
Adds pytest-xdist for parallel tests, and only installs needed deps f…
finbarrtimbers f25ae11
Minor fix to scripts. (#842)
finbarrtimbers b245fb7
add verify script (#845)
saurabh111233212 44358c2
Silences TORCH_CUDA_ARCH_LIST warning by passing the env var through.…
finbarrtimbers 7a3011c
Added merge_group trigger.
finbarrtimbers 815a73a
dataset statistics fix (#846)
jacob-morrison 63baaf4
Added Claude.md. (#847)
finbarrtimbers 6760a65
skip_oi_evals skips everything including print statements (#817)
mnoukhov 5fd117a
force manual set of local model name for upload to GS (#849)
mnoukhov c6b061e
use a proper async client for code verification (#851)
saurabh111233212 8d64677
Update beaker-experiment.yml (#852)
finbarrtimbers c051698
turn on evals, update aime (#855)
saumyamalik 08f26d6
run bfcl by default, add logic (#857)
saumyamalik aa5b573
Fix to beaker integration test. (#854)
finbarrtimbers ff0049e
Refactors the eval setup in `grpo_fast.py` (#841)
finbarrtimbers 68274f6
Uses ThreadPoolExecutor to manage threads. (#856)
finbarrtimbers 514de1e
Removed debugging code that made it to main.
finbarrtimbers 6b71677
Fixes head. (#863)
finbarrtimbers 3fd39f8
Change num_evals so the user sets after how many steps it runs, rathe…
finbarrtimbers 3f1f098
Disables wandb in the Beaker experiment integration test. (#866)
finbarrtimbers e977af6
Fix readme (#858)
finbarrtimbers c35ac3d
Adds a longer timeout for `accumulate_inference_batches`. (#869)
finbarrtimbers 0c3bcec
flag for eval'ing on step 0 (#872)
saurabh111233212 d6b8805
Fix build by syncing all deps, not just dev. (#874)
finbarrtimbers 9431d17
Avoid pickle-imports slowing things down (#871)
hamishivi 72d32f6
Removes logging message. (#878)
finbarrtimbers 391f84b
Switches `data_preparation_thread` to use a sentinel for shutdown ins…
finbarrtimbers bf91c08
Adds git hash to description. (#867)
finbarrtimbers d65e143
also do apply eval on 0 to local eval (#873)
mnoukhov 56a937b
Makes runs for the Beaker integration test smaller. (#877)
finbarrtimbers b981c6f
Tests pass. (#879)
finbarrtimbers 16fb751
Renamed file. (#881)
finbarrtimbers 2927e51
Clean up Docker images + instructions (#876)
finbarrtimbers df713bc
No more semaphore warnings. (#883)
finbarrtimbers 6d3b832
Added test script. (#860)
finbarrtimbers c659faa
Fixed build command to specify platform. (#888)
finbarrtimbers c9ee3cf
Fix up finetune (#884)
hamishivi d861a49
Switch from TensorBoard to wandb-only logging in grpo_fast.py, improv…
mnoukhov 020a39f
Adds the wandb url to the experiment description. (#889)
finbarrtimbers 81c9e51
Remove extraneous try/except (#891)
finbarrtimbers 0cd6092
pin hf transformers to lower (#890)
hamishivi 94e0476
Removes git from the Docker image. (#885)
finbarrtimbers 1275d36
switch to omega500 eval by default (#893)
saumyamalik cd11ed9
Fix crash for debug script (#892)
hamishivi aa5d328
Update pyproject.toml to remove redundant dep (#899)
finbarrtimbers ed2d8a6
Add refill strategy Implementation and some evaluation results (#816)
tengxiao1 4ca89ba
Log Train vs Generation time to see GPU idling (#853)
mnoukhov 29c994d
beaker eval image (#896)
mnoukhov cb0e96e
add back global_step to maintain backwards compatibility (#898)
mnoukhov 0f98271
Adds --frozen flag to all calls to uv in CI. (#900)
finbarrtimbers 564f8a4
Increase backend timeouts (#901)
hamishivi 84f5359
bugfix (#903)
hamishivi 7dc0210
Update grpo_fast.py (#908)
finbarrtimbers 89c1878
Tidies up the way we do leak reporting. (#907)
finbarrtimbers 0394a8e
Switches LLMRayActor to use the LLMEngine instead of LLM. (#837)
finbarrtimbers 6a4bf5b
Removed outdated deps. (#909)
finbarrtimbers 5e7e247
Smaller docker (#886)
finbarrtimbers 8f82091
Update Dockerfile (#912)
finbarrtimbers dfc2261
Removes wrong `output_dir` argument. (#904)
finbarrtimbers d1834cf
Fixes open_instruct/grpo_fast.py so that when there's an error in the…
finbarrtimbers 7392a10
Fixes to build. (#887)
finbarrtimbers e51f371
Strips trailing whitespace. (#913)
finbarrtimbers 1b6d913
resumption fix! i think (#910)
saurabh111233212 a916940
`actually_fix_build_v2_final.txt` (#915)
finbarrtimbers 22e3e37
default resumable commnand to be resumable (#919)
saurabh111233212 aae7df3
Prevent duplicate runs in Github Actions. (#916)
finbarrtimbers 4564130
Eval weka cluster logic (#920)
saumyamalik 4f8ad0d
Fixes batching bug. (#918)
finbarrtimbers 1f8bc4a
Logs the git commit and branch to the Beaker description so we can re…
finbarrtimbers 67befc4
Updated workflows to prevent duplicates. (#921)
finbarrtimbers 2d49234
Fix the way we do caching in Docker (#922)
finbarrtimbers 4c475ef
rename code folder and fix extracting user query and judge template (…
fabrahman 2598720
Remove duplicate olmo-instruct templates (#928)
hamishivi 0a39fc2
shorter tags (#930)
hamishivi 8b90699
Makes logging consistent across all of `open-instruct`. (#927)
finbarrtimbers 55137f0
fix query extraction for olmo chat template (#933)
fabrahman 4e43ea4
Adds a progress bar to the Wandb description (#929)
finbarrtimbers b6a812c
autoset RAY CGRAPH timeout if not explicitly set (fix tp crash) (#934)
hamishivi 252d265
Runs the Beaker experiment integration test on a larger runner (#936)
finbarrtimbers 2938bd6
Fix benchmark so it runs (#935)
finbarrtimbers 8997c30
Explicitly bind ray dashboard host (#938)
hamishivi efb1965
Change cluster so that we can schedule the gantry benchmark script (#…
finbarrtimbers ac4e3fa
ANother attempt at fixing the experiment. (#940)
finbarrtimbers afa8400
Another attempt at fixing the experiment (#942)
finbarrtimbers c523f66
More benchmark fixes! (#941)
finbarrtimbers bc1b7be
checkpoint rng states and data iterator stuff, gs logic (#932)
saurabh111233212 79b32bf
Saurabhs/high fidelity checkpoints (#945)
saurabh111233212 cc0a49e
Use the LLM judge tokenizer (#946)
fabrahman 64794ef
Commit (#948)
finbarrtimbers 000d77c
Fixes the MFU calculation in our benchmark script. (#943)
finbarrtimbers 6f39d22
Don't block main thread or trainer on weight update to generator (#937)
mnoukhov a044400
Fix ray bundles indices, max len config. (#949)
hamishivi 2f60852
Update SFT Gantry command (#951)
tyler-romero 2c97cd5
Cleaner progress code. (#953)
finbarrtimbers bf979e9
fix context truncation tokenization and add tests (#952)
fabrahman 10599c4
console.warn -> console.log (#954)
saurabh111233212 853ce80
Fix UnboundLocalError where metrics weren't defined (#958)
finbarrtimbers 0ab9d91
Add Claude Code GitHub Workflow (#959)
finbarrtimbers e762ff3
Fix DPO script (#956)
hamishivi 8c624e2
Added a decorator to profile functions.
finbarrtimbers 504a8d3
Revert "Added a decorator to profile functions."
finbarrtimbers 8afdc2e
Remove claude (#966)
finbarrtimbers cc8fe54
Better logging of errors in threads (#967)
hamishivi bea92da
use a smaller model (#968)
hamishivi c740fad
Less logging. (#969)
finbarrtimbers f1d7223
Removed explicit cache reset in favour of cache_salt. (#947)
finbarrtimbers 5301917
Adds a MBU calculation to our benchmark script (#957)
finbarrtimbers f455ef0
Adds a dashboard to `ActorManager` that makes it easier to track what…
finbarrtimbers cf92137
Caches `should_stop` so we're not hammering Ray. (#973)
finbarrtimbers 49144f4
Fix calculation. (#974)
finbarrtimbers 5b5a027
Update script priorities (#971)
finbarrtimbers fa7e608
load pretokenized user query (v0) (#965)
fabrahman 01daf56
Fix progress update so that it doesn't duplicate components. (#977)
finbarrtimbers 674c706
Adds a bunch of logs to explain what's happening during filtering. (#…
finbarrtimbers 91e0310
Manually duplicate both tool and non-tool requests. (#978)
finbarrtimbers 8e8f4ac
Upgrade beaker version. (#981)
finbarrtimbers 106005c
fix grouped layer names for kv cache info (#983)
mnoukhov a529fbf
FP8 KV Cache (#984)
hamishivi 5715660
Fix finetune keys (#987)
hamishivi 628c7df
Fix final save when job crashes early. (#988)
hamishivi a56bbc5
Upperbound litellm version (#991)
hamishivi af3f921
Modify grpo_fast.py so that we now pass individual prompts through th…
finbarrtimbers c76471b
fix env var check (#997)
mnoukhov 4a5c351
fixes the recurring crash issue (#999)
mnoukhov b39af14
remove flash infer from dependencies, mason, and benchmark script (#1…
saurabh111233212 e1d948b
update eval names (#1001)
hamishivi 6b2ebdc
Puzzle verifier (#1003)
hamishivi 9df8110
fixes for manual eval jobs with new cluster names (#1002)
fabrahman 34685ce
Disables cascade attention. (#1006)
finbarrtimbers 3dc4a23
whoops wrong name (#1007)
hamishivi f18e2c5
Added script. (#1009)
finbarrtimbers f2fa726
Fixed errors and testing with fake asserts. (#1010)
finbarrtimbers 4ca27a1
Small refactor to prepare for the `backfill-prompts` PR (#1011)
finbarrtimbers 4f41042
Check we have enough data for bsz/prefill (#1012)
hamishivi 8a51bcf
Adds a script that launches benchmarks via Mason (#1015)
finbarrtimbers ab90d5a
As requests finish inside LLMRayActor, pulls new ones from the queue …
finbarrtimbers f7f3d3f
Adds support for inflight updates in LLMRayActor. (#1013)
finbarrtimbers cc6fd17
Cleanup eval (#1020)
finbarrtimbers c7735f5
Gracefully shuts down ray. (#986)
finbarrtimbers ce5571a
Removed references to old scripts. (#1022)
finbarrtimbers afe6b9b
Removed outdated files (#1023)
finbarrtimbers 9107188
Removes `nvcc` and the "delete big files" steps from our tests as the…
finbarrtimbers 1b695ca
Remove duplicate cluster names in mason (#1017)
finbarrtimbers bb98dbc
Removes the `reduce_loss` option, as we don't need to support `reduce…
finbarrtimbers 43bbe30
Update filtering script (#1029)
hamishivi 6d141b6
Fixes bug in token metrics (#1032)
finbarrtimbers b174166
Calculates tokens per second for actors. (#1034)
finbarrtimbers 965c359
Adds MFU/MBU reporting to the actors (#1030)
finbarrtimbers b4657b4
Fix filtering judge script (#1037)
hamishivi d880e86
Default to tokenizer chat template, configurable random seed. (#1035)
hamishivi d7ca66b
Logs MFU for the learner to wandb (#1038)
finbarrtimbers 3006543
logging oe eval results to wandb when using new oe-eval-interal (#923)
mnoukhov 045cc0a
Adds timing for the individual weight sync events (#1033)
finbarrtimbers 89e350b
fix beaker whoami and update warning message (#1040)
mnoukhov 6f9feba
Skips uploading/renaming the beaker image when it hasn't changed. (#…
finbarrtimbers 5e5c93d
Update cluster names for open-instruct scripts (#1046)
finbarrtimbers 0ce2138
Removed all references to reduce_loss. (#1048)
finbarrtimbers 9489f33
Modifies `mason.py` so that we set default env vars in one central pl…
finbarrtimbers 7ba1bba
Cleaned up code (#1050)
finbarrtimbers 9724932
Make sure we're using vllm's v1 API everywhere. (#1052)
finbarrtimbers 0913deb
Silences flash attention warning by loading models directly on device…
finbarrtimbers e8db3e7
Silences NCCL warning (#1055)
finbarrtimbers 33d9f4b
Revert "Silences NCCL warning (#1055)" (#1060)
finbarrtimbers 8867f45
Added reasonable timeouts to our debug scripts to ensure they don't h…
finbarrtimbers c3f79a3
Fixed formatting (#1059)
finbarrtimbers e320ff0
Removes the generate thread (#1054)
finbarrtimbers 938e08a
Update filter script (#1042)
hamishivi 3a21f22
Removed a bunch of dead code (#1057)
finbarrtimbers bd26188
Now, `LLMRayActor` returns logprobs, and we calculate some stats abou…
finbarrtimbers cde0f5b
Use less steps for tool_grpo_fast.sh (#1062)
finbarrtimbers c30f27e
Removed self.logger (#1061)
finbarrtimbers d7c569c
Changes LLMRayActor so that we pass the dtype as a string into `updat…
finbarrtimbers 96af030
Silences trust remote code warning (#1065)
finbarrtimbers e765706
Refactors `LLMRayActor.__init__` by splitting it into smaller methods…
finbarrtimbers cc20841
Creates triton cache, silencing warning (#1066)
finbarrtimbers 0212cbd
Updated shared memory to silence Docker warning (#1067)
finbarrtimbers 3704b71
Silence warning. (#1068)
finbarrtimbers 070c1e1
Cleaned up logging (#1069)
finbarrtimbers 9f788ac
Update grpo_fast.py (#1072)
finbarrtimbers 3a679b7
Removes invalid `string.replace` (#1071)
finbarrtimbers 940a360
Added type annotations (#1070)
finbarrtimbers 50cd847
Updates beaker-py version to >2. (#1021)
finbarrtimbers 69fb6d9
Constant denom for loss (#1076)
hamishivi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Git directories (can be large) | ||
**/.git/ | ||
.gitignore | ||
.github/ | ||
|
||
# Python cache and compiled files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
*.so | ||
.Python | ||
*.egg-info/ | ||
dist/ | ||
build/ | ||
*.egg | ||
local_dataset_cache/ | ||
|
||
|
||
# Virtual environments | ||
.venv/ | ||
venv/ | ||
ENV/ | ||
env/ | ||
|
||
# IDE and editor files | ||
.vscode/ | ||
.idea/ | ||
*.swp | ||
*.swo | ||
*~ | ||
.DS_Store | ||
|
||
# Test and documentation | ||
tests/ | ||
test/ | ||
docs/ | ||
*.md | ||
!README.md | ||
.pytest_cache/ | ||
.coverage | ||
htmlcov/ | ||
.tox/ | ||
|
||
# Logs and databases | ||
*.log | ||
*.sql | ||
*.sqlite | ||
*.db | ||
|
||
# Package manager files | ||
node_modules/ | ||
npm-debug.log* | ||
yarn-debug.log* | ||
yarn-error.log* | ||
|
||
# Jupyter notebooks checkpoints | ||
.ipynb_checkpoints/ | ||
|
||
# Model and data files (if stored locally) | ||
*.ckpt | ||
*.pth | ||
*.h5 | ||
*.safetensors | ||
data/ | ||
models/ | ||
checkpoints/ | ||
|
||
# Temporary files | ||
tmp/ | ||
temp/ | ||
*.tmp | ||
|
||
# Build artifacts | ||
*.o | ||
*.a | ||
*.so | ||
*.dylib | ||
|
||
# Cache directories | ||
.cache/ | ||
.mypy_cache/ | ||
.ruff_cache/ | ||
|
||
# Docker files (avoid recursion) | ||
Dockerfile* | ||
docker-compose* | ||
.dockerignore |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,191 @@ | ||
name: Beaker Experiment Launch | ||
|
||
on: | ||
merge_group: | ||
|
||
# Adding a comment to trigger a run. | ||
workflow_dispatch: # This allows us to manually trigger a build through the GitHub UI. | ||
# pull_request: | ||
# branches: [main] | ||
# paths: | ||
# - 'open_instruct/**' | ||
# - '!open_instruct/README.md' | ||
# - 'requirements.txt' | ||
# - 'Dockerfile' | ||
# - '.github/workflows/beaker-experiment.yml' | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.ref }} | ||
cancel-in-progress: true | ||
|
||
env: | ||
DOCKER_BUILDKIT: "1" | ||
|
||
jobs: | ||
launch-experiment: | ||
name: Launch Beaker Experiment | ||
runs-on: 8-Core-XL-Runner-Ubuntu-Latest | ||
timeout-minutes: 35 | ||
|
||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 # Need full history to get commit author info | ||
|
||
- name: Checkout oe-eval-internal | ||
uses: actions/checkout@v4 | ||
with: | ||
repository: allenai/oe-eval-internal | ||
path: './oe-eval-internal' | ||
ssh-key: ${{ secrets.OE_EVAL_GIT_CLONE_ACCESS_PRIVATE_SSH_DEPLOY_KEY }} | ||
fetch-depth: 1 | ||
filter: 'blob:none' | ||
|
||
- name: Get trigger information | ||
id: get-trigger-info | ||
run: | | ||
if [ "${{ github.event_name }}" = "push" ]; then | ||
# Get the commit author for push events | ||
AUTHOR_NAME=$(git log -1 --pretty=format:'%an') | ||
echo "trigger_info=Push by ${AUTHOR_NAME}" >> $GITHUB_OUTPUT | ||
elif [ "${{ github.event_name }}" = "workflow_dispatch" ]; then | ||
# Get the user who triggered the manual dispatch | ||
echo "trigger_info=Manual dispatch by ${{ github.actor }}" >> $GITHUB_OUTPUT | ||
else | ||
# For scheduled runs | ||
echo "trigger_info=Scheduled run" >> $GITHUB_OUTPUT | ||
fi | ||
|
||
- name: Setup Python environment | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.10' | ||
|
||
- name: Install uv | ||
uses: astral-sh/setup-uv@v3 | ||
|
||
- name: Setup Beaker | ||
uses: allenai/setup-beaker@v2 | ||
with: | ||
token: ${{ secrets.BEAKER_TOKEN }} | ||
workspace: ai2/tulu-thinker | ||
|
||
- name: Install dependencies | ||
run: | | ||
# Install development dependencies needed for mason.py | ||
uv sync --frozen | ||
|
||
- name: Build image and launch experiment | ||
id: launch | ||
env: | ||
BEAKER_TOKEN: ${{ secrets.BEAKER_TOKEN }} | ||
GITHUB_RUN_ID: ${{ github.run_id }} | ||
GITHUB_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} | ||
run: | | ||
set -euo pipefail | ||
|
||
# Make scripts executable | ||
chmod +x scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_on_beaker.sh | ||
|
||
echo "Building Docker image and launching experiment..." | ||
echo "Git commit: $(git rev-parse --short HEAD)" | ||
|
||
# Build image and launch experiment | ||
# Use tee to both stream output and capture it for parsing | ||
./scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_integration_test.sh 2>&1 | tee /tmp/beaker_output.log || { | ||
EXIT_CODE=$? | ||
echo "ERROR: build_image_and_launch.sh failed with exit code $EXIT_CODE" | ||
exit $EXIT_CODE | ||
} | ||
|
||
# Extract experiment ID from the saved output | ||
EXPERIMENT_ID=$(grep -oP 'https://beaker.org/ex/\K[a-zA-Z0-9]+' /tmp/beaker_output.log | tail -1) | ||
if [ -z "$EXPERIMENT_ID" ]; then | ||
echo "ERROR: Failed to extract experiment ID from output" | ||
echo "DEBUG: Full output log:" | ||
cat /tmp/beaker_output.log | ||
echo "---" | ||
echo "Please check that the experiment was created successfully." | ||
exit 1 | ||
fi | ||
|
||
echo "experiment_id=$EXPERIMENT_ID" >> $GITHUB_OUTPUT | ||
echo "Experiment ID: $EXPERIMENT_ID" | ||
echo "Experiment URL: https://beaker.org/ex/$EXPERIMENT_ID" | ||
|
||
- name: Wait for Beaker experiment completion | ||
env: | ||
BEAKER_TOKEN: ${{ secrets.BEAKER_TOKEN }} | ||
run: | | ||
EXPERIMENT_ID="${{ steps.launch.outputs.experiment_id }}" | ||
echo "Waiting for experiment $EXPERIMENT_ID to complete..." | ||
|
||
# Maximum wait time: 20 minutes (1200 seconds) | ||
MAX_WAIT_TIME=1200 | ||
CHECK_INTERVAL=30 | ||
ELAPSED_TIME=0 | ||
|
||
while [ $ELAPSED_TIME -lt $MAX_WAIT_TIME ]; do | ||
# Get job status directly | ||
JOB_STATUS=$(beaker experiment get $EXPERIMENT_ID --format json | jq -r '.[0].jobs[0].status' 2>/dev/null || echo "null") | ||
|
||
# Check if exitCode exists (experiment is done) | ||
if [ "$JOB_STATUS" = "null" ]; then | ||
EXIT_CODE="pending" | ||
else | ||
EXIT_CODE=$(echo "$JOB_STATUS" | jq -r '.exitCode // "pending"') | ||
fi | ||
|
||
if [ "$EXIT_CODE" = "pending" ]; then | ||
echo "=== Experiment still running (elapsed: ${ELAPSED_TIME}s) ===" | ||
else | ||
echo "=== Experiment finished with exit code: $EXIT_CODE (elapsed: ${ELAPSED_TIME}s) ===" | ||
fi | ||
|
||
# Stream new logs since last check | ||
echo "--- Recent logs ---" | ||
beaker experiment logs $EXPERIMENT_ID 2>/dev/null | tail -n 50 || echo "No logs available yet" | ||
echo "--- End of logs ---" | ||
|
||
# Check if experiment has completed | ||
if [ "$EXIT_CODE" != "pending" ]; then | ||
if [ "$EXIT_CODE" = "0" ]; then | ||
echo "✅ Experiment completed successfully!" | ||
# Show final logs | ||
echo "=== Final logs ===" | ||
beaker experiment logs $EXPERIMENT_ID | tail -n 100 | ||
exit 0 | ||
else | ||
echo "❌ Experiment failed with exit code $EXIT_CODE" | ||
# Show error logs | ||
echo "=== Error logs ===" | ||
beaker experiment logs $EXPERIMENT_ID | tail -n 200 | ||
exit 1 | ||
fi | ||
fi | ||
|
||
# Wait before next check | ||
sleep $CHECK_INTERVAL | ||
ELAPSED_TIME=$((ELAPSED_TIME + CHECK_INTERVAL)) | ||
done | ||
|
||
echo "⏱️ Timeout: Experiment did not complete within 20 minutes" | ||
exit 1 | ||
|
||
- name: Summary | ||
if: always() | ||
run: | | ||
echo "## Beaker Experiment Summary" >> $GITHUB_STEP_SUMMARY | ||
echo "" >> $GITHUB_STEP_SUMMARY | ||
echo "**Trigger:** ${{ steps.get-trigger-info.outputs.trigger_info }}" >> $GITHUB_STEP_SUMMARY | ||
echo "**Docker Image:** Built locally by build_image_and_launch.sh" >> $GITHUB_STEP_SUMMARY | ||
if [ -n "${{ steps.launch.outputs.experiment_id }}" ]; then | ||
echo "**Beaker Experiment:** [View on Beaker](https://beaker.org/ex/${{ steps.launch.outputs.experiment_id }})" >> $GITHUB_STEP_SUMMARY | ||
fi | ||
echo "" >> $GITHUB_STEP_SUMMARY | ||
if [ "${{ job.status }}" = "success" ]; then | ||
echo "✅ **Status:** Experiment completed successfully!" >> $GITHUB_STEP_SUMMARY | ||
else | ||
echo "❌ **Status:** Experiment failed or timed out" >> $GITHUB_STEP_SUMMARY | ||
fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ on: | |
- master | ||
- main | ||
- benchmark | ||
merge_group: | ||
permissions: | ||
contents: write | ||
jobs: | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in this
echo
statement. The variable${{ inputs.beaker }}
is not being expanded, so the log message will incorrectly print "Creating image .../inputs.beaker." instead of the actual image name.echo "Creating image $beaker_user/${{ inputs.beaker }}."