Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
223 commits
Select commit Hold shift + click to select a range
e2be8d0
Update oe-eval.sh to set a default timeout of 48h. (#789)
finbarrtimbers Jul 16, 2025
ca55010
Updated configs to support changes. (#790)
finbarrtimbers Jul 16, 2025
c56efad
Add benchmark scripts (#786)
finbarrtimbers Jul 16, 2025
004a48d
Add remap verifier (#773)
hamishivi Jul 16, 2025
3b5b354
Ran the linter. (#792)
finbarrtimbers Jul 16, 2025
7f7308c
fix the URL for code api setup (#791)
mnoukhov Jul 16, 2025
829796a
Add nltk setup to uv dockerfile (#785)
hamishivi Jul 16, 2025
e4e5dfb
Switches the actors to use the Ray queue. (#784)
finbarrtimbers Jul 16, 2025
7eb6c4d
Set new default value for num_samples
finbarrtimbers Jul 16, 2025
7bf039f
Updates the benchmark script (#795)
finbarrtimbers Jul 17, 2025
d5e7160
install nginx in uv (#793)
mnoukhov Jul 17, 2025
bb7477d
allow passing local models, bubble up dataset cache errors (#797)
mnoukhov Jul 17, 2025
839a806
binary reward for code (#798)
saurabh111233212 Jul 18, 2025
541058c
Now, we run individual prompts through the queue. (#796)
finbarrtimbers Jul 18, 2025
774edca
Adds flashinfer dep. (#800)
finbarrtimbers Jul 18, 2025
b3e8e70
new beaker names (#803)
hamishivi Jul 21, 2025
266f214
Remove Unused DPO Function (#794)
fabianlim Jul 21, 2025
8048c9a
extra reporting (#799)
garrett361 Jul 21, 2025
4659dca
Revert "Now, we run individual prompts through the queue. (#796)" (#804)
saurabh111233212 Jul 21, 2025
45ae474
Fix misnamed variables. (#808)
finbarrtimbers Jul 21, 2025
9d1620d
Fix broken syntax. (#809)
finbarrtimbers Jul 21, 2025
de8a14f
Add new olmo chat templates, and improve data mixing/tokenization (#765)
jacob-morrison Jul 21, 2025
d944d42
Fixes from last PR (#810)
hamishivi Jul 22, 2025
207268a
Delete run_repro.sh (#813)
finbarrtimbers Jul 22, 2025
cc33540
Fix disk space error on image creation (#814)
hamishivi Jul 22, 2025
8c45fd8
fix: chat template kwarg in save_with_accelerate (#824)
garrett361 Jul 24, 2025
6f30008
Fixes the ray double init error. (#822)
finbarrtimbers Jul 24, 2025
b28ac65
[WIP] Filtering: Remove special tokens, Chinese characters, readme (#…
natolambert Jul 25, 2025
7b92068
Makes pytest ignore oe-eval-internal. (#829)
finbarrtimbers Jul 25, 2025
4477615
repetition filtering for reasoning traces (#801)
natolambert Jul 25, 2025
535ff07
fix avg_loss rename (#828)
garrett361 Jul 25, 2025
ec3e61b
look at changes (#825)
jacob-morrison Jul 28, 2025
80d6566
Fix head so grpo_fast.py can run. (#821)
finbarrtimbers Jul 28, 2025
a388f3e
perf penalty (#805)
saurabh111233212 Jul 28, 2025
e8d4cf0
delete ref to async moe (#834)
saurabh111233212 Jul 28, 2025
bf22d22
fix some format in code_stdio (#835)
saurabh111233212 Jul 29, 2025
9d59a72
Adds an integration test, which launches a beaker experiment that run…
finbarrtimbers Jul 29, 2025
bcd65a1
oops! (#840)
saurabh111233212 Jul 29, 2025
cecdea6
Added comments explaining why we set NCCL_CUMEM_ENABLE. (#838)
finbarrtimbers Jul 29, 2025
d407b09
Adds an optional timeout flag to mason, and sets it on the single GPU…
finbarrtimbers Jul 29, 2025
2497ce0
Adds pytest-xdist for parallel tests, and only installs needed deps f…
finbarrtimbers Jul 29, 2025
f25ae11
Minor fix to scripts. (#842)
finbarrtimbers Jul 29, 2025
b245fb7
add verify script (#845)
saurabh111233212 Jul 29, 2025
44358c2
Silences TORCH_CUDA_ARCH_LIST warning by passing the env var through.…
finbarrtimbers Jul 30, 2025
7a3011c
Added merge_group trigger.
finbarrtimbers Jul 30, 2025
815a73a
dataset statistics fix (#846)
jacob-morrison Jul 30, 2025
63baaf4
Added Claude.md. (#847)
finbarrtimbers Jul 30, 2025
6760a65
skip_oi_evals skips everything including print statements (#817)
mnoukhov Aug 2, 2025
5fd117a
force manual set of local model name for upload to GS (#849)
mnoukhov Aug 4, 2025
c6b061e
use a proper async client for code verification (#851)
saurabh111233212 Aug 6, 2025
8d64677
Update beaker-experiment.yml (#852)
finbarrtimbers Aug 6, 2025
c051698
turn on evals, update aime (#855)
saumyamalik Aug 6, 2025
08f26d6
run bfcl by default, add logic (#857)
saumyamalik Aug 6, 2025
aa5b573
Fix to beaker integration test. (#854)
finbarrtimbers Aug 6, 2025
ff0049e
Refactors the eval setup in `grpo_fast.py` (#841)
finbarrtimbers Aug 7, 2025
68274f6
Uses ThreadPoolExecutor to manage threads. (#856)
finbarrtimbers Aug 7, 2025
514de1e
Removed debugging code that made it to main.
finbarrtimbers Aug 7, 2025
6b71677
Fixes head. (#863)
finbarrtimbers Aug 7, 2025
3fd39f8
Change num_evals so the user sets after how many steps it runs, rathe…
finbarrtimbers Aug 8, 2025
3f1f098
Disables wandb in the Beaker experiment integration test. (#866)
finbarrtimbers Aug 8, 2025
e977af6
Fix readme (#858)
finbarrtimbers Aug 8, 2025
c35ac3d
Adds a longer timeout for `accumulate_inference_batches`. (#869)
finbarrtimbers Aug 8, 2025
0c3bcec
flag for eval'ing on step 0 (#872)
saurabh111233212 Aug 8, 2025
d6b8805
Fix build by syncing all deps, not just dev. (#874)
finbarrtimbers Aug 11, 2025
9431d17
Avoid pickle-imports slowing things down (#871)
hamishivi Aug 11, 2025
72d32f6
Removes logging message. (#878)
finbarrtimbers Aug 11, 2025
391f84b
Switches `data_preparation_thread` to use a sentinel for shutdown ins…
finbarrtimbers Aug 12, 2025
bf91c08
Adds git hash to description. (#867)
finbarrtimbers Aug 12, 2025
d65e143
also do apply eval on 0 to local eval (#873)
mnoukhov Aug 12, 2025
56a937b
Makes runs for the Beaker integration test smaller. (#877)
finbarrtimbers Aug 12, 2025
b981c6f
Tests pass. (#879)
finbarrtimbers Aug 12, 2025
16fb751
Renamed file. (#881)
finbarrtimbers Aug 12, 2025
2927e51
Clean up Docker images + instructions (#876)
finbarrtimbers Aug 12, 2025
df713bc
No more semaphore warnings. (#883)
finbarrtimbers Aug 12, 2025
6d3b832
Added test script. (#860)
finbarrtimbers Aug 13, 2025
c659faa
Fixed build command to specify platform. (#888)
finbarrtimbers Aug 13, 2025
c9ee3cf
Fix up finetune (#884)
hamishivi Aug 13, 2025
d861a49
Switch from TensorBoard to wandb-only logging in grpo_fast.py, improv…
mnoukhov Aug 13, 2025
020a39f
Adds the wandb url to the experiment description. (#889)
finbarrtimbers Aug 13, 2025
81c9e51
Remove extraneous try/except (#891)
finbarrtimbers Aug 13, 2025
0cd6092
pin hf transformers to lower (#890)
hamishivi Aug 13, 2025
94e0476
Removes git from the Docker image. (#885)
finbarrtimbers Aug 14, 2025
1275d36
switch to omega500 eval by default (#893)
saumyamalik Aug 14, 2025
cd11ed9
Fix crash for debug script (#892)
hamishivi Aug 15, 2025
aa5d328
Update pyproject.toml to remove redundant dep (#899)
finbarrtimbers Aug 15, 2025
ed2d8a6
Add refill strategy Implementation and some evaluation results (#816)
tengxiao1 Aug 15, 2025
4ca89ba
Log Train vs Generation time to see GPU idling (#853)
mnoukhov Aug 15, 2025
29c994d
beaker eval image (#896)
mnoukhov Aug 15, 2025
cb0e96e
add back global_step to maintain backwards compatibility (#898)
mnoukhov Aug 15, 2025
0f98271
Adds --frozen flag to all calls to uv in CI. (#900)
finbarrtimbers Aug 16, 2025
564f8a4
Increase backend timeouts (#901)
hamishivi Aug 18, 2025
84f5359
bugfix (#903)
hamishivi Aug 18, 2025
7dc0210
Update grpo_fast.py (#908)
finbarrtimbers Aug 18, 2025
89c1878
Tidies up the way we do leak reporting. (#907)
finbarrtimbers Aug 18, 2025
0394a8e
Switches LLMRayActor to use the LLMEngine instead of LLM. (#837)
finbarrtimbers Aug 18, 2025
6a4bf5b
Removed outdated deps. (#909)
finbarrtimbers Aug 18, 2025
5e7e247
Smaller docker (#886)
finbarrtimbers Aug 18, 2025
8f82091
Update Dockerfile (#912)
finbarrtimbers Aug 19, 2025
dfc2261
Removes wrong `output_dir` argument. (#904)
finbarrtimbers Aug 19, 2025
d1834cf
Fixes open_instruct/grpo_fast.py so that when there's an error in the…
finbarrtimbers Aug 19, 2025
7392a10
Fixes to build. (#887)
finbarrtimbers Aug 19, 2025
e51f371
Strips trailing whitespace. (#913)
finbarrtimbers Aug 19, 2025
1b6d913
resumption fix! i think (#910)
saurabh111233212 Aug 19, 2025
a916940
`actually_fix_build_v2_final.txt` (#915)
finbarrtimbers Aug 19, 2025
22e3e37
default resumable commnand to be resumable (#919)
saurabh111233212 Aug 19, 2025
aae7df3
Prevent duplicate runs in Github Actions. (#916)
finbarrtimbers Aug 19, 2025
4564130
Eval weka cluster logic (#920)
saumyamalik Aug 19, 2025
4f8ad0d
Fixes batching bug. (#918)
finbarrtimbers Aug 20, 2025
1f8bc4a
Logs the git commit and branch to the Beaker description so we can re…
finbarrtimbers Aug 20, 2025
67befc4
Updated workflows to prevent duplicates. (#921)
finbarrtimbers Aug 20, 2025
2d49234
Fix the way we do caching in Docker (#922)
finbarrtimbers Aug 20, 2025
4c475ef
rename code folder and fix extracting user query and judge template (…
fabrahman Aug 21, 2025
2598720
Remove duplicate olmo-instruct templates (#928)
hamishivi Aug 21, 2025
0a39fc2
shorter tags (#930)
hamishivi Aug 21, 2025
8b90699
Makes logging consistent across all of `open-instruct`. (#927)
finbarrtimbers Aug 21, 2025
55137f0
fix query extraction for olmo chat template (#933)
fabrahman Aug 21, 2025
4e43ea4
Adds a progress bar to the Wandb description (#929)
finbarrtimbers Aug 21, 2025
b6a812c
autoset RAY CGRAPH timeout if not explicitly set (fix tp crash) (#934)
hamishivi Aug 22, 2025
252d265
Runs the Beaker experiment integration test on a larger runner (#936)
finbarrtimbers Aug 22, 2025
2938bd6
Fix benchmark so it runs (#935)
finbarrtimbers Aug 22, 2025
8997c30
Explicitly bind ray dashboard host (#938)
hamishivi Aug 22, 2025
efb1965
Change cluster so that we can schedule the gantry benchmark script (#…
finbarrtimbers Aug 22, 2025
ac4e3fa
ANother attempt at fixing the experiment. (#940)
finbarrtimbers Aug 22, 2025
afa8400
Another attempt at fixing the experiment (#942)
finbarrtimbers Aug 25, 2025
c523f66
More benchmark fixes! (#941)
finbarrtimbers Aug 25, 2025
bc1b7be
checkpoint rng states and data iterator stuff, gs logic (#932)
saurabh111233212 Aug 25, 2025
79b32bf
Saurabhs/high fidelity checkpoints (#945)
saurabh111233212 Aug 25, 2025
cc0a49e
Use the LLM judge tokenizer (#946)
fabrahman Aug 26, 2025
64794ef
Commit (#948)
finbarrtimbers Aug 26, 2025
000d77c
Fixes the MFU calculation in our benchmark script. (#943)
finbarrtimbers Aug 26, 2025
6f39d22
Don't block main thread or trainer on weight update to generator (#937)
mnoukhov Aug 26, 2025
a044400
Fix ray bundles indices, max len config. (#949)
hamishivi Aug 26, 2025
2f60852
Update SFT Gantry command (#951)
tyler-romero Aug 26, 2025
2c97cd5
Cleaner progress code. (#953)
finbarrtimbers Aug 27, 2025
bf979e9
fix context truncation tokenization and add tests (#952)
fabrahman Aug 27, 2025
10599c4
console.warn -> console.log (#954)
saurabh111233212 Aug 27, 2025
853ce80
Fix UnboundLocalError where metrics weren't defined (#958)
finbarrtimbers Aug 28, 2025
0ab9d91
Add Claude Code GitHub Workflow (#959)
finbarrtimbers Aug 28, 2025
e762ff3
Fix DPO script (#956)
hamishivi Aug 28, 2025
8c624e2
Added a decorator to profile functions.
finbarrtimbers Aug 29, 2025
504a8d3
Revert "Added a decorator to profile functions."
finbarrtimbers Aug 29, 2025
8afdc2e
Remove claude (#966)
finbarrtimbers Aug 29, 2025
cc8fe54
Better logging of errors in threads (#967)
hamishivi Aug 29, 2025
bea92da
use a smaller model (#968)
hamishivi Aug 29, 2025
c740fad
Less logging. (#969)
finbarrtimbers Aug 29, 2025
f1d7223
Removed explicit cache reset in favour of cache_salt. (#947)
finbarrtimbers Aug 29, 2025
5301917
Adds a MBU calculation to our benchmark script (#957)
finbarrtimbers Aug 29, 2025
f455ef0
Adds a dashboard to `ActorManager` that makes it easier to track what…
finbarrtimbers Aug 30, 2025
cf92137
Caches `should_stop` so we're not hammering Ray. (#973)
finbarrtimbers Aug 30, 2025
49144f4
Fix calculation. (#974)
finbarrtimbers Aug 31, 2025
5b5a027
Update script priorities (#971)
finbarrtimbers Sep 1, 2025
fa7e608
load pretokenized user query (v0) (#965)
fabrahman Sep 2, 2025
01daf56
Fix progress update so that it doesn't duplicate components. (#977)
finbarrtimbers Sep 2, 2025
674c706
Adds a bunch of logs to explain what's happening during filtering. (#…
finbarrtimbers Sep 3, 2025
91e0310
Manually duplicate both tool and non-tool requests. (#978)
finbarrtimbers Sep 3, 2025
8e8f4ac
Upgrade beaker version. (#981)
finbarrtimbers Sep 3, 2025
106005c
fix grouped layer names for kv cache info (#983)
mnoukhov Sep 4, 2025
a529fbf
FP8 KV Cache (#984)
hamishivi Sep 4, 2025
5715660
Fix finetune keys (#987)
hamishivi Sep 4, 2025
628c7df
Fix final save when job crashes early. (#988)
hamishivi Sep 5, 2025
a56bbc5
Upperbound litellm version (#991)
hamishivi Sep 5, 2025
af3f921
Modify grpo_fast.py so that we now pass individual prompts through th…
finbarrtimbers Sep 5, 2025
c76471b
fix env var check (#997)
mnoukhov Sep 8, 2025
4a5c351
fixes the recurring crash issue (#999)
mnoukhov Sep 8, 2025
b39af14
remove flash infer from dependencies, mason, and benchmark script (#1…
saurabh111233212 Sep 9, 2025
e1d948b
update eval names (#1001)
hamishivi Sep 11, 2025
6b2ebdc
Puzzle verifier (#1003)
hamishivi Sep 11, 2025
9df8110
fixes for manual eval jobs with new cluster names (#1002)
fabrahman Sep 12, 2025
34685ce
Disables cascade attention. (#1006)
finbarrtimbers Sep 12, 2025
3dc4a23
whoops wrong name (#1007)
hamishivi Sep 12, 2025
f18e2c5
Added script. (#1009)
finbarrtimbers Sep 15, 2025
f2fa726
Fixed errors and testing with fake asserts. (#1010)
finbarrtimbers Sep 15, 2025
4ca27a1
Small refactor to prepare for the `backfill-prompts` PR (#1011)
finbarrtimbers Sep 15, 2025
4f41042
Check we have enough data for bsz/prefill (#1012)
hamishivi Sep 15, 2025
8a51bcf
Adds a script that launches benchmarks via Mason (#1015)
finbarrtimbers Sep 16, 2025
ab90d5a
As requests finish inside LLMRayActor, pulls new ones from the queue …
finbarrtimbers Sep 16, 2025
f7f3d3f
Adds support for inflight updates in LLMRayActor. (#1013)
finbarrtimbers Sep 17, 2025
cc6fd17
Cleanup eval (#1020)
finbarrtimbers Sep 18, 2025
c7735f5
Gracefully shuts down ray. (#986)
finbarrtimbers Sep 18, 2025
ce5571a
Removed references to old scripts. (#1022)
finbarrtimbers Sep 18, 2025
afe6b9b
Removed outdated files (#1023)
finbarrtimbers Sep 19, 2025
9107188
Removes `nvcc` and the "delete big files" steps from our tests as the…
finbarrtimbers Sep 19, 2025
1b695ca
Remove duplicate cluster names in mason (#1017)
finbarrtimbers Sep 19, 2025
bb98dbc
Removes the `reduce_loss` option, as we don't need to support `reduce…
finbarrtimbers Sep 19, 2025
43bbe30
Update filtering script (#1029)
hamishivi Sep 23, 2025
6d141b6
Fixes bug in token metrics (#1032)
finbarrtimbers Sep 23, 2025
b174166
Calculates tokens per second for actors. (#1034)
finbarrtimbers Sep 24, 2025
965c359
Adds MFU/MBU reporting to the actors (#1030)
finbarrtimbers Sep 24, 2025
b4657b4
Fix filtering judge script (#1037)
hamishivi Sep 24, 2025
d880e86
Default to tokenizer chat template, configurable random seed. (#1035)
hamishivi Sep 24, 2025
d7ca66b
Logs MFU for the learner to wandb (#1038)
finbarrtimbers Sep 25, 2025
3006543
logging oe eval results to wandb when using new oe-eval-interal (#923)
mnoukhov Sep 25, 2025
045cc0a
Adds timing for the individual weight sync events (#1033)
finbarrtimbers Sep 25, 2025
89e350b
fix beaker whoami and update warning message (#1040)
mnoukhov Sep 26, 2025
6f9feba
Skips uploading/renaming the beaker image when it hasn't changed. (#…
finbarrtimbers Oct 1, 2025
5e5c93d
Update cluster names for open-instruct scripts (#1046)
finbarrtimbers Oct 1, 2025
0ce2138
Removed all references to reduce_loss. (#1048)
finbarrtimbers Oct 1, 2025
9489f33
Modifies `mason.py` so that we set default env vars in one central pl…
finbarrtimbers Oct 1, 2025
7ba1bba
Cleaned up code (#1050)
finbarrtimbers Oct 1, 2025
9724932
Make sure we're using vllm's v1 API everywhere. (#1052)
finbarrtimbers Oct 3, 2025
0913deb
Silences flash attention warning by loading models directly on device…
finbarrtimbers Oct 3, 2025
e8db3e7
Silences NCCL warning (#1055)
finbarrtimbers Oct 5, 2025
33d9f4b
Revert "Silences NCCL warning (#1055)" (#1060)
finbarrtimbers Oct 6, 2025
8867f45
Added reasonable timeouts to our debug scripts to ensure they don't h…
finbarrtimbers Oct 6, 2025
c3f79a3
Fixed formatting (#1059)
finbarrtimbers Oct 6, 2025
e320ff0
Removes the generate thread (#1054)
finbarrtimbers Oct 7, 2025
938e08a
Update filter script (#1042)
hamishivi Oct 7, 2025
3a21f22
Removed a bunch of dead code (#1057)
finbarrtimbers Oct 8, 2025
bd26188
Now, `LLMRayActor` returns logprobs, and we calculate some stats abou…
finbarrtimbers Oct 8, 2025
cde0f5b
Use less steps for tool_grpo_fast.sh (#1062)
finbarrtimbers Oct 9, 2025
c30f27e
Removed self.logger (#1061)
finbarrtimbers Oct 9, 2025
d7c569c
Changes LLMRayActor so that we pass the dtype as a string into `updat…
finbarrtimbers Oct 9, 2025
96af030
Silences trust remote code warning (#1065)
finbarrtimbers Oct 9, 2025
e765706
Refactors `LLMRayActor.__init__` by splitting it into smaller methods…
finbarrtimbers Oct 9, 2025
cc20841
Creates triton cache, silencing warning (#1066)
finbarrtimbers Oct 9, 2025
0212cbd
Updated shared memory to silence Docker warning (#1067)
finbarrtimbers Oct 9, 2025
3704b71
Silence warning. (#1068)
finbarrtimbers Oct 9, 2025
070c1e1
Cleaned up logging (#1069)
finbarrtimbers Oct 10, 2025
9f788ac
Update grpo_fast.py (#1072)
finbarrtimbers Oct 10, 2025
3a679b7
Removes invalid `string.replace` (#1071)
finbarrtimbers Oct 10, 2025
940a360
Added type annotations (#1070)
finbarrtimbers Oct 10, 2025
50cd847
Updates beaker-py version to >2. (#1021)
finbarrtimbers Oct 10, 2025
69fb6d9
Constant denom for loss (#1076)
hamishivi Oct 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Git directories (can be large)
**/.git/
.gitignore
.github/

# Python cache and compiled files
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.egg-info/
dist/
build/
*.egg
local_dataset_cache/


# Virtual environments
.venv/
venv/
ENV/
env/

# IDE and editor files
.vscode/
.idea/
*.swp
*.swo
*~
.DS_Store

# Test and documentation
tests/
test/
docs/
*.md
!README.md
.pytest_cache/
.coverage
htmlcov/
.tox/

# Logs and databases
*.log
*.sql
*.sqlite
*.db

# Package manager files
node_modules/
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Jupyter notebooks checkpoints
.ipynb_checkpoints/

# Model and data files (if stored locally)
*.ckpt
*.pth
*.h5
*.safetensors
data/
models/
checkpoints/

# Temporary files
tmp/
temp/
*.tmp

# Build artifacts
*.o
*.a
*.so
*.dylib

# Cache directories
.cache/
.mypy_cache/
.ruff_cache/

# Docker files (avoid recursion)
Dockerfile*
docker-compose*
.dockerignore
14 changes: 11 additions & 3 deletions .github/actions/push/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,18 @@ runs:
- shell: bash
if: inputs.beaker != '' # previously startsWith(github.ref, 'refs/tags/') && ...
run: |
beaker_user=$(beaker account whoami --format json | jq -r '.[0].name')
# Push release to Beaker.
SHORT_SHA=$(git rev-parse --short HEAD)
beaker image create --name "${{ inputs.beaker }}-${SHORT_SHA}-${{ github.run_id }}" ${{ inputs.image }}
DESCRIPTION="Created from commit: ${SHORT_SHA}"
beaker image create \
--name "${{ inputs.beaker }}-${SHORT_SHA}-${{ github.run_id }}" ${{ inputs.image }} \
--description "$DESCRIPTION"
# We can't delete the old image because it might be used by a running job. Instead, we rename it to an empty
# string, so it will not be resolved by the Beaker client.
beaker image rename nathanl/${{ inputs.beaker }} "" || true
beaker image create --name ${{ inputs.beaker }} ${{ inputs.image }}
echo "Deleting image $beaker_user/${{ inputs.beaker }}."
beaker image rename $beaker_user/${{ inputs.beaker }} "" || true
echo "Creating image $beaker_user/inputs.beaker."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in this echo statement. The variable ${{ inputs.beaker }} is not being expanded, so the log message will incorrectly print "Creating image .../inputs.beaker." instead of the actual image name.

        echo "Creating image $beaker_user/${{ inputs.beaker }}."

beaker image create \
--name ${{ inputs.beaker }} ${{ inputs.image }} \
--description "$DESCRIPTION"
191 changes: 191 additions & 0 deletions .github/workflows/beaker-experiment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
name: Beaker Experiment Launch

on:
merge_group:

# Adding a comment to trigger a run.
workflow_dispatch: # This allows us to manually trigger a build through the GitHub UI.
# pull_request:
# branches: [main]
# paths:
# - 'open_instruct/**'
# - '!open_instruct/README.md'
# - 'requirements.txt'
# - 'Dockerfile'
# - '.github/workflows/beaker-experiment.yml'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
DOCKER_BUILDKIT: "1"

jobs:
launch-experiment:
name: Launch Beaker Experiment
runs-on: 8-Core-XL-Runner-Ubuntu-Latest
timeout-minutes: 35

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # Need full history to get commit author info

- name: Checkout oe-eval-internal
uses: actions/checkout@v4
with:
repository: allenai/oe-eval-internal
path: './oe-eval-internal'
ssh-key: ${{ secrets.OE_EVAL_GIT_CLONE_ACCESS_PRIVATE_SSH_DEPLOY_KEY }}
fetch-depth: 1
filter: 'blob:none'

- name: Get trigger information
id: get-trigger-info
run: |
if [ "${{ github.event_name }}" = "push" ]; then
# Get the commit author for push events
AUTHOR_NAME=$(git log -1 --pretty=format:'%an')
echo "trigger_info=Push by ${AUTHOR_NAME}" >> $GITHUB_OUTPUT
elif [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
# Get the user who triggered the manual dispatch
echo "trigger_info=Manual dispatch by ${{ github.actor }}" >> $GITHUB_OUTPUT
else
# For scheduled runs
echo "trigger_info=Scheduled run" >> $GITHUB_OUTPUT
fi

- name: Setup Python environment
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install uv
uses: astral-sh/setup-uv@v3

- name: Setup Beaker
uses: allenai/setup-beaker@v2
with:
token: ${{ secrets.BEAKER_TOKEN }}
workspace: ai2/tulu-thinker

- name: Install dependencies
run: |
# Install development dependencies needed for mason.py
uv sync --frozen

- name: Build image and launch experiment
id: launch
env:
BEAKER_TOKEN: ${{ secrets.BEAKER_TOKEN }}
GITHUB_RUN_ID: ${{ github.run_id }}
GITHUB_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
set -euo pipefail

# Make scripts executable
chmod +x scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_on_beaker.sh

echo "Building Docker image and launching experiment..."
echo "Git commit: $(git rev-parse --short HEAD)"

# Build image and launch experiment
# Use tee to both stream output and capture it for parsing
./scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_integration_test.sh 2>&1 | tee /tmp/beaker_output.log || {
EXIT_CODE=$?
echo "ERROR: build_image_and_launch.sh failed with exit code $EXIT_CODE"
exit $EXIT_CODE
}

# Extract experiment ID from the saved output
EXPERIMENT_ID=$(grep -oP 'https://beaker.org/ex/\K[a-zA-Z0-9]+' /tmp/beaker_output.log | tail -1)
if [ -z "$EXPERIMENT_ID" ]; then
echo "ERROR: Failed to extract experiment ID from output"
echo "DEBUG: Full output log:"
cat /tmp/beaker_output.log
echo "---"
echo "Please check that the experiment was created successfully."
exit 1
fi

echo "experiment_id=$EXPERIMENT_ID" >> $GITHUB_OUTPUT
echo "Experiment ID: $EXPERIMENT_ID"
echo "Experiment URL: https://beaker.org/ex/$EXPERIMENT_ID"

- name: Wait for Beaker experiment completion
env:
BEAKER_TOKEN: ${{ secrets.BEAKER_TOKEN }}
run: |
EXPERIMENT_ID="${{ steps.launch.outputs.experiment_id }}"
echo "Waiting for experiment $EXPERIMENT_ID to complete..."

# Maximum wait time: 20 minutes (1200 seconds)
MAX_WAIT_TIME=1200
CHECK_INTERVAL=30
ELAPSED_TIME=0

while [ $ELAPSED_TIME -lt $MAX_WAIT_TIME ]; do
# Get job status directly
JOB_STATUS=$(beaker experiment get $EXPERIMENT_ID --format json | jq -r '.[0].jobs[0].status' 2>/dev/null || echo "null")

# Check if exitCode exists (experiment is done)
if [ "$JOB_STATUS" = "null" ]; then
EXIT_CODE="pending"
else
EXIT_CODE=$(echo "$JOB_STATUS" | jq -r '.exitCode // "pending"')
fi

if [ "$EXIT_CODE" = "pending" ]; then
echo "=== Experiment still running (elapsed: ${ELAPSED_TIME}s) ==="
else
echo "=== Experiment finished with exit code: $EXIT_CODE (elapsed: ${ELAPSED_TIME}s) ==="
fi

# Stream new logs since last check
echo "--- Recent logs ---"
beaker experiment logs $EXPERIMENT_ID 2>/dev/null | tail -n 50 || echo "No logs available yet"
echo "--- End of logs ---"

# Check if experiment has completed
if [ "$EXIT_CODE" != "pending" ]; then
if [ "$EXIT_CODE" = "0" ]; then
echo "✅ Experiment completed successfully!"
# Show final logs
echo "=== Final logs ==="
beaker experiment logs $EXPERIMENT_ID | tail -n 100
exit 0
else
echo "❌ Experiment failed with exit code $EXIT_CODE"
# Show error logs
echo "=== Error logs ==="
beaker experiment logs $EXPERIMENT_ID | tail -n 200
exit 1
fi
fi

# Wait before next check
sleep $CHECK_INTERVAL
ELAPSED_TIME=$((ELAPSED_TIME + CHECK_INTERVAL))
done

echo "⏱️ Timeout: Experiment did not complete within 20 minutes"
exit 1

- name: Summary
if: always()
run: |
echo "## Beaker Experiment Summary" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "**Trigger:** ${{ steps.get-trigger-info.outputs.trigger_info }}" >> $GITHUB_STEP_SUMMARY
echo "**Docker Image:** Built locally by build_image_and_launch.sh" >> $GITHUB_STEP_SUMMARY
if [ -n "${{ steps.launch.outputs.experiment_id }}" ]; then
echo "**Beaker Experiment:** [View on Beaker](https://beaker.org/ex/${{ steps.launch.outputs.experiment_id }})" >> $GITHUB_STEP_SUMMARY
fi
echo "" >> $GITHUB_STEP_SUMMARY
if [ "${{ job.status }}" = "success" ]; then
echo "✅ **Status:** Experiment completed successfully!" >> $GITHUB_STEP_SUMMARY
else
echo "❌ **Status:** Experiment failed or timed out" >> $GITHUB_STEP_SUMMARY
fi
1 change: 1 addition & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
- master
- main
- benchmark
merge_group:
permissions:
contents: write
jobs:
Expand Down
35 changes: 12 additions & 23 deletions .github/workflows/push-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,6 @@ concurrency:
cancel-in-progress: true

on:
push:
# Run this workflow anytime a push updates one of the files in the image's directory
# (other than the README), and anytime there's a new release tag for this image.
paths:
- 'open_instruct/**'
- '!open_instruct/README.md'
- 'requirements.txt'
- 'Dockerfile'
- '.github/workflows/push-image.yml'
# Note, add .olmo dockerfile + requirements if adding auto build to those
branches: [main]
# pull_request: # note, comment this out for running on every push
# # Also run on PRs that update the files in the image's directory (other than README).
# branches: [main]
Expand All @@ -32,6 +21,7 @@ on:
# - 'requirements.txt'
# - 'Dockerfile'
# - '.github/workflows/push-image.yml'
merge_group:
workflow_dispatch: # This allows us to manually trigger a build through the GitHub UI.

env:
Expand All @@ -42,7 +32,6 @@ jobs:
name: open_instruct
runs-on: ubuntu-latest
timeout-minutes: 60
if: (github.event_name != 'workflow_run') || (github.event.workflow_run.conclusion == 'success')
steps:
- uses: actions/checkout@v3

Expand All @@ -60,22 +49,22 @@ jobs:
# ghcr_user: ${{ secrets.GHCR_USER }}

# big images fail, trying this
# reference for big files in runner: https://github.com/actions/runner-images/issues/10386
- name: Delete huge unnecessary tools folder
run: rm -rf /opt/hostedtoolcache /usr/share/dotnet "$AGENT_TOOLSDIRECTORY"

run: rm -rf /opt/hostedtoolcache /usr/share/dotnet "$AGENT_TOOLSDIRECTORY" /usr/local/lib/android/sdk/ndk

- name: Check remaining disk space
run: df -h

- name: Build image
run: |
docker build \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--build-arg CUDA=12.1.0 --build-arg \
TARGET=cudnn8-devel --build-arg DIST=ubuntu20.04 \
--build-arg REQUIRE=requirements.txt . \
-t open_instruct

docker build --platform=linux/amd64 \
--build-arg GIT_COMMIT="$(git rev-parse --short HEAD)" \
--build-arg GIT_BRANCH="${GITHUB_REF#refs/heads/}" \
-t open_instruct .

- name: Check image
run: |
docker run --rm open_instruct
run: docker run --rm open_instruct

- name: Push image
# if: github.event_name != 'pull_request'
Expand Down
Loading