Skip to content

Commit

Permalink
update config
Browse files Browse the repository at this point in the history
  • Loading branch information
epwalsh committed Mar 27, 2024
1 parent 0f96e2f commit ea0532b
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions configs/mcli/mitchish70.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
name: olmo-70b
image: mosaicml/pytorch:2.2.1_cu121-python3.11-ubuntu20.04
scheduling:
priority: auto
# preemptible: true # means it can be retried
# max_retries: 3
compute:
cluster: r9z3
gpus: 256
cluster: r15z1p1
gpus: 128
gpu_type: h100_80gb
integrations:
- integration_type: git_repo
Expand Down Expand Up @@ -48,10 +52,15 @@ command: |-
scripts/train.py configs/mitchish70-s3.yaml \
--run_name=mitchish70-002 \
--wandb.group=mitchish70-official \
'--load_path=${path.last_checkpoint:${remote_save_folder}}' \
--load_path=s3://ai2-llm/checkpoints/OLMo-large/mitchish70-002/step34700-unsharded \
--global_train_batch_size=1536 \
--device_train_microbatch_size=3 \
--time_limit=604800 \
--save_overwrite
# '--load_path=${path.last_checkpoint:${remote_save_folder}}' \
# --load_path=s3://ai2-llm/checkpoints/OLMo-large/mitchish70-002/step32310 \
# --load_path_sharded_checkpointer=torch_new \
# --load_path=s3://ai2-llm/checkpoints/OLMo-large/mitchish70-002/step32050-unsharded \
# --load_path=s3://ai2-llm/checkpoints/OLMo-large/mitchish70-002/step32300-unsharded \
# --load_path=s3://ai2-llm/checkpoints/OLMo-large/mitchish70-002/step32300 \

0 comments on commit ea0532b

Please sign in to comment.