Skip to content

Commit

Permalink
update configs
Browse files Browse the repository at this point in the history
  • Loading branch information
epwalsh committed Oct 8, 2023
1 parent c110059 commit 7906bd2
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 0 deletions.
2 changes: 2 additions & 0 deletions configs/mcli/v1-mix-medium-mitch-ish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,11 @@ integrations:
command: |-
pip install urllib3==1.26.17
pip freeze
mkdir -p /root/.cache/torch/
export OMP_NUM_THREADS=8
export LOG_FILTER_TYPE=local_rank0_only
export OLMO_NO_SSL=1 # we get SSLErrors all the time on this cluster
#export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
cd LLM
Expand Down
7 changes: 7 additions & 0 deletions configs/mcli/v1-mix-medium.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,17 @@ integrations:
pip_install: -e .
ssh_clone: true
command: |-
pip install urllib3==1.26.17
pip freeze
mkdir -p /root/.cache/torch/
export OMP_NUM_THREADS=8
export LOG_FILTER_TYPE=local_rank0_only
export OLMO_NO_SSL=1 # we get SSLErrors all the time on this cluster
#export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
cd LLM
torchrun \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT \
Expand Down
1 change: 1 addition & 0 deletions configs/mcli/v1_5-mix-medium-mitch-ish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ integrations:
command: |-
pip install urllib3==1.26.17
pip freeze
mkdir -p /root/.cache/torch/
export OMP_NUM_THREADS=8
export LOG_FILTER_TYPE=local_rank0_only
Expand Down
7 changes: 7 additions & 0 deletions configs/mcli/v1_5-mix-medium.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,17 @@ integrations:
pip_install: -e .
ssh_clone: true
command: |-
pip install urllib3==1.26.17
pip freeze
mkdir -p /root/.cache/torch/
export OMP_NUM_THREADS=8
export LOG_FILTER_TYPE=local_rank0_only
export OLMO_NO_SSL=1 # we get SSLErrors all the time on this cluster
#export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
cd LLM
torchrun \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT \
Expand Down

0 comments on commit 7906bd2

Please sign in to comment.