Skip to content

Commit 974ff4f

Browse files
[Docs] Fix some issues with Managed Jobs example. (#4361)
* [Docs] Fix some issues with Managed Jobs example. * fix * Fix env
1 parent a404e3f commit 974ff4f

File tree

1 file changed

+44
-49
lines changed

1 file changed

+44
-49
lines changed

docs/source/examples/managed-jobs.rst

+44-49
Original file line numberDiff line numberDiff line change
@@ -78,49 +78,47 @@ We can launch it with the following:
7878

7979
.. code-block:: console
8080
81+
$ git clone https://github.com/huggingface/transformers.git ~/transformers -b v4.30.1
8182
$ sky jobs launch -n bert-qa bert_qa.yaml
8283
83-
8484
.. code-block:: yaml
8585
8686
# bert_qa.yaml
8787
name: bert-qa
8888
8989
resources:
9090
accelerators: V100:1
91-
# Use spot instances to save cost.
92-
use_spot: true
93-
94-
# Assume your working directory is under `~/transformers`.
95-
# To make this example work, please run the following command:
96-
# git clone https://github.com/huggingface/transformers.git ~/transformers -b v4.30.1
97-
workdir: ~/transformers
91+
use_spot: true # Use spot instances to save cost.
9892
99-
setup: |
93+
envs:
10094
# Fill in your wandb key: copy from https://wandb.ai/authorize
10195
# Alternatively, you can use `--env WANDB_API_KEY=$WANDB_API_KEY`
10296
# to pass the key in the command line, during `sky jobs launch`.
103-
echo export WANDB_API_KEY=[YOUR-WANDB-API-KEY] >> ~/.bashrc
97+
WANDB_API_KEY:
98+
99+
# Assume your working directory is under `~/transformers`.
100+
workdir: ~/transformers
104101
102+
setup: |
105103
pip install -e .
106104
cd examples/pytorch/question-answering/
107105
pip install -r requirements.txt torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
108106
pip install wandb
109107
110108
run: |
111-
cd ./examples/pytorch/question-answering/
109+
cd examples/pytorch/question-answering/
112110
python run_qa.py \
113-
--model_name_or_path bert-base-uncased \
114-
--dataset_name squad \
115-
--do_train \
116-
--do_eval \
117-
--per_device_train_batch_size 12 \
118-
--learning_rate 3e-5 \
119-
--num_train_epochs 50 \
120-
--max_seq_length 384 \
121-
--doc_stride 128 \
122-
--report_to wandb
123-
111+
--model_name_or_path bert-base-uncased \
112+
--dataset_name squad \
113+
--do_train \
114+
--do_eval \
115+
--per_device_train_batch_size 12 \
116+
--learning_rate 3e-5 \
117+
--num_train_epochs 50 \
118+
--max_seq_length 384 \
119+
--doc_stride 128 \
120+
--report_to wandb \
121+
--output_dir /tmp/bert_qa/
124122
125123
.. note::
126124

@@ -162,55 +160,52 @@ An End-to-End Example
162160
Below we show an `example <https://github.com/skypilot-org/skypilot/blob/master/examples/spot/bert_qa.yaml>`_ for fine-tuning a BERT model on a question-answering task with HuggingFace.
163161

164162
.. code-block:: yaml
165-
:emphasize-lines: 13-16,42-45
163+
:emphasize-lines: 8-11,41-44
166164
167165
# bert_qa.yaml
168166
name: bert-qa
169167
170168
resources:
171169
accelerators: V100:1
172-
use_spot: true
173-
174-
# Assume your working directory is under `~/transformers`.
175-
# To make this example work, please run the following command:
176-
# git clone https://github.com/huggingface/transformers.git ~/transformers -b v4.30.1
177-
workdir: ~/transformers
170+
use_spot: true # Use spot instances to save cost.
178171
179172
file_mounts:
180173
/checkpoint:
181174
name: # NOTE: Fill in your bucket name
182175
mode: MOUNT
183176
184-
setup: |
177+
envs:
185178
# Fill in your wandb key: copy from https://wandb.ai/authorize
186179
# Alternatively, you can use `--env WANDB_API_KEY=$WANDB_API_KEY`
187180
# to pass the key in the command line, during `sky jobs launch`.
188-
echo export WANDB_API_KEY=[YOUR-WANDB-API-KEY] >> ~/.bashrc
181+
WANDB_API_KEY:
182+
183+
# Assume your working directory is under `~/transformers`.
184+
workdir: ~/transformers
189185
186+
setup: |
190187
pip install -e .
191188
cd examples/pytorch/question-answering/
192-
pip install -r requirements.txt
189+
pip install -r requirements.txt torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
193190
pip install wandb
194191
195192
run: |
196-
cd ./examples/pytorch/question-answering/
193+
cd examples/pytorch/question-answering/
197194
python run_qa.py \
198-
--model_name_or_path bert-base-uncased \
199-
--dataset_name squad \
200-
--do_train \
201-
--do_eval \
202-
--per_device_train_batch_size 12 \
203-
--learning_rate 3e-5 \
204-
--num_train_epochs 50 \
205-
--max_seq_length 384 \
206-
--doc_stride 128 \
207-
--report_to wandb \
208-
--run_name $SKYPILOT_TASK_ID \
209-
--output_dir /checkpoint/bert_qa/ \
210-
--save_total_limit 10 \
211-
--save_steps 1000
212-
213-
195+
--model_name_or_path bert-base-uncased \
196+
--dataset_name squad \
197+
--do_train \
198+
--do_eval \
199+
--per_device_train_batch_size 12 \
200+
--learning_rate 3e-5 \
201+
--num_train_epochs 50 \
202+
--max_seq_length 384 \
203+
--doc_stride 128 \
204+
--report_to wandb \
205+
--output_dir /checkpoint/bert_qa/ \
206+
--run_name $SKYPILOT_TASK_ID \
207+
--save_total_limit 10 \
208+
--save_steps 1000
214209
215210
As HuggingFace has built-in support for periodically checkpointing, we only need to pass the highlighted arguments for setting up
216211
the output directory and frequency of checkpointing (see more

0 commit comments

Comments
 (0)