@@ -89,7 +89,7 @@ your client:
8989
9090 ```bash
9191 cd $RECIPE_ROOT
92- export WORKLOAD_NAME=$USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus16-2node
92+ export WORKLOAD_NAME=$USER-a4-llama3-1-70b
9393 helm install $WORKLOAD_NAME . -f values.yaml \
9494 --set-file workload_launcher=launcher.sh \
9595 --set-file workload_config=llama3-1-70b-seq8192-gbs2048-mbs1-gpus16.py \
@@ -107,7 +107,7 @@ your client:
107107
108108 ``` bash
109109 cd $RECIPE_ROOT
110- export WORKLOAD_NAME=$USER -a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus16-2node
110+ export WORKLOAD_NAME=$USER -a4-llama3-1-70b
111111 helm install $WORKLOAD_NAME . -f values.yaml \
112112 --set-file workload_launcher=launcher.sh \
113113 --set-file workload_config=llama3-1-70b-seq8192-gbs2048-mbs1-gpus16.py \
@@ -124,12 +124,12 @@ your client:
124124To check the status of pods in your job, run the following command:
125125
126126```
127- kubectl get pods | grep $USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus16-2node
127+ kubectl get pods | grep $USER-a4-llama3-1-70b
128128```
129129
130130Replace the following:
131131
132- - JOB_NAME_PREFIX - your job name prefix. For example $USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus16-2node .
132+ - JOB_NAME_PREFIX - your job name prefix. For example $USER-a4-llama3-1-70b.
133133
134134To get the logs for one of the pods, run the following command:
135135
@@ -141,13 +141,13 @@ Information about the training job's progress, including crucial details such as
141141loss, step count, and step time, is generated by the rank 0 process.
142142This process runs on the pod whose name begins with
143143`JOB_NAME_PREFIX-workload-0-0`.
144- For example: `$USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus16-2node- workload-0-0-s9zrv`.
144+ For example: `$USER-a4-llama3-1-70b-workload-0-0-s9zrv`.
145145
146146### Uninstall the Helm release
147147
148148You can delete the job and other resources created by the Helm chart. To
149149uninstall Helm, run the following command from your client:
150150
151151```bash
152- helm uninstall $USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus16-2node
153- ```
152+ helm uninstall $USER-a4-llama3-1-70b
153+ ```
0 commit comments