This sample shows how to run training jobs on Cloud Machine Learning Engine with Cloud TPUs using TensorFlow's tf.metrics
.
This sample is adapted from the official samples for training ResNet-50 with Cloud TPUs to run on Cloud Machine Learning Engine.
-
Install Google Cloud Platform SDK. The SDK includes the commandline tools
gcloud
for submitting training jobs to Cloud Machine Learning Engine. -
Enable Cloud Storage.
-
Follow the steps here to authorize Cloud TPU to access your project.
-
Clone the repository.
git clone https://github.com/GoogleCloudPlatform/cloudml-samples.git
-
If you do not already have a Cloud Storage bucket, create one to be used for the training job.
gsutil mb gs://[YOUR_GCS_BUCKET] export GCS_BUCKET="gs://[YOUR_GCS_BUCKET]"
-
Run the sample. The included script will train ResNet-50 for 1024 steps using a fake dataset.
cd cloudml-samples/tpu/training/resnet bash submit_resnet.sh