Skip to content

Latest commit

 

History

History
executable file
·
72 lines (55 loc) · 2.38 KB

Readme.md

File metadata and controls

executable file
·
72 lines (55 loc) · 2.38 KB

Deploy DL Workspace cluster on Azure.

This document describes the procedure to deploy a DL Workspace cluster on Azure. With autoscale enabled DL Workspace, VM will be created (and released) on demand when you launch DL jobs ( e.g., TensorFlow, Pytorch, Caffe2), thus save your money in operation.

Please note that the procedure below doesn't deploy HDFS/Spark on DLWorkspace cluster on Azure (Spark job execution is not available on Azure Cluster).

  1. Follow this document to setup the dev environment of DLWorkspace. Login to your Azure subscription on your dev machine via:
az login
  1. Please configure your azure cluster.

  2. Set proper authentication.

  3. Initial cluster and generate certificates and keys:

./deploy.py -y build
  1. Create Azure Cluster:
./az_tools.py create
  1. Generate cluster config file:
./az_tools.py genconfig 

Please note that if you are not Microsoft user, you should remove the

  1. Run Azure deployment script block:
./deploy.py --verbose scriptblocks azure 

After the script completes execution, you may still need to wait for a few minutes so that relevant docker images can be pulled to the target machine for execution. You can then access your cluster at:

http://machine1.westus.cloudapp.azure.com/

where machine1 is your azure infrastructure node. (you may get the address by ./deploy.py display)

The script block execute the following command in sequences: (you do NOT need to run the following commands if you have run step 5)

  1. Setup basic tools on the Ubuntu image.
./deploy.py runscriptonall ./scripts/prepare_ubuntu.sh
  1. Deploy etcd/master and workers.
./deploy.py -y deploy
./deploy.py -y updateworker
  1. Label nodels and deploy services:
./deploy.py -y kubernetes labels
  1. Build and deploy jobmanager, restfulapi, and webportal. Mount storage.
./deploy.py docker push restfulapi
./deploy.py docker push webui
./deploy.py webui
./deploy.py mount
./deploy.py kubernetes start jobmanager restfulapi webportal
  1. If you run into a deployment issue, please check here first.

  2. If you want to deploy a DLWorkspace cluster that can be autoscaled (i.e., automatically create/release VM when needed), please follow the following additional steps.