Skip to content

Latest commit

 

History

History
 
 

simple_pipeline

Kubeflow demo - Simple pipeline

Hyperparameter tuning and autoprovisioning GPU nodes

This folder contains a demonstration of Kubeflow capabilities, suitable for presentation to public audiences.

This demo highlights the use of pipelines and hyperparameter tuning on a GKE cluster with node autoprovisioning (NAP). A simple pipeline requests GPU resources, which triggers node pool creation. This demo includes the following steps:

  1. Setup your environment
  2. Run a simple pipeline
  3. Perform hyperparameter tuning
  4. Run a better pipeline

1. Setup your environment

Follow the instructions in demo_setup/README.md to setup your environment and install Kubeflow with pipelines on an autoprovisioning GKE cluster.

View the installed components in the GCP Console.

  • In the Kubernetes Engine section, you will see a new cluster ${CLUSTER} with 3 n1-standard-1 nodes
  • Under Workloads, you will see all the default Kubeflow and pipeline components.

Source the environment file and activate the conda environment for pipelines:

source kubeflow-demo-simple-pipeline.env
source activate kfp

2. Run a simple pipeline

Show the file gpu-example-pipeline.py as an example of a simple pipeline.

Compile it to create a .tar.gz file:

./gpu-example-pipeline.py

View the pipelines UI locally by forwarding a port to the ml-pipeline-ui pod:

PIPELINES_POD=$(kubectl get po -l app=ml-pipeline-ui | \
  grep ml-pipeline-ui | \
  head -n 1 | \
  cut -d " " -f 1 )
kubectl port-forward ${PIPELINES_POD} 8080:3000

In the browser, navigate to localhost:8080 and create a new pipeline by uploading gpu-example-pipeline.py.tar.gz. Select the pipeline and click Create experiment. Use all suggested defaults.

View the effects of autoprovisioning by observing the number of nodes increase.

Select Experiments from the left-hand side, then Runs. Click on the experiment run to view the graph and watch it execute.

View the container logs for the training step and take note of the low accuracy (~0.113).

3. Perform hyperparameter tuning

In order to determine parameters that result in higher accuracy, use Katib to execute a Study, which defines a search space for performing training with a range of different parameters.

Create a Study by applying an example file to the cluster:

kubectl apply -f gpu-example-katib.yaml

This creates a Studyjob object. To view it:

kubectl get studyjob
kubectl describe studyjobs gpu-example

To view the Katib UI, connect to the modeldb-frontend pod:

KATIB_POD=$(kubectl get po -l app=modeldb,component=frontend | \
  grep modeldb-frontend | \
  head -n 1 | \
  cut -d " " -f 1 )
kubectl port-forward ${KATIB_POD} 8081:3000

In the browser, navigate to localhost:8081/katib and click on the gpu-example project. In the Explore Visualizations section, select Optimizer in the Group By dropdown, then click Compare.

While you're waiting, watch for autoprovisioning to occur. View the pods in Pending status.

View the creation of a new GPU node pool:

gcloud container node-pools list --cluster ${CLUSTER}

View the creation of new nodes:

kubectl get nodes

In the Katib UI, interact with the various graphs to determine which combination of parameters results in the highest accuracy. Grouping by optimizer type is one way to find consistently higher accuracies. Gather a set of parameters to use in a new run of the pipeline.

4. Run a better pipeline

In the pipelines UI, clone the previous experiment run and update the arguments to match the parameters for one of the runs with higher accuracies from the Katib UI. Execute the pipeline and watch for the resulting accuracy, which should be closer to 0.98.

Approximately 5 minutes after the last run completes, check the cluster nodes to verify that GPU nodes have disappeared.