Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request enables requesting Nvidia GPUs when the
operator-engine
is deployed on a Kubernetes cluster, especially on a Google Kubernetes Engine (GKE) cluster.This way, on GKE, it is not necessary to have a dedicated machine with GPU, it can be requested just for the algorithm job.
Changes proposed in this PR:
gpuType
passed to the operator-engine contains "nvidia", configureresource/limits
to includenvidia.com/gpu
with the value for ENV variablenGPU
.nGPU = 1
, generateresource/limits/nvidia.com/gpu = 1
gpuType
includes the stringcloud.google.com/gke-accelerator
, consider that it also consists of the requested GKE GPU type separated by:
from the previous string. Use that combination asnodeSelector
to trigger that an instance providing that GPU is started on GKE.gpuType = cloud.google.com/gke-accelerator:nvidia-tesla-t4
, generatenodeSelector/cloud.google.com/gke-accelerator = nvidia-tesla-t4
gpuType
contains the stringnvidia.com/gpu.product
, consider that it includes the requested Nvidia GPU product separated by:
from the previous string. Use the combination asnodeSelector
to route the execution to the node providing that particular Nvidia GPU.gpuType = nvidia.com/gpu.product:Quadro-P1000
, generatenodeSelector/nvidia.com/gpu.product = Quadro-P1000
Additional Information for GKE
The previous additions make it possible to add much cheaper support for GPUs to your Ocean Provider running on GKE. An existing GKE cluster can be combined with a node pool that provides GPU support with cheaper preemptible instances, which run just when needed by an algorithm job.
For instance, to create a node pool with just one preemptible
n1-standard-4
instance providing an Nvidia T4 (but on potentially 3 locations to increase chances), the followinggcloud
command can be used.To fit into the
n1-standard-4
, also configureoperator-engine
to requestnCPU = 3
andramGB = 10
. If more resources are needed, increase the instance type to other more powerful N1 types.