[Feature] Add end to end tests to apiserver

Fixes #1388
ray-project · Oct 5, 2023 · 1de910a · 1de910a
1 parent 38e3527
commit 1de910a
Show file tree

Hide file tree

Showing 19 changed files with 2,282 additions and 163 deletions.
diff --git a/.github/workflows/test-job.yaml b/.github/workflows/test-job.yaml
@@ -144,7 +144,7 @@ jobs:
         working-directory: ${{env.working-directory}}
 
       - name: Test
-        run: go test ./...
+        run: go test ./pkg/... ./cmd/... -race -parallel 4
         working-directory: ${{env.working-directory}}
 
       - name: Set up Docker

diff --git a/apiserver/DEVELOPMENT.md b/apiserver/DEVELOPMENT.md
@@ -61,6 +61,35 @@ make build
 make test
 ```
 
+#### End to End Testing
+
+There are two `make` targets provide execute the end to end test (integration between Kuberay API server and Kuberay Operator):
+
+* `make e2e-test` executes all the tests defined in the [test/e2e package](./test/e2e/). It uses the cluster defined in `~/.kube/config` to submit the workloads.
+* `make local-e2e-test` creates a local kind cluster, deploys the nightly operator image and a freshly build Kuberay API server into the kind cluster and shuts down the kind cluster upon successful execution of the end to end test.
+
+The `e2e` test targets use two variables to control what version of Ray images to use in the end to end tests:
+
+* `E2E_API_SERVER_RAY_IMAGE` -- for the ray docker image. Currently set to `rayproject/ray:2.7.0-py310`. On Apple silicon or arm64 development machines the `-aarch64` suffix is added.
+* `E2E_API_SERVER_URL` -- for the base URL of the deployed KubeRayAPI server. The default value is: `http://localhost:31888`
+
+The end to end test targets share the usage of the `GO_TEST_FLAGS`. Overriding the make file variable with a `-v` option allows for both unit and end to end tests to print any output / debug messages. By default, only if there's a test failure those messages are show.
+
+The default values of the variables can be overridden using the `-e` make command line arguments.
+
+Examples:
+
+```bash
+# To run end to end test using default cluster
+make e2e-test
+
+# To run end to end test in fresh cluster. 
+# Please note that: 
+# * the cluster created for this test is the same as the cluster created by make cluster.
+# * if the end to end tests fail the cluster will still be up and will have to be explicitly shutdown by executing make clean-cluster
+make local-e2e-test
+```
+
 #### Swagger UI updates
 
 To update the swagger ui files deployed with the Kuberay API server, you'll need to:
@@ -117,7 +146,7 @@ make run
 
 #### Access
 
-Access the service at `localhost:8888` for http, and `locahost:8887` for the RPC port.
+Access the service at `localhost:8888` for http, and `localhost:8887` for the RPC port.
 
 ### Kubernetes Deployment
 
@@ -160,9 +189,9 @@ As a convenience for local development the following `make` targets are provided
 * `make cluster` -- creates a local kind cluster, using the configuration from `hack/kind-cluster-config.yaml`. It creates a port mapping allowing for the service running in the kind cluster to be accessed on  `localhost:31888` for HTTP and `localhost:31887` for RPC.
 * `make clean-cluster` -- deletes the local kind cluster created with `make cluster`
 * `load-image` -- loads the docker image defined by the `IMG` make variable into the kind cluster. The default value for variable is: `kuberay/apiserver:latest`. The name of the image can be changed by using `make load-image -e IMG=<your image name and tag>`
-* `operator-image` -- Build the operator image to be loaded in your kind cluster. The tag for the operator image is `kuberay/operator:latest`. This step is optional.
-* `load-operator-image` -- Load the operator image to the kind cluster created with `create-kind-cluster`. The tag for the operator image is `kuberay/operator:latest`, and the tag can be overridden using `make load-operator-image -E OPERATOR_IMAGE_TAG=<operator tag>`. To use the nightly operator tag, set `OPERATOR_IMAGE_TAG` to `nightly`.
-* `deploy-operator` -- Deploy operator into your cluster.  The tag for the operator image is `kuberay/operator:latest`.
+* `operator-image` -- Build the operator image to be loaded in your kind cluster. You must specify a value for the operator image tag. Since the default value is set to `nightly`, the local image with this value will be overridden if `make deploy` operator is used later. This step is optional. Example: `make operator-image -e OPERATOR_IMAGE_TAG=latest`
+* `load-operator-image` -- Load the operator image to the kind cluster created with `create-kind-cluster`. The tag for the operator image is `kuberay/operator:nightly`, and the tag can be overridden using `make load-operator-image -E OPERATOR_IMAGE_TAG=<operator tag>`.
+* `deploy-operator` -- Deploy operator into your cluster.  The tag for the operator image is `kuberay/operator:nightly`.
 * `undeploy-operator` -- Undeploy operator from your cluster
 
 When developing and testing with kind you might want to execute these targets together:
@@ -172,9 +201,12 @@ When developing and testing with kind you might want to execute these targets to
 make docker-image cluster load-image deploy
 
 #To create a new API server image, operator image and deploy them on a new cluster
-make docker-image operator-image cluster load-image load-operator-image deploy deploy-operator
+make docker-image operator-image cluster load-image load-operator-image deploy deploy-operator -e OPERATOR_IMAGE_TAG=latest
+
+#To execute end 2 end tests with a local build operator and verbose output
+make operator-image local-e2e-test -e OPERATOR_IMAGE_TAG=latest -e GO_TEST_FLAGS="-v"
 ```
 
 #### Access API Server in the Cluster
 
-Access the service at `localhost:31888` for http and `locahost:31887` for the RPC port.
+Access the service at `localhost:31888` for http and `localhost:31887` for the RPC port.
diff --git a/apiserver/Makefile b/apiserver/Makefile
@@ -7,6 +7,13 @@ REPO_ROOT_BIN	:= $(REPO_ROOT)/bin
 IMG_TAG ?=latest
 IMG ?= kuberay/apiserver:$(IMG_TAG)
 
+# Allow for additional test flags (-v, etc)
+GO_TEST_FLAGS ?= 
+# Ray docker images to use for end to end tests
+E2E_API_SERVER_RAY_IMAGE ?=rayproject/ray:2.7.0-py310
+# Kuberay API Server base URL to use in end to end tests
+E2E_API_SERVER_URL ?=http://localhost:31888
+
 # Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
 ifeq (,$(shell go env GOBIN))
 GOBIN=$(shell go env GOPATH)/bin
@@ -56,11 +63,18 @@ imports: goimports ## Run goimports against code.
 	$(GOIMPORTS) -l -w .	
 
 test: fmt vet fumpt imports lint  ## Run unit tests.
-	go test ./... -race -coverprofile ray-kube-api-server-coverage.out
+	go test ./pkg/... ./cmd/... $(GO_TEST_FLAGS) -race -coverprofile ray-kube-api-server-coverage.out  -parallel 4
 
 lint: golangci-lint fmt vet fumpt imports ## Run the linter.
 	$(GOLANGCI_LINT) run  --timeout=3m	
 
+.PHONY: e2e-test
+e2e-test: ## Run end to end tests using a pre-exiting cluster.
+	go test ./test/e2e/... $(GO_TEST_FLAGS) -timeout 30m  -race -parallel 4 -count=1
+
+.PHONY: local-e2e-test
+local-e2e-test: docker-image cluster load-image load-operator-image deploy-operator deploy e2e-test clean-cluster ## Run end to end tests, create a fresh kind cluster will all components deployed.
+
 ##@ Build
 
 build: fmt vet fumpt imports lint ## Build api server binary.
@@ -70,10 +84,10 @@ run: fmt vet fumpt imports lint ## Run the api server from your host.
 	go run -race cmd/main.go -localSwaggerPath ${REPO_ROOT}/proto/swagger
 
 docker-image: test ## Build image with the api server.
-	${ENGINE} build -t ${IMG} -f Dockerfile ..
+	$(ENGINE) build -t ${IMG} -f Dockerfile ..
 
 docker-push: ## Push image with the api server.
-	${ENGINE} push ${IMG}
+	$(ENGINE) push ${IMG}
 
 .PHONY: build-swagger
 build-swagger: go-bindata
@@ -170,7 +184,7 @@ clean-dev-tools: ## Remove all development tools
 ##@ Testing Setup and Tools
 KIND_CONFIG ?= hack/kind-cluster-config.yaml
 KIND_CLUSTER_NAME ?= ray-api-server-cluster
-OPERATOR_IMAGE_TAG ?= latest
+OPERATOR_IMAGE_TAG ?= nightly
 .PHONY: cluster
 cluster: kind ## Start kind development cluster.
 	$(KIND) create cluster -n $(KIND_CLUSTER_NAME) --config $(KIND_CONFIG)
@@ -200,4 +214,7 @@ undeploy-operator: ## Undeploy operator via helm from the K8s cluster specified
 
 .PHONY: load-operator-image
 load-operator-image: ## Load the operator image to the kind cluster created with create-kind-cluster.
+ifeq (${OPERATOR_IMAGE_TAG}, nightly)
+	$(ENGINE) pull kuberay/operator:$(OPERATOR_IMAGE_TAG)
+endif	
 	$(KIND) load docker-image kuberay/operator:$(OPERATOR_IMAGE_TAG) -n $(KIND_CLUSTER_NAME)
diff --git a/apiserver/Volumes.md b/apiserver/Volumes.md
@@ -7,96 +7,81 @@ API server allows to specify multiple types of volumes mounted to the Ray pods (
 [config maps](https://kubernetes.io/docs/concepts/storage/volumes/#configmap),
 [secrets](https://kubernetes.io/docs/concepts/storage/volumes/#secret),
 and [empty dir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).
-Multiple volumes of different type can be mounted to both head and worker nodes, by defining a volume array for them 
-
+Multiple volumes of different type can be mounted to both head and worker nodes, by defining a volume array for them
 
 ## HostPath volumes
 
-A hostPath volume mounts a file or directory from the host node's filesystem into your Pod. This is not something that 
-most Pods will need, but it offers a powerful escape hatch for some applications.
+A hostPath volume mounts a file or directory from the host node's filesystem into your Pod. This is not something that most Pods will need, but it offers a powerful escape hatch for some applications.
 
 For example, some uses for a hostPath are:
 
 * running a container that needs access to Docker internals; use a hostPath of /var/lib/docker
 * running cAdvisor in a container; use a hostPath of /sys
-* allowing a Pod to specify whether a given hostPath should exist prior to the Pod running, whether it should be 
-created, and what it should exist as
+* allowing a Pod to specify whether a given hostPath should exist prior to the Pod running, whether it should be created, and what it should exist as
 
 The code below gives an example of hostPath volume definition:
 
-````
+```json
 {
-	"name": "hostPath",             # unique name
-	"source": "/tmp",               # data location on host
-	"mountPath": "/tmp/hostPath",   # mounting path
-	"volumeType": 1,                # volume type - host path
-	"hostPathType": 0,              # host path type - directory
-	"mountPropagationMode": 1       # mount propagation - host to container
+    "name": "hostPath",             # unique name
+    "source": "/tmp",               # data location on host
+    "mountPath": "/tmp/hostPath",   # mounting path
+    "volumeType": 1,                # volume type - host path
+    "hostPathType": 0,              # host path type - directory
+    "mountPropagationMode": 1       # mount propagation - host to container
 }
-````
+```
 
 ## PVC volumes
 
-A Persistent Volume Claim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources 
-and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request 
-specific size and access modes (e.g., they can be mounted `ReadWriteOnce`, `ReadOnlyMany` or `ReadWriteMany`).
+A Persistent Volume Claim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted `ReadWriteOnce`, `ReadOnlyMany` or `ReadWriteMany`).
 
-The caveat of using PVC volumes is that the same PVC is mounted to all nodes. As a result only PVCs with access
-mode `ReadOnlyMany` can be used in this case.
+The caveat of using PVC volumes is that the same PVC is mounted to all nodes. As a result only PVCs with access mode `ReadOnlyMany` can be used in this case.
 
 The code below gives an example of PVC volume definition:
 
-````
+```json
 {
-	"name": "pvc",              # unique name
-	"mountPath": "/tmp/pvc",    # mounting path
-	"volumeType": 0,            # volume type - PVC
-	"mountPropagationMode": 2,  # mount propagation mode - bidirectional
-	"readOnly": false           # read only
+    "name": "pvc",              # unique name
+    "mountPath": "/tmp/pvc",    # mounting path
+    "volumeType": 0,            # volume type - PVC
+    "mountPropagationMode": 2,  # mount propagation mode - bidirectional
+    "readOnly": false           # read only
 }
-````
+```
 
 ## Ephemeral volumes
 
-Some application need additional storage but don't care whether that data is stored persistently across restarts. For 
-example, caching services are often limited by memory size and can move infrequently used data into storage that is 
-slower than memory with little impact on overall performance. Ephemeral volumes are designed for these use cases. 
-Because volumes follow the Pod's lifetime and get created and deleted along with the Pod, Pods can be stopped and 
-restarted without being limited to where some persistent volume is available.
+Some application need additional storage but don't care whether that data is stored persistently across restarts. For example, caching services are often limited by memory size and can move infrequently used data into storage that is slower than memory with little impact on overall performance. Ephemeral volumes are designed for these use cases. Because volumes follow the Pod's lifetime and get created and deleted along with the Pod, Pods can be stopped and restarted without being limited to where some persistent volume is available.
 
-Although there are several option of ephemeral volumes, here we are using generic ephemeral volumes, which can be 
-provided by all storage drivers that also support persistent volumes. Generic ephemeral volumes are similar to emptyDir 
-volumes in the sense that they provide a per-pod directory for scratch data that is usually empty after provisioning. 
-But they may also have additional features:
+Although there are several option of ephemeral volumes, here we are using generic ephemeral volumes, which can be provided by all storage drivers that also support persistent volumes. Generic ephemeral volumes are similar to emptyDir volumes in the sense that they provide a per-pod directory for scratch data that is usually empty after provisioning. But they may also have additional features:
 
 * Storage can be local or network-attached.
 * Volumes can have a fixed size that Pods are not able to exceed.
 
 The code below gives an example of ephemeral volume definition:
 
-````
+```json
 {
-	"name": "ephemeral",            # unique name
-	"mountPath": "/tmp/ephemeral"   # mounting path,
-	"mountPropagationMode": 0,      # mount propagation mode - None
-	"volumeType": 2,                # volume type - ephemeral
-	"storage": "5Gi",               # disk size
-	"storageClass": "default"       # storage class - optional
-	"accessMode": 0                 # access mode RWO - optional
+    "name": "ephemeral",            # unique name
+    "mountPath": "/tmp/ephemeral"   # mounting path,
+    "mountPropagationMode": 0,      # mount propagation mode - None
+    "volumeType": 2,                # volume type - ephemeral
+    "storage": "5Gi",               # disk size
+    "storageClass": "default",      # storage class - optional
+    "accessMode": 0                 # access mode RWO - optional
 }
-````
+```
 
 ## Config map volumes
 
-A ConfigMap provides a way to inject configuration data into pods. The data stored in a ConfigMap can be referenced in 
-a volume of type configMap and then consumed by containerized applications running in a pod.
+A ConfigMap provides a way to inject configuration data into pods. The data stored in a ConfigMap can be referenced in a volume of type configMap and then consumed by containerized applications running in a pod.
 
-When referencing a ConfigMap, you provide the name of the ConfigMap in the volume. You can customize the path to use 
-for a specific entry in the ConfigMap.
+When referencing a ConfigMap, you provide the name of the ConfigMap in the volume. You can customize the path to use for a specific entry in the ConfigMap.
 
 The code below gives an example of config map volume definition:
 
-````
+```json
 {
     "name":"code-sample",               # Unique name
     "mountPath":"/home/ray/samples",    # mounting path
@@ -106,17 +91,15 @@ The code below gives an example of config map volume definition:
         "sample_code.py":"sample_code.py"
     }
 }
-````
+```
 
 ## Secret volumes
 
-A secret volume is used to pass sensitive information, such as passwords, to Pods. You can store secrets in the 
-Kubernetes API and mount them as files for use by pods without coupling to Kubernetes directly. Secret volumes are 
-backed by tmpfs (a RAM-backed filesystem) so they are never written to non-volatile storage.
+A secret volume is used to pass sensitive information, such as passwords, to Pods. You can store secrets in the Kubernetes API and mount them as files for use by pods without coupling to Kubernetes directly. Secret volumes are backed by tmpfs (a RAM-backed filesystem) so they are never written to non-volatile storage.
 
 The code below gives an example of secret volume definition:
 
-````
+```json
 {
     "name":"important-secret",          # Unique name
     "mountPath":"/home/ray/sensitive",  # mounting path
@@ -126,22 +109,19 @@ The code below gives an example of secret volume definition:
         "subPath": "password"
     }
 }
-````
+```
 
 ## Emptydir volumes
 
-An emptyDir volume is first created when a Pod is assigned to a node, and exists as long as that Pod is running on 
-that node. As the name says, the emptyDir volume is initially empty. All containers in the Pod can read and write the 
-same files in the emptyDir volume, though that volume can be mounted at the same or different paths in each container. 
-When a Pod is removed from a node for any reason, the data in the emptyDir is deleted permanently.
+An emptyDir volume is first created when a Pod is assigned to a node, and exists as long as that Pod is running on that node. As the name says, the emptyDir volume is initially empty. All containers in the Pod can read and write the same files in the emptyDir volume, though that volume can be mounted at the same or different paths in each container. When a Pod is removed from a node for any reason, the data in the emptyDir is deleted permanently.
 
-The code below gives an example of empydir volume definition:
+The code below gives an example of empty directory volume definition:
 
-````
+```json
 {
-	"name": "emptyDir",            # unique name
-	"mountPath": "/tmp/emptydir"   # mounting path,
-	"volumeType": 5,                # vlume type - ephemeral
-	"storage": "5Gi",               # max storage size - optional
+    "name": "emptyDir",             # unique name
+    "mountPath": "/tmp/emptydir"    # mounting path,
+    "volumeType": 5,                # vlume type - ephemeral
+    "storage": "5Gi",               # max storage size - optional
 }
-````
+```
diff --git a/apiserver/go.mod b/apiserver/go.mod
@@ -20,12 +20,15 @@ require (
 )
 
 require (
+	github.com/dustinkirkland/golang-petname v0.0.0-20230626224747-e794b9370d49
 	github.com/elazarl/go-bindata-assetfs v1.0.1
 	github.com/grpc-ecosystem/go-grpc-middleware v1.3.0
 	github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0
 	github.com/grpc-ecosystem/grpc-gateway/v2 v2.6.0
 )
 
+require github.com/pmezard/go-difflib v1.0.0 // indirect
+
 require (
 	github.com/asaskevich/govalidator v0.0.0-20200428143746-21a406dcc535 // indirect
 	github.com/beorn7/perks v1.0.1 // indirect
@@ -48,7 +51,6 @@ require (
 	github.com/mitchellh/mapstructure v1.4.1 // indirect
 	github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
 	github.com/modern-go/reflect2 v1.0.2 // indirect
-	github.com/pmezard/go-difflib v1.0.0 // indirect
 	github.com/prometheus/client_model v0.2.0 // indirect
 	github.com/prometheus/common v0.28.0 // indirect
 	github.com/prometheus/procfs v0.6.0 // indirect