Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial e2e tests #136

Merged
merged 34 commits into from
Jul 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
7cd925b
Initial e2e tests
astefanutti Jun 13, 2023
75da2ab
test: Submit MNIST RayJob
astefanutti Jun 19, 2023
3dc70d3
test: Fix RayCluster labels selector
astefanutti Jun 19, 2023
b824ef1
e2e: Print KubeRay operator logs
astefanutti Jun 19, 2023
a6011cf
test: Fix RayJob runtime environment
astefanutti Jun 19, 2023
aed9d26
test: Print RayJob logs
astefanutti Jun 20, 2023
d98ff93
test: Document how to run e2e tests locally
astefanutti Jun 21, 2023
0a38102
test: Polish MNIST RayJob test
astefanutti Jun 21, 2023
48a6084
test: Add MNIST training with MCAD Job
astefanutti Jun 22, 2023
849a415
test: Print MNIST batch job logs
astefanutti Jun 22, 2023
e356b4b
test: Use RayCluster 'complete' configuration
astefanutti Jun 22, 2023
9d1ad86
test: Add step log statements
astefanutti Jun 22, 2023
b143732
test: Add defered troubleshooting logs
astefanutti Jun 22, 2023
e95e08a
test: Add MNIST training in RayCluster with CodeFlare SDK
astefanutti Jun 22, 2023
49563ef
test: Customize test timeouts
astefanutti Jun 22, 2023
9a96e45
test: Pass MNIST training with CodeFlare SDK on OpenShift
astefanutti Jun 26, 2023
12d106d
test: Print Job logs after successfull or failed completion
astefanutti Jun 27, 2023
ecf16dc
test: Re-use pip requirements file
astefanutti Jun 27, 2023
210b102
test: Parameterize CodeFlare SDK version
astefanutti Jun 27, 2023
08ed883
test: Remove ray_lightning from requirements
astefanutti Jun 27, 2023
06a8659
test: Parameterize Ray image and version
astefanutti Jun 27, 2023
f2c6618
test: Parameterize PyTorch image
astefanutti Jun 27, 2023
646876d
test: Add FIXME for SDK user base image
astefanutti Jun 27, 2023
9823d64
Align go.mod with MCAD version
astefanutti Jun 27, 2023
02ee9e4
test: Print Ray job logs after successful or failed completion
astefanutti Jun 28, 2023
5ab5db0
test: Upload job logs
astefanutti Jun 29, 2023
d4d8f00
test Remove unused functions
astefanutti Jun 29, 2023
99682d0
test: Fix Unexpected kind-action input
astefanutti Jun 30, 2023
494afe6
test: Format test output using gotestfmt
astefanutti Jun 30, 2023
7394965
test: Add codeflare stack logs to uploaded artifacts
astefanutti Jun 30, 2023
2045c7a
test: Add description to e2e tests
astefanutti Jun 30, 2023
9eabd95
test: Factorize e2e tests setup
astefanutti Jun 30, 2023
ff32292
test: Update e2e tests local run documentation
astefanutti Jun 30, 2023
4463ec5
test: Write logs also for jobs that have timed out
astefanutti Jul 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions .github/workflows/e2e_tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
name: e2e

on:
pull_request:
branches:
- main
- 'release-*'
paths-ignore:
- 'docs/**'
- '**.adoc'
- '**.md'
- 'LICENSE'
push:
branches:
- main
- 'release-*'
paths-ignore:
- 'docs/**'
- '**.adoc'
- '**.md'
- 'LICENSE'

concurrency:
group: ${{ github.head_ref }}-${{ github.workflow }}
cancel-in-progress: true

jobs:
kubernetes:

runs-on: ubuntu-20.04
KPostOffice marked this conversation as resolved.
Show resolved Hide resolved

steps:
- name: Cleanup
astefanutti marked this conversation as resolved.
Show resolved Hide resolved
run: |
ls -lart
echo "Initial status:"
df -h

echo "Cleaning up resources:"
sudo swapoff -a
sudo rm -f /swapfile
sudo apt clean
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
docker rmi $(docker image ls -aq)

echo "Final status:"
df -h

- name: Checkout code
uses: actions/checkout@v3
with:
submodules: recursive

- name: Init directories
run: |
TEMP_DIR="$(pwd)/tmp"
mkdir -p "${TEMP_DIR}"
echo "TEMP_DIR=${TEMP_DIR}" >> $GITHUB_ENV

mkdir -p "$(pwd)/bin"
echo "$(pwd)/bin" >> $GITHUB_PATH

- name: Set Go
uses: actions/setup-go@v3
with:
go-version: v1.18

- name: Set up gotestfmt
uses: gotesttools/gotestfmt-action@v2
with:
token: ${{ secrets.GITHUB_TOKEN }}

- name: Container image registry
KPostOffice marked this conversation as resolved.
Show resolved Hide resolved
run: |
podman run -d -p 5000:5000 --name registry registry:2.8.1

export REGISTRY_ADDRESS=$(hostname -i):5000
echo "REGISTRY_ADDRESS=${REGISTRY_ADDRESS}" >> $GITHUB_ENV
echo "Container image registry started at ${REGISTRY_ADDRESS}"

KIND_CONFIG_FILE=${{ env.TEMP_DIR }}/kind.yaml
echo "KIND_CONFIG_FILE=${KIND_CONFIG_FILE}" >> $GITHUB_ENV
envsubst < ./test/e2e/kind.yaml > ${KIND_CONFIG_FILE}

sudo --preserve-env=REGISTRY_ADDRESS sh -c 'cat > /etc/containers/registries.conf.d/local.conf <<EOF
[[registry]]
prefix = "$REGISTRY_ADDRESS"
insecure = true
location = "$REGISTRY_ADDRESS"
EOF'

- name: Setup KinD cluster
uses: helm/[email protected]
with:
cluster_name: cluster
version: v0.17.0
config: ${{ env.KIND_CONFIG_FILE }}

- name: Print cluster info
run: |
echo "KinD cluster:"
kubectl cluster-info
kubectl describe nodes

- name: Deploy CodeFlare stack
id: deploy
run: |
echo Deploying CodeFlare operator
IMG="${REGISTRY_ADDRESS}"/codeflare-operator
make image-push -e IMG="${IMG}"
make deploy -e IMG="${IMG}"
kubectl wait --timeout=120s --for=condition=Available=true deployment -n openshift-operators codeflare-operator-manager

echo Setting up CodeFlare stack
make setup-e2e

- name: Run e2e tests
run: |
export CODEFLARE_TEST_TIMEOUT_SHORT=1m
export CODEFLARE_TEST_TIMEOUT_MEDIUM=3m
export CODEFLARE_TEST_TIMEOUT_LONG=8m

export CODEFLARE_TEST_OUTPUT_DIR=${{ env.TEMP_DIR }}
echo "CODEFLARE_TEST_OUTPUT_DIR=${CODEFLARE_TEST_OUTPUT_DIR}" >> $GITHUB_ENV

set -euo pipefail
go test -timeout 30m -v ./test/e2e -json 2>&1 | tee ${CODEFLARE_TEST_OUTPUT_DIR}/gotest.log | gotestfmt

- name: Print CodeFlare operator logs
if: always() && steps.deploy.outcome == 'success'
run: |
echo "Printing CodeFlare operator logs"
kubectl logs -n openshift-operators --tail -1 -l app.kubernetes.io/name=codeflare-operator | tee ${CODEFLARE_TEST_OUTPUT_DIR}/codeflare-operator.log

- name: Print MCAD controller logs
if: always() && steps.deploy.outcome == 'success'
run: |
echo "Printing MCAD controller logs"
kubectl logs -n codeflare-system --tail -1 -l component=multi-cluster-application-dispatcher | tee ${CODEFLARE_TEST_OUTPUT_DIR}/mcad.log

- name: Print KubeRay operator logs
if: always() && steps.deploy.outcome == 'success'
run: |
echo "Printing KubeRay operator logs"
kubectl logs -n ray-system --tail -1 -l app.kubernetes.io/name=kuberay | tee ${CODEFLARE_TEST_OUTPUT_DIR}/kuberay.log

- name: Upload logs
uses: actions/upload-artifact@v3
if: always() && steps.deploy.outcome == 'success'
with:
name: logs
retention-days: 10
path: |
${{ env.CODEFLARE_TEST_OUTPUT_DIR }}/**/*.log
1 change: 0 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,4 @@ repos:
hooks:
- id: go-fmt
- id: golangci-lint
- id: go-build
KPostOffice marked this conversation as resolved.
Show resolved Hide resolved
- id: go-mod-tidy
52 changes: 46 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,15 @@ MCAD_REF ?= release-${MCAD_VERSION}
MCAD_REPO ?= github.com/project-codeflare/multi-cluster-app-dispatcher
MCAD_CRD ?= ${MCAD_REPO}/config/crd?ref=${MCAD_REF}

# KUBERAY_VERSION defines the default version of the KubeRay operator (used for testing)
KUBERAY_VERSION ?= v0.5.0

# RAY_VERSION defines the default version of Ray (used for testing)
RAY_VERSION ?= 2.5.0

# CODEFLARE_SDK_VERSION defines the default version of the CodeFlare SDK
CODEFLARE_SDK_VERSION ?= 0.4.4

# OPERATORS_REPO_ORG points to GitHub repository organization where bundle PR is opened against
# OPERATORS_REPO_FORK_ORG points to GitHub repository fork organization where bundle build is pushed to
OPERATORS_REPO_ORG ?= redhat-openshift-ecosystem
Expand Down Expand Up @@ -61,6 +70,9 @@ MCAD_IMAGE ?= $(IMAGE_ORG_BASE)/mcad-controller:$(MCAD_REF)
# INSTASCALE_IMAGE defines the default container image for the InstaScale controller
INSTASCALE_IMAGE ?= $(IMAGE_ORG_BASE)/instascale-controller:$(INSTASCALE_VERSION)

# RAY_IMAGE defines the default container image for Ray (used for testing)
RAY_IMAGE ?= rayproject/ray:$(RAY_VERSION)

# BUNDLE_IMG defines the image:tag used for the bundle.
# You can use it as an arg. (E.g make bundle-build BUNDLE_IMG=<some-registry>/<project-name-bundle>:<tag>)
BUNDLE_IMG ?= $(IMAGE_TAG_BASE)-bundle:$(VERSION)
Expand Down Expand Up @@ -116,6 +128,7 @@ help: ## Display this help.
##@ Development

DEFAULTS_FILE := controllers/defaults.go
DEFAULTS_TEST_FILE := test/support/defaults.go

.PHONY: defaults
defaults:
Expand All @@ -133,7 +146,22 @@ defaults:
@echo ")" >> $(DEFAULTS_FILE)
@echo "" >> $(DEFAULTS_FILE)

gofmt -w $(DEFAULTS_FILE)
$(info Regenerating $(DEFAULTS_TEST_FILE))
@echo "package support" > $(DEFAULTS_TEST_FILE)
@echo "" >> $(DEFAULTS_TEST_FILE)
@echo "// ***********************" >> $(DEFAULTS_TEST_FILE)
@echo "// DO NOT EDIT THIS FILE" >> $(DEFAULTS_TEST_FILE)
@echo "// ***********************" >> $(DEFAULTS_TEST_FILE)
@echo "" >> $(DEFAULTS_TEST_FILE)
@echo "const (" >> $(DEFAULTS_TEST_FILE)
@echo " CodeFlareSDKVersion = \"$(CODEFLARE_SDK_VERSION)\"" >> $(DEFAULTS_TEST_FILE)
@echo " RayVersion = \"$(RAY_VERSION)\"" >> $(DEFAULTS_TEST_FILE)
@echo " RayImage = \"$(RAY_IMAGE)\"" >> $(DEFAULTS_TEST_FILE)
@echo "" >> $(DEFAULTS_TEST_FILE)
@echo ")" >> $(DEFAULTS_TEST_FILE)
@echo "" >> $(DEFAULTS_TEST_FILE)

gofmt -w $(DEFAULTS_FILE) $(DEFAULTS_TEST_FILE)

.PHONY: manifests
manifests: controller-gen ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.
Expand Down Expand Up @@ -186,20 +214,24 @@ vet: ## Run go vet against code.

##@ Build

.PHONY: modules
modules: ## Update Go dependencies.
go get $(MCAD_REPO)@$(MCAD_VERSION)

.PHONY: build
build: defaults generate fmt vet ## Build manager binary.
build: modules defaults generate fmt vet ## Build manager binary.
go build -o bin/manager main.go

.PHONY: run
run: defaults manifests generate fmt vet ## Run a controller from your host.
run: modules defaults manifests generate fmt vet ## Run a controller from your host.
go run ./main.go

.PHONY: image-build
image-build: test-unit ## Build container image with the manager.
podman build -t ${IMG} .

.PHONY: image-push
image-push: ## Push container image with the manager.
image-push: image-build ## Push container image with the manager.
podman push ${IMG}

##@ Deployment
Expand Down Expand Up @@ -383,5 +415,13 @@ catalog-push: ## Push a catalog image.
$(MAKE) image-push IMG=$(CATALOG_IMG)

.PHONY: test-unit
test-unit: defaults manifests generate fmt vet envtest ## Run tests.
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test ./... -coverprofile cover.out
test-unit: defaults manifests generate fmt vet envtest ## Run unit tests.
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test $(go list ./... | grep -v /test/) -coverprofile cover.out

.PHONY: test-e2e
test-e2e: defaults manifests generate fmt vet ## Run e2e tests.
go test -timeout 30m -v ./test/e2e

.PHONY: setup-e2e
setup-e2e: ## Set up e2e tests.
KUBERAY_VERSION=$(KUBERAY_VERSION) test/e2e/setup.sh
39 changes: 38 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# codeflare-operator

Operator for installation and lifecycle management of CodeFlare distributed workload stack, starting with MCAD and InstaScale

<!-- Don't delete these comments, they are used to generate Compatibility Matrix table for release automation -->
Expand All @@ -14,7 +15,43 @@ CodeFlare Stack Compatibility Matrix
| KubeRay | v0.5.0 |
<!-- Compatibility Matrix end -->

## Release process
## Development

### Testing

The e2e tests can be executed locally by running the following commands:

1. Use an existing cluster, or set up a test cluster, e.g.:

```bash
# Create a KinD cluster
$ kind create cluster --image kindest/node:v1.25.8
# Install the CRDs
$ make install
astefanutti marked this conversation as resolved.
Show resolved Hide resolved
```

2. Set up the CodeFlare stack:
```bash
$ make setup-e2e
```

3. Start the operator locally:

```bash
$ make run
```
astefanutti marked this conversation as resolved.
Show resolved Hide resolved

Alternatively, You can run the operator from your IDE / debugger.

4. In a separate terminal, run the e2e suite:

```bash
$ make test-e2e
```

Alternatively, You can run the e2e test(s) from your IDE / debugger.

## Release

Prerequisite:
- Build and release [MCAD](https://github.com/project-codeflare/multi-cluster-app-dispatcher)
Expand Down
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ require (
github.com/manifestival/manifestival v0.7.2
github.com/onsi/ginkgo/v2 v2.9.2
github.com/onsi/gomega v1.27.6
github.com/project-codeflare/multi-cluster-app-dispatcher v1.31.0
github.com/ray-project/kuberay/ray-operator v0.0.0-20230614221720-085c29d40fa9
go.uber.org/zap v1.24.0
k8s.io/api v0.26.3
k8s.io/apimachinery v0.26.3
Expand Down
4 changes: 4 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,8 @@ github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINE
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/pquerna/cachecontrol v0.0.0-20171018203845-0dec1b30a021/go.mod h1:prYjPmNq4d1NPVmpShWobRqXY3q7Vp+80DqgxxUrUIA=
github.com/project-codeflare/multi-cluster-app-dispatcher v1.31.0 h1:vq4fAuvlv4Zvnx0dA53WaYkTWG1BFxAkamxuzHfZO2M=
github.com/project-codeflare/multi-cluster-app-dispatcher v1.31.0/go.mod h1:fmbU5LuV1Z2Sbu1FCEoVuw8qxDFcalXvkPyMfGZHHTc=
github.com/prometheus/client_golang v0.9.1/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXPKyh/dDVn+NZz0KFw=
github.com/prometheus/client_golang v0.9.3/go.mod h1:/TN21ttK/J9q6uSwhBd54HahCDft0ttaMvbicHlPoso=
github.com/prometheus/client_golang v1.0.0/go.mod h1:db9x61etRT2tGnBNRi70OPL5FsnadC4Ky3P0J6CfImo=
Expand Down Expand Up @@ -459,6 +461,8 @@ github.com/prometheus/procfs v0.7.3/go.mod h1:cz+aTbrPOrUb4q7XlbU9ygM+/jj0fzG6c1
github.com/prometheus/procfs v0.8.0 h1:ODq8ZFEaYeCaZOJlZZdJA2AbQR98dSHSM1KW/You5mo=
github.com/prometheus/procfs v0.8.0/go.mod h1:z7EfXMXOkbkqb9IINtpCn86r/to3BnA0uaxHdg830/4=
github.com/prometheus/tsdb v0.7.1/go.mod h1:qhTCs0VvXwvX/y3TZrWD7rabWM+ijKTux40TwIPHuXU=
github.com/ray-project/kuberay/ray-operator v0.0.0-20230614221720-085c29d40fa9 h1:qIThU9GGqEay/y78y4Y9e1FVfrdkH5MFnT0zEJ9yh0A=
github.com/ray-project/kuberay/ray-operator v0.0.0-20230614221720-085c29d40fa9/go.mod h1:2auArgwD9dXXJz1oc7SqQ4U/rHdpwnrBwG98kr8OWXA=
github.com/rogpeppe/fastuuid v0.0.0-20150106093220-6724a57986af/go.mod h1:XWv6SoW27p1b0cqNHllgS5HIMJraePCO15w5zCzIWYg=
github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFRclV5y23lUDJ4=
github.com/russross/blackfriday/v2 v2.0.1/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
Expand Down
31 changes: 31 additions & 0 deletions test/e2e/kind.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# ---------------------------------------------------------------------------
# Copyright 2023.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ---------------------------------------------------------------------------

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."${REGISTRY_ADDRESS}"]
endpoint = ["http://${REGISTRY_ADDRESS}"]
Loading