Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Scripts for e2e tests #2128

Merged
merged 10 commits into from
Feb 10, 2022

Conversation

kimwnasptd
Copy link
Member

This is a first step for #2099

The PR reworks the entire tests repo to include a simple e2e test that:

  1. Installs all the KF manifests
  2. Runs the E2E Notebooks we have in https://github.com/kubeflow/pipelines/blob/master/samples/contrib/kubeflow-e2e-mnist/kubeflow-e2e-mnist.ipynb

While developing the tests locally I was using KinD to spin up clusters and test everything from scratch. With this mentality, and along the fact that we don't have many maintainers in the kubeflow/testing repo, I tried to use GH actions to run these E2E tests.

But the performance was abysmal. Even with optimizations, like loading etcd in memory kubernetes-sigs/kind#845, I couldn't get the tests to succeed. It took more than 45 mins to apply all the manifests and then the test failed due to etcd timeouts.

I've included the GH workflow files, to give it one last try, but I don't have many expectations. The next step for this is to see how could we use the AWS infra/prow to spin up the clusters and try our luck with there. The current blocking PR for this is kubeflow/testing#972.

cc @kubeflow/release-team
/cc @StefanoFioravanzo @elikatsis

Signed-off-by: Kimonas Sotirchos <[email protected]>
Signed-off-by: Kimonas Sotirchos <[email protected]>
Signed-off-by: Kimonas Sotirchos <[email protected]>
Signed-off-by: Kimonas Sotirchos <[email protected]>
Signed-off-by: Kimonas Sotirchos <[email protected]>
@kimwnasptd
Copy link
Member Author

The PR also broke the unit tests. I'll send a fix for them. We should be able to run unit tests for GH actions though, since this is not resource intensive.

/assign @kimwnasptd

@kimwnasptd
Copy link
Member Author

After inspecting the VM sizes of the GH actions I realized that the machines were running out of memory when I was installing all the manifests. It's a 7Gb VM.

I managed to created a "light" version of the manifests that only install the barebone components, to get the tests running. This time it took 6mins to install, but in the end got timeouts during training. This is most probably due to CPU limitations.

@StefanoFioravanzo @elikatsis I'm going to remove the GH action and fix unit tests. Then as a next step I'll implement this by using Prow.

To avoid having to use --load_restrictor none we'll need to wrap the
KServe manifests inside a kustomization.yaml file.

Signed-off-by: Kimonas Sotirchos <[email protected]>
Signed-off-by: Kimonas Sotirchos <[email protected]>
We should use prow instead to trigger our e2e tests.

Signed-off-by: Kimonas Sotirchos <[email protected]>
@kimwnasptd
Copy link
Member Author

@elikatsis @StefanoFioravanzo this should be ready for review. To summarize:

  1. I've removed old the deprecated files in the tests dir
  2. I've updated the unit tests to build the example kustomization, as a quick sanity check that most of the components's files can be generated
  3. I've prepared some helpers in hack for spinning up a KinD cluster locally, so that someone can test the e2e scripts
  4. I've created tests/e2e scripts, which we can later use with Prow

The next step will be to use Prow, with workflows similar to what we do for Notebooks, in order to run our e2e tests there. https://github.com/kubeflow/kubeflow/blob/master/prow_config.yaml

@elikatsis
Copy link
Member

Thanks for all the effort and the troubleshooting @kimwnasptd. Orchestrating all of this testing.. phew.. That's great!

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elikatsis, kimwnasptd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [elikatsis,kimwnasptd]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit d9d3fe2 into kubeflow:master Feb 10, 2022
@elikatsis elikatsis deleted the feature-kimwnasptd-e2e-tests branch February 10, 2022 19:27
kimwnasptd added a commit to arrikto/kubeflow-manifests that referenced this pull request Feb 15, 2022
* remove old test files

Signed-off-by: Kimonas Sotirchos <[email protected]>

* gitignore: Don't track pyc files

Signed-off-by: Kimonas Sotirchos <[email protected]>

* flake8: Introduce linting file

Signed-off-by: Kimonas Sotirchos <[email protected]>

* hack: Introduce scripts for cluster manipulation

Signed-off-by: Kimonas Sotirchos <[email protected]>

* tests: Add e2e test

Signed-off-by: Kimonas Sotirchos <[email protected]>

* GH action for running e2e test

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Reduce the installed components and system reqs

Signed-off-by: Kimonas Sotirchos <[email protected]>

* kserve: Add simple kustomization file

To avoid having to use --load_restrictor none we'll need to wrap the
KServe manifests inside a kustomization.yaml file.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* unittests: Fix unit tests

Signed-off-by: Kimonas Sotirchos <[email protected]>

* gh: Remove action for e2e tests

We should use prow instead to trigger our e2e tests.

Signed-off-by: Kimonas Sotirchos <[email protected]>
google-oss-prow bot pushed a commit that referenced this pull request Feb 16, 2022
* tests: Scripts for e2e tests (#2128)

* remove old test files

Signed-off-by: Kimonas Sotirchos <[email protected]>

* gitignore: Don't track pyc files

Signed-off-by: Kimonas Sotirchos <[email protected]>

* flake8: Introduce linting file

Signed-off-by: Kimonas Sotirchos <[email protected]>

* hack: Introduce scripts for cluster manipulation

Signed-off-by: Kimonas Sotirchos <[email protected]>

* tests: Add e2e test

Signed-off-by: Kimonas Sotirchos <[email protected]>

* GH action for running e2e test

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Reduce the installed components and system reqs

Signed-off-by: Kimonas Sotirchos <[email protected]>

* kserve: Add simple kustomization file

To avoid having to use --load_restrictor none we'll need to wrap the
KServe manifests inside a kustomization.yaml file.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* unittests: Fix unit tests

Signed-off-by: Kimonas Sotirchos <[email protected]>

* gh: Remove action for e2e tests

We should use prow instead to trigger our e2e tests.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Add networkpolicies under /contrib/networkpolicies (#2121)

* Create .gitkeep

* Add files via upload

* Create OWNERS

* Create README.md

* Delete default-deny-not-istio-system.yaml

* Create default-allow-same-namespace.yaml

* Create centraldashboard.yaml

* Create jupyter-web-app.yaml

* Create katib-ui.yaml

* Create kfserving-models-web-app.yaml

* Create ml-pipeline-ui.yaml

* Update ml-pipeline.yaml

* Create volumes-web-app.yaml

* Update kustomization.yaml

* Update OWNERS

* Sync kubeflow pipelines manifests 1.8.0 rc.2 (#2131)

* hack: Update pipelines sync script to change README

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Update kubeflow/pipelines manifests from 1.8.0-rc.2

* Sync kubeflow kubeflow manifests v1.5.0 rc.1 (#2134)

* hack: Sync README for kubeflow/kubeflow sync-script

Extend the sync-script for kubeflow/kubeflow to also update the
components versions in the readme.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Update kubeflow/kubeflow manifests from v1.5.0-rc.1

* Sync kserve/models-web-app manifests (#2135)

* kserve: Rename from upstream to kserve

We will be including both kserve/kserve and kserve/models-web-app into
the manifests, so the names will need to reflect this.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* kserve: Add manifests for the models-web-app

Include the MWA manifests from the v0.7.0 tag.
https://github.com/kserve/models-web-app/tree/v0.7.0

Signed-off-by: Kimonas Sotirchos <[email protected]>

* kserve: Include both kserve and mwa manifests

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Update kubeflow/kfp-tekton manifests from v1.1.1 (#2141)

* hack: Update tekton script to edit README

The hack script for updating the kfp-tekton manifests should also be
updating the README file as well.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Update kubeflow/kfp-tekton manifests from v1.1.1

* Update manifests for Katib v0.13.0-rc.1 release (#2139)

* Update manifests for Katib v0.13.0-rc.1 release

* Change README

* readme: Remove MPI reference and add ingress distributions link (#2143)

* Closes #1963
* Remove unused MPI reference (PR #2119)

* Update kubeflow/pipelines manifests from 1.8.0 (#2144)

Signed-off-by: Kimonas Sotirchos <[email protected]>

* hack: Don't error if namespace kubeflow exists (#2140)

The helper setup scripts should not error when the namespaces already
exist.

Signed-off-by: Kimonas Sotirchos <[email protected]>

Co-authored-by: juliusvonkohout <[email protected]>
Co-authored-by: Andrey Velichkevich <[email protected]>
Co-authored-by: a9p <[email protected]>
VaishnaviHire pushed a commit to VaishnaviHire/manifests that referenced this pull request Aug 11, 2022
* tests: Scripts for e2e tests (kubeflow#2128)

* remove old test files

Signed-off-by: Kimonas Sotirchos <[email protected]>

* gitignore: Don't track pyc files

Signed-off-by: Kimonas Sotirchos <[email protected]>

* flake8: Introduce linting file

Signed-off-by: Kimonas Sotirchos <[email protected]>

* hack: Introduce scripts for cluster manipulation

Signed-off-by: Kimonas Sotirchos <[email protected]>

* tests: Add e2e test

Signed-off-by: Kimonas Sotirchos <[email protected]>

* GH action for running e2e test

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Reduce the installed components and system reqs

Signed-off-by: Kimonas Sotirchos <[email protected]>

* kserve: Add simple kustomization file

To avoid having to use --load_restrictor none we'll need to wrap the
KServe manifests inside a kustomization.yaml file.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* unittests: Fix unit tests

Signed-off-by: Kimonas Sotirchos <[email protected]>

* gh: Remove action for e2e tests

We should use prow instead to trigger our e2e tests.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Add networkpolicies under /contrib/networkpolicies (kubeflow#2121)

* Create .gitkeep

* Add files via upload

* Create OWNERS

* Create README.md

* Delete default-deny-not-istio-system.yaml

* Create default-allow-same-namespace.yaml

* Create centraldashboard.yaml

* Create jupyter-web-app.yaml

* Create katib-ui.yaml

* Create kfserving-models-web-app.yaml

* Create ml-pipeline-ui.yaml

* Update ml-pipeline.yaml

* Create volumes-web-app.yaml

* Update kustomization.yaml

* Update OWNERS

* Sync kubeflow pipelines manifests 1.8.0 rc.2 (kubeflow#2131)

* hack: Update pipelines sync script to change README

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Update kubeflow/pipelines manifests from 1.8.0-rc.2

* Sync kubeflow kubeflow manifests v1.5.0 rc.1 (kubeflow#2134)

* hack: Sync README for kubeflow/kubeflow sync-script

Extend the sync-script for kubeflow/kubeflow to also update the
components versions in the readme.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Update kubeflow/kubeflow manifests from v1.5.0-rc.1

* Sync kserve/models-web-app manifests (kubeflow#2135)

* kserve: Rename from upstream to kserve

We will be including both kserve/kserve and kserve/models-web-app into
the manifests, so the names will need to reflect this.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* kserve: Add manifests for the models-web-app

Include the MWA manifests from the v0.7.0 tag.
https://github.com/kserve/models-web-app/tree/v0.7.0

Signed-off-by: Kimonas Sotirchos <[email protected]>

* kserve: Include both kserve and mwa manifests

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Update kubeflow/kfp-tekton manifests from v1.1.1 (kubeflow#2141)

* hack: Update tekton script to edit README

The hack script for updating the kfp-tekton manifests should also be
updating the README file as well.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Update kubeflow/kfp-tekton manifests from v1.1.1

* Update manifests for Katib v0.13.0-rc.1 release (kubeflow#2139)

* Update manifests for Katib v0.13.0-rc.1 release

* Change README

* readme: Remove MPI reference and add ingress distributions link (kubeflow#2143)

* Closes kubeflow#1963
* Remove unused MPI reference (PR kubeflow#2119)

* Update kubeflow/pipelines manifests from 1.8.0 (kubeflow#2144)

Signed-off-by: Kimonas Sotirchos <[email protected]>

* hack: Don't error if namespace kubeflow exists (kubeflow#2140)

The helper setup scripts should not error when the namespaces already
exist.

Signed-off-by: Kimonas Sotirchos <[email protected]>

Co-authored-by: juliusvonkohout <[email protected]>
Co-authored-by: Andrey Velichkevich <[email protected]>
Co-authored-by: a9p <[email protected]>
kevin85421 pushed a commit to juliusvonkohout/manifests that referenced this pull request Feb 28, 2023
* remove old test files

Signed-off-by: Kimonas Sotirchos <[email protected]>

* gitignore: Don't track pyc files

Signed-off-by: Kimonas Sotirchos <[email protected]>

* flake8: Introduce linting file

Signed-off-by: Kimonas Sotirchos <[email protected]>

* hack: Introduce scripts for cluster manipulation

Signed-off-by: Kimonas Sotirchos <[email protected]>

* tests: Add e2e test

Signed-off-by: Kimonas Sotirchos <[email protected]>

* GH action for running e2e test

Signed-off-by: Kimonas Sotirchos <[email protected]>

* Reduce the installed components and system reqs

Signed-off-by: Kimonas Sotirchos <[email protected]>

* kserve: Add simple kustomization file

To avoid having to use --load_restrictor none we'll need to wrap the
KServe manifests inside a kustomization.yaml file.

Signed-off-by: Kimonas Sotirchos <[email protected]>

* unittests: Fix unit tests

Signed-off-by: Kimonas Sotirchos <[email protected]>

* gh: Remove action for e2e tests

We should use prow instead to trigger our e2e tests.

Signed-off-by: Kimonas Sotirchos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants