Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand test framework to include upstream k8s testing #2826

Closed
CecileRobertMichon opened this issue Mar 30, 2020 · 38 comments
Closed

Expand test framework to include upstream k8s testing #2826

CecileRobertMichon opened this issue Mar 30, 2020 · 38 comments
Labels
area/testing Issues or PRs related to testing help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Milestone

Comments

@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Mar 30, 2020

⚠️ Cluster API maintainers can ask to turn an issue-proposal into a CAEP when necessary, this is to be expected for large changes that impact multiple components, breaking changes, or new large features.

ie. Use CAPI to test Kubernetes

Goals

  1. Use CAPX as deployer to test upstream k/k changes
  2. Use CAPX as deployer to test k8s-sigs projects such as out-of-tree cloud providers
  3. Run upstream k8s conformance tests against CAPI
  4. Encourage reuse across different infra providers instead of maintaining bash scripts in each in provider repo (right now there are scripts in CAPG, CAPA, and CAPZ, with significant overlap). We intend to extend the current test/framework to allow this proposal to be implemented there.

Non-Goals/Future Work

  1. Add a kubetest deployer for CAPI.
  2. Run the tests as PR gates on k/k.

User Story

As a developer I would like to run k/k E2E tests on a CAPI cluster to test changes made to a k8s component.

Detailed Description

NOTE : this is a very rough draft based on working group meeting (recording here). Will evolve this as we continue the discussion with the wider community and come up with implementation details. Just hoping to get the discussion started with this issue.

  1. Build k/k binaries (e2e.test, kubectl, ginkgo)
  2. (optionally) build k8s component images from private SHA (if the images aren't already available on a registry)
  3. Create a cluster with custom k8s binaries & container images
    In order to use a custom k8s build (example k/k master), there are a few different options:
  • Build a custom image with image-builder as part of CI and use that image in the cluster
    • pros: can reuse the image for multiple nodes
    • cons: time consuming, building a VM image with packer takes ~20 minutes
  • Use an existing image (possibly with a different k8s version) and in the KubeadmConfig pass in a PreKubeadmCommand script to replace the k8s version with the one we want.
    • pros: doesn't require building an image, faster
    • cons: we have to do this for every vm, hacky (bash script might be error prone), different from user experience with capi (with images)
  • Modify capi infra providers to take custom k8s component images
    • pros: can be reused more easily by users not familiar with the project and CI, doesn't require the preKubeadm "hack" script, or reusing a VM image.
    • cons: more work and changes involved.
  1. Run tests suite: k/k E2E, cloud provider E2E, other k8s-sigs E2E, etc.

related to #2141, which might overlap in implementation details but has different objectives: #2141 aims to test if capi/capz/capa/capv/capg/etc passes k8s conformance whereas this proposal would be to use CAPI as a dev tool to test k8s and k8s-sigs changes.

/kind proposal

cc @dims @vincepri @alexeldeib @fabriziopandini @ritazh @chewong @randomvariable @rbitia

@k8s-ci-robot k8s-ci-robot added the kind/proposal Issues or PRs related to proposals. label Mar 30, 2020
@vincepri
Copy link
Member

Thanks for the write-up Cecile!

I'd expand the 4th goal mentioning we intend to extend the current test/framework to allow this proposal to be implemented there.

Modify capi infra providers to take custom k8s component images

Would this entail changes to image-builder to take some custom scripts that can setup images in a custom fashion? We should probably tackle this separately, it's a really interesting idea and would make the images generic, although I assume folks will probably need internet access so it might not work in every environment.

@timothysc
Copy link
Member

Modify capi infra providers to take custom k8s component images
pros: can be reused more easily by users not familiar with the project and CI, doesn't require the preKubeadm "hack" script, or reusing a VM image.

+1 ^ this is generally more useful for testing, but I still see a problem with the kubelet. You can override almost everything else, but the kubelet running on the base OS built by the image builder is not easily replaced unless you combined a rpm/deb update/install on cloud-init.

@CecileRobertMichon
Copy link
Contributor Author

CecileRobertMichon commented Mar 31, 2020

@vincepri @timothysc what I meant by

Modify capi infra providers to take custom k8s component images

Is that instead of using preKubeadmCommand to pass in the script that overrides the k8s version, we add a new property, maybe under a feature gate, to pass in a "custom" k8s version (what we call CI_VERSION in the script above, or custom k8s components images, and run a script to install that version on the VMs before running the bootstrap script or as part of the bootstrap script.

A better place for this might actually be the bootstrap provider, not the infra providers now that I think about it. @vincepri I don't think this entails changes to image builder as I'm not talking about building any new images but rather using cloud init to install k8s components during provisioning. This does require internet access but so does our current preKubeadmCommand solution. The advantage here is that it would be more reusable and we could use it with a combination of a user preKubeadmCommand.

@timothysc for kubelet we'd need to do a systemctl restart kubelet after installing the desire kubelet binary, just like we do it in the preKubeadmCommand right now.

The other possibility is to change kubeadm to allow passing in custom component images (if it's not already supported, I don't think it is from what I've seen). So your kubeadm config would look something like:

kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data["local_hostname"] }}'
        customKubeletVersion: v1.19.0-alpha.1.175+7b1a531976be0d
        kubeletExtraArgs:
          cloud-provider: azure
          cloud-config: /etc/kubernetes/azure.json
    joinConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data["local_hostname"] }}'
        customKubeletVersion: v1.19.0-alpha.1.175+7b1a531976be0d
        kubeletExtraArgs:
          cloud-provider: azure
          cloud-config: /etc/kubernetes/azure.json
    clusterConfiguration:
      apiServer:
        timeoutForControlPlane: 20m
        customImage: myDockerHubUser/custom-api-server-build:v1.19.0-dirty
        extraArgs:
          cloud-provider: azure
          cloud-config: /etc/kubernetes/azure.json
        extraVolumes:
           - [...]
      controllerManager:
        customImage: myDockerHubUser/custom-controller-manager-build:v1.19.0-dirty
        extraArgs:
          cloud-provider: azure
          cloud-config: /etc/kubernetes/azure.json
          allocate-node-cidrs: "false"
        extraVolumes:
          - [...]

And have kubeadm pull the right images / components before init/join in cloud init. Basically I'm just trying to think of ways we can build a k8s cluster with custom builds of various k8s components installed without having to build VM images in every test.

@alexeldeib
Copy link
Contributor

FYI, in 1.16 kubeadm started supporting Kustomize patches (-k flag) on the static manifests. Might be useful:

@detiber
Copy link
Member

detiber commented Mar 31, 2020

I kind of like the idea of adding support to the bootstrap provider (and hiding it behind a feature gate). It would allow us to recreate the existing functionality in a more centralized and re-usable way than exists today.

If nothing else it would provide a good stopgap until we can better define an automated pipeline where we could consume images that are automatically built using image-builder from the latest k8s artifacts.

@detiber
Copy link
Member

detiber commented Mar 31, 2020

FYI, in 1.16 kubeadm started supporting Kustomize patches (-k flag) on the static manifests. Might be useful:

We would either need to validation against k8s requested version and kustomize patches or wait until we are ready to declare that we are only willing to support workload clusters >= v1.16 if we go down that path.

@fabriziopandini
Copy link
Member

I'd expand the 4th goal

+1 to this, I would like also to consider the idea of having Cluster APi conformance tests (as a next step for the work started with #2753)

The other possibility is to change kubeadm to allow passing in custom component images

This should be already possible, I can give examples if required.

@CecileRobertMichon
Copy link
Contributor Author

@fabriziopandini would love examples if you have them

@elmiko
Copy link
Contributor

elmiko commented Apr 1, 2020

just to add an extra layer to this conversation, i am looking at contributing some e2e tests for the kubernetes autoscaler that use cluster-api. although we will start by using the docker provider to help keep the resources low, i think it would not be difficult to have these tests also use cloud providers at some point.

@vincepri
Copy link
Member

vincepri commented Apr 1, 2020

/milestone v0.3.x

@k8s-ci-robot k8s-ci-robot added this to the v0.3.x milestone Apr 1, 2020
@CecileRobertMichon
Copy link
Contributor Author

@vincepri with the new v1alpha3+ roadmap should this be 0.3.x or 0.4.x?

@vincepri
Copy link
Member

@CecileRobertMichon This could be added to v0.3.x in a backward compatible way. I'm unclear though if we have folks interested in working on it.

@CecileRobertMichon
Copy link
Contributor Author

is it okay to mark this with help wanted? I can probably help with some of it but I don't think I have bandwidth to work on it full time right away.

@vincepri
Copy link
Member

/help

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 17, 2020
@vincepri
Copy link
Member

/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Apr 27, 2020
@vincepri vincepri added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/proposal Issues or PRs related to proposals. labels Apr 27, 2020
@randomvariable
Copy link
Member

/assign

CAPA conformance was giving me grief so i kind of started doing it.

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jun 18, 2020
@elmiko
Copy link
Contributor

elmiko commented Jun 18, 2020

hey @randomvariable, just an update to the comment i made previously in this thread. i have started to hack on an experiment where i have broken out the autoscaler tests from upstream and started to make them into a library.

the general idea is that currently the upstream autoscaler tests are heavily tied into the gce/gke provider. i am working towards rewriting the provider interface so that it could be used generically (ie more widely applicable abstractions). the end result from this would be a series of tests that can be consumed as a lirbary with the user passing in a provider during their compile, in essence providing a generic suite of tests that can be consumed from the outside (no included providers).

i certainly don't think you should wait for me, but i wanted to let you know what i've been hacking on.

@vincepri
Copy link
Member

/milestone v0.4.0

@fabriziopandini
Copy link
Member

@CecileRobertMichon FYI in kubeadm we are using images from CI only, because we determined that having a small delay from them tip of Kubernetes is not a problem, especially given code freeze near release.
So I personally think CAPD - with its own build from source code - as an exception, not the rule we have to follow

@jsturtevant
Copy link
Contributor

Trying to understand my options and roadmap for running upstream k8s e2e tests using CAPI (starting to look at this for Windows). Looks like we have a few options.

This is attempt to summarize where we are at right now:

in CAPZ we also have

  • ci-entrypoint.sh (custom script that builds k8s and runs those binaries)
    • this could be removed once we have the kubetest2 with the ability to support --build?

Is the end goal to be able to support all this functionality fully via kubetest2 deployer?

@fabriziopandini
Copy link
Member

@jsturtevant IMO #3652 and #4041 are dealing with two different goals:

  1. Ensure Cluster API stability (Test Cluster API itself)
  2. Test Kubernetes using Cluster API (Cluster API as a release blocker in Kubernetes)

I think that for this specific issue 2/#4041 is the most appropriate answer but I'm not sure if/how the two things could converge.
WRT to this might be that using kubetest as a package name in #3652 wasn't a good choice...

@vincepri
Copy link
Member

/milestone v0.4.x

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.4.0, v0.4.x Feb 19, 2021
@CecileRobertMichon CecileRobertMichon modified the milestones: v0.4.x, v0.4 Mar 22, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 21, 2021
@fabriziopandini
Copy link
Member

/remove-lifecycle rotten
@CecileRobertMichon what about breaking down this issue in a set of small actionable items?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 21, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2021
@vincepri vincepri modified the milestones: v0.4, v1.1 Oct 22, 2021
@randomvariable
Copy link
Member

/unassign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 5, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Issues or PRs related to testing help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet