Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable e2e tests using emulators for cloud providers [ aws, gcp, azure ] #743

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

anveshreddy18
Copy link
Contributor

@anveshreddy18 anveshreddy18 commented May 22, 2024

What this PR does / why we need it:

This PR enables e2e tests ( cluster based tests ) to be run with the emulators of 3 major cloud providers i.e AWS, GCP, AZURE. And of course can be run with the real infra as well. The reason for using emulators is to reduce the cost of using real infra every time a PR is raised because with the enablement of e2e tests with this PR, these tests will be run as prow job for every PR.

This PR also partially updates the outdated charts to reflect the changes made to the backup-restore over the years. This will help the users with setting up backup-restore effortlessly and also helps the developers with testing etcdbr standalone

Which issue(s) this PR fixes:
Fixes #722

Special notes for your reviewer:

Release note:

Enable e2e tests with cloud provider emulators and update helm charts for latest version of `etcdbr` to be used with etcd-wrapper

@gardener-robot gardener-robot added the needs/rebase Needs git rebase label May 22, 2024
@gardener-robot
Copy link

@anveshreddy18 You need rebase this pull request with latest master branch. Please check.

@gardener-robot gardener-robot added needs/review Needs review size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else labels May 22, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 22, 2024
Copy link

gitguardian bot commented May 22, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
- RSA Private Key 5454c29 hack/e2e-test/infrastructure/kind/kubeconfig View secret
- RSA Private Key dd4a097 hack/e2e-test/infrastructure/kind/kubeconfig View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@gardener-robot-ci-2 gardener-robot-ci-2 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 22, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 22, 2024
@anveshreddy18 anveshreddy18 self-assigned this May 23, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 29, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 29, 2024
@shreyas-s-rao shreyas-s-rao added this to the v0.30.0 milestone Jun 5, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 6, 2024
@anveshreddy18 anveshreddy18 marked this pull request as ready for review June 6, 2024 09:07
@anveshreddy18 anveshreddy18 requested a review from a team as a code owner June 6, 2024 09:07
@renormalize renormalize self-assigned this Jun 6, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 10, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 10, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 10, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 18, 2024
@shreyas-s-rao shreyas-s-rao self-assigned this Jun 24, 2024
Copy link
Member

@renormalize renormalize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the much needed PR @anveshreddy18!

I haven't taken a look at the changes in test/integrationcluster just yet. Will take a look in a following review.

@@ -155,6 +158,9 @@ EOF
}
EOF
export AWS_APPLICATION_CREDENTIALS_JSON="${credentials_file}"
export AWS_ACCESS_KEY_ID=$1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these variables being exported as env variables when they weren't needed to be exposed before?
The aws cli picks up credentials from .aws that is created in this function anyway, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I would like to avoid using environment variables to pass credential information to the etcd-backup-restore process (not just for the main codebase, but for the tests as well).

pullPolicy: IfNotPresent
# etcd-backup-restore image to use
etcdBackupRestore:
repository: europe-docker.pkg.dev/gardener-project/public/gardener/etcdbrctl
tag: v0.12.1
tag: v0.28.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a latest tag available for etcd-backup-restore the way there is for etcd-druid?
It is done that way in etcd-druid's values.yaml

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @renormalize . Etcd-druid pipeline_definitions defines tag_as_latest field for the docker image here, but is missing for etcdbrctl and etcd-wrapper. Thanks for pointing it out. I'll raise PRs to fix this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised gardener/etcd-wrapper#39 and #824 to add latest tag, but only to the upcoming releases. I have also tagged the latest versions (etcd-wrapper:v0.3.0 and etcdbrctl:v0.32.0) as latest in the artefact registry, so you can go ahead and use the latest tag already.

- ETCD_VERSION: optional, defaults to `v3.4.13-bootstrap-1`
- ETCDBR_VERSION: optional, defaults to `v0.12.1`
- ETCD_VERSION: optional, defaults to etcd-wrapper `v0.1.1`
- ETCDBR_VERSION: optional, defaults to `v0.28.0`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the latest tag is available as mentioned in this comment, this would have to be updated as well.


The e2e tests for etcd-backup-restore are the integrationcluster tests in the `test/integrationcluster` package. These tests are run on a Kubernetes cluster and test the full functionality of etcd-backup-restore. The tests create a provider namespace on the cluster and deploy the [etcd-backup-restore helm chart](../../chart/etcd-backup-restore) which in turn deploys the required secrets, configmap, services and finally the statefulset which deploys the pod that runs etcd and backup-restore as a sidecar.

These tests are setup to be run with both emulators and real cloud providers. The emulators can be used for local development and testing as well as prow job to test code changes when a PR is raised. The real cloud providers can be used for testing in a real cloud environment to ensure that the changes work as expected in a real environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't necessarily think mentioning testing through prow jobs here is relevant. But don't have a strong opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @renormalize . This doc should simply mention about how e2e tests can be run against any kubernetes cluster, and that etcd-backup-restore leverages a KIND cluster for this purpose. The code/tests have no idea about where they are run (in a prow cluster or otherwise), so that need not be mentioned here.

make ci-e2e-kind PROVIDERS="aws,gcp"
```


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +75 to +94
#### AWS

For AWS, first get the AWS credentials and update the `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `AWS_DEFAULT_REGION` variables of the `hack/config/aws_config.sh` file with the correct values and also update the creation of `/tmp/aws.json` file with the correct values. It should look like below snippet after removing the `endpoint` and `s3ForcePathStyle` fields:

```sh
export AWS_APPLICATION_CREDENTIALS_JSON="/tmp/aws.json"
echo "{ \"accessKeyID\": \"${AWS_ACCESS_KEY_ID}\", \"secretAccessKey\": \"${AWS_SECRET_ACCESS_KEY}\", \"region\": \"${AWS_DEFAULT_REGION}\" }" > "${AWS_APPLICATION_CREDENTIALS_JSON}"
```

Apart from providing the required credentials for real provider, one should also remove the variables that signify the usage of an emulator, which are `AWS_ENDPOINT_URL_S3` and `LOCALSTACK_HOST` for Localstack. Remove them from the make command of the `hack/ci-e2e-kind.sh` script before running the tests.

With these changes made, the tests can be run in the same way as with the emulators.

#### GCP

For GCP, first get the GCP credential service account json file and your project ID and replace the variables `GOOGLE_APPLICATION_CREDENTIALS` and `GCP_PROJECT_ID` in configuration file located at `hack/config/gcp_config.sh` with the path to this service account file and your project ID respectively. We also need to remove the environment variables for the fakegcs provider from the make command of the `hack/ci-e2e-kind.sh` file, for that one should remove the variables `GOOGLE_EMULATOR_ENABLED`, `GCS_EMULATOR_HOST` and `GOOGLE_STORAGE_API_ENDPOINT` and run the tests in the same way as with the emulators.

#### Azure

For Azure, first get the Azure credentials and update the `STORAGE_ACCOUNT` and `STORAGE_KEY` env variables in the configuration file located at `hack/config/azure_config.sh` with the correct values. Also, remove the variables for the azurite, which are `AZURE_STORAGE_API_ENDPOINT`, `AZURE_EMULATOR_ENABLED`, `AZURITE_HOST` and `AZURE_STORAGE_CONNECTION_STRING` from the make command of the `hack/ci-e2e-kind.sh` script. With these changes made, the tests can be run in the same way as with the emulators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, there is no heading signifying that these paragraphs are describing running end to end tests with real infrastructure.
A sub-heading could be added, and these three sections could be nested in that.

Also, I do not agree with the recommendation of changing the hardcoded credentials for the emulators with the credentials for real infrastructure in the scripts.
It is very likely that a user would forget that they updated the emulator credentials with real credentials, and they check those credentials into their code, and push it to their remotes.

If someone wants to do it at their own risk they could, but we should not specify in the documentation that this is the recommended way of testing with real infrastructure.

Comment on lines +20 to +21
/bin
/hack/tools/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to add the / in front of bin and hack?
It would be cleaner if these lines are consistent with the rest of the file where / is not added to the beginning.

function cleanup() {
if containsElement $STEPS "cleanup" || [ $cleanup_required = true ]; then
kind delete cluster --name etcdbr-e2e
echo "Cleaning up completed."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the kubeconfig file that is auto generated through kind, also be removed in the cleanup?
If you think that's unnecessary then we don't need to do it.

Copy link
Collaborator

@shreyas-s-rao shreyas-s-rao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anveshreddy18 thanks for the awesome and well-written PR! I have a few review comments from my side, mainly to do with renaming integrationcluster to e2e, and to avoid the use of environment variables to pass credential information, and a few nits. PTAL, thanks.

@@ -155,6 +158,9 @@ EOF
}
EOF
export AWS_APPLICATION_CREDENTIALS_JSON="${credentials_file}"
export AWS_ACCESS_KEY_ID=$1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I would like to avoid using environment variables to pass credential information to the etcd-backup-restore process (not just for the main codebase, but for the tests as well).

Comment on lines +130 to +132
export AWS_DEFAULT_REGION=${REGION}
export AWS_ACCESS_KEY_ID=`cat ${HOME}/.aws/credentials | grep -e "^.*aws_access_key_id.*$" | sed "s/^.*aws_access_key_id[ ]*=[ ]*//"`
export AWS_SECRET_ACCESS_KEY=`cat ${HOME}/.aws/credentials | grep -e "^.*aws_secret_access_key.*$" | sed "s/^.*aws_secret_access_key[ ]*=[ ]*//"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. If the credentials can be fetched from the .aws folder, or from an applicationCredentials JSON file, then I would prefer that over exposing the env vars

Comment on lines +296 to +299
export ETCD_VERSION=${ETCD_VERSION:-"v0.1.1"}
echo "Etcd version: ${ETCD_VERSION}"

export ETCDBR_VERSION=${ETCDBR_VERSION:-${ETCDBR_VER:-"v0.12.1"}}
export ETCDBR_VERSION=${ETCDBR_VERSION:-${ETCDBR_VER:-"v0.29.0-dev"}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the etcd-wrapper and etcd-backup-restore versions to the latest versions?

echo "Etcd-backup-restore version: ${ETCDBR_VERSION}"

echo "Starting integration tests on k8s cluster."

set +e

if [ -r "$INTEGRATION_TEST_KUBECONFIG" ]; then
KUBECONFIG=$INTEGRATION_TEST_KUBECONFIG STORAGE_CONTAINER=$TEST_ID ginkgo -v -timeout=15m -mod=vendor test/e2e/integrationcluster
KUBECONFIG=$INTEGRATION_TEST_KUBECONFIG STORAGE_CONTAINER=$TEST_ID PROVIDER=aws ginkgo -v -timeout=15m -mod=vendor test/integrationcluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we defaulting to aws here, given that the PR adds support for all the 3 major providers? Shouldn't the providers option be taking from the args passed to the integration_test script?

Comment on lines +19 to +20
TOOLS_DIR := hack/tools
include $(REPO_ROOT)/hack/tools.mk
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Etcdbr master branch now includes

include hack/tools.mk

Can you please replace that with your changes - basically rebase on latest master and take care of the merge conflicts.

providerAWS = "aws"
providerAzure = "azure"
providerGCP = "gcp"
providerNone = "none"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable

Comment on lines +55 to +65
envS3AccessKeyID = "AWS_ACCESS_KEY_ID"
envS3SecretAccessKey = "AWS_SECRET_ACCESS_KEY"
envS3Region = "AWS_DEFAULT_REGION"
envLocalStackHost = "LOCALSTACK_HOST"
envFakeGCSHost = "GOOGLE_EMULATOR_HOST"
envGoogleCreds = "GOOGLE_APPLICATION_CREDENTIALS"
envGoogleEmulatorEnabled = "GOOGLE_EMULATOR_ENABLED"
envAzureStorageAccount = "STORAGE_ACCOUNT"
envAzureStorageKey = "STORAGE_KEY"
envAzureEmulatorEnabled = "AZURE_EMULATOR_ENABLED"
envAzuriteHost = "AZURITE_HOST"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace env vars with file reads.

Comment on lines +51 to +53
abs = "ABS"
s3 = "S3"
gcs = "GCS"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already defined at pkg/types/snapstore. Please reuse those.

cmdArgs = append(cmdArgs, "--timeout", timeout.String())
}

cmd := exec.Command("helm", cmdArgs...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recommend executing external commands on the host when the same can be achieved natively using libraries. Parsing command output from the exec() call is not easy, and error handling is even harder.

We previously used helm package to render the charts. Any particular reason to remove that?

close(doneCh)
return
default:
cmd = fmt.Sprintf("ETCDCTL_API=3 ./nonroot/hacks/etcdctl put foo-%d bar-%d", i, i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call the etcdctl command using etcdctl directly, since /nonroot/hacks is part of the PATH already. You can test this out once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/changes Needs (more) changes needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/rebase Needs git rebase needs/review Needs review needs/second-opinion Needs second review by someone else size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Run cluster based integration tests as e2e tests & enable them in prow using emulators
8 participants