Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform test: Add AWS persistent #30809

Merged
merged 3 commits into from
Feb 14, 2025
Merged

Conversation

def-
Copy link
Contributor

@def- def- commented Dec 12, 2024

Test run: https://buildkite.com/materialize/qa-canary/builds/404
Second test run: https://buildkite.com/materialize/qa-canary/builds/408

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@def- def- requested a review from a team as a code owner December 12, 2024 18:02
@def-
Copy link
Contributor Author

def- commented Dec 12, 2024

@bobbyiliev The test works locally, but fails in CI: https://buildkite.com/materialize/qa-canary/builds/338#0193bc0c-9a96-41f3-a136-927eba3f5477

An error occurred (AccessDeniedException) when calling the DescribeCluster operation: User: arn:aws:sts::834237029485:assumed-role/buildkite-aarch64-small-d306b64-Role/i-0346d14d0e83e2d8c is not authorized to perform: eks:DescribeCluster on resource: arn:aws:eks:us-east-1:834237029485:cluster/aws-persistent-cluster

Can I easily tell terraform that anyone in the Materialize org can have access to this cluster?

@def- def- force-pushed the pr-aws-persistent branch 2 times, most recently from 200e187 to c6b838c Compare December 12, 2024 18:20
@def-
Copy link
Contributor Author

def- commented Dec 12, 2024

@bobbyiliev There seems to be another problem. I think the security group and subnet are not using a prefix. Now that I added the persistent aws setup in this PR the temporary aws terraform setup is failing: https://buildkite.com/materialize/nightly/builds/10674#0193bc9c-f90a-45eb-8494-11b525a7a606

│ Error: deleting Security Group (sg-0dfe7eb420640183b): operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: eba72284-122d-4bdb-8c17-fdb6cca811ea, api error DependencyViolation: resource sg-0dfe7eb420640183b has a dependent object
│ Error: deleting EC2 Subnet (subnet-04a826bd3eab3b285): operation error EC2: DeleteSubnet, https response error StatusCode: 400, RequestID: 332610d1-f9a5-4df5-ae4c-cd430fd29596, api error DependencyViolation: The subnet 'subnet-04a826bd3eab3b285' has dependencies and cannot be deleted.

Can you take a look please?

@def-
Copy link
Contributor Author

def- commented Dec 12, 2024

I also tried granting the CI role permissions to the EKS cluster, but still seeing the same:

An error occurred (AccessDeniedException) when calling the DescribeCluster operation: User: arn:aws:sts::834237029485:assumed-role/buildkite-aarch64-small-d306b64-Role/i-00838a09531c9f213 is not authorized to perform: eks:DescribeCluster on resource: arn:aws:eks:us-east-1:834237029485:cluster/aws-persistent-cluster

@bobbyiliev
Copy link
Contributor

@bobbyiliev There seems to be another problem. I think the security group and subnet are not using a prefix. Now that I added the persistent aws setup in this PR the temporary aws terraform setup is failing: https://buildkite.com/materialize/nightly/builds/10674#0193bc9c-f90a-45eb-8494-11b525a7a606

Just submitted a PR to update a hardcoded prefix.

I also tried granting the CI role permissions to the EKS cluster, but still seeing the same:

Is there an easy way for us to get the CI role during the run itself? I think that we could extend the terraform module to accept an extra parameter and add the role to the cluster dynamically.

@def- def- force-pushed the pr-aws-persistent branch from c6b838c to 8adb562 Compare January 13, 2025 15:40
@def-
Copy link
Contributor Author

def- commented Jan 13, 2025

Is there an easy way for us to get the CI role during the run itself? I think that we could extend the terraform module to accept an extra parameter and add the role to the cluster dynamically.

I don't think this would work because the terraform setup already exists. For now I'd like some way to extend the AWS cluster so that every role has access to it. It's the only thing still blocking this PR: https://buildkite.com/materialize/qa-canary/builds/372#01946058-36fc-42a1-81fb-e010fe36479e @bobbyiliev @jseldess Do you have any idea how to achieve that?

@bobbyiliev
Copy link
Contributor

I don't think this would work because the terraform setup already exists. For now I'd like some way to extend the AWS cluster so that every role has access to it. It's the only thing still blocking this PR: https://buildkite.com/materialize/qa-canary/builds/372#01946058-36fc-42a1-81fb-e010fe36479e @bobbyiliev @jseldess Do you have any idea how to achieve that?

I am not familiar with how Buildkite is setup but as far as I can tell the issue is not with the Kubernetes and the Terraform setup but with the IAM assumed role used by the Buildkite agent which is lacking permissions to perform the eks:DescribeCluster action.

So the Buildkite job is failing at the IAM layer, well before even talking to Kubernetes RBAC, because its IAM principal does not have permission to run eks:DescribeCluster.

That said, I don't know if we can modify that Builtkite role during runtime?

@def- def- force-pushed the pr-aws-persistent branch 2 times, most recently from 36ad20f to 773baaf Compare February 1, 2025 10:23
@def- def- force-pushed the pr-aws-persistent branch from 773baaf to ab342d5 Compare February 7, 2025 17:20
@def- def- requested a review from jubrad February 7, 2025 17:31
@def-
Copy link
Contributor Author

def- commented Feb 7, 2025

Ready for review now

@def- def- force-pushed the pr-aws-persistent branch 2 times, most recently from 002af81 to 1fab6a8 Compare February 7, 2025 22:19
@def- def- enabled auto-merge February 7, 2025 22:20
@def- def- force-pushed the pr-aws-persistent branch from 1fab6a8 to 225fecd Compare February 10, 2025 08:03
@def- def- requested a review from jubrad February 10, 2025 08:03
@def- def- force-pushed the pr-aws-persistent branch 7 times, most recently from 18e517f to d8c8e84 Compare February 12, 2025 17:36
@def-
Copy link
Contributor Author

def- commented Feb 12, 2025

@jubrad @bobbyiliev In the last commit I have switched to the approach in MaterializeInc/terraform-aws-materialize#27 (install_materialize_operator = true), but now Materialize seems to have the wrong helm chart version:

==> mzcompose: test case workflow-aws-temporary failed: builtins.AssertionError: Actual version: v0.133.0-dev.0 (9ff02f79d, helm chart: v25.1.0), expected to contain v25.2.0-beta.1
Traceback (most recent call last):
  File "/home/deen/git/materialize3/misc/python/materialize/mzcompose/composition.py", line 603, in test_case
    yield
  File "/home/deen/git/materialize3/misc/python/materialize/cli/mzcompose.py", line 692, in handle_composition
    composition.workflow(
  File "/home/deen/git/materialize3/misc/python/materialize/mzcompose/composition.py", line 485, in workflow
    func(self, parser)
  File "/home/deen/git/materialize3/test/terraform/mzcompose.py", line 578, in workflow_aws_temporary
    assert version.endswith(
AssertionError: Actual version: v0.133.0-dev.0 (9ff02f79d, helm chart: v25.1.0), expected to contain v25.2.0-beta.1

Can I make it use the actual misc/helm-charts/operator folder?

Edit: Removed the assert for now

@def- def- force-pushed the pr-aws-persistent branch 4 times, most recently from 6d646d2 to 15206e7 Compare February 13, 2025 08:53
@def- def- force-pushed the pr-aws-persistent branch from 15206e7 to 24ce98c Compare February 13, 2025 09:30
@def-
Copy link
Contributor Author

def- commented Feb 13, 2025

This can be submitted in this state, green test run: https://buildkite.com/materialize/nightly/builds/11128

We can improve it later.

database_password = "zdUXjK4dRBBqBiTMK9gbkL9zPMYMSTsj"
db_identifier = "aws-persistent-metadata-db"
postgres_version = "15"
db_instance_class = "db.t3.micro"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we really run on a t3.micro? 1 core/2 vcpu, and 1G mem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's working so far, but we don't have much of a load. I can scale it up if it becomes a problem

Comment on lines 251 to 265
spawn.runv(
[
"helm",
"install",
"materialize-operator",
"misc/helm-charts/operator",
"--namespace",
"materialize",
"--create-namespace",
"-f",
"-",
],
cwd=MZ_ROOT,
stdin=yaml.dump(materialize_values).encode(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should n't the terraform-aws-materialize module be installing the materialize-operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This code has been removed already)

@def- def- merged commit b793976 into MaterializeInc:main Feb 14, 2025
82 checks passed
@def- def- deleted the pr-aws-persistent branch February 14, 2025 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants