Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Pre-built Kubernetes AMIs are “disappearing” #5131

Closed
juanjoku opened this issue Oct 2, 2024 · 15 comments
Closed

Some Pre-built Kubernetes AMIs are “disappearing” #5131

juanjoku opened this issue Oct 2, 2024 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@juanjoku
Copy link

juanjoku commented Oct 2, 2024

/kind bug

What steps did you take and what happened:

Certain Pre-built Kubernetes AMIs, which existed a few days ago, are no longer available.
I can reproduce the problem without using ClusterAPI, just trying to use the AMI from AWS.

What did you expect to happen:

In the documentation, it is explained that specific AMIs ("pre-built Kubernetes AMIs") are available:
https://cluster-api-aws.sigs.k8s.io/topics/images/built-amis

Several days ago, using the CLI clusterawsadm, I could see that there were indeed AMIS in the eu-west-3 region.
I refined the search and found the one I was interested in (Ubuntu-20.04, Kube-v1.25.6).

$clusterawsadm ami list --region eu-west-3 --kubernetes-version v1.25.6 --os
KUBERNETES VERSION   REGION           OS             NAME                                       AMI ID
v1.25.6              eu-west-3        ubuntu-20.04   capa-ami-ubuntu-20.04-v1.25.6-1675792181   ami-0dc483fa659842781

BUT...

Since yesterday, this AMI is no longer available.
And Many others have been disappearing!!

Right now, I don't see any in that region:

$ clusterawsadm ami list --region eu-west-3
No AMIs found

I think that the problem is not in that region, and the AMIS are being deleted (I don't know with what criteria, but it doesn't seem to have to do with the age of the Kubernetes versions).

For example, a few days ago these commands returned me a dozen AMIS... but today:

$ clusterawsadm ami list --os ubuntu-20.04 --kubernetes-version v1.25.6
No AMIs found

$ clusterawsadm ami list --os ubuntu-20.04 --kubernetes-version v1.28.2
No AMIs found

It's not just the AMIS that have disappeared.
Yesterday, using an AMI that DID appear in the ami-list... I was getting errors like these:

$ aws ec2 describe-images --image-id ami-03986b93868a41e8f
<response OK, no errors>

$ aws ec2 run-instances --image-id ami-03986b93868a41e8f --instance-type t2.large --count 1
An error occurred (InvalidSnapshot.NotFound) when calling the RunInstances operation: Failed trying to describe product codes for snapshot '216734523443'

Anything else you would like to add:

I did some tests, and it only happened with ClusterAPI AMIs.

Environment:

Simply using the AMIs, no matter the ClusterAPI version.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 2, 2024
@nrb
Copy link
Contributor

nrb commented Oct 2, 2024

/triage accept

@k8s-ci-robot
Copy link
Contributor

@nrb: The label(s) triage/accept cannot be applied, because the repository doesn't have them.

In response to this:

/triage accept

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@richardcase
Copy link
Member

We think the images are being deleted as per the deprecation time on the images.

@nrb
Copy link
Contributor

nrb commented Oct 2, 2024

/triage accepted

The account the images are stored in appears to be this one https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/test/e2e/shared/defaults.go#L40C42-L40C54

I can see AMIs from that account in us-east-1, but not other regions.

Screenshot 2024-10-02 at 1 32 01 PM

Like Richard pointed out, the oldest of these are hitting their deprecation dates, which is likely why they're not longer available for use.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 2, 2024
@nrb
Copy link
Contributor

nrb commented Oct 2, 2024

kubernetes/k8s.io#6517 is in flight to move AMIs into a community-owned account. I'm not sure how we'll handle images passing their deprecation time; it seems like the ones that are expiring are 1.24-1.26. Will these be worth rebuilding?

@richardcase
Copy link
Member

I thought we may be able to manually copy some of the existing images to a different account using these instructions. But it appears thats not possible as i'm getting this error:

You do not have permission to access the storage of this ami

@richardcase
Copy link
Member

richardcase commented Oct 2, 2024

A quick update. We are working to get this resolved asap. We have access to a new community AWS account and will start publishing AMIs to that account. There is some work that needs to be done, setting up the account and putting in place automation. I'll update this issue as more progress is made.

@richardcase
Copy link
Member

it seems like the ones that are expiring are 1.24-1.26. Will these be worth rebuilding?

Good question. I'd say that we consider not going back to 1.24 and perhaps limit ourselves to going back to 1.27 or 1.28.

Does anyone have a strong opinion on this?

@juanjoku
Copy link
Author

juanjoku commented Oct 3, 2024

FYI: I experienced the problem not only with v1.26, but also with v1.28, and even later.

In our case, what we have done is to generate our own images (via image-builder repository). I opened this issue mainly to know if those pre-built AMIs should exist.

Thx!!

@damdo
Copy link
Member

damdo commented Oct 23, 2024

@richardcase this is fixed by #5133 right?
Should we close this as resolved?

@richardcase
Copy link
Member

@damdo - i think maybe we should leave it open until there is a release with the new AMI account in it. wdyt?

@richardcase
Copy link
Member

For anyone reading this until, until the new release is out, you can look at what amis are available using:

clusterawsadm ami list --region us-west-2 --owner-id 819546954734

And then use the ami-id in your AWSMachineTemplate like this:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
metadata:
  name: test1-md-0
  namespace: default
spec:
  template:
    spec:
      ami:
        id: ami-0413b3957eabc41fe
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.large
      rootVolume:
        size: 500

@damdo
Copy link
Member

damdo commented Oct 27, 2024

@damdo - i think maybe we should leave it open until there is a release with the new AMI account in it. wdyt?

Sounds reasonable yeah.

@richardcase
Copy link
Member

Release v2.7.1 with the use of the AWS account for AMIs published. So:

/close

@k8s-ci-robot
Copy link
Contributor

@richardcase: Closing this issue.

In response to this:

Release v2.7.1 with the use of the AWS account for AMIs published. So:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants