Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate ECR repo provided by optional-test-infra #1008

Closed
PatrickXYS opened this issue Jun 14, 2022 · 17 comments
Closed

Deprecate ECR repo provided by optional-test-infra #1008

PatrickXYS opened this issue Jun 14, 2022 · 17 comments

Comments

@PatrickXYS
Copy link
Member

As a post item of deprecating optional-test-infra, we'll deprecate all the ECR repos provided by optional-test-infra.

Private ECR Repo List

kubeflow/katib: 809251082950.dkr.ecr.us-west-2.amazonaws.com/katib/v1beta1/

  • cert-generator
  • earlystopping-medianstop
  • file-metrics-collector
  • katib-db-manager
  • suggestion-chocolate
  • suggestion-darts
  • suggestion-enas
  • suggestion-goptuna
  • suggestion-hyperband
  • suggestion-hyperopt
  • suggestion-optuna
  • suggestion-skopt
  • tfevent-metrics-collector
  • trial-darts-cnn-cifar10
  • trial-enas-cnn-cifar10-cpu
  • trial-enas-cnn-cifar10-gpu
  • trial-mxnet-mnist
  • trial-pytorch-mnist
  • trial-tf-mnist-with-summaries

kserve/kserve: 809251082950.dkr.ecr.us-west-2.amazonaws.com/kserve/

  • agent
  • aix-explainer
  • alibi-explainer
  • art-explainer
  • batcher
  • image-transformer
  • kserve-controller
  • lgbserver
  • paddleserver
  • pmmlserver
  • pytorchserver
  • sklearnserver
  • storage-initializer
  • xgbserver

kubeflow/training related: 809251082950.dkr.ecr.us-west-2.amazonaws.com/kserve/

  • pytorch-operator
  • tf-operator
  • training-operator

Public ECR Repo List

kubeflow/kubeflow: public.ecr.aws/j1r0q0g6/notebooks

  • access-management
  • admission-webhook
  • central-dashboard
  • jupyter-web-app
  • notebook-controller
  • notebook-servers/base
  • notebook-servers/codeserver
  • notebook-servers/jupyter
  • notebook-servers/jupyter-cuda
  • notebook-servers/jupyter-pytorch-cuda
  • notebook-servers/jupyter-pytorch-cuda-full
  • notebook-servers/jupyter-pytorch-full
  • notebook-servers/jupyter-scipy
  • notebook-servers/jupyter-tensorflow
  • notebook-servers/jupyter-tensorflow-cuda
  • notebook-servers/jupyter-tensorflow-cuda-full
  • notebook-servers/jupyter-tensorflow-full
  • notebook-server/rstudio
  • notebook-server/rstudio-tidyverse
  • profile-controller
  • tensorboard-controller
  • tensorboards-web-app
  • volumes-web-app

kubeflow/training related: public.ecr.aws/j1r0q0g6/training

  • tf-operator
  • training-operator
@johnugeorge
Copy link
Member

johnugeorge commented Jun 14, 2022

Training operator uses the ECR repo for last 3 releases ( v1.4.0, v1.3.0, v1.21, v1.1) https://github.com/kubeflow/training-operator/blob/master/manifests/overlays/kubeflow/kustomization.yaml#L9

We should not deprecate the ECR repos for upcoming few releases. Though we will find an alternative solution starting from this release, we have to allow current users to continue using the release manifests.

@surajkota
Copy link

surajkota commented Jun 14, 2022

@PatrickXYS we understand the need to deprecate this repositories and we request you to work with us as we figure out the next gen of this infrastructure. Please see details in #1006 (comment).

In the meantime, we will work on funding the account with the needed credits to use these repositories.

@surajkota
Copy link

@PatrickXYS are you still available on Kubeflow slack or Is [email protected] the correct email to reach you?

@thesuperzapper
Copy link
Member

@PatrickXYS can please you clarify if we can maintain the old ECR public images (even if we don't add new images) so that people's old manifest don't break?

@PatrickXYS
Copy link
Member Author

PatrickXYS commented Jun 27, 2022

I don't think any kubeflow repos are actively using those ECR images for deployment except notebooks and training WG.

Reason for the above observation: only notebooks and training ECR image has version tags such as v1.1, v1.2, etc.

So the rest of the other repos are no-concern, but notebooks and training might be special ones.


Due to my current situation, I have very limited bandwidth and am slow to respond.

@thesuperzapper
Copy link
Member

thesuperzapper commented Jun 29, 2022

I have mirrored all the public.ecr.aws/j1r0q0g6/{IMAGE_NAME} ones to GitHub Container Registry under ghcr.io/kubeflow/{IMAGE_NAME}, for example:

These are just a one-time backup of the historical tags, but the working groups can feel free to start using these images for new manifests/releases if they like (GitHub actions within the corresponding kubeflow GitHub repo can push to these registries).


@surajkota clarified that we are still planning to keep the old ECR public.ecr.aws/j1r0q0g6/{IMAGE_NAME} image available by migrating the ECR to be owned by a new AWS account.

I can share the python script I used to migrate the ECR images to GHCR (as this migration can be a bit of a nightmare due to ECR lacking the ability to list tags).

@thesuperzapper
Copy link
Member

I have raised kubeflow/kubeflow#6560 about using ghcr.io/kubeflow/{IMAGE_NAME} as the default image registry for Kubeflow images (in addition to a DockerHub / ECR mirror).

@surajkota
Copy link

WGs have confirmed that private ECR registries were only used for testing purpose and have not been used in release manifests. Hence, the private ECR registries can be deprecated.

As @thesuperzapper commented, I am looking into migrating the public ECR repositories to a new account(with same registry alias) and will post an update here as soon as I have confirmation on the process. Repositories which need to be migrated:

  • public.ecr.aws/j1r0q0g6/notebooks
  • public.ecr.aws/j1r0q0g6/training

@surajkota
Copy link

surajkota commented Jul 18, 2022

Hi @PatrickXYS, thanks for your patience. We have made progress on securing the credits for new AWS account (#1006 (comment)) which unblocks us to move with next steps w.r.t these ECR repositories migration.

Update on migrating the existing repository to a new account: The registry alias(j1r0q0g6) is a unique string/key and because of this, although there is a way to migrate the repositories to a new account, it involves downtime(a few minutes). (Since there can be only one registry linked to this alias)

So instead of the above approach for migrating ECR repository, I am proposing another option of migrating the AWS account(809251082950) under the new AWS organization but with the following conditions:

  • Move this account from a personal to non-personal email address. @andreyvelich has agreed to support us in this matter
  • Clean up the account further to remove all existing IAM users, roles and resources except the following ECR repositories:
    • public.ecr.aws/j1r0q0g6/notebooks
    • public.ecr.aws/j1r0q0g6/training
  • @PatrickXYS I am checking with the AWS support team if it is possible to remove your credit card from the account once its part of the AWS organization. If it is not, would you be ok with just freezing that card and getting a new card?

For this to happen, we need a shared email address between Notebooks and Training WG and we can look into other options for decoupling the training and notebook repositories later if needed.

This is the fastest and cleanest way to mitigate this and we do not need any help from the AWS ECR team.

@kimwnasptd and @johnugeorge Please let me know your thoughts on this approach.

@johnugeorge
Copy link
Member

@kimwnasptd Given these complications, Should we just just create and track the mapping previous image location-> new image location for earlier releases? (without any extra changes)
Release 1.4 or earlier doesn't support k8s 1.21+ anyways. There is very less chance of newer installations. In any case, if users run into issues with images, they can override the image name/tag in kustomization file and move forward.

@surajkota
Copy link

@johnugeorge To clarify, the only action required from one of you(anyone from Notebook or training WG) is to create an email address. The repository will stay in the same account as before and there is no impact on user experience. Please help us understand if this looks complicated to you?

Re: 1.5 or earlier & as per your comment on June 13, we need to consider the users who have not migrated to newer versions of K8s or Kubeflow. It's not only about new installations, the existing installation will also break. For e.g. notebook server images will become unavailable suddenly. I disagree with your proposal to just document it given the effort is to only create an email address and this account will be part of AWS Organization like the other WG accounts. It's similar to how you might have created a dockerhub account for publishing 1.6 images.

@PatrickXYS
Copy link
Member Author

PatrickXYS commented Jul 21, 2022

If it is not, would you be ok with just freezing that card and getting a new card?

Let's try to avoid this, please try to get a credit card from your team, and replace my card with yours with help from @andreyvelich . After that, feel free to take any action. I believe my proposal is a better way to move forward.

@surajkota
Copy link

surajkota commented Jul 27, 2022

Hi everyone, the credits applied in the account 809251082950 were about to expire end of this week on 07/31. Given that we cannot deprecate the repositories without a proper plan, timeline etc. and do not want to break any customer deployments. We have completed the steps outlined in the comment above to ensure Yao's credit card does not get charged for the billing.

This account is now part of the new AWS organization. Yao, I have opened a case with AWS support to remove your credit card since its no longer required. Will keep you updated.

@PatrickXYS
Copy link
Member Author

Great, just saw the message. Please keep me updated with the credit card removal stuff since I can't access the AWS account now, which makes me concerned if AWS will be applying bills to my credit card.

@PatrickXYS
Copy link
Member Author

@surajkota Could you please help keep me updated with the card removal process, there are still possibilities that billing will be applied to my card but I have no access to the AWS account.

As I mentioned in the previous comment:

Let's try to avoid this, please try to get a credit card from your team, and replace my card with yours with help from @andreyvelich . After that, feel free to take any action. I believe my proposal is a better way to move forward.

Replace the card before taking any action, but seems like it didn't work on the AWS side, could you please expedite the process to resolve my concern?

@surajkota
Copy link

Hi @PatrickXYS, your credit card has been removed from the account. Thanks for your patience

Info for historical purpose: since this was a standalone account before, it is not possible to remove the default payment information. We added the same card as on the management account with help of @kimwnasptd, Amber Graner.

We can close this issue now.
/close

@google-oss-prow
Copy link

@surajkota: Closing this issue.

In response to this:

Hi @PatrickXYS, your credit card has been removed from the account. Thanks for your patience

Info for historical purpose: since this was a standalone account before, it is not possible to remove the default payment information. We added the same card as on the management account with help of @kimwnasptd, Amber Graner.

We can close this issue now.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants