Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to deploy #5

Open
pouryas opened this issue Jan 8, 2023 · 3 comments
Open

Unable to deploy #5

pouryas opened this issue Jan 8, 2023 · 3 comments

Comments

@pouryas
Copy link

pouryas commented Jan 8, 2023

Hi there

Appreciate the great work for posting and uploading this. I'm struggling to deploy the infrastructure as is.

I've created a domain in route53 and inserted the domain name and the Zoneid in the code, and had to change the Postgres version from 11.10 to 11 as it threw an error with the 11.10 version but I'm getting a bunch of errors related to Traefik.

The diagnosis of the Pulumi provision is below:

`

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (middlewares.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

error: resource middlewares.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (ingressroutes.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
error: resource ingressroutes.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (tlsoptions.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
error: resource tlsoptions.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:traefik.containo.us/v1alpha1:IngressRoute (traefik-dashboard):
warning: This resource contains Helm hooks that are not currently supported by Pulumi. The resource will be created, but any hooks will not be executed. Hooks support is tracked at https://github.com/pulumi/pulumi-kubernetes/issues/555 -- This warning can be disabled by setting the PULUMI_K8S_SUPPRESS_HELM_HOOK_WARNINGS environment variable
error: creation of resource default/traefik-dashboard failed because the Kubernetes API server reported that the apiVersion for this resource does not exist. Verify that any required CRDs have been created: no matches for kind "IngressRoute" in version "traefik.containo.us/v1alpha1"

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (ingressroutetcps.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
error: resource ingressroutetcps.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:traefik.containo.us/v1alpha1:Middleware (mlflow-strip-prefix):
error: creation of resource mlflow/mlflow-strip-prefix-33bf2e4f failed because the Kubernetes API server reported that the apiVersion for this resource does not exist. Verify that any required CRDs have been created: no matches for kind "Middleware" in version "traefik.containo.us/v1alpha1"

kubernetes:traefik.containo.us/v1alpha1:Middleware (mlflow-trailing-slash):
error: creation of resource mlflow/mlflow-trailing-slash-0d17ce3f failed because the Kubernetes API server reported that the apiVersion for this resource does not exist. Verify that any required CRDs have been created: no matches for kind "Middleware" in version "traefik.containo.us/v1alpha1"

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (ingressrouteudps.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
 error: resource ingressrouteudps.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (traefikservices.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
 error: resource traefikservices.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

pulumi:pulumi:Stack (ml-infra-dev):
error: update failed
error: Error: invocation of kubernetes:helm:template returned an error: error reading from server: read tcp 127.0.0.1:53702->127.0.0.1:53700: use of closed network connection
    at Object.callback (/Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@pulumi/runtime/invoke.ts:172:33)
    at Object.onReceiveStatus (/Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@grpc/grpc-js/src/client.ts:338:26)
    at Object.onReceiveStatus (/Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@grpc/grpc-js/src/client-interceptors.ts:426:34)
    at Object.onReceiveStatus (/Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@grpc/grpc-js/src/client-interceptors.ts:389:48)
    at /Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@grpc/grpc-js/src/call-stream.ts:276:24
    at processTicksAndRejections (node:internal/process/task_queues:77:11)

I0108 12:50:16.966189    2838 request.go:682] Waited for 1.032607959s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/apps/v1?timeout=32s
I0108 12:50:27.164277    2838 request.go:682] Waited for 4.434218083s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/batch/v1beta1?timeout=32s
I0108 12:50:37.364251    2838 request.go:682] Waited for 1.036100542s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/rbac.authorization.k8s.io/v1?timeout=32s
I0108 12:50:47.564198    2838 request.go:682] Waited for 4.433233583s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/storage.k8s.io/v1?timeout=32s
I0108 12:50:57.764186    2838 request.go:682] Waited for 1.032272375s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/node.k8s.io/v1beta1?timeout=32s
I0108 12:51:07.964178    2838 request.go:682] Waited for 4.432762875s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s
I0108 12:51:20.563948    2838 request.go:682] Waited for 1.09841225s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/batch/v1beta1?timeout=32s
I0108 12:51:34.163858    2838 request.go:682] Waited for 1.102064041s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/rbac.authorization.k8s.io/v1?timeout=32s

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (tlsstores.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
  error: resource tlsstores.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.`

Appreciate if someone can guide me through fixing this

@pouryas
Copy link
Author

pouryas commented Jan 8, 2023

resolved this issue. Seems like traefik helm isn't compatible with the latest version of EKS so I had to manually specify 1.21 version when instantiating the eks cluster:

const cluster = new eks.Cluster('mlplatform-eks', { createOidcProvider: true, version: "1.21", });

@shabieh2
Copy link

shabieh2 commented Feb 3, 2023

thanks for updating this! major help!

@shabieh2
Copy link

shabieh2 commented Mar 8, 2023

AWS no longer supports version 1.21, so here is the revised index.ts code for 1.22+

Like pouryas mentioned we need to change the postgres version (currently 12.7). We also needed to change the Traefik helm chart, and another minor change in traefik.getResource. Here is the index.ts for ml-infra

import * as aws from '@pulumi/aws';
import * as eks from '@pulumi/eks';
import * as k8s from '@pulumi/kubernetes';
import * as random from '@pulumi/random';
import S3ServiceAccount from './S3ServiceAccount';
import TraefikRoute from './TraefikRoute';


// Create a Kubernetes cluster.
const cluster = new eks.Cluster('mlplatform-eks', {
  createOidcProvider: true,
});


// Install Traefik
const traefik = new k8s.helm.v3.Chart('traefik', {
  chart: 'traefik',
  fetchOpts: { repo: 'https://traefik.github.io/charts' },
}, { provider: cluster.provider })



// Create PostgreSQL database for MLFlow - this will save model metadata
const dbPassword = new random.RandomPassword('mlplatform-db-password', { length: 16, special: false });
const db = new aws.rds.Instance('mlflow-db', {
  allocatedStorage: 10,
  engine: "postgres",
  engineVersion: "12.7",
  instanceClass: "db.t3.micro",
  name: "mlflow",
  password: dbPassword.result,
  skipFinalSnapshot: true,
  vpcSecurityGroupIds: [cluster.clusterSecurityGroup.id, cluster.nodeSecurityGroup.id],
  username: "postgres",
});


// Create S3 bucket for MLFlow
const mlflowBucket = new aws.s3.Bucket("mlflow-bucket", {
  acl: "public-read-write",
});


// Install MLFlow
const mlflowNamespace = new k8s.core.v1.Namespace('mlflow-namespace', {
  metadata: { name: 'mlflow' },
}, { provider: cluster.provider });

const mlflowServiceAccount = new S3ServiceAccount('mlflow-service-account', {
  namespace: 'default',
  oidcProvider: cluster.core.oidcProvider!,
  readOnly: false,
}, { provider: cluster.provider });

const mlflow = new k8s.helm.v3.Chart("mlflow", {
  chart: "mlflow",
  
  values: {
    "backendStore": {
      "postgres": {
        "username": db.username,
        "password": db.password,
        "host": db.address,
        "port": db.port,
        "database": "mlflow"
      }
    },
    "defaultArtifactRoot": mlflowBucket.bucket.apply((bucketName: string) => `s3://${bucketName}`),
    "serviceAccount": {
      "create": false,
      "name": mlflowServiceAccount.name,
    }
  },
  fetchOpts: { repo: "https://larribas.me/helm-charts" },
}, { provider: cluster.provider });


// Expose MLFlow in Traefik as /mlflow 
new TraefikRoute('mlflow-route', {
  prefix: '/mlflow',
  service: mlflow.getResource('v1/Service','mlflow'),
  namespace: 'default',
}, { provider: cluster.provider});


// Service account for models with read only access to models
const modelsServiceAccount = new S3ServiceAccount('models-service-account', {
  namespace: 'default',
  oidcProvider: cluster.core.oidcProvider!,
  readOnly: true,
}, { provider: cluster.provider });


// Set ml.mycompany.com DNS record in Route53
new aws.route53.Record("record", {
   zoneId: <YOUR_ZONE_ID>
   name: "ml.yourcompany.com",
  type: "CNAME",
  ttl: 300,
  records: [traefik.getResource('v1/Service', 'default/traefik').status.loadBalancer.ingress[0].hostname],
});


export const kubeconfig = cluster.kubeconfig;


export const modelsServiceAccountName = modelsServiceAccount.name;

export const traefik_hn= traefik.getResource('v1/Service', 'default/traefik').status.loadBalancer.ingress[0].hostname
export const mlfow_info= mlflow.getResource('v1/Service', 'default/mlflow')`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants