Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INFRA and WORKLOAD machinesets are scaled because the have no infra and workload label #153

Open
qiliRedHat opened this issue May 17, 2022 · 4 comments

Comments

@qiliRedHat
Copy link
Contributor

qiliRedHat commented May 17, 2022

On GCP I scaled workers to 3 and installed INFRA_WORKLOAD_INSTALL, Then scale cluster again to 120 nodes.
All machinesets are scaled.

oc get machinesets -A
NAMESPACE               NAME                      DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   infra-qili-gcp-kn95ma     15        15        15      15          50m
openshift-machine-api   infra-qili-gcp-kn95mb     15        15        11      11          50m
openshift-machine-api   infra-qili-gcp-kn95mc     15        15        1       1           50m
openshift-machine-api   qili-gcp-kn95m-worker-a   15        15        3       3           5h51m
openshift-machine-api   qili-gcp-kn95m-worker-b   15        15                            5h51m
openshift-machine-api   qili-gcp-kn95m-worker-c   15        15                            5h51m
openshift-machine-api   qili-gcp-kn95m-worker-f   15        15                            5h51m
openshift-machine-api   workload-qili-gcp-kn95m   15        15        1       1           50m

#147 fixed this issue and the fix worked on Azure.

I found there is no label of infra and workload on GCP machinesets

% oc get --no-headers machinesets -A --show-labels                                              
openshift-machine-api   infra-qili-gcp-kn95ma     1     1     1     1     147m    machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   infra-qili-gcp-kn95mb     1     1     1     1     147m    machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   infra-qili-gcp-kn95mc     1     1     1     1     147m    machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   qili-gcp-kn95m-worker-a   15    15    8     8     7h29m   machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   qili-gcp-kn95m-worker-b   15    15                7h29m   machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   qili-gcp-kn95m-worker-c   15    15                7h29m   machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   qili-gcp-kn95m-worker-f   15    15                7h29m   machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   workload-qili-gcp-kn95m   1     1     1     1     147m    machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m

So the fix in #147 gets all machinesets.

oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '{print $2}'
infra-qili-gcp-kn95ma
infra-qili-gcp-kn95mb
infra-qili-gcp-kn95mc
qili-gcp-kn95m-worker-a
qili-gcp-kn95m-worker-b
qili-gcp-kn95m-worker-c
qili-gcp-kn95m-worker-f
workload-qili-gcp-kn95m
@qiliRedHat
Copy link
Contributor Author

qiliRedHat commented May 17, 2022

https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cluster-workers-scaling/718/consoleFull

05-17 15:19:06.296  ++ oc get --no-headers machinesets -A -l 'machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload'
05-17 15:19:06.296  ++ awk '{print $2}'
05-17 15:19:06.556  + for machineset in '$(oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '\''{print $2}'\'')'
05-17 15:19:06.556  + oc scale machinesets -n openshift-machine-api infra-qili-gcp-kn95ma --replicas 15
05-17 15:19:06.881  machineset.machine.openshift.io/infra-qili-gcp-kn95ma scaled
05-17 15:19:06.881  + for machineset in '$(oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '\''{print $2}'\'')'
05-17 15:19:06.881  + oc scale machinesets -n openshift-machine-api infra-qili-gcp-kn95mb --replicas 15
05-17 15:19:07.155  machineset.machine.openshift.io/infra-qili-gcp-kn95mb scaled
05-17 15:19:07.155  + for machineset in '$(oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '\''{print $2}'\'')'
05-17 15:19:07.155  + oc scale machinesets -n openshift-machine-api infra-qili-gcp-kn95mc --replicas 15
05-17 15:19:07.725  machineset.machine.openshift.io/infra-qili-gcp-kn95mc scaled
05-17 15:19:07.725  + for machineset in '$(oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '\''{print $2}'\'')'
05-17 15:19:07.725  + oc scale machinesets -n openshift-machine-api qili-gcp-kn95m-worker-a --replicas 15
05-17 15:19:07.982  machineset.machine.openshift.io/qili-gcp-kn95m-worker-a scaled
05-17 15:19:07.982  + for machineset in '$(oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '\''{print $2}'\'')'
05-17 15:19:07.982  + oc scale machinesets -n openshift-machine-api qili-gcp-kn95m-worker-b --replicas 15
05-17 15:19:08.254  machineset.machine.openshift.io/qili-gcp-kn95m-worker-b scaled
05-17 15:19:08.254  + for machineset in '$(oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '\''{print $2}'\'')'
05-17 15:19:08.254  + oc scale machinesets -n openshift-machine-api qili-gcp-kn95m-worker-c --replicas 15
05-17 15:19:08.843  machineset.machine.openshift.io/qili-gcp-kn95m-worker-c scaled
05-17 15:19:08.843  + for machineset in '$(oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '\''{print $2}'\'')'
05-17 15:19:08.843  + oc scale machinesets -n openshift-machine-api qili-gcp-kn95m-worker-f --replicas 15
05-17 15:19:09.129  machineset.machine.openshift.io/qili-gcp-kn95m-worker-f scaled
05-17 15:19:09.129  + for machineset in '$(oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '\''{print $2}'\'')'
05-17 15:19:09.129  + oc scale machinesets -n openshift-machine-api workload-qili-gcp-kn95m --replicas 15
05-17 15:19:09.386  machineset.machine.openshift.io/workload-qili-gcp-kn95m scaled
05-17 15:19:09.386  + [[ 0 != 0 ]]
05-17 15:19:09.386  + set +x
05-17 15:19:09.658  �[1m17-05-2022T07:19:09 Following nodes are currently present, waiting for desired count 120 to be met.�[0m
05-17 15:19:09.658  �[1m17-05-2022T07:19:09 Machinesets:�[0m
05-17 15:19:10.223  NAMESPACE               NAME                      DESIRED   CURRENT   READY   AVAILABLE   AGE
05-17 15:19:10.223  openshift-machine-api   infra-qili-gcp-kn95ma     15        15        1       1           46m
05-17 15:19:10.223  openshift-machine-api   infra-qili-gcp-kn95mb     15        15        1       1           46m
05-17 15:19:10.223  openshift-machine-api   infra-qili-gcp-kn95mc     15        15        1       1           46m
05-17 15:19:10.223  openshift-machine-api   qili-gcp-kn95m-worker-a   15        15        3       3           5h47m
05-17 15:19:10.223  openshift-machine-api   qili-gcp-kn95m-worker-b   15        15                            5h47m
05-17 15:19:10.223  openshift-machine-api   qili-gcp-kn95m-worker-c   15        0                             5h47m
05-17 15:19:10.223  openshift-machine-api   qili-gcp-kn95m-worker-f   15        0                             5h47m
05-17 15:19:10.223  openshift-machine-api   workload-qili-gcp-kn95m   15        1         1       1           46m

@qiliRedHat
Copy link
Contributor Author

qiliRedHat commented May 17, 2022

Installed a new cluster. https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/103526/
Scaleup to 4 nodes and check INFRA_WORKLOAD_INSTALL
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cluster-workers-scaling/720/
There is no infra and workload label on the infra and workload machinesets, even no machine.openshift.io/cluster-api-machine-role=worker,machine.openshift.io/cluster-api-machine-type=worker on worker machinesets.

% oc get --no-headers machinesets -A --show-labels
openshift-machine-api   infra-qili-gcp-2-psnlfa     1     1     1     1     10m   machine.openshift.io/cluster-api-cluster=qili-gcp-2-psnlf
openshift-machine-api   infra-qili-gcp-2-psnlfb     1     1     1     1     10m   machine.openshift.io/cluster-api-cluster=qili-gcp-2-psnlf
openshift-machine-api   infra-qili-gcp-2-psnlfc     1     1     1     1     10m   machine.openshift.io/cluster-api-cluster=qili-gcp-2-psnlf
openshift-machine-api   qili-gcp-2-psnlf-worker-a   1     1     1     1     68m   machine.openshift.io/cluster-api-cluster=qili-gcp-2-psnlf
openshift-machine-api   qili-gcp-2-psnlf-worker-b   1     1     1     1     68m   machine.openshift.io/cluster-api-cluster=qili-gcp-2-psnlf
openshift-machine-api   qili-gcp-2-psnlf-worker-c   1     1     1     1     68m   machine.openshift.io/cluster-api-cluster=qili-gcp-2-psnlf
openshift-machine-api   qili-gcp-2-psnlf-worker-f   1     1     1     1     68m   machine.openshift.io/cluster-api-cluster=qili-gcp-2-psnlf
openshift-machine-api   workload-qili-gcp-2-psnlf   1     1     1     1     10m   machine.openshift.io/cluster-api-cluster=qili-gcp-2-psnlf

But the yaml shows there is machine.openshift.io/cluster-api-machine-role: infra

% oc get machinesets infra-qili-gcp-2-psnlfa -n openshift-machine-api -o yaml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"machine.openshift.io/v1beta1","kind":"MachineSet","metadata":{"annotations":{},"generation":1,"labels":{"machine.openshift.io/cluster-api-cluster":"qili-gcp-2-psnlf"},"name":"infra-qili-gcp-2-psnlfa","namespace":"openshift-machine-api"},"spec":{"replicas":1,"selector":{"matchLabels":{"machine.openshift.io/cluster-api-cluster":"qili-gcp-2-psnlf","machine.openshift.io/cluster-api-machine-role":"infra","machine.openshift.io/cluster-api-machine-type":"infra","machine.openshift.io/cluster-api-machineset":"infra-a"}},"template":{"metadata":{"creationTimestamp":null,"labels":{"machine.openshift.io/cluster-api-cluster":"qili-gcp-2-psnlf","machine.openshift.io/cluster-api-machine-role":"infra","machine.openshift.io/cluster-api-machine-type":"infra","machine.openshift.io/cluster-api-machineset":"infra-a"}},"spec":{"metadata":{"creationTimestamp":null,"labels":{"node-role.kubernetes.io/infra":""}},"providerSpec":{"value":{"apiVersion":"gcpprovider.openshift.io/v1beta1","canIPForward":false,"credentialsSecret":{"name":"gcp-cloud-credentials"},"deletionProtection":false,"disks":[{"autoDelete":false,"boot":true,"image":"projects/rhcos-cloud/global/images/rhcos-411-85-202203181601-0-gcp-x86-64","labels":null,"sizeGb":100,"type":"pd-ssd"}],"kind":"GCPMachineProviderSpec","machineType":"n1-standard-64","metadata":{"creationTimestamp":null},"networkInterfaces":[{"network":"qili-gcp-2-psnlf-network","subnetwork":"qili-gcp-2-psnlf-worker-subnet"}],"projectID":"openshift-qe","region":"us-central1","serviceAccounts":[{"email":"[email protected]","scopes":["https://www.googleapis.com/auth/cloud-platform"]}],"tags":["qili-gcp-2-psnlf-worker"],"userDataSecret":{"name":"worker-user-data"},"zone":"us-central1-a"}}}}},"status":{"replicas":1}}
    machine.openshift.io/GPU: "0"
    machine.openshift.io/memoryMb: "245760"
    machine.openshift.io/vCPU: "64"
  creationTimestamp: "2022-05-17T11:14:34Z"
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: qili-gcp-2-psnlf
  name: infra-qili-gcp-2-psnlfa
  namespace: openshift-machine-api
  resourceVersion: "45074"
  uid: 827566c5-d251-4135-a2ea-b2dbf748ba29
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: qili-gcp-2-psnlf
      machine.openshift.io/cluster-api-machine-role: infra
      machine.openshift.io/cluster-api-machine-type: infra
      machine.openshift.io/cluster-api-machineset: infra-a
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: qili-gcp-2-psnlf
        machine.openshift.io/cluster-api-machine-role: infra
        machine.openshift.io/cluster-api-machine-type: infra
        machine.openshift.io/cluster-api-machineset: infra-a
    spec:
      lifecycleHooks: {}
      metadata:
        labels:
          node-role.kubernetes.io/infra: ""
      providerSpec:
        value:
          apiVersion: gcpprovider.openshift.io/v1beta1
          canIPForward: false
          credentialsSecret:
            name: gcp-cloud-credentials
          deletionProtection: false
          disks:
          - autoDelete: false
            boot: true
            image: projects/rhcos-cloud/global/images/rhcos-411-85-202203181601-0-gcp-x86-64
            labels: null
            sizeGb: 100
            type: pd-ssd
          kind: GCPMachineProviderSpec
          machineType: n1-standard-64
          metadata:
            creationTimestamp: null
          networkInterfaces:
          - network: qili-gcp-2-psnlf-network
            subnetwork: qili-gcp-2-psnlf-worker-subnet
          projectID: openshift-qe
          region: us-central1
          serviceAccounts:
          - email: [email protected]
            scopes:
            - https://www.googleapis.com/auth/cloud-platform
          tags:
          - qili-gcp-2-psnlf-worker
          userDataSecret:
            name: worker-user-data
          zone: us-central1-a
status:
  availableReplicas: 1
  fullyLabeledReplicas: 1
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1

@qiliRedHat
Copy link
Contributor Author

qiliRedHat commented May 17, 2022

Compare to a Azure cluster's machinests labels, Azure machinests has machine.openshift.io/cluster-api-machine-role=worker,machine.openshift.io/cluster-api-machine-type=worker

% oc get --no-headers machinesets -A --show-labels
openshift-machine-api   qili-preserve-az-bv6ml-worker-centralus1   1     1     1     1     9h    machine.openshift.io/cluster-api-cluster=qili-preserve-az-bv6ml,machine.openshift.io/cluster-api-machine-role=worker,machine.openshift.io/cluster-api-machine-type=worker
openshift-machine-api   qili-preserve-az-bv6ml-worker-centralus2   1     1     1     1     9h    machine.openshift.io/cluster-api-cluster=qili-preserve-az-bv6ml,machine.openshift.io/cluster-api-machine-role=worker,machine.openshift.io/cluster-api-machine-type=worker
openshift-machine-api   qili-preserve-az-bv6ml-worker-centralus3   1     1     1     1     9h    machine.openshift.io/cluster-api-cluster=qili-preserve-az-bv6ml,machine.openshift.io/cluster-api-machine-role=worker,machine.openshift.io/cluster-api-machine-type=worker

@qiliRedHat
Copy link
Contributor Author

I think I got the root cause, GCP missed the labels in metadata.

${MACHINESET_METADATA_LABEL_PREFIX}/cluster-api-cluster: ${CLUSTER_NAME}

Compare to Azure
${MACHINESET_METADATA_LABEL_PREFIX}/cluster-api-cluster: ${CLUSTER_NAME}
${MACHINESET_METADATA_LABEL_PREFIX}/cluster-api-machine-role: infra
${MACHINESET_METADATA_LABEL_PREFIX}/cluster-api-machine-type: infra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant