Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA CSI sidecars using K8s Deployment #602

Closed

Conversation

nitishkumar4
Copy link
Contributor

Pull request checklist


Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature Enhancement
  • Test Automation
  • Code Refactoring (no functional changes, no api changes)
  • Build related changes
  • Community Operator listing
  • Other (please describe):

What is the current behavior?

What is the new behavior?

How risky is this change?

  • Small, isolated change
  • Medium, requires regression testing
  • Large, requires functional and regression testing

@nitishkumar4 nitishkumar4 force-pushed the poc_single_sidecar_deployment branch from 2e0fc2c to d760935 Compare February 8, 2022 06:20
- Updated generated files using `make manifests`
- sidecar pods are combined into single pod with multiple containers
- Create Using Deployment over Statefulsets for CSI Sidecars.md
- Handling node tolerations
- added pod anti affinity to ensure only one sidecar pod is scheduled per node
- added liveness probe in containers
- Added upgrade strategy
- different cluster roles for sidecar controllers
@nitishkumar4
Copy link
Contributor Author

nitishkumar4 commented Feb 8, 2022

Given below are resources which are added/updated as part of this Pull request.

  • Single service account ibm-spectrum-scale-csi-controller is used for controller pod with sidecars containers.
  • ClusterRole includes permissions for Kind: Leases
  • ClusterRoleBinding binds all the clusterroles to the single service account used by the controller pod.
  • Sidecars are deployed using Deployment instead of Statefulsets .
  • Sidecars have more than one replica and leader election enabled which uses Kind: Leases to store lease holder information.
  • Controller pods using podAntiAffinity to ensure each pod is scheduled in a different node.
  • Deployment UpdateStrategy is updated to ensure podAntiAffinity doesn't break the cluster.
  • CSIScaleOperator Controller removes deprecated resources i.e Statefulsets and ServiceAccounts from the cluster.
  • Pod toleration is added and timout is reduced to 60 sec from 300 sec(default) to reduce pod eviction timeout.
  • Sidecar controller pods uses nodeSelector controllerNodeSelector: {scale = true} to schedule pods.

ServiceAccounts

# oc get serviceaccount -n ibm-spectrum-scale-csi-driver | grep ibm-spectrum-scale-csi
ibm-spectrum-scale-csi-controller   2         18m
ibm-spectrum-scale-csi-node         2         23m
ibm-spectrum-scale-csi-operator     2         4d20h

ClusterRoles

#oc get clusterrole -n ibm-spectrum-scale-csi-driver | grep ibm-spectrum-scale-csi
ibm-spectrum-scale-csi-attacher                                             2022-02-08T07:08:17Z
ibm-spectrum-scale-csi-node                                                 2022-02-08T07:08:19Z
ibm-spectrum-scale-csi-operator                                             2022-01-07T09:24:46Z
ibm-spectrum-scale-csi-provisioner                                          2022-02-08T07:08:18Z
ibm-spectrum-scale-csi-resizer                                              2022-02-08T07:08:18Z
ibm-spectrum-scale-csi-snapshotter                                          2022-02-08T07:08:18Z
# oc describe clusterrole ibm-spectrum-scale-csi-attacher
Name:         ibm-spectrum-scale-csi-attacher
...
PolicyRule:
  Resources                                Non-Resource URLs  Resource Names  Verbs
  ---------                                -----------------  --------------  -----
  leases.coordination.k8s.io               []                 []              [create get list patch update delete]
  persistentvolumes                        []                 []              [get list watch patch]
  volumeattachments.storage.k8s.io         []                 []              [get list watch patch]
  events                                   []                 []              [get list watch update]
  csinodes.storage.k8s.io                  []                 []              [get list watch]
  volumeattachments.storage.k8s.io/status  []                 []              [patch]
# oc describe clusterrole ibm-spectrum-scale-csi-provisioner
Name:         ibm-spectrum-scale-csi-provisioner
...
PolicyRule:
  Resources                                       Non-Resource URLs  Resource Names  Verbs
  ---------                                       -----------------  --------------  -----
  leases.coordination.k8s.io                      []                 []              [create get list patch update delete]
  persistentvolumes                               []                 []              [get list watch create delete]
  persistentvolumeclaims                          []                 []              [get list watch update]
  nodes                                           []                 []              [get list watch]
  csinodes.storage.k8s.io                         []                 []              [get list watch]
  storageclasses.storage.k8s.io                   []                 []              [get list watch]
  volumeattachments.storage.k8s.io                []                 []              [get list watch]
  volumesnapshotcontents.snapshot.storage.k8s.io  []                 []              [get list]
  volumesnapshots.snapshot.storage.k8s.io         []                 []              [get list]
  events                                          []                 []              [list watch create update patch]
# oc describe clusterrole ibm-spectrum-scale-csi-resizer
Name:         ibm-spectrum-scale-csi-resizer
...
PolicyRule:
  Resources                      Non-Resource URLs  Resource Names  Verbs
  ---------                      -----------------  --------------  -----
  leases.coordination.k8s.io     []                 []              [create get list patch update delete]
  persistentvolumes              []                 []              [get list watch patch update]
  persistentvolumeclaims         []                 []              [get list watch]
  pods                           []                 []              [get list watch]
  storageclasses.storage.k8s.io  []                 []              [get list watch]
  events                         []                 []              [list watch create update patch]
  persistentvolumeclaims/status  []                 []              [patch update]
# oc describe clusterrole ibm-spectrum-scale-csi-snapshotter
Name:         ibm-spectrum-scale-csi-snapshotter
...
PolicyRule:
  Resources                                              Non-Resource URLs  Resource Names  Verbs
  ---------                                              -----------------  --------------  -----
  leases.coordination.k8s.io                             []                 []              [create get list patch update delete]
  volumesnapshotcontents.snapshot.storage.k8s.io         []                 []              [create get list watch update delete patch]
  volumesnapshotclasses.snapshot.storage.k8s.io          []                 []              [get list watch]
  events                                                 []                 []              [list watch create update patch]
  volumesnapshotcontents.snapshot.storage.k8s.io/status  []                 []              [update patch]

ClusterRoleBindings

# oc get clusterrolebinding -n ibm-spectrum-scale-csi-driver | grep ibm-spectrum-scale-csi
ibm-spectrum-scale-csi-attacher                                             ClusterRole/ibm-spectrum-scale-csi-attacher                                             13m
ibm-spectrum-scale-csi-node                                                 ClusterRole/ibm-spectrum-scale-csi-node                                                 13m
ibm-spectrum-scale-csi-operator                                             ClusterRole/ibm-spectrum-scale-csi-operator                                             10d
ibm-spectrum-scale-csi-provisioner                                          ClusterRole/ibm-spectrum-scale-csi-provisioner                                          13m
ibm-spectrum-scale-csi-resizer                                              ClusterRole/ibm-spectrum-scale-csi-resizer                                              13m
ibm-spectrum-scale-csi-snapshotter                                          ClusterRole/ibm-spectrum-scale-csi-snapshotter                                          13m
# oc describe clusterrolebinding ibm-spectrum-scale-csi-attacher
Name:         ibm-spectrum-scale-csi-attacher
...
Role:
  Kind:  ClusterRole
  Name:  ibm-spectrum-scale-csi-attacher
Subjects:
  Kind            Name                               Namespace
  ----            ----                               ---------
  ServiceAccount  ibm-spectrum-scale-csi-controller  ibm-spectrum-scale-csi-driver
# oc describe clusterrolebinding ibm-spectrum-scale-csi-provisioner
Name:         ibm-spectrum-scale-csi-provisioner
...
Role:
  Kind:  ClusterRole
  Name:  ibm-spectrum-scale-csi-provisioner
Subjects:
  Kind            Name                               Namespace
  ----            ----                               ---------
  ServiceAccount  ibm-spectrum-scale-csi-controller  ibm-spectrum-scale-csi-driver
# oc describe clusterrolebinding ibm-spectrum-scale-csi-snapshotter
Name:         ibm-spectrum-scale-csi-snapshotter
...
Role:
  Kind:  ClusterRole
  Name:  ibm-spectrum-scale-csi-snapshotter
Subjects:
  Kind            Name                               Namespace
  ----            ----                               ---------
  ServiceAccount  ibm-spectrum-scale-csi-controller  ibm-spectrum-scale-csi-driver
# oc describe clusterrolebinding ibm-spectrum-scale-csi-resizer
Name:         ibm-spectrum-scale-csi-resizer
...
Role:
  Kind:  ClusterRole
  Name:  ibm-spectrum-scale-csi-resizer
Subjects:
  Kind            Name                               Namespace
  ----            ----                               ---------
  ServiceAccount  ibm-spectrum-scale-csi-controller ibm-spectrum-scale-csi-driver

Sidecar Controller Deployment

# oc get deployment ibm-spectrum-scale-csi-controller -n ibm-spectrum-scale-csi-driver
NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
ibm-spectrum-scale-csi-controller   2/2     2            2           4m41s
# oc get replicaset -n ibm-spectrum-scale-csi-driver
NAME                                           DESIRED   CURRENT   READY   AGE
ibm-spectrum-scale-csi-controller-86b77885b9   2         2         2       4m43s
# oc get pods -n ibm-spectrum-scale-csi-driver | grep ibm-spectrum-scale-csi-controller
NAME                                                 READY   STATUS    RESTARTS   AGE
ibm-spectrum-scale-csi-controller-86b77885b9-l95wf   4/4     Running   3          4m39s
ibm-spectrum-scale-csi-controller-86b77885b9-rbqvc   4/4     Running   4          4m39s
...

Leases

# kubectl get leases -n ibm-spectrum-scale-csi-driver  | grep spectrumscale
NAME                                                    HOLDER                                              AGE
external-attacher-leader-spectrumscale-csi-ibm-com      ibm-spectrum-scale-csi-controller-86b77885b9-l95wf   4d5h
external-resizer-spectrumscale-csi-ibm-com              ibm-spectrum-scale-csi-controller-86b77885b9-l95wf   4d5h
external-snapshotter-leader-spectrumscale-csi-ibm-com   ibm-spectrum-scale-csi-controller-86b77885b9-l95wf   4d5h
spectrumscale-csi-ibm-com                               ibm-spectrum-scale-csi-controller-86b77885b9-l95wf   4d5h

NodeSelector

# oc describe csiscaleoperator ibm-spectrum-scale-csi -n ibm-spectrum-scale-csi-driver
Spec:
  Clusters:
  ...
  Controller Node Selector:
    Key:                  scale
    Value:                true
  ...
  Plugin Node Selector:
    Key:    scale
    Value:  true

LivenessProbe for Sidecar containers

# oc describe pod ibm-spectrum-scale-csi-controller-dc88d5b85-k6ljv -n ibm-spectrum-scale-csi-driver
spec:
  Containers:
    ibm-spectrum-scale-csi-attacher
      Port:          8080/TCP
      Args:
        ...
        --http-endpoint=:8080
      Liveness:       http-get http://:http-endpoint/healthz/leader-election delay=30s timeout=10s period=20s
      ...
    ibm-spectrum-scale-csi-provisioner
      Port:          8081/TCP
      Args:
        ...
        --http-endpoint=:8081
      Liveness:       http-get http://:http-endpoint/healthz/leader-election delay=30s timeout=10s period=20s
    ...
    ibm-spectrum-scale-csi-snapshotter
      Port:          8082/TCP
      Args:
        ...
        --http-endpoint=:8082
      Liveness:       http-get http://:http-endpoint/healthz/leader-election delay=30s timeout=10s period=20s
    ...
    ibm-spectrum-scale-csi-resizer
      Port:          8083/TCP
      Args:
        ...
        --http-endpoint=:8083
      Liveness:       http-get http://:http-endpoint/healthz/leader-election delay=30s timeout=10s period=20s

PodAntiAffinity

# oc get deploy ibm-spectrum-scale-csi-controller -n ibm-spectrum-scale-csi-driver -o yaml
Spec:
  ...
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 50%
    type: RollingUpdate
  ...
  template:
    spec: 
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - ibm-spectrum-scale-csi-controller
            topologyKey: kubernetes.io/hostname

Pod Toleration

# oc describe pod ibm-spectrum-scale-csi-controller-5f7cc965f9-qphtr -n ibm-spectrum-scale-csi-driver
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 60s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 60s

@nitishkumar4 nitishkumar4 force-pushed the poc_single_sidecar_deployment branch from dec5301 to 31fbf00 Compare February 8, 2022 13:30
@nitishkumar4 nitishkumar4 changed the title Poc single sidecar deployment HA CSI sidecars using K8s Deployment Feb 8, 2022
@nitishkumar4 nitishkumar4 marked this pull request as ready for review February 8, 2022 14:15
"file": "/home/madhu/goop_code/ibm-spectrum-scale-csi/operator/controllers/config/constants.go",
"code": "109: \n110: \tDefaultImagePullSecret = \"ibm-spectrum-scale-csi-registrykey\"\n111: \tDefaultLogLevel = \"DEBUG\"\n",
"line": "110",
"file": "/mnt/c/Users/NitishKumar/Documents/GitHub/IBM/ibm-spectrum-scale-csi/operator/controllers/config/constants.go",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have goes result checked in

@@ -1,4 +1,5 @@
main.go:69:10: if block ends with a return statement, so drop this else and outdent its block
main.go:44:7: exported const OCPControllerNamespace should have comment or be unexported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need golint result checked in

Copy link
Contributor Author

@nitishkumar4 nitishkumar4 Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gosec and golint outputs are stored in location ibm-spectrum-scale-csi/operator/doc/dev/security . This is handled in PR#60. Output is redirected here so that it can be shared if requried.

Are you asking why these files are checked in as part of current PR? gosec and golint is linked to make command so that once development is done and developer test the build using make command along with it code scan also takes place. This would ensure before a PR gets merged the code is atleast once scanned using gosec and golint.

Let me know your views on this.

@nitishkumar4
Copy link
Contributor Author

nitishkumar4 commented Feb 9, 2022

Upgrade

  • All existing nodeselectors i.e attacherNodeSelector, provisionerNodeSelector, snapshotterNodeSelector & resizerNodeSelector gets deleted during upgrade.

  • Controller pod uses node selector controllerNodeSelector to schedule pods.

CSI version 2.3.1 (Ansible Operator)

# oc describe csiscaleoperator -n ibm-spectrum-scale-csi -n ibm-spectrum-scale-csi-driver
Spec:
...
  Attacher Node Selector:
    Key:    scale
    Value:  true
    Key:    sidecar
    Value:  attacher
  Provisioner Node Selector:
    Key:    scale
    Value:  true
    Key:    sidecar
    Value:  provisioner
  Snapshotter Node Selector:
    Key:    scale
    Value:  true
    Key:    sidecar
    Value:  snapshotter
...
# oc get nodes --show-labels | grep worker
worker0.csitest.cp.fyre.ibm.com   ... scale=true,sidecar=attacher
worker1.csitest.cp.fyre.ibm.com   ... scale=true,sidecar=provisioner
worker3.csitest.cp.fyre.ibm.com   ... scale=true,sidecar=snapshotter
# oc get pods -o wide -n ibm-spectrum-scale-csi-driver
ibm-spectrum-scale-csi-attacher-0                  1/1     Running    ... worker0.csitest.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-provisioner-0               1/1     Running    ... worker1.csitest.cp.fyre.ibm.com
ibm-spectrum-scale-csi-snapshotter-0               1/1     Running    ... worker3.csitest.cp.fyre.ibm.com

CSI version 2.5.0 (Go Operator-Statefulset based)

# oc describe csiscaleoperator -n ibm-spectrum-scale-csi -n ibm-spectrum-scale-csi-driver
Spec:
  Attacher Node Selector:
    Key:    scale
    Value:  true
    Key:    sidecar
    Value:  attacher
  Plugin Node Selector:
    Key:    scale
    Value:  true
  Provisioner Node Selector:
    Key:    scale
    Value:  true
    Key:    sidecar
    Value:  provisioner
  Resizer Node Selector:
    Key:    scale
    Value:  true
    Key:    sidecar
    Value:  resizer
  Snapshotter Node Selector:
    Key:    scale
    Value:  true
    Key:    sidecar
    Value:  snapshotter
# oc get pods -o wide
ibm-spectrum-scale-csi-attacher-0         ...   worker0.csitest.cp.fyre.ibm.com
ibm-spectrum-scale-csi-provisioner-0      ...   worker1.csitest.cp.fyre.ibm.com
ibm-spectrum-scale-csi-resizer-0          ...   worker2.csitest.cp.fyre.ibm.com
ibm-spectrum-scale-csi-snapshotter-0      ...   worker3.csitest.cp.fyre.ibm.com

CSI version 2.5.0 (Go Operator-Deployment based)

# oc describe csiscaleoperator -n ibm-spectrum-scale-csi -n ibm-spectrum-scale-csi-driver
Spec:
  ...
  Controller Node Selector:
    Key:                  scale
    Value:                true
  Plugin Node Selector:
    Key:    scale
    Value:  true
# oc describe pod ibm-spectrum-scale-csi-controller-5f7cc965f9-z5gw2 -n ibm-spectrum-scale-csi-driver
Node-Selectors:              scale=true
# oc get pods -o wide -n ibm-spectrum-scale-csi-driver
ibm-spectrum-scale-csi-controller-5f7cc965f9-ddqvh   ...    worker0.csitest.cp.fyre.ibm.com
ibm-spectrum-scale-csi-controller-5f7cc965f9-z5gw2   ...    worker1.csitest.cp.fyre.ibm.com

@deeghuge deeghuge changed the title HA CSI sidecars using K8s Deployment HA CSI sidecars using K8s Deployment - Target 2.6.0 - do not merge Feb 15, 2022
@Jainbrt Jainbrt added this to the v2.6.0 milestone Feb 15, 2022
@Jainbrt Jainbrt changed the title HA CSI sidecars using K8s Deployment - Target 2.6.0 - do not merge HA CSI sidecars using K8s Deployment Apr 1, 2022
@nitishkumar4
Copy link
Contributor Author

This feature is not required anymore. Alternate approach is already merged. #683. Closing the PR. Please re-open if required in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PVC atached to a pod doesn't migrate across nodes when Kubelet Service is stopped
3 participants