Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: Add missing RBAC for nodes to cephfs chart #5126

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

Lirt
Copy link
Contributor

@Lirt Lirt commented Feb 5, 2025

Fixes: #5125

All information included in linked Issue.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@mergify mergify bot added the component/deployment Helm chart, kubernetes templates and configuration Issues/PRs label Feb 5, 2025
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, these permissions are also listed in deploy/cephfs/kubernetes/csi-provisioner-rbac.yaml

@nixpanic nixpanic added the ci/skip/multi-arch-build skip building on multiple architectures label Feb 6, 2025
@nixpanic nixpanic requested a review from a team February 6, 2025 08:10
@nixpanic nixpanic added bug Something isn't working backport-to-release-v3.13 Label to backport from devel to release-v3.13 branch labels Feb 6, 2025
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

I think something else is missing here, If these RBACs are missing how is CI passing for the helm charts?

@nixpanic @iPraveenParihar any idea?

@Lirt
Copy link
Contributor Author

Lirt commented Feb 6, 2025

That one is not logging anything useful. I tried to label/unlabel PVC and also recreate it (same for other 3 provisioner pods I have)

kl ceph-csi-fs-ceph-csi-cephfs-provisioner-68c847d56b-ptfk6

I0206 08:58:28.442213       1 utils.go:266] ID: 2407 GRPC call: /csi.v1.Identity/Probe
I0206 08:59:28.500902       1 utils.go:266] ID: 2408 GRPC call: /csi.v1.Identity/Probe
I0206 09:00:28.442390       1 utils.go:266] ID: 2409 GRPC call: /csi.v1.Identity/Probe
I0206 09:01:28.442180       1 utils.go:266] ID: 2410 GRPC call: /csi.v1.Identity/Probe

@iPraveenParihar
Copy link
Contributor

AFAIK, cephfs provisioner doesn't require node resource access. Let me try it on my machine.

@iPraveenParihar
Copy link
Contributor

Using release-v3.13 branch, It worked for me -

$ k get po --show-labels
NAME                                            READY   STATUS    RESTARTS   AGE     LABELS
csi-cephfsplugin-4fnqj                          3/3     Running   0          6m52s   app.kubernetes.io/managed-by=helm,app.kubernetes.io/name=ceph-csi-cephfs,app=ceph-csi-cephfs,chart=ceph-csi-cephfs-3-canary,component=nodeplugin,controller-revision-hash=5b98b59465,heritage=Helm,pod-template-generation=1,release=ceph-csi-cephfs
csi-cephfsplugin-provisioner-6b94b86f4d-cscs9   5/5     Running   0          3m24s   app.kubernetes.io/managed-by=helm,app.kubernetes.io/name=ceph-csi-cephfs,app=ceph-csi-cephfs,chart=ceph-csi-cephfs-3-canary,component=provisioner,heritage=Helm,pod-template-hash=6b94b86f4d,release=ceph-csi-cephfs
csi-rbdplugin-provisioner-74c9864df6-tmf55      7/7     Running   0          3m24s   app.kubernetes.io/managed-by=helm,app.kubernetes.io/name=ceph-csi-rbd,app=ceph-csi-rbd,chart=ceph-csi-rbd-3-canary,component=provisioner,heritage=Helm,pod-template-hash=74c9864df6,release=ceph-csi-rbd
csi-rbdplugin-xtfpt                             3/3     Running   0          6m39s   app.kubernetes.io/managed-by=helm,app.kubernetes.io/name=ceph-csi-rbd,app=ceph-csi-rbd,chart=ceph-csi-rbd-3-canary,component=nodeplugin,controller-revision-hash=765fc779c5,heritage=Helm,pod-template-generation=1,release=ceph-csi-rbd

$ k get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
csi-cephfs-pvc   Bound    pvc-f6697a91-52e8-4bf2-9f58-6b956e318333   1Gi        RWX            csi-cephfs-sc   <unset>                 3m15s

$ k get clusterrole csi-cephfsplugin-provisioner -oyaml | grep "node"
$

lgtm, these permissions are also listed in deploy/cephfs/kubernetes/csi-provisioner-rbac.yaml

It was added in PR #3460 here. But not sure, why was it added. I don't find any requirement of it 😕.

@nixpanic
Copy link
Member

nixpanic commented Feb 6, 2025

I think something else is missing here, If these RBACs are missing how is CI passing for the helm charts?

@nixpanic @iPraveenParihar any idea?

I wondered about that as well. Possibly minikube does not require RBACs?

@iPraveenParihar
Copy link
Contributor

iPraveenParihar commented Feb 6, 2025

@nixpanic, found this rook/rook#11697 by @Madhu-1.
It seems Node access is required for StorageClasses with volumeBindingMode: WaitForFirstConsumer

verified it

 Warning  ProvisioningFailed    56s (x8 over 2m)      cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6b94b86f4d-cscs9_8a3521ba-e903-4f81-a048-af8164a4174c  failed to get target node: nodes "dr1" is forbidden: User "system:serviceaccount:test:csi-cephfsplugin-provisioner" cannot get resource "nodes" in API group "" at the cluster scope

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

I think something else is missing here, If these RBACs are missing how is CI passing for the helm charts?

@nixpanic @iPraveenParihar any idea?

I wondered about that as well. Possibly minikube does not require RBACs?

IMO it not related to minikube it could be related to external-provisioner version.

@Lirt what is the external-provisioner version in your cluster? and also can you paste the yaml output of the cephfs deployment?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

@Mergifyio queue

Copy link
Contributor

mergify bot commented Feb 6, 2025

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 72b9d5a

@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Feb 6, 2025
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.30

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

@nixpanic, found this rook/rook#11697 by @Madhu-1. It seems Node access is required for StorageClasses with volumeBindingMode: WaitForFirstConsumer

verified it

 Warning  ProvisioningFailed    56s (x8 over 2m)      cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6b94b86f4d-cscs9_8a3521ba-e903-4f81-a048-af8164a4174c  failed to get target node: nodes "dr1" is forbidden: User "system:serviceaccount:test:csi-cephfsplugin-provisioner" cannot get resource "nodes" in API group "" at the cluster scope

we need to cover this in our E2E as well :)

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Feb 6, 2025
@Lirt
Copy link
Contributor Author

Lirt commented Feb 6, 2025

I used all default tags from helm chart 3.13.0 (if I don't have mistake in values.yaml). Here is deployment ceph-csi-fs-ceph-csi-cephfs-provisioner. Also I am using WaitForFirstConsumer in SC.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ceph-csi-fs-ceph-csi-cephfs-provisioner
  namespace: storage
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ceph-csi-cephfs
      component: provisioner
      release: ceph-csi-fs
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 50%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ceph-csi-cephfs
        chart: ceph-csi-cephfs-3.13.0
        component: provisioner
        heritage: Helm
        release: ceph-csi-fs
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - ceph-csi-cephfs
              - key: component
                operator: In
                values:
                - provisioner
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - --nodeid=$(NODE_ID)
        - --type=cephfs
        - --controllerserver=true
        - --pidlimit=-1
        - --endpoint=$(CSI_ENDPOINT)
        - --v=4
        - --drivername=$(DRIVER_NAME)
        - --setmetadata=true
        - --logslowopinterval=30s
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: DRIVER_NAME
          value: cephfs.csi.ceph.com
        - name: NODE_ID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CSI_ENDPOINT
          value: unix:///csi/csi-provisioner.sock
        image: artifactory.devops.telekom.de/quay.io/cephcsi/cephcsi:v3.13.0
        imagePullPolicy: IfNotPresent
        name: csi-cephfsplugin
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
        - mountPath: /sys
          name: host-sys
        - mountPath: /lib/modules
          name: lib-modules
          readOnly: true
        - mountPath: /dev
          name: host-dev
        - mountPath: /etc/ceph/
          name: ceph-config
        - mountPath: /etc/ceph-csi-config/
          name: ceph-csi-config
        - mountPath: /tmp/csi/keys
          name: keys-tmp-dir
      - args:
        - --csi-address=$(ADDRESS)
        - --v=1
        - --timeout=60s
        - --leader-election=true
        - --retry-interval-start=500ms
        - --extra-create-metadata=true
        - --feature-gates=HonorPVReclaimPolicy=true
        - --prevent-volume-mode-conversion=true
        env:
        - name: ADDRESS
          value: unix:///csi/csi-provisioner.sock
        image: artifactory.devops.telekom.de/registry.k8s.io/sig-storage/csi-provisioner:v5.0.1
        imagePullPolicy: IfNotPresent
        name: csi-provisioner
        resources:
          limits:
            cpu: 250m
            memory: 128Mi
          requests:
            cpu: 50m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
      - args:
        - --csi-address=$(ADDRESS)
        - --v=1
        - --timeout=60s
        - --leader-election=true
        - --extra-create-metadata=true
        - --enable-volume-group-snapshots=false
        env:
        - name: ADDRESS
          value: unix:///csi/csi-provisioner.sock
        image: artifactory.devops.telekom.de/registry.k8s.io/sig-storage/csi-snapshotter:v8.0.1
        imagePullPolicy: IfNotPresent
        name: csi-snapshotter
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
      - args:
        - --v=1
        - --csi-address=$(ADDRESS)
        - --timeout=60s
        - --leader-election
        - --retry-interval-start=500ms
        - --handle-volume-inuse-error=false
        - --feature-gates=RecoverVolumeExpansionFailure=true
        env:
        - name: ADDRESS
          value: unix:///csi/csi-provisioner.sock
        image: artifactory.devops.telekom.de/registry.k8s.io/sig-storage/csi-resizer:v1.11.1
        imagePullPolicy: IfNotPresent
        name: csi-resizer
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 50m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
      - args:
        - --type=liveness
        - --endpoint=$(CSI_ENDPOINT)
        - --metricsport=8080
        - --metricspath=/metrics
        - --polltime=60s
        - --timeout=3s
        env:
        - name: CSI_ENDPOINT
          value: unix:///csi/csi-provisioner.sock
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: artifactory.devops.telekom.de/quay.io/cephcsi/cephcsi:v3.13.0
        imagePullPolicy: IfNotPresent
        name: liveness-prometheus
        ports:
        - containerPort: 8080
          name: metrics
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/control-plane: ""
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: ceph-csi-fs-ceph-csi-cephfs-provisioner
      serviceAccountName: ceph-csi-fs-ceph-csi-cephfs-provisioner
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      volumes:
      - emptyDir:
          medium: Memory
        name: socket-dir
      - hostPath:
          path: /sys
          type: ""
        name: host-sys
      - hostPath:
          path: /lib/modules
          type: ""
        name: lib-modules
      - hostPath:
          path: /dev
          type: ""
        name: host-dev
      - configMap:
          defaultMode: 420
          name: ceph-config-cephfs
        name: ceph-config
      - configMap:
          defaultMode: 420
          name: ceph-csi-config-cephfs
        name: ceph-csi-config
      - emptyDir:
          medium: Memory
        name: keys-tmp-dir

Storage Class:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard-rwx-retain
parameters:
  clusterID: ...
  csi.storage.k8s.io/controller-expand-secret-name: ceph-rwx-pool-01
  csi.storage.k8s.io/controller-expand-secret-namespace: storage-namespace
  csi.storage.k8s.io/node-stage-secret-name: ceph-rwx-pool-01
  csi.storage.k8s.io/node-stage-secret-namespace: storage-namespace
  csi.storage.k8s.io/provisioner-secret-name: ceph-rwx-pool-01
  csi.storage.k8s.io/provisioner-secret-namespace: storage-namespace
  fsName: ...
  pool: ...
provisioner: cephfs.csi.ceph.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

@mergify mergify bot merged commit 72b9d5a into ceph:devel Feb 6, 2025
49 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-release-v3.13 Label to backport from devel to release-v3.13 branch bug Something isn't working ci/skip/multi-arch-build skip building on multiple architectures component/deployment Helm chart, kubernetes templates and configuration Issues/PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing "nodes" RBAC for cephfs provisioner clusterrole (v3.13.0)
5 participants