Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes cannot be mounted if a portal is unavailable #115

Open
snir-dream opened this issue Dec 10, 2024 · 2 comments
Open

Volumes cannot be mounted if a portal is unavailable #115

snir-dream opened this issue Dec 10, 2024 · 2 comments
Assignees

Comments

@snir-dream
Copy link

Describe the bug
We are simulating network failures to test our HA performance and found that once one of the iSCSI portals becomes unavailable, the CSI fails to mount all volumes.
In the log below, the network link to 10.0.20.10 was blocked using an iptable rule that drops all packets to/from that IP.

seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:26 iscsi.go:160: waitForPathToExistImpl (/dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4)
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:26 iscsi.go:170: [0] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:26 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:27 iscsi.go:170: [1] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:27 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:28 iscsi.go:170: [2] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:28 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node I1210 17:11:29.194677       1 node.go:96] >>> /csi.v1.Identity/Probe
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:29 iscsi.go:170: [3] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:29 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:30 iscsi.go:170: [4] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:30 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node I1210 17:11:31.036371       1 node.go:96] >>> /csi.v1.Node/NodeGetCapabilities
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:31 iscsi.go:170: [5] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:31 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:32 iscsi.go:170: [6] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:32 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:33 iscsi.go:170: [7] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:33 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:34 iscsi.go:170: [8] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:34 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:35 iscsi.go:170: [9] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:35 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:36 iscsi.go:170: [10] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:36 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:37 iscsi.go:170: [11] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:37 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:38 iscsi.go:170: [12] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:38 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:39 iscsi.go:170: [13] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:39 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:40 iscsi.go:170: [14] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:40 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:41 iscsi.go:170: [15] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:41 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:42 iscsi.go:170: [16] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:42 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:43 iscsi.go:170: [17] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:43 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:44 iscsi.go:170: [18] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:44 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:45 iscsi.go:170: [19] os stat device: exist false device /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:45 iscsi.go:175: Device not found for: /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:45 iscsi.go:199: device does NOT exist [20*1s] (/dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4)
seagate-exos-x-csi-node-server-kmm5x seagate-exos-x-csi-node DEBUG: 2024/12/10 17:11:45 iscsi.go:316: waitForPathToExist: exists=false err=stat /dev/disk/by-path/ip-10.0.20.10:3260-iscsi-iqn.1988-11.com.dell:01.array.bc305b6893fb-lun-4: no such file or directory 

We think the problem is the error handling inside the waitForPathToExistImpl function. When a path is unavailable the physical device will not be created and this function fails with non-nil err value which fails the entire mount procedure. We think that this function shouldn't propogate existance errors.

To Reproduce

  1. Start a pod with volume. This step is important to ensure the session to the portal is logged-in.
  2. Block access to one of the paths by running
iptables -A INPUT -s 10.0.20.10 -j DROP
iptables -A OUTPUT -d 10.0.20.10 -j DROP
  1. Restart the pod.

Expected behavior
The pod is restarted and enters the "running" state.

Screenshots
None

Storage System (please complete the following information):

  • Vendor: DELL
  • Model: ME5012
  • Firmware Version: ME5.1.2.1.1

Environment:

  • Kubernetes version: v1.30.5+rke2r1
  • Host OS: Ubuntu 24.04.1 LTS

Additional context
We have found the following two issues might affect reproducibility:

  1. We previously had a different issue in a similar scenario where a volume-mount hit a gRPC timeout if some of the portals were unavailable. To resolve this we lowered the discovery timeout and max retries at the iscsid.conf by adding -
discovery.sendtargets.timeo.login_timeout = 5
discovery.sendtargets.reopen_max = 1
  1. If the session on the unavailable portal is not logged-in, this issue will not reproduce because the login will fail and continue to the next portal as expected, this will show up in the logs as -
seagate-exos-x-csi-node-server-4bh6w seagate-exos-x-csi-node DEBUG: 2024/12/10 19:28:02 iscsiadm.go:68: Output of iscsiadm command: {output: iscsiadm: connect to 10.0.20.10 timed out\niscsiadm: connect to 10.0.20.10 timed out\niscsiadm: connection login retries (reopen_max) 1 exceeded\niscsiadm: Could not perform SendTargets discovery: iSCSI PDU timed out\n}
@David-T-White David-T-White self-assigned this Dec 10, 2024
@David-T-White
Copy link
Collaborator

Hi, thanks for the detailed report, as well as the iscsid.conf suggestions. We will review and update you soon.

@unclebenel
Copy link

Any news?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants