-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcmu-runner auto restart when timeout and unfound targets. #665
Comments
As I tried to restart the tcmu-runner & rbd-target-api at 66.66.66.2 it fixed. |
What's your
The rbd handler will set the |
The settings of tcmu-runner service as follows:
I didn't found log about why tcmu-runner restart in /var/log/tcmu-runner.log. What I pasted in issue are all the logs during my test. In ceph cluster I got one pg inactive.
I expected the gw will auto connect(log) afer disconnect by time out, but i cant discovery or login to the gw on the node which tcmu-runner auto-restarted. |
Are you using container or something else ? I know in some container envs the container service will detect the tcmu-runner service's status, if it's dead the container will try to pull it up. As default the tcmu-runner itself won't auto-restart by systemd service, so I am curious how this could happen for you ?
Were you building the tcmu-runner from source ? While in another node, you are building the tcmu-runner with a higher ceph version ???? |
Thank you for replying.
|
What I can find on gateway node(storage node) is.
What in client iscsid service
|
I will set "log_level = 5" in /etc/tcmu/tcmu.conf and rebuild this error |
Hello!
tcmu version: tcmu-runner-1.4.0
Linux version & kernal version: CentOS7.7.1908 kernel 3.10.0-1127
Ceph version: 13.2.8 minic
272 OSD in 8 storage nodes
32 rbd-volumes(LUNs) in 8 clients; 8 gateways(1 client log in 1 gw); 1targets
cluser IO:
650MB/s 1K IOps write
What I do
To simulate pg inactive as well as watching the relative changes in cluster IO and iscsi LUN.
What I expected to see
LUN time out -> dissconnected -> re-login
BUT, what didn't perform as expected:
(1).Some tcmu-runner service seems restart after timeout.(4/8 tcmu-runner service were restarted.)
(2).Cannot discovery target using gateway IP:port (see e.g)
e.g:
The vm-node11(client) was connected to 66.66.66.2:3260,but after the tcmu-runner auto restart, the vm can't discovery targets from command. ----iscsiadm -m discovery -t sendtargets -p 66.66.66.2:3260
Command out put on client:
Last 25 lines log for tcmu-runner at 66.66.66.2 :
25970 2021-07-26 14:47:10.815 1825526 [WARN] tcmu_print_cdb_info:1193 rbd/rbd.xxx30T-11-1: a3 c 1 12 0 0 0 0 2 0 0 0 is not supported
25971 2021-07-26 14:47:10.826 1825526 [WARN] tcmu_print_cdb_info:1193 rbd/rbd.xxx30T-11-2: a3 c 1 12 0 0 0 0 2 0 0 0 is not supported
25972 2021-07-26 14:47:10.840 1825526 [WARN] tcmu_print_cdb_info:1193 rbd/rbd.xxx30T-11-3: a3 c 1 12 0 0 0 0 2 0 0 0 is not supported
25973 2021-07-26 14:47:10.840 1825526 [INFO] alua_implicit_transition:569 rbd/rbd.xxx30T-11-1: Starting lock acquisition operation.
25974 2021-07-26 14:47:10.848 1825526 [WARN] tcmu_print_cdb_info:1193 rbd/rbd.xxx30T-11-4: a3 c 1 12 0 0 0 0 2 0 0 0 is not supported
25975 2021-07-26 14:47:10.849 1825526 [INFO] alua_implicit_transition:569 rbd/rbd.xxx30T-11-2: Starting lock acquisition operation.
25976 2021-07-26 14:47:10.857 1825526 [INFO] alua_implicit_transition:569 rbd/rbd.xxx30T-11-3: Starting lock acquisition operation.
25977 2021-07-26 14:47:10.865 1825526 [INFO] alua_implicit_transition:569 rbd/rbd.xxx30T-11-4: Starting lock acquisition operation.
25978 2021-07-26 14:47:10.921 1825526 [WARN] tcmu_rbd_lock:757 rbd/rbd.xxx30T-11-1: Acquired exclusive lock.
25979 2021-07-26 14:47:10.927 1825526 [WARN] tcmu_rbd_lock:757 rbd/rbd.xxx30T-11-2: Acquired exclusive lock.
25980 2021-07-26 14:47:10.997 1825526 [WARN] tcmu_rbd_lock:757 rbd/rbd.xxx30T-11-4: Acquired exclusive lock.
25981 2021-07-26 14:47:11.013 1825526 [WARN] tcmu_rbd_lock:757 rbd/rbd.xxx30T-11-3: Acquired exclusive lock.
25982 2021-07-26 14:53:21.238 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25983 2021-07-26 14:53:21.238 1825526 [ERROR] tcmu_notify_conn_lost:187 rbd/rbd.xxx30T-11-3: Handler connection lost (lock state 1)
25984 2021-07-26 14:53:21.239 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25985 2021-07-26 14:53:21.240 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25986 2021-07-26 14:53:21.240 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25987 2021-07-26 14:53:21.240 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25988 2021-07-26 14:53:21.240 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25989 2021-07-26 14:53:21.240 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25990 2021-07-26 14:53:38.087 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25991 2021-07-26 14:53:38.093 1825526 [ERROR] tcmu_rbd_handle_timedout_cmd:992 rbd/rbd.xxx30T-11-3: Timing out cmd.
25992 2021-07-26 14:53:38.097 1825526 [INFO] tgt_port_grp_recovery_thread_fn:245: Disabled iscsi/iqn.2021-04.com.fixxxxxxe.iscsi-gw:xxxtor/tpgt_7.
25993 2021-07-26 14:54:41.430 1832626 [INFO] dyn_config_start:422: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
LOG END AT HERE FOR 3 HOURS.
END
Thank you for reading~
The text was updated successfully, but these errors were encountered: