nvmeof Gateway fails to start up in brand new cluster #669

madkiss · 2024-05-23T12:59:17Z

I am trying to set up ceph-nvmeof 1.2.9 on Reef. Fresh cluster installed a few hours ago with cephadm. Deployed as per documentation. nvmeof fails to come up, logging messages I see are

May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw void NVMeofGwMonitorClient::tick()
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw bool get_gw_state(const char*, const std::map<std::pair<std::__cxx11::basic_string, std::__cxx11::basic_string >, std::map<std::__cxx11::basic_string, NvmeGwState> >&, const NvmeGroupKey&, const NvmeGwId&, NvmeGwState&) can not find group (nvme,None) old map map: {}
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw void NVMeofGwMonitorClient::send_beacon() sending beacon as gid 24694 availability 0 osdmap_epoch 0 gwmap_epoch 0
May 23 12:55:14 ceph2 bash[76745]: debug 2024-05-23T12:55:14.333+0000 785f205e5700 0 can't decode unknown message type 2049 MSG_AUTH=17
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f609868640 0 client.0 ms_handle_reset on v2:10.4.3.11:3300/0
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f609868640 0 client.0 ms_handle_reset on v2:10.4.3.11:3300/0
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.337+0000 70f609868640 0 nvmeofgw virtual bool NVMeofGwMonitorClient::ms_dispatch2(ceph::ref_t&) got map type 4
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.337+0000 70f609868640 0 ms_deliver_dispatch: unhandled message 0x5e584cc24820 mon_map magic: 0 from mon.1 v2:10.4.3.11:3300/0

Another message is

May 23 12:57:26 ceph1 bash[146371]: 1: [v2:10.4.3.11:3300/0,v1:10.4.3.11:6789/0] mon.ceph2
May 23 12:57:26 ceph1 bash[146371]: 2: [v2:10.4.3.12:3300/0,v1:10.4.3.12:6789/0] mon.ceph3
May 23 12:57:26 ceph1 bash[146371]: -12> 2024-05-23T12:57:24.746+0000 73e68f1de640 0 nvmeofgw virtual bool NVMeofGwMonitorClient::ms_dispatch2(ceph::ref_t&) got map type 4
May 23 12:57:26 ceph1 bash[146371]: -11> 2024-05-23T12:57:24.746+0000 73e68f1de640 0 ms_deliver_dispatch: unhandled message 0x5757d2e9d380 mon_map magic: 0 from mon.0 v2:10.4.3.10:3300/0
May 23 12:57:26 ceph1 bash[146371]: -10> 2024-05-23T12:57:24.746+0000 73e68f1de640 10 monclient: handle_config config(2 keys)
May 23 12:57:26 ceph1 bash[146371]: -9> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 set_mon_vals callback ignored cluster_network
May 23 12:57:26 ceph1 bash[146371]: -8> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 set_mon_vals callback ignored container_image
May 23 12:57:26 ceph1 bash[146371]: -7> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 nvmeofgw NVMeofGwMonitorClient::init()::<lambda()> nvmeof monc config notify callback
May 23 12:57:26 ceph1 bash[146371]: -6> 2024-05-23T12:57:25.654+0000 73e68d1da640 10 monclient: tick
May 23 12:57:26 ceph1 bash[146371]: -5> 2024-05-23T12:57:25.654+0000 73e68d1da640 10 monclient: _check_auth_tickets
May 23 12:57:26 ceph1 bash[146371]: -4> 2024-05-23T12:57:26.654+0000 73e68d1da640 10 monclient: tick
May 23 12:57:26 ceph1 bash[146371]: -3> 2024-05-23T12:57:26.654+0000 73e68d1da640 10 monclient: _check_auth_tickets
May 23 12:57:26 ceph1 bash[146371]: -2> 2024-05-23T12:57:26.742+0000 73e68b1d6640 0 nvmeofgw void NVMeofGwMonitorClient::tick()
May 23 12:57:26 ceph1 bash[146371]: -1> 2024-05-23T12:57:26.742+0000 73e68b1d6640 4 nvmeofgw void NVMeofGwMonitorClient::disconnect_panic() Triggering a panic upon disconnection from the monitor, elapsed 102, configured disconnect panic duration 100
May 23 12:57:26 ceph1 bash[146371]: 0> 2024-05-23T12:57:26.746+0000 73e68b1d6640 -1 *** Caught signal (Aborted) **
May 23 12:57:26 ceph1 bash[146371]: in thread 73e68b1d6640 thread_name:safe_timer

Cluster has a cluster network configured and I saw some messages about the option not being able to be changed at runtime. I did add it to ceph.conf though for the target, so that should be good. Any help will be greatly appreciated. Thank you in advance.

madkiss · 2024-05-23T13:30:06Z

Some additional infos as I forgot to put those in the first mail. Target version is 1.2.9, Ceph version is ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable).

Set up is brand new with no previous configuration in place. No extraordinary strange configuration either. Any help will be greatly appreciated. Thank you very much in advance again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvmeof Gateway fails to start up in brand new cluster #669

nvmeof Gateway fails to start up in brand new cluster #669

madkiss commented May 23, 2024

madkiss commented May 23, 2024

nvmeof Gateway fails to start up in brand new cluster #669

nvmeof Gateway fails to start up in brand new cluster #669

Comments

madkiss commented May 23, 2024

madkiss commented May 23, 2024