Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvmeof Gateway fails to start up in brand new cluster #669

Open
madkiss opened this issue May 23, 2024 · 1 comment
Open

nvmeof Gateway fails to start up in brand new cluster #669

madkiss opened this issue May 23, 2024 · 1 comment

Comments

@madkiss
Copy link

madkiss commented May 23, 2024

I am trying to set up ceph-nvmeof 1.2.9 on Reef. Fresh cluster installed a few hours ago with cephadm. Deployed as per documentation. nvmeof fails to come up, logging messages I see are

May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw void NVMeofGwMonitorClient::tick()
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw bool get_gw_state(const char*, const std::map<std::pair<std::__cxx11::basic_string, std::__cxx11::basic_string >, std::map<std::__cxx11::basic_string, NvmeGwState> >&, const NvmeGroupKey&, const NvmeGwId&, NvmeGwState&) can not find group (nvme,None) old map map: {}
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f605860640 0 nvmeofgw void NVMeofGwMonitorClient::send_beacon() sending beacon as gid 24694 availability 0 osdmap_epoch 0 gwmap_epoch 0
May 23 12:55:14 ceph2 bash[76745]: debug 2024-05-23T12:55:14.333+0000 785f205e5700 0 can't decode unknown message type 2049 MSG_AUTH=17
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f609868640 0 client.0 ms_handle_reset on v2:10.4.3.11:3300/0
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.333+0000 70f609868640 0 client.0 ms_handle_reset on v2:10.4.3.11:3300/0
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.337+0000 70f609868640 0 nvmeofgw virtual bool NVMeofGwMonitorClient::ms_dispatch2(ceph::ref_t&) got map type 4
May 23 12:55:14 ceph2 bash[119529]: 2024-05-23T12:55:14.337+0000 70f609868640 0 ms_deliver_dispatch: unhandled message 0x5e584cc24820 mon_map magic: 0 from mon.1 v2:10.4.3.11:3300/0

Another message is

May 23 12:57:26 ceph1 bash[146371]: 1: [v2:10.4.3.11:3300/0,v1:10.4.3.11:6789/0] mon.ceph2
May 23 12:57:26 ceph1 bash[146371]: 2: [v2:10.4.3.12:3300/0,v1:10.4.3.12:6789/0] mon.ceph3
May 23 12:57:26 ceph1 bash[146371]: -12> 2024-05-23T12:57:24.746+0000 73e68f1de640 0 nvmeofgw virtual bool NVMeofGwMonitorClient::ms_dispatch2(ceph::ref_t&) got map type 4
May 23 12:57:26 ceph1 bash[146371]: -11> 2024-05-23T12:57:24.746+0000 73e68f1de640 0 ms_deliver_dispatch: unhandled message 0x5757d2e9d380 mon_map magic: 0 from mon.0 v2:10.4.3.10:3300/0
May 23 12:57:26 ceph1 bash[146371]: -10> 2024-05-23T12:57:24.746+0000 73e68f1de640 10 monclient: handle_config config(2 keys)
May 23 12:57:26 ceph1 bash[146371]: -9> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 set_mon_vals callback ignored cluster_network
May 23 12:57:26 ceph1 bash[146371]: -8> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 set_mon_vals callback ignored container_image
May 23 12:57:26 ceph1 bash[146371]: -7> 2024-05-23T12:57:24.746+0000 73e68d9db640 4 nvmeofgw NVMeofGwMonitorClient::init()::<lambda()> nvmeof monc config notify callback
May 23 12:57:26 ceph1 bash[146371]: -6> 2024-05-23T12:57:25.654+0000 73e68d1da640 10 monclient: tick
May 23 12:57:26 ceph1 bash[146371]: -5> 2024-05-23T12:57:25.654+0000 73e68d1da640 10 monclient: _check_auth_tickets
May 23 12:57:26 ceph1 bash[146371]: -4> 2024-05-23T12:57:26.654+0000 73e68d1da640 10 monclient: tick
May 23 12:57:26 ceph1 bash[146371]: -3> 2024-05-23T12:57:26.654+0000 73e68d1da640 10 monclient: _check_auth_tickets
May 23 12:57:26 ceph1 bash[146371]: -2> 2024-05-23T12:57:26.742+0000 73e68b1d6640 0 nvmeofgw void NVMeofGwMonitorClient::tick()
May 23 12:57:26 ceph1 bash[146371]: -1> 2024-05-23T12:57:26.742+0000 73e68b1d6640 4 nvmeofgw void NVMeofGwMonitorClient::disconnect_panic() Triggering a panic upon disconnection from the monitor, elapsed 102, configured disconnect panic duration 100
May 23 12:57:26 ceph1 bash[146371]: 0> 2024-05-23T12:57:26.746+0000 73e68b1d6640 -1 *** Caught signal (Aborted) **
May 23 12:57:26 ceph1 bash[146371]: in thread 73e68b1d6640 thread_name:safe_timer

Cluster has a cluster network configured and I saw some messages about the option not being able to be changed at runtime. I did add it to ceph.conf though for the target, so that should be good. Any help will be greatly appreciated. Thank you in advance.

@madkiss
Copy link
Author

madkiss commented May 23, 2024

Some additional infos as I forgot to put those in the first mail. Target version is 1.2.9, Ceph version is ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable).

Set up is brand new with no previous configuration in place. No extraordinary strange configuration either. Any help will be greatly appreciated. Thank you very much in advance again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant