Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy fails when public_network contains more than one network #131

Closed
mdsteveb opened this issue Jun 30, 2014 · 30 comments
Closed

deploy fails when public_network contains more than one network #131

mdsteveb opened this issue Jun 30, 2014 · 30 comments

Comments

@mdsteveb
Copy link

I'm new to ruby, chef and the cookbook so it's possible my analysis is flawed, and I'm unable to suggest a patch, but please bear with me.

I'm having trouble deploying a configuration which lists multiple public networks. The problem seems to be find_node_ip_in_network which does not check for this possibility and instead passes the full string directly to IPAddr.new, which fails.

To be more specific, in my environment I set a default attribute:

"public network": "192.168.10.0/24,192.168.20.0/24"

e.g. listing two subnets. Ceph supports this as described here: http://ceph.com/docs/master/rados/configuration/network-config-ref/#network-config-settings

In any case it doesn't look like either find_node_ip_in_network or its caller mon_addresses make any any effort to handle multiple networks. The result is that when chef-client runs, it dies with a backtrace:

ArgumentError: invalid address
/opt/chef/embedded/lib/ruby/1.9.1/ipaddr.rb:544:in `in6_addr'
/opt/chef/embedded/lib/ruby/1.9.1/ipaddr.rb:481:in `initialize'
/opt/chef/embedded/lib/ruby/1.9.1/ipaddr.rb:401:in `new'
/opt/chef/embedded/lib/ruby/1.9.1/ipaddr.rb:401:in `mask!'
/opt/chef/embedded/lib/ruby/1.9.1/ipaddr.rb:488:in `initialize'
/var/chef/cache/cookbooks/ceph/libraries/default.rb:42:in `new'
/var/chef/cache/cookbooks/ceph/libraries/default.rb:42:in `find_node_ip_in_network'
/var/chef/cache/cookbooks/ceph/libraries/default.rb:87:in `block in mon_addresses'
/var/chef/cache/cookbooks/ceph/libraries/default.rb:87:in `map'
/var/chef/cache/cookbooks/ceph/libraries/default.rb:87:in `mon_addresses'
/var/chef/cache/cookbooks/ceph/recipes/mon.rb:108:in `from_file'
....
@hufman
Copy link
Contributor

hufman commented Jul 1, 2014

Good eye! I didn't even realize that ceph.conf could support multiple networks. I'll look into it and figure out the best way to handle it in the code.
Thanks for catching this!

@mdsteveb
Copy link
Author

mdsteveb commented Jul 9, 2014

Even after applying #132 I'm having problems getting quorum. What seems to be happening is that ceph.conf is being generated with an incomplete set of mon hosts listed. One of the nodes (the first listed in "mon initial members", the lowest in ASCII sort order, and also by coincidence the oddball node on the different subnet) gets only itself listed in "mon host". The other two nodes get that first IP listed and then their own, but none of them have the full set of 3 listed. What I'm getting at is that maybe my proposed fix wasn't quite right and find_node_ip_in_network does somehow need to deal with multiples? It seems like it's not doing the right thing when the list of mon addresses is built.

@hufman
Copy link
Contributor

hufman commented Jul 10, 2014

Hmmm can you provide some more details?

  • What are the chef environment setups? The ceph cookbook searches for mons within the current environment, by default. A recent change lets you specify node['ceph']['search_environment'] to false to search all environments, or any other string to search a specific environment.
  • What are the nodes' IPs and the public network?

@mdsteveb
Copy link
Author

The systems are set up just as described in the ceph.com guide "Deploying Ceph
with Chef". They use a chef environment named Ceph.

bull            192.168.159.152/24
superman        192.168.70.100/24
zod             192.168.70.103/24

(Those are not the real IP addresses.)

Here's my Ceph environment:

$ knife environment show Ceph -F json
{
  "name": "Ceph",
  "description": "",
  "cookbook_versions": {
  },
  "json_class": "Chef::Environment",
  "chef_type": "environment",
  "default_attributes": {
    "ceph": {
      "monitor-secret": "munged-for-security",
      "config": {
        "fsid": "munged-for-security",
        "mon_initial_members": "bull.example.com,superman.example.com,zod.example.com",
        "global": {
          "public network": "192.168.70.0/24,192.168.159.0/24",
          "cluster network": "192.168.74.0/24",
        },
        "mon": {
          "debug mon": "5"
        },
        "osd": {
          "osd journal size": "8192"
        }
      }
    }
  },
  "override_attributes": {
  }
}

When run, this is relevant part of the generated ceph.conf on bull:

  mon initial members = bull.example.com,superman.example.com,zod.example.com
  mon host = 192.168.159.152:6789
  cluster network = 192.168.74.0/24
  public network = 192.168.70.0/24,192.168.159.0/24

This is from superman:

  mon initial members = bull.example.com,superman.example.com,zod.example.com
  mon host = 192.168.159.152:6789, 192.168.70.100:6789
  cluster network = 192.168.74.0/24
  public network = 192.168.70.0/24,192.168.159.0/24

This is from zod:

  mon initial members = bull.example.com,superman.example.com,zod.example.com
  mon host = 192.168.159.152:6789, 192.168.70.103:6789
  cluster network = 192.168.74.0/24
  public network = 192.168.70.0/24,192.168.159.0/24

So, I'm fairly new to doing anything substantial with Ceph and not familiar
with how the dynamic deployments work, but isn't mon_host supposed to be a
list of all three of the mons from mon_initial_members?

@mdsteveb
Copy link
Author

Here's mon_status output from bull:

{ "name": "bull",
  "rank": -1,
  "state": "probing",
  "election_epoch": 0,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
        "0.0.0.0:6789\/0",
        "192.168.70.100:6789\/0",
        "192.168.70.103:6789\/0",
        "192.168.159.152:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
      "fsid": "munged-for-security",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "bull.example.com",
              "addr": "0.0.0.0:0\/1"},
            { "rank": 1,
              "name": "superman.example.com",
              "addr": "0.0.0.0:0\/2"},
            { "rank": 2,
              "name": "zod.example.com",
              "addr": "0.0.0.0:0\/3"}]}}

This is from zod:

{ "name": "zod",
  "rank": -1,
  "state": "probing",
  "election_epoch": 0,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
        "0.0.0.0:6789\/0",
        "192.168.70.103:6789\/0",
        "192.168.159.152:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
      "fsid": "munged-for-security",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "bull.example.com",
              "addr": "0.0.0.0:0\/1"},
            { "rank": 1,
              "name": "superman.example.com",
              "addr": "0.0.0.0:0\/2"},
            { "rank": 2,
              "name": "zod.example.com",
              "addr": "0.0.0.0:0\/3"}]}}

This is from superman:

{ "name": "superman",
  "rank": -1,
  "state": "probing",
  "election_epoch": 0,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
        "0.0.0.0:6789\/0",
        "192.168.70.100:6789\/0",
        "192.168.159.152:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
      "fsid": "munged-for-security",
      "modified": "0.000000",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "bull.example.com",
              "addr": "0.0.0.0:0\/1"},
            { "rank": 1,
              "name": "superman.example.com",
              "addr": "0.0.0.0:0\/2"},
            { "rank": 2,
              "name": "zod.example.com",
              "addr": "0.0.0.0:0\/3"}]}}

@hufman
Copy link
Contributor

hufman commented Jul 10, 2014

The way this cookbook works, the mon_initial_nodes is optional. When a new mon is provisioned, the cookbook does a search for any other mons and runs "add_bootstrap_peer_hint" on the new mon, providing the link to the existing cluster.

During the log output, you should see this happen. You should see several execute[peer 192.168.35.XX:6789] action run lines go through, which then say - execute ceph --admin-daemon '/var/run/ceph/ceph-mon.cephmon2.asok' add_bootstrap_peer_hint 192.168.35.XX:6789. Each of these lines is directly coming from the library mon_addresses function. This function calls mon_nodes() to do a search for any mon nodes, and adds itself to the list. It then converts the list of mon nodes to a list of IP addresses, removes any nils (resulting from nodes that don't have any IPs in the public_network), and then removes duplicates.

Are all of the nodes in the same Chef environment? Do you see all 3 nodes when you do a 'knife search node "ceph_is_mon:true AND chef_environment:Chef"' ? What about 'knife search node "(ceph_is_mon:true AND chef_environment:Chef) AND (ceph_bootstrap_osd_key:*)"' ? Do you see 3 sets of add_boostrap_peer_hint commands on all of the chef runs? The mon_addresses function directly produces the mons config option.

Also, a small trick in the mon_addresses function: If you are running Chef on a node that already has a running Cephmon node, the cookbook will query it for the monmap's list of IPs. Try stopping the ceph-mon on bull, and try running Chef with just a runlist of ceph::conf. This will merely create the ceph.conf file by using Chef searches, instead of using the local mon's monmap.

@mdsteveb
Copy link
Author

Hm... something's amok here:

[2014-07-09T16:33:21-04:00] ERROR: execute[peer 192.168.159.152:6789] (ceph::mon line 109) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '22'
---- Begin output of ceph --admin-daemon '/var/run/ceph/ceph-mon.bull.asok' add_bootstrap_peer_hint 192.168.159.152:6789 ----
STDOUT: 
STDERR: no valid command found; 10 closest matches:
config show
help
log dump
get_command_descriptions
git_version
config set <var> <val> [<val>...]
version
2
config get <var>
0
admin_socket: invalid command
---- End output of ceph --admin-daemon '/var/run/ceph/ceph-mon.bull.asok' add_bootstrap_peer_hint 192.168.159.152:6789 ----
Ran ceph --admin-daemon '/var/run/ceph/ceph-mon.bull.asok' add_bootstrap_peer_hint 192.168.159.152:6789 returned 22; ignore_failure is set, continuing

When I run the same command by hand, it succeeds?

add_bootstrap_peer_hint is only trying to run on bull, it never appears in the logs of the other nodes.

Confirmed all Ceph nodes are using the Ceph environment. Your first search only returns bull. The second search (with Chef->Ceph typo fixed) doesn't return anything.

Thanks for taking the time to walk me through this.

@hufman
Copy link
Contributor

hufman commented Jul 10, 2014

This cookbook is able to spin up a fresh cluster, and some improvements a few months ago made it much more reliable. The first mon node will do a search, not find any nodes, and create a cluster. The second node will do a search, find the first node, and should connect to it.

As for your strange add_bootstrap_peer_hint error: That is very strange indeed. What happens if you ceph --admin-daemon '/var/run/ceph/ceph-mon.bull.asok' get_command_descriptions? Is there a command add_bootstrap_peer_hint in there?

Since your knife search didn't find the two nodes, this gives me some clues about what is going on. The two-node cluster is successfully talking to each other, and so when Chef calls mon_addresses it uses the monmap to generate the ceph.conf. Is the ceph::mon recipe running on those nodes? This is what sets the ceph['is_mon'] attribute, which is what is searched for by any non-mon Ceph member. After you run ceph::mon on the two nodes, your knife search should find them, and then your third node, by running just ceph::conf without a running monitor, should generate a more-correct config file. Then, remove the /var/lib/ceph/mon/ceph-bull directory, so that the single-node cluster is effectively destroyed and Chef can join it to the existing 2 node cluster.

Actually, scratch some of that: Looking at your mon_status from before, I think your mon_initial_members is actually confusing things too. This is what is generating the spurious mons from that output, mons that don't have an IP address.

This is a completely fresh setup, right? No data? If so, I'd recommend stopping all the ceph-mon daemons, removing all of the /var/lib/ceph/mon directories on the systems, removing mon_initial_members from the environment, and trying again. Run ceph::mon on a first system, and it should create a single-mon cluster, with all of the PGs stuck inactive because there's no storage, that's cool. Doing that above knife search should show that single node. Then, run ceph::mon on a second system. It should successfully run a chef search to find the other node and join to it. The ceph -s command should show a two-mon cluster, with data still stuck and so on. Then the third node should work just as well.

@mdsteveb
Copy link
Author

I finally got a chance to try this. I removed mon_initial_members , removed /var/lib/ceph/mon/ceph-* on the nodes, and hand-ran chef-client on the 3 monitors in succession and this seems to have got everything going! Thanks!

mon_initial_members should probably be removed from the Chef deployment guide on ceph.com then.

My next issue is that ceph-disk isn't taking /dev/mapper/36* disk device names:

ceph-disk: Error: not a disk or partition: /dev/mapper/3600605b008a76aa01b03af8041d45597

But that's something else so this issue can be closed now.

@mdsteveb
Copy link
Author

Actually, one more question. So again, I'm a Chef newbie. But, when I added the roles to the nodes I used knife node edit, and added roleceph-mon to each node's run list. However, now knife node show is weird - for bull it shows ceph-mon in Roles. But it doesn't show that role for any of the other mons even though they're in the run list, chef-client has been run, and all 3 show up in the cluster. `knife node search "ceph_is_mon:true" only returns bull.

Node Name:   bull.example.com
Environment: Ceph
FQDN:        bull.example.com
IP:          192.168.159.152
Run List:    role[standalone], role[ceph-mon]
Roles:       standalone, base, ceph-mon
Recipes:     gelf_handler, base, chef-client, chef-mail-handler, ceph::repo, ceph::mon, gelf_handler::default, chef_handler::default, base::default, base::users, base::rsyslog, chef-client::default, chef-client::service, chef-client::init_service, chef-mail-handler::default, ceph::apt, apt::default, ceph::_common, ceph::_common_install, ceph::mon_install, ceph::conf
Platform:    ubuntu 14.04
Tags:
Node Name:   zod.example.com
Environment: Ceph
FQDN:        zod.example.com
IP:          192.168.70.103
Run List:    role[standalone], role[ceph-mon], role[ceph-osd]
Roles:       standalone, base
Recipes:     gelf_handler, base, chef-client, chef-mail-handler, gelf_handler::default, chef_handler::default, base::default, base::users, base::rsyslog, chef-client::default, chef-client::service, chef-client::init_service, chef-mail-handler::default
Platform:    ubuntu 14.04
Tags:
Node Name:   superman.example.com
Environment: Ceph
FQDN:        superman.example.com
IP:          192.168.70.100
Run List:    role[standalone], role[ceph-mon], role[ceph-osd]
Roles:       standalone, base
Recipes:     gelf_handler, base, chef-client, chef-mail-handler, gelf_handler::default, chef_handler::default, base::default, base::users, base::rsyslog, chef-client::default, chef-client::service, chef-client::init_service, chef-mail-handler::default
Platform:    ubuntu 14.04
Tags:

So I'm confused. It seems like it created/added the mons properly, so why isn't chef showing them?

@hufman
Copy link
Contributor

hufman commented Jul 15, 2014

I'm glad to hear that! I think the mon_initial_nodes caveat is due to this cookbook's usage of add_bootstrap_peer_hint, and for hand configurations.

About your /dev/mapper/36* named disks.. those look like SAN IDs, are you sure it shouldn't be something like /dev/disk/by-id/36* ? However, I have not tried symlinked devices, so this would be new for me too.

I do know that the Roles and Recipes lists are updated by the client, not on the server. So, if you have run just the role[standalone], it would set the Roles and Recipes as zod and superman show. And then, when you set their runlist to include role[ceph-mon] and role[ceph-osd], it will run most of they way, and as you said above, crash in the role[ceph-osd] section. This will cause the node to not be saved back to the server, including the new Roles and Recipes. This would also prevent the nodes from showing up in the ceph_is_mon search, because that attribute isn't being saved up to the server.
Run chef-client on all the things, with just the role[standalone] and role[ceph-mon] runlist. Then, the search will be good and happy. Then, just add the role[ceph-osd], without clearing out the ceph-mon portion.

An alternative explanation: Somewhere inside your role[standalone], you are doing a node.save(). This saves the node before the ceph::mon recipe has run, so it doesn't have the ceph_is_mon attribute set. and then when ceph-osd crashes it can't save the completed node with the ceph_is_mon attribute.

@mdsteveb
Copy link
Author

Ah, ok, that makes sense. Both zod and superman are also osd nodes and until I resolve the disk naming, the osd setup is failing, so that would explain it. I didn't realize that something would only show the Role when it had successfully completed.

Those /dev/mapper names aren't symlinks, they're block devices (created I think by multipathd, or maybe because of the way I have the LSI HBA configured). I was trying to use those names instead of /dev/disk/* or /dev/sd* because it was more obvious looking at them which devices were my SAS OSD devices and which were the internal SATA OS disks. I can use other names easily enough, but it surprised me since they're actual device nodes, not symlinks.

# ls -l /dev/mapper/36* | head -10
brw-rw---- 1 root disk 252,  8 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8041d45597
brw-rw---- 1 root disk 252,  5 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8041d57535
brw-rw---- 1 root disk 252, 17 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8041d68f3d
brw-rw---- 1 root disk 252, 21 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8041d7abc9
brw-rw---- 1 root disk 252, 24 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8041d8ceec
brw-rw---- 1 root disk 252,  4 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8041d9f478
brw-rw---- 1 root disk 252, 20 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8041db1c7f
brw-rw---- 1 root disk 252, 13 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8041dc4b3c
brw-rw---- 1 root disk 252, 29 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8141dd9633
brw-rw---- 1 root disk 252, 28 Jul  9 16:32 /dev/mapper/3600605b008a76aa01b03af8141df117f

Will try again with different device names and let you know what happens.

@mdsteveb
Copy link
Author

Okay. Changed to /dev/disk/by-id/ which was recognized by ceph-disk. Then I ran into "device busy" problems. I ended up removing multipath-tools (don't need them for now, but could have reconfigured it I assume) and rebooting, at which point I got OSDs building on zod and superman (the two mon nodes which are also osd hosts). knife is showing the output I expect - yay!

Now lex and spiderman are the problem... chef got the disks prepared okay, and they show up as prepared in ceph-disk list output. However chef is unable to start the osds:

[2014-07-16T14:40:20-04:00] ERROR: service[ceph_osd] (ceph::osd line 136) had an error: Chef::Exceptions::Exec: /sbin/start ceph-osd-all-starter returned 1, expected 0
[2014-07-16T14:40:20-04:00] ERROR: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
[2014-07-16T14:40:20-04:00] ERROR: Sleeping for 1800 seconds before trying again

I can't find any other errors in the logs indicating why it's not starting. ceph-osd.0.log shows a lot of lines like this (for all the disks):

2014-07-16 14:35:11.256693 7f90dcb88800  0 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 4174
2014-07-16 14:35:11.258889 7f90dcb88800  1 journal _open /dev/sdab2 fd 4: 8588886016 bytes, block size 4096 bytes, directio = 0, aio = 0

...which doesn't really tell me anything. It's not mounting the filesystems or anything. It feels like it's unable to find the cluster or something, but I'm not sure. The generated ceph.conf does have the mon hosts listed correctly and the cluster network specified correctly, and they can talk to each other just fine.

I do note that ceph -s doesn't work on those hosts and they don't have the admin keyring the way the other (mon) hosts do. Maybe the cookbook needs to install that on the osds too?

@mdsteveb
Copy link
Author

(Copying the keyring file over solved the ceph -s issue but didn't help with starting the osds.)

@mdsteveb
Copy link
Author

Ah... running ceph-disk activate-all gives this for every disk on the OSD-only nodes:

root@spiderman:/etc/init# ceph-disk activate-all
INFO:ceph-disk:Activating /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.c06002d6-2f3e-4b67-b6cf-7bb9d04a4fae
INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.c06002d6-2f3e-4b67-b6cf-7bb9d04a4fae
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: /bin/mount -t xfs -o rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.c06002d6-2f3e-4b67-b6cf-7bb9d04a4fae /var/lib/ceph/tmp/mnt.gXtTiK
INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise c06002d6-2f3e-4b67-b6cf-7bb9d04a4fae
2014-07-17 13:31:24.302731 7fbace96e700 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication
2014-07-17 13:31:24.302760 7fbace96e700  0 librados: client.bootstrap-osd authentication error (95) Operation not supported
Error connecting to cluster: Error
ERROR:ceph-disk:Failed to activate
INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.gXtTiK
ceph-disk: Error: ceph osd create failed: Command '/usr/bin/ceph' returned non-zero exit status 1: 

@hufman
Copy link
Contributor

hufman commented Jul 18, 2014

About the disks: You can run the ceph-osd recipe and give it temporary names for the initial setup, and then after that the ceph-osd recipe won't try to initialize any other drives, and in fact won't even check if the original names are still there. So, you can give it the temporary names, and then Chef will think the server is happy and done. This lets you stop ceph-osd and enable multipathd to get the drives happy and secure, and then run ceph-disk-activate to start them up. You might need to tweak the /lib/udev/rules.d/95-ceph-osd.rules file to make sure it ignores the un-multipathed drive during boot and hotplug, but maybe the udev rule order will ensure that the multipath device gets created first, so that the underlying device is locked and ceph-osd will ignore it for being busy. Anyways!

ceph -s requires the admin key, yes. However, the osd setup want to use a special "osd bootstrap" key that is only allowed to create OSDs. The beginning of the osd recipe should have created a /var/lib/ceph/bootstrap-osd/ceph.keyring file using the osd_secret that it obtains from Chef stored attributes. It picks a random mon from Chef and looks for a node['ceph']['bootstrap_osd_key']. The ceph-mon recipe, on all of your Mons, should have run ceph auth get-key client.bootstrap-osd and saved the output as its bootstrap_osd_key. If you run knife search node ceph_bootstrap_osd_key:* -a ceph.bootstrap_osd_key, you should see your mons show up along with their saved bootstrap_osd_key. If not, try running ceph auth get-key client.bootstrap-osd, which should produce something. If that doesn't, that's a different problem that we should look at.

So! You should check on your new OSD server to see if there is a /var/lib/ceph/bootstrap-osd/ceph.keyring file. If not, you should run knife search node ceph_bootstrap_osd_key:* -a ceph.bootstrap_osd_key to see if Chef knows about the bootstrap_osd_key. If not, you should run ceph auth get-key client.bootstrap-osd to make sure Ceph knows about a bootstrap-osd key. The strange part is that, the code that the osd recipe uses to find the mons requires that the mons returned have valid bootstrap_osd_keys, and the same search is used to populate the ceph.conf file with mons.

Let me know how it goes!

@mdsteveb
Copy link
Author

Ok. All nodes see the bootstrap-osd key from within ceph auth get-key. The knife search returns nothing. There is no bootstrap keyring file on the new osd servers but it is present on the servers which are also mons.

So, it looks like somehow chef doesn't know the bootstrap key but it did manage to install it on the mons somehow? Not on the osd-only servers though.

$ knife search node 'ceph_bootstrap_osd_key:*' -a ceph.bootstrap_osd_key
0 items found
root@lex:/var/lib/ceph/bootstrap-osd# ls
root@lex:/var/lib/ceph/bootstrap-osd# ceph auth get-key client.bootstrap-osd
BQCCn8VTEBRSIBAApJ72w8cvgUvPfxJBASvABw==root@lex:/var/lib/ceph/bootstrap-osd#
root@zod:/var/lib/ceph/bootstrap-osd# ls
ceph.keyring
root@zod:/var/lib/ceph/bootstrap-osd# cat ceph.keyring
[client.bootstrap-osd]
    key = BQCCn8VTEBRSIBAApJ72w8cvgUvPfxJBASvABw==
root@zod:/var/lib/ceph/bootstrap-osd# ceph auth get-key client.bootstrap-osd
BQCCn8VTEBRSIBAApJ72w8cvgUvPfxJBASvABw==root@zod:/var/lib/ceph/bootstrap-osd#

@guilhem
Copy link
Contributor

guilhem commented Jul 18, 2014

@hufman sorry I let you manage this problem. TL;DR for me :p

@mdsteveb
Copy link
Author

...and I'm sorry this went way beyond the initial issue! But I do appreciate the help!

@hufman
Copy link
Contributor

hufman commented Jul 18, 2014

Hmmm can I have you try something? What happens if you change recipes/mon.rb:125 to use node.set['ceph']['bootstrap_osd_key'] instead of node.override['ceph']['bootstrap_osd_key']? Looking at the docs, I don't think node.override actually saves to the Chef server.. node.set will set a node normal attribute, which does get saved.
Another small thing to make sure, make sure that the Chef run says something about a ruby_block[get osd-bootstrap keyring], because that's the section that actually runs that code.

@mdsteveb
Copy link
Author

To answer your second question first, I don't see that ruby block running (on lex) based on chef-client output. The closest I get is this:

...
[2014-07-19T10:11:20-04:00] INFO: Processing package[gdisk] action upgrade (ceph
::osd line 38)
[2014-07-19T10:11:20-04:00] INFO: Processing package[cryptsetup] action upgrade
(ceph::osd line 42)
[2014-07-19T10:11:20-04:00] INFO: Processing directory[/var/lib/ceph/bootstrap-o
sd] action create (ceph::osd line 49)
[2014-07-19T10:11:20-04:00] INFO: Processing execute[format bootstrap-osd as keyring] action run (ceph::osd line 58)
[2014-07-19T10:11:20-04:00] INFO: Processing service[ceph_osd] action enable (ceph::osd line 136)
[2014-07-19T10:11:20-04:00] INFO: Processing service[ceph_osd] action start (ceph::osd line 136)

================================================================================
Error executing action `start` on resource 'service[ceph_osd]'
================================================================================

Chef::Exceptions::Exec
----------------------
/sbin/start ceph-osd-all-starter returned 1, expected 0

zod shows the same thing except the final start succeeds. Those are the only references to anything bootstrap on either machine. Do I need to redeploy a mon from scratch for a valid test?

@mdsteveb
Copy link
Author

That code update doesn't seem to have caused any change, even after I stopped the mon on zod, deleted /var/lib/ceph/mon/ceph-zod, and ran chef-client on it hoping that it would update the key. Tried again after also deleting bootstrap-osd/ceph.key, still no luck. Not sure what else I need to do to get that block to run?

@hufman
Copy link
Contributor

hufman commented Jul 21, 2014

Hmmm that's very odd... that ruby_block should be running on all the mon nodes... Did you leave node['ceph']['encrypted_data_bags'] at the default of false? If you set this to true, then you have to manually store your OSD bootstrap keys in Chef encrypted data bags. Do you have node['ceph']['config']['global']['auth cluster required'] set to anything other than cephx? If this key is present and set to anything but 'cephx', then the cookbook will not attempt any cephx-related features, including the above bootstrap key things. You should be able to leave these settings to their defaults. I haven't actually tested the encrypted_data_bags option, so I can't offer comprehensive guidance on setting up that route!

Bull, being only a ceph_mon node, should definitely be running that ruby_block every time chef-client runs. You saw it doing the add_bootstrap_peer_hint, right? It should immediately thereafter run the ruby_block[get osd-bootstrap keyring] resource. The not_if clause of that resource means that it won't actually run the command if the node already has the bootstrap_osd_key, but I'm pretty sure it would still say that it is parsing the command and that it's up to date. And, if it thinks that the node already has that attribute, a search should find it! If you do a knife node edit bull.example.com, and then go down to the ceph attributes, is there a bootstrap_osd_key in there? And if so, perhaps the search syntax is broken...

Alternatively, to get your cluster up and running... you could just put the osd bootstrap key in place manually. The osd cookbook will use the keyring stored in /var/lib/ceph/bootstrap-osd/ceph.keyring if it exists, so you can copy the appropriate keyring there and the osd cookbook will go on to provision the drives as usual.

@mdsteveb
Copy link
Author

knife node edit bull.example.com shows only the name, the environment, and the run list:

{
  "name": "bull.example.com",
  "chef_environment": "Ceph",
  "normal": {
    "tags": [

    ]
  },
  "run_list": [
    "role[standalone]",
    "role[ceph-mon]"
  ]
}

For what it's worth, I don't see a bootstrap_osd_key defined in zod's node either, but those osds came up okay.

# ceph --admin-daemon /var/run/ceph/ceph-mon.bull.asok config get auth_cluster_required
{ "auth_cluster_required": "cephx"}

(I couldn't find this value defined in chef anywhere.)

# knife node show bull.el.nist.gov --long | grep bags
  encrypted_data_bags: false

I'm looking at the same if statement in the code that you're looking at and I have no idea why it's not running either... I'm going to comment out the if statement and see what happens...

@mdsteveb
Copy link
Author

Commenting out the if made it write the keyring file, and the knife search for the bootstrap key works now (showing all 3 mons), but the OSDs still don't start.

ceph-disk activate-all still won't run on the OSD-only nodes:

root@lex:/var/log/ceph# ceph-disk activate-all
INFO:ceph-disk:Activating /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.e708eaae-4bff-4e44-a9d2-0c3cfc10270c
INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.e708eaae-4bff-4e44-a9d2-0c3cfc10270c
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: /bin/mount -t xfs -o rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64,allocsize=4M -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.e708eaae-4bff-4e44-a9d2-0c3cfc10270c /var/lib/ceph/tmp/mnt.j_MsAj
INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
ERROR:ceph-disk:Failed to activate
INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.j_MsAj
ceph-disk: Error: No cluster conf found in /etc/ceph with fsid c9f92f84-c94a-4d50-99b4-e5ca3180624b

(repeated for all disks)

@hufman
Copy link
Contributor

hufman commented Jul 22, 2014

Well huh, that's tricky. I manually set the fsid in my environment, so it gets added to every node. We are planning to store the automatically-generated fsid into node attributes, similar to the mon and osd secrets, but haven't gotten to it yet.
On your ceph mon nodes, run "ceph -s" to find the current fsid. Add this in node['ceph']['config']['fsid'].
Then, just for fun, go through your ceph mon's ceph.conf and make sure their fsid all match, and make sure that they don't have any other fsid's in their node attributes. Currently, the cookbook automatically generates the fsid and saves it in the nodes, but it doesn't try to synchronize them across the nodes right now.

Sorry I forgot about that part :( The instructions should've been: after setting up the first ceph-mon node, save the fsid into the environment. You don't need to redeploy anything, just add the fsid to the environment and maybe clean out any saved node attributes about the fsid.

@mdsteveb
Copy link
Author

I was just following along with the guide on wiki.ceph.com, part of which includes setting a cluster id (fsid) in the environment, which I did. Is that what you're talking about? That value only shows up in {bull,zod,superman}'s ceph.conf file, not in the OSD-only servers. It's the same as the value that shows up running ceph -s on any of the nodes, and it's the value I put in the Ceph environment as the fsid.

None of the nodes have any other fsid field defined that I can see anywhere, not even the ones where the OSDs came up correctly. So if I understand what you're saying, it's already correct and there are no discrepancies to resolve or remove. But I don't see it when I knife node edit any of the nodes. Is this similar to the previous problem where the bootstrap keyring value wasn't getting saved back to the node as expected? (Never did figure out why that was happening; just worked around it.)

@hufman
Copy link
Contributor

hufman commented Jul 22, 2014

I'm so sorry about that! I think this came from PR #114, which was working towards making it so that you can use the ceph cookbook on nodes that aren't in the ceph cluster, to create clients and so on. The initial commit I created did a check in the ceph.conf template to not add the fsid option if it's not a monitor, because I thought that only the monitor would need it. Later I discovered that OSDs do need it, so I amended the PR with a commit to fix that. However, we mildly refactored the cookbook at that same time, and that commit was lost.

To fix it, look at the last file in hufman@8225ca9 and apply that diff to your ceph.conf.erb template. I'll work on fixing this in the main repo!

@mdsteveb
Copy link
Author

Ah ha! That did it! Everything came up nicely after that. Woohoo!

Well, it was a whole chain of things and we never did figure out one of them, but assuming you're pushing this fix back up then I think we can call it good for now. Thanks for all your help!

@hufman
Copy link
Contributor

hufman commented Oct 20, 2014

I believe this has all been resolved and/or moved to other issues:

@hufman hufman closed this as completed Oct 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants