Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show nvmeof gateways in "ceph status" #801

Merged
merged 2 commits into from
Sep 9, 2024

Conversation

VallariAg
Copy link
Member

@VallariAg VallariAg commented Aug 13, 2024

Register nvmeof gw service to service_map.
This brings up nvmeof in "ceph status" output.

After gateway deployment, It shows all 4 gateways in "ceph -s" output):

2024-08-20T11:51:41.617 INFO:tasks.workunit.client.2.smithi067.stdout:  cluster:
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:    id:     de9bdec8-5ee8-11ef-bccf-c7b262605968
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:    health: HEALTH_WARN
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:            Degraded data redundancy: 484/1452 objects degraded (33.333%), 32 pgs degraded, 32 pgs undersized
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:  services:
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:    mon:    3 daemons, quorum a,c,b (age 9m)
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:    mgr:    x(active, since 10m)
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:    osd:    2 osds: 2 up (since 8m), 2 in (since 8m)
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:    nvmeof: 4 gateways active (4 hosts)
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:  data:
2024-08-20T11:51:41.618 INFO:tasks.workunit.client.2.smithi067.stdout:    pools:   1 pools, 32 pgs
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:    objects: 484 objects, 1020 MiB
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:    usage:   1.5 GiB used, 177 GiB / 179 GiB avail
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:    pgs:     484/1452 objects degraded (33.333%)
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:             32 active+undersized+degraded
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:  io:
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:    client:   32 KiB/s rd, 2 op/s rd, 0 op/s wr
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:  progress:
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:    Global Recovery Event (0s)
2024-08-20T11:51:41.619 INFO:tasks.workunit.client.2.smithi067.stdout:      [............................]

After nvmeof service is removed, nvmeof disappears from "ceph status" too:

2024-08-20T11:52:02.450 INFO:tasks.workunit.client.2.smithi067.stderr:+ ceph -s
2024-08-20T11:52:02.987 INFO:tasks.workunit.client.2.smithi067.stdout:  cluster:
2024-08-20T11:52:02.987 INFO:tasks.workunit.client.2.smithi067.stdout:    id:     de9bdec8-5ee8-11ef-bccf-c7b262605968
2024-08-20T11:52:02.987 INFO:tasks.workunit.client.2.smithi067.stdout:    health: HEALTH_WARN
2024-08-20T11:52:02.987 INFO:tasks.workunit.client.2.smithi067.stdout:            Degraded data redundancy: 484/1452 objects degraded (33.333%), 32 pgs degraded, 32 pgs undersized
2024-08-20T11:52:02.987 INFO:tasks.workunit.client.2.smithi067.stdout:
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:  services:
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:    mon: 3 daemons, quorum a,c,b (age 9m)
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:    mgr: x(active, since 10m)
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:    osd: 2 osds: 2 up (since 8m), 2 in (since 8m)
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:  data:
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:    pools:   1 pools, 32 pgs
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:    objects: 484 objects, 1020 MiB
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:    usage:   1.4 GiB used, 177 GiB / 179 GiB avail
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:    pgs:     484/1452 objects degraded (33.333%)
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:             32 active+undersized+degraded
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:
2024-08-20T11:52:02.988 INFO:tasks.workunit.client.2.smithi067.stdout:  io:
2024-08-20T11:52:02.989 INFO:tasks.workunit.client.2.smithi067.stdout:    client:   87 KiB/s rd, 0 B/s wr, 97 op/s rd, 55 op/s wr
2024-08-20T11:52:02.989 INFO:tasks.workunit.client.2.smithi067.stdout:
2024-08-20T11:52:02.989 INFO:tasks.workunit.client.2.smithi067.stdout:  progress:
2024-08-20T11:52:02.989 INFO:tasks.workunit.client.2.smithi067.stdout:    Global Recovery Event (0s)
2024-08-20T11:52:02.989 INFO:tasks.workunit.client.2.smithi067.stdout:      [............................]

https://pulpito.ceph.com/vallariag-2024-08-20_11:27:49-nvmeof-main-distro-default-smithi/

  • Fix problem "4 stray daemon(s) not managed by cephadm"
  • Ensure it shows 4 gateways in "ceph -s"
  • Run it again with good build - this build gives [WRN] Health check failed: Degraded data redundancy: 2/6 objects degraded (33.333%), 2 pgs degraded (PG_DEGRADED)

@caroav caroav requested review from baum and gbregman August 14, 2024 04:38
control/server.py Show resolved Hide resolved
control/server.py Outdated Show resolved Hide resolved
control/server.py Show resolved Hide resolved
control/state.py Show resolved Hide resolved
control/server.py Outdated Show resolved Hide resolved
control/cephutils.py Outdated Show resolved Hide resolved
@VallariAg VallariAg force-pushed the nvmeof-ceph-status branch 2 times, most recently from 0875891 to 4eeb243 Compare August 19, 2024 16:00
@VallariAg VallariAg force-pushed the nvmeof-ceph-status branch 2 times, most recently from 91164e7 to c575bca Compare September 5, 2024 09:09
control/server.py Show resolved Hide resolved
Copy link
Collaborator

@baum baum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🖖

@VallariAg
Copy link
Member Author

VallariAg commented Sep 5, 2024

Added group to metadata.
In our Github CI, gateway is set with group name "" (ref: start_up.sh). So I tested it on a deployment with "mygroup" group name:

[vallariag@smithi049 ~]$ ceph service dump
{
    "epoch": 2517,
    "modified": "2024-09-05T14:55:56.434331+0000",
    "services": {
        "nvmeof": {
            "daemons": {
                "summary": "",
                "mypool.mygroup.smithi049.gaaaet": {
                    "start_epoch": 2517,
                    "start_stamp": "2024-09-05T14:55:55.953098+0000",
                    "gid": 14613,
                    "addr": "172.21.15.49:0/2235415121",
                    "metadata": {
                        "arch": "x86_64",
                        "ceph_release": "squid",
                        "ceph_version": "ceph version 19.3.0-4585-gb59673c4 (b59673c44bd569f9f3db37f87bced695dec5fcbf) squid (dev)",
                        "ceph_version_short": "19.3.0-4585-gb59673c4",
                        "container_hostname": "smithi049",
                        "container_image": "quay.io/vallari/nvmeof:1.3",
                        "cpu": "Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz",
                        "daemon_type": "gateway",
                        "distro": "rhel",
                        "distro_description": "Red Hat Enterprise Linux 9.4 (Plow)",
                        "distro_version": "9.4",
                        "group": "mygroup",
                        "hostname": "smithi049",
                        "id": "mypool.mygroup.smithi049.gaaaet",
                        "kernel_description": "#1 SMP PREEMPT_DYNAMIC Tue Apr 9 12:57:02 UTC 2024",
                        "kernel_version": "5.14.0-437.el9.x86_64",
                        "mem_swap_kb": "0",
                        "mem_total_kb": "32488316",
                        "os": "Linux",
                        "pool_name": "mypool"
                    },
                    "task_status": {}
                }
            }
        }
    }
}

This brings up nvmeof in "ceph status" output.

Signed-off-by: Vallari Agrawal <[email protected]>
Verify nvmeof service in ceph status output.

Signed-off-by: Vallari Agrawal <[email protected]>
@VallariAg VallariAg merged commit 63ce22c into ceph:devel Sep 9, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants