Why ceph RBD real space usage is much larger than disk usage once mounted? #4059

microyahoo · 2023-08-18T06:01:04Z

microyahoo
Aug 18, 2023

I noticed that the output of a rbd du is way different from the output of a df -h once that rbd is mounted as a disk.

for example:

[root@node01 smd]# rbd du deeproute-replica-ssd-pool/mlp-platform-orin-data-2
warning: fast-diff map is not enabled for mlp-platform-orin-data-2. operation may be slow.
NAME                      PROVISIONED  USED   
mlp-platform-orin-data-2      300 GiB  300 GiB

[root@node01 smd]# df -hT -x overlay -x tmpfs
Filesystem                Type      Size  Used Avail Use% Mounted on
.....
/dev/rbd0                 ext4      295G   43G  238G  16% /mnt/rbd

Is there a reason for the two results to be so different? Can this be a problem when ceph keeps track of the total space used by the cluster to know how much space is available? Thanks in advance.

Answered by nixpanic

Aug 21, 2023

When a filesystem writes data to a device, new blocks will be allocated. Upon deletion, those blocks are tracked by the filesystem for re-use, and usually not de-allocated immediately. This causes the filesystem to grow it's usage on the block-device where it lives. For thin-allocated storage, this is not optimal, as the thin/sparseness is lost over time.

There are multiple ways to de-allocate the free blocks that the filesystem keeps track of. These are the two most common ones:

run fstrim om the mountpoint (manually as admin, or with CSI-Addons' ReclaimSpace
mount the filesystem with the discard option (can be set in the mountOptions parameter in a StorageClass

Note that there might b…

View full answer

nixpanic · 2023-08-21T07:56:21Z

nixpanic
Aug 21, 2023
Maintainer

When a filesystem writes data to a device, new blocks will be allocated. Upon deletion, those blocks are tracked by the filesystem for re-use, and usually not de-allocated immediately. This causes the filesystem to grow it's usage on the block-device where it lives. For thin-allocated storage, this is not optimal, as the thin/sparseness is lost over time.

There are multiple ways to de-allocate the free blocks that the filesystem keeps track of. These are the two most common ones:

run fstrim om the mountpoint (manually as admin, or with CSI-Addons' ReclaimSpace
mount the filesystem with the discard option (can be set in the mountOptions parameter in a StorageClass

Note that there might be a performance hit on the filesystem when the de-allocation on the filesystem is running.

0 replies

microyahoo · 2024-09-06T06:33:31Z

microyahoo
Sep 6, 2024
Author

hi @nixpanic, @Madhu-1 the usage of the CephFS subvolume has reached 172T, but the ceph df shows that the stored of the cephfs-data0 pool is only 110T. Is this the reason for the discrepancy? Thanks.

[root@prediction-ssd01-node01 ~]# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
nvme   8.7 TiB  8.7 TiB  8.0 GiB   8.0 GiB       0.09
ssd    880 TiB  550 TiB  330 TiB   330 TiB      37.47
TOTAL  889 TiB  559 TiB  330 TiB   330 TiB      37.11
 
--- POOLS ---
POOL                              ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                               1     1   79 MiB       21  238 MiB      0    2.8 TiB
cephfs-metadata                   10   256  273 MiB      158  818 MiB      0    2.8 TiB
cephfs-data0                      11  4096  110 TiB   40.64M  329 TiB  40.16    163 TiB
object-store2.rgw.control         12    32      0 B        8      0 B      0    2.8 TiB
object-store2.rgw.meta            13    32  2.6 KiB       14  121 KiB      0    2.8 TiB
object-store2.rgw.buckets.non-ec  14     8      0 B        0      0 B      0    2.8 TiB
.rgw.root                         15     8  4.9 KiB       17  192 KiB      0    2.8 TiB
object-store2.rgw.otp             16     8      0 B        0      0 B      0    2.8 TiB
object-store2.rgw.buckets.index   17     8   14 KiB      499   41 KiB      0    2.8 TiB
object-store2.rgw.log             18     8   30 KiB      324  1.9 MiB      0    2.8 TiB
object-store2.rgw.buckets.data    19  4096      0 B        2      0 B      0    327 TiB

[root@prediction-ssd01-node01 ~]# ceph fs subvolume info cephfs csi-vol-547639e6-7051-4338-9c4c-f4c8f6ba8619 csi
{
    "atime": "2024-08-29 04:03:17",
    "bytes_pcent": "21.14",
    "bytes_quota": 879609302220800,
    "bytes_used": 185987726942208,
    "created_at": "2024-08-29 04:03:17",
    "ctime": "2024-09-06 06:29:47",
    "data_pool": "cephfs-data0",
    "features": [
        "snapshot-clone",
        "snapshot-autoprotect",
        "snapshot-retention"
    ],
    "flavor": 2,
    "gid": 0,
    "mode": 16877,
    "mon_addrs": [
        "10.3.11.92:6789",
        "10.3.11.91:6789",
        "10.3.11.88:6789"
    ],
    "mtime": "2024-09-06 06:29:47",
    "path": "/volumes/csi/csi-vol-547639e6-7051-4338-9c4c-f4c8f6ba8619/f92f03d5-39ae-4092-b9ce-6f2b2b0384fa",
    "pool_namespace": "",
    "state": "complete",
    "type": "subvolume",
    "uid": 0
}

ceph verison: v18.2.4

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why ceph RBD real space usage is much larger than disk usage once mounted? #4059

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Why ceph RBD real space usage is much larger than disk usage once mounted? #4059

microyahoo Aug 18, 2023

Replies: 2 comments

nixpanic Aug 21, 2023 Maintainer

microyahoo Sep 6, 2024 Author

microyahoo
Aug 18, 2023

nixpanic
Aug 21, 2023
Maintainer

microyahoo
Sep 6, 2024
Author