Why ceph RBD real space usage is much larger than disk usage once mounted? #4059
-
I noticed that the output of a for example: [root@node01 smd]# rbd du deeproute-replica-ssd-pool/mlp-platform-orin-data-2
warning: fast-diff map is not enabled for mlp-platform-orin-data-2. operation may be slow.
NAME PROVISIONED USED
mlp-platform-orin-data-2 300 GiB 300 GiB
[root@node01 smd]# df -hT -x overlay -x tmpfs
Filesystem Type Size Used Avail Use% Mounted on
.....
/dev/rbd0 ext4 295G 43G 238G 16% /mnt/rbd Is there a reason for the two results to be so different? Can this be a problem when ceph keeps track of the total space used by the cluster to know how much space is available? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
When a filesystem writes data to a device, new blocks will be allocated. Upon deletion, those blocks are tracked by the filesystem for re-use, and usually not de-allocated immediately. This causes the filesystem to grow it's usage on the block-device where it lives. For thin-allocated storage, this is not optimal, as the thin/sparseness is lost over time. There are multiple ways to de-allocate the free blocks that the filesystem keeps track of. These are the two most common ones:
Note that there might be a performance hit on the filesystem when the de-allocation on the filesystem is running. |
Beta Was this translation helpful? Give feedback.
-
hi @nixpanic, @Madhu-1 the usage of the CephFS subvolume has reached 172T, but the [root@prediction-ssd01-node01 ~]# ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
nvme 8.7 TiB 8.7 TiB 8.0 GiB 8.0 GiB 0.09
ssd 880 TiB 550 TiB 330 TiB 330 TiB 37.47
TOTAL 889 TiB 559 TiB 330 TiB 330 TiB 37.11
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 79 MiB 21 238 MiB 0 2.8 TiB
cephfs-metadata 10 256 273 MiB 158 818 MiB 0 2.8 TiB
cephfs-data0 11 4096 110 TiB 40.64M 329 TiB 40.16 163 TiB
object-store2.rgw.control 12 32 0 B 8 0 B 0 2.8 TiB
object-store2.rgw.meta 13 32 2.6 KiB 14 121 KiB 0 2.8 TiB
object-store2.rgw.buckets.non-ec 14 8 0 B 0 0 B 0 2.8 TiB
.rgw.root 15 8 4.9 KiB 17 192 KiB 0 2.8 TiB
object-store2.rgw.otp 16 8 0 B 0 0 B 0 2.8 TiB
object-store2.rgw.buckets.index 17 8 14 KiB 499 41 KiB 0 2.8 TiB
object-store2.rgw.log 18 8 30 KiB 324 1.9 MiB 0 2.8 TiB
object-store2.rgw.buckets.data 19 4096 0 B 2 0 B 0 327 TiB
[root@prediction-ssd01-node01 ~]# ceph fs subvolume info cephfs csi-vol-547639e6-7051-4338-9c4c-f4c8f6ba8619 csi
{
"atime": "2024-08-29 04:03:17",
"bytes_pcent": "21.14",
"bytes_quota": 879609302220800,
"bytes_used": 185987726942208,
"created_at": "2024-08-29 04:03:17",
"ctime": "2024-09-06 06:29:47",
"data_pool": "cephfs-data0",
"features": [
"snapshot-clone",
"snapshot-autoprotect",
"snapshot-retention"
],
"flavor": 2,
"gid": 0,
"mode": 16877,
"mon_addrs": [
"10.3.11.92:6789",
"10.3.11.91:6789",
"10.3.11.88:6789"
],
"mtime": "2024-09-06 06:29:47",
"path": "/volumes/csi/csi-vol-547639e6-7051-4338-9c4c-f4c8f6ba8619/f92f03d5-39ae-4092-b9ce-6f2b2b0384fa",
"pool_namespace": "",
"state": "complete",
"type": "subvolume",
"uid": 0
} ceph verison: |
Beta Was this translation helpful? Give feedback.
When a filesystem writes data to a device, new blocks will be allocated. Upon deletion, those blocks are tracked by the filesystem for re-use, and usually not de-allocated immediately. This causes the filesystem to grow it's usage on the block-device where it lives. For thin-allocated storage, this is not optimal, as the thin/sparseness is lost over time.
There are multiple ways to de-allocate the free blocks that the filesystem keeps track of. These are the two most common ones:
fstrim
om the mountpoint (manually as admin, or with CSI-Addons' ReclaimSpacediscard
option (can be set in themountOptions
parameter in a StorageClassNote that there might b…