Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-csi very slow on vm #9754

Open
Tracked by #9825
plano-fwinkler opened this issue Nov 19, 2024 · 4 comments
Open
Tracked by #9825

ceph-csi very slow on vm #9754

plano-fwinkler opened this issue Nov 19, 2024 · 4 comments
Assignees

Comments

@plano-fwinkler
Copy link

proxmox with ceph and talos as vm with ceph csi is much slower than openebs-hostpath, are there any modules missing for the kernel?

Environment

  • Talos version: 1.8.2
  • Kubernetes version: 1.31.2
  • Platform: proxmox with ceph storage
@smira
Copy link
Member

smira commented Nov 19, 2024

The issue you posted doesn't have any relevant details, including the performance numbers, the way you set up things, etc.

Ceph is a complicated subject, and setting it up properly is not trivial.

@plano-fwinkler
Copy link
Author

We have a Proxmox Cluster with 5 Nodes and a Ceph Cluster on the Proxmox. The Ceph Cluster has a 100GB nic.

if i testing with kubestr fio:

with a local path Storageclass
`
./kubestr fio -s openebs-hostpath
PVC created kubestr-fio-pvc-qqb7w
Pod created kubestr-fio-pod-4z7zc
Running FIO test (default-fio) on StorageClass (openebs-hostpath) with a PVC of Size (100Gi)
Elapsed time- 28.089900025s
FIO test results:

FIO version - fio-3.36
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
blocksize=4K filesize=2G iodepth=64 rw=randread
read:
IOPS=49767.750000 BW(KiB/s)=199087
iops: min=41961 max=61272 avg=49501.585938
bw(KiB/s): min=167847 max=245088 avg=198006.484375

JobName: write_iops
blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
IOPS=21245.320312 BW(KiB/s)=84993
iops: min=9028 max=39728 avg=35385.707031
bw(KiB/s): min=36112 max=158912 avg=141543.125000

JobName: read_bw
blocksize=128K filesize=2G iodepth=64 rw=randread
read:
IOPS=36891.605469 BW(KiB/s)=4722663
iops: min=31849 max=45298 avg=36709.964844
bw(KiB/s): min=4076761 max=5798144 avg=4698881.500000

JobName: write_bw
blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
IOPS=33320.179688 BW(KiB/s)=4265520
iops: min=17652 max=40996 avg=33119.656250
bw(KiB/s): min=2259456 max=5247488 avg=4239321.500000

Disk stats (read/write):
sda: ios=1454972/1046364 merge=0/22 ticks=1907168/1466570 in_queue=3393654, util=29.229431%

  • OK
    `

and with the ceph block Storageclass: rbd.csi.ceph.com

`
./kubestr fio -s ceph-block
PVC created kubestr-fio-pvc-n7m9z
Pod created kubestr-fio-pod-4jnqw
Running FIO test (default-fio) on StorageClass (ceph-block) with a PVC of Size (100Gi)
Elapsed time- 27.566283667s
FIO test results:

FIO version - fio-3.36
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
blocksize=4K filesize=2G iodepth=64 rw=randread
read:
IOPS=242.109741 BW(KiB/s)=983
iops: min=98 max=496 avg=257.322571
bw(KiB/s): min=392 max=1987 avg=1030.129028

JobName: write_iops
blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
IOPS=224.676819 BW(KiB/s)=914
iops: min=2 max=768 avg=264.464294
bw(KiB/s): min=8 max=3072 avg=1058.357178

JobName: read_bw
blocksize=128K filesize=2G iodepth=64 rw=randread
read:
IOPS=213.964386 BW(KiB/s)=27884
iops: min=90 max=462 avg=223.967743
bw(KiB/s): min=11520 max=59254 avg=28694.708984

JobName: write_bw
blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
IOPS=219.214661 BW(KiB/s)=28548
iops: min=4 max=704 avg=258.035706
bw(KiB/s): min=512 max=90112 avg=33048.785156

Disk stats (read/write):
rbd2: ios=8696/8655 merge=0/267 ticks=2245425/1975831 in_queue=4221257, util=99.504547%

  • OK

`

The talos machine has two nic's. One only to communicating with the ceph monitor's.

It's Working, but i think to slow.

@smira
Copy link
Member

smira commented Nov 19, 2024

Then you need to dig further to understand why - what is the bottleneck, certainly Ceph block storage should be slower as it goes via the network, does replication, etc.

You can watch resource utilization to understand what is the bottleneck.

We are not aware of anything missing from the Talos side, and we do use Ceph a lot ourselves with Talos.

@f-wi-plano
Copy link

f-wi-plano commented Dec 2, 2024

Ok, first part we updated from 1.7.5 to 1.8.3:

talos 1.8.3:

./kubestr fio -s ceph-block
PVC created kubestr-fio-pvc-rr88n
Pod created kubestr-fio-pod-tfhwn
Running FIO test (default-fio) on StorageClass (ceph-block) with a PVC of Size (100Gi)
Elapsed time- 28.61114439s
FIO test results:
  
FIO version - fio-3.36
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
read:
  IOPS=225.275375 BW(KiB/s)=916
  iops: min=58 max=547 avg=245.451614
  bw(KiB/s): min=232 max=2188 avg=982.258057

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=208.858887 BW(KiB/s)=850
  iops: min=118 max=480 avg=251.928574
  bw(KiB/s): min=472 max=1923 avg=1008.285706

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
read:
  IOPS=171.147690 BW(KiB/s)=22384
  iops: min=32 max=382 avg=186.451614
  bw(KiB/s): min=4096 max=48896 avg=23881.837891

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=210.829285 BW(KiB/s)=27469
  iops: min=18 max=486 avg=251.142853
  bw(KiB/s): min=2304 max=62208 avg=32166.677734

Disk stats (read/write):
  rbd7: ios=7798/8137 merge=0/266 ticks=2268458/2110792 in_queue=4379250, util=99.517471%
  -  OK
  

talos 1.7.5

 ./kubestr fio -s ceph-block
PVC created kubestr-fio-pvc-gz78h
Pod created kubestr-fio-pod-w6q9h
Running FIO test (default-fio) on StorageClass (ceph-block) with a PVC of Size (100Gi)
Elapsed time- 25.926723803s
FIO test results:
  
FIO version - fio-3.36
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
read:
  IOPS=3099.707031 BW(KiB/s)=12415
  iops: min=2904 max=3330 avg=3104.266602
  bw(KiB/s): min=11616 max=13322 avg=12417.200195

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=1818.115234 BW(KiB/s)=7289
  iops: min=1597 max=1963 avg=1821.033325
  bw(KiB/s): min=6388 max=7855 avg=7284.466797

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
read:
  IOPS=3061.892822 BW(KiB/s)=392458
  iops: min=2860 max=3300 avg=3065.199951
  bw(KiB/s): min=366080 max=422400 avg=392351.312500

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=1826.963989 BW(KiB/s)=234388
  iops: min=1712 max=1960 avg=1829.699951
  bw(KiB/s): min=219136 max=250880 avg=234209.000000

Disk stats (read/write):
  rbd3: ios=104828/62036 merge=0/701 ticks=2173229/1309682 in_queue=3482912, util=99.467682%
  -  OK

But I think that's still too slow.

But I don't know where to look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants