Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

off2 is sometimes very slow (processes in IOWAIT) #263

Open
stephanegigandet opened this issue Oct 10, 2023 · 12 comments
Open

off2 is sometimes very slow (processes in IOWAIT) #263

stephanegigandet opened this issue Oct 10, 2023 · 12 comments

Comments

@stephanegigandet
Copy link
Contributor

Running some scripts on off2 sometimes takes much more time than usual, I have been testing some import scripts and everything seems sluggish compared to off1.

e.g. to start some Perl processes, it sometimes takes 15 seconds, other times 2 minutes:

off@off-pro:/srv/off-pro/lib$ time ./startup_apache2.pl 
Can't open /mnt/off-pro/logs/off-pro/log4perl.log (Permission denied) at /usr/share/perl5/Log/Log4perl/Appender/File.pm line 151.

real	2m11.803s
user	0m13.394s
sys	0m1.442s

[5 minutes later]

off@off-pro:/srv/off-pro/lib$ time ./startup_apache2.pl 
Can't open /mnt/off-pro/logs/off-pro/log4perl.log (Permission denied) at /usr/share/perl5/Log/Log4perl/Appender/File.pm line 151.

real	0m14.031s
user	0m12.912s
sys	0m1.117s

I'm guessing in the first run, we were just waiting for the disks for 2 minutes.

Same thing happened with systemctl start apache2@off which timed out once then was able to start the next time.

@stephanegigandet stephanegigandet added 🐛 bug Something isn't working performance labels Oct 10, 2023
@stephanegigandet
Copy link
Contributor Author

@alexgarel
Copy link
Member

@cquest Are drive the same ? Or maybe one has more constraints ?

@john-gom
Copy link

Could be irrelevant but is it possible that some slow or inaccessible devices are in the PATH?

@alexgarel
Copy link
Member

Could be irrelevant but is it possible that some slow or inaccessible devices are in the PATH?

I don't see anything special. As root on off2: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
On a VM (off): /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

@alexgarel
Copy link
Member

I did a small test with dd to compare with my laptop (following this article):

On off2:

dd if=/dev/zero of=/home/alex/test.img bs=1G count=1 oflag=dsync
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,995176 s, 1,1 GB/s

dd if=/dev/zero of=/home/alex/test.img bs=512 count=1000 oflag=dsync
512000 bytes (512 kB, 500 KiB) copied, 17,5354 s, 29,2 kB/s

On off1

dd if=/dev/zero of=/home/alex/test.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.60762 s, 668 MB/s

dd if=/dev/zero of=/home/alex/test.img bs=512 count=1000 oflag=dsync
512000 bytes (512 kB, 500 KiB) copied, 8.94754 s, 57.2 kB/s

On my laptop:

dd if=/dev/zero of=/home/alex/test.img bs=1G count=1 oflag=dsync
1073741824 octets (1,1 GB, 1,0 GiB) copiés, 1,44455 s, 743 MB/s

dd if=/dev/zero of=/home/alex/test.img bs=512 count=1000 oflag=dsync
512000 octets (512 kB, 500 KiB) copiés, 4,1427 s, 124 kB/s

So we have a multiplication by two on write latency om off2 compared to off1, but not more.

This is just a write test… might not be very interesting, as perf problems seems more on read.

@alexgarel
Copy link
Member

read test (but I can't be sure if there is no cache, and don't want to invalidate all cache ! I try with the nocache option)

off1:

dd if=/srv2/off/html/data/openfoodfacts-mongodbdump.gz of=/dev/null bs=8k
...
7989754834 bytes (8.0 GB, 7.4 GiB) copied, 51.0625 s, 156 MB/s

dd if=/srv2/off/html/data/openfoodfacts-mongodbdump.gz iflag=nocache of=/dev/null bs=8k 
7989754834 bytes (8.0 GB, 7.4 GiB) copied, 39.3774 s, 203 MB/s

off2:

dd if=/zfs-hdd/off/html_data/openfoodfacts-mongodbdump.gz of=/dev/null bs=8k
...
7431006564 bytes (7,4 GB, 6,9 GiB) copied, 90,3344 s, 82,3 MB/s

dd if=/zfs-hdd/off/html_data/openfoodfacts-mongodbdump.gz iflag=nocache of=/dev/null bs=8k
...
7431006564 bytes (7,4 GB, 6,9 GiB) copied, 41,3374 s, 180 MB/s

@stephanegigandet
Copy link
Contributor Author

Another example: 6 seconds to do a grep on 2 small files:

root@opf:/var/log# ls -lrt syslog messages
-rw-r----- 1 root adm    1702 Oct 18 09:39 messages
-rw-r--r-- 1 root root 267895 Oct 18 09:40 syslog
root@opf:/var/log# time grep oom syslog messages

real	0m6.708s
user	0m0.000s
sys	0m0.002s
root@opf:/var/log# time grep oom syslog messages

real	0m8.337s
user	0m0.002s
sys	0m0.000s

@alexgarel
Copy link
Member

alexgarel commented Oct 18, 2023

sudo zpool status


  pool: zfs-hdd
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 9 days 04:45:09 with 0 errors on Tue Oct 17 05:10:14 2023
config:

	NAME         STATE     READ WRITE CKSUM
	zfs-hdd      ONLINE       0     0     0
	  raidz1-0   ONLINE       0     0     0
	    sda4     ONLINE       3     0     0
	    sdb4     ONLINE       0     0     0
	    sdc4     ONLINE       0     0     0
	    sdd4     ONLINE       0     0     0
	cache
	  nvme3n1p2  ONLINE       0     0     0

@stephanegigandet
Copy link
Contributor Author

fail2ban nginx-bot and nginx-http were doing a lot of reads and writes and @alexgarel disabled them, but we still have very high usage of the disks:

root@off2:/var/log# iostat -x 5

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.18    0.00    2.02   61.44    0.00   31.37

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme2n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme3n1         79.80   1526.40     0.00   0.00    0.12    19.13    1.60     31.20     0.00   0.00    0.00    19.50    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.01  17.20
sda             38.20    706.40     0.00   0.00   24.24    18.49  233.80   2914.40     0.00   0.00    2.56    12.47    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.53  59.60
sdb             43.00    803.20     0.00   0.00   25.98    18.68  232.80   2846.40     0.00   0.00    2.38    12.23    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.67  58.72
sdc             44.60    772.80     0.00   0.00   88.86    17.33  108.20   3018.40     0.00   0.00   40.15    27.90    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    8.31  93.12
sdd             42.60    750.40     0.00   0.00  115.08    17.62  146.00   2950.40     0.20   0.14    7.45    20.21    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    5.99  77.44

after removing fail2ban nginx bots,
iotop -a -d 5 shows a high current disk write, but it doesn't seem to show the processes that do those writes.

Total DISK READ:         8.74 M/s | Total DISK WRITE:        42.24 K/s
Current DISK READ:       4.13 M/s | Current DISK WRITE:      11.17 M/s
    TID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND                                                                    
 441670 be/4 root       1152.00 K      2.76 M  ?unavailable?  [kworker/u32:4-flush-zfs-107]
3535889 be/4 root          0.00 B   1904.00 K  ?unavailable?  systemd-journald
2358804 be/4 100000        0.00 B   1680.00 K  ?unavailable?  systemd-journald
2618831 be/4 100033        6.50 M   1021.00 K  ?unavailable?  nginx: worker process
3219470 be/4 100033       12.00 K    808.00 K  ?unavailable?  nginx: worker process
3219471 be/4 100033        0.00 B    710.00 K  ?unavailable?  nginx: worker process
 434969 be/4 root        540.00 K    660.00 K  ?unavailable?  [kworker/u32:3-flush-zfs-107]
 640367 be/4 root        272.00 K    628.00 K  ?unavailable?  [kworker/u32:1-flush-zfs-12]
 380477 be/4 www-data    193.78 M    515.00 K  ?unavailable?  nginx: worker process
   4499 be/4 root          5.00 M    445.00 K  ?unavailable?  python3 /usr/bin/fail2ban-server -xf start [f2b/f.sshd]
 380480 be/4 www-data     76.83 M    417.00 K  ?unavailable?  nginx: worker process
 380476 be/4 www-data    156.90 M    398.00 K  ?unavailable?  nginx: worker process
 380478 be/4 www-data    167.61 M    360.00 K  ?unavailable?  nginx: worker process
 380479 be/4 www-data    130.88 M    288.00 K  ?unavailable?  nginx: worker process
2604899 be/4 100033       28.78 M    253.00 K  ?unavailable?  nginx: worker process
 500393 be/4 100000        2.38 M    225.00 K  ?unavailable?  python3 /usr/bin/fail2ban-server -xf start [f2b/f.sshd]
 380481 be/4 www-data     58.06 M    215.00 K  ?unavailable?  nginx: worker process


@stephanegigandet
Copy link
Contributor Author

set atime=off on all volumes on hdd

things look much better:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.93    0.00    0.48    7.86    0.00   88.73

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme2n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme3n1         48.67    616.00     0.00   0.00    0.10    12.66    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00  12.53
sda             45.00    873.33     0.00   0.00   10.05    19.41   25.33    332.00     0.00   0.00    1.21    13.11    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.48  43.60
sdb             41.00    770.67     0.00   0.00   11.27    18.80   22.00    321.33     0.00   0.00    2.11    14.61    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.51  42.93
sdc             32.67    597.33     0.00   0.00   14.56    18.29   19.33    306.67     0.00   0.00    9.60    15.86    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.66  43.07
sdd             36.33    698.67     0.00   0.00   15.61    19.23   22.00    330.67     0.00   0.00    1.92    15.03    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.61  41.60
Total DISK READ:         9.13 M/s | Total DISK WRITE:       903.05 K/s
Current DISK READ:       4.24 M/s | Current DISK WRITE:       2.96 M/s
    TID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND                                                                                                                     
1643228 be/4 root        204.12 M      4.04 M  ?unavailable?  [nfsd]
1643225 be/4 root          5.63 M   1948.00 K  ?unavailable?  [nfsd]
3756909 be/4 root        352.00 K   1920.00 K  ?unavailable?  [kworker/u32:4-events_unbound]
1643227 be/4 root         56.49 M   1513.00 K  ?unavailable?  [nfsd]
2618831 be/4 100033       12.11 M   1421.00 K  ?unavailable?  nginx: worker process
3219470 be/4 100033        0.00 B   1403.00 K  ?unavailable?  nginx: worker process
2358804 be/4 100000        0.00 B   1192.00 K  ?unavailable?  systemd-journald
 380476 be/4 www-data    525.33 M   1173.00 K  ?unavailable?  nginx: worker process
1643226 be/4 root         14.17 M   1024.00 K  ?unavailable?  [nfsd]

@teolemon teolemon added the off2 label Nov 17, 2023
@teolemon
Copy link
Member

image
cat /proc/pressure/io
some avg10=84.50 avg60=84.73 avg300=81.75 total=269336601043
full avg10=81.85 avg60=79.88 avg300=74.52 total=227694438466
whereas I don't see any high read/write:
image
I don't see really annoying bots (appart maybe a "Amazon CloudFront" on images, using many ips, but it's about 10% of the traffic)

iostat --pretty --human

Linux 5.15.131-2-pve (off2) 23/09/2024 x86_64 (16 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
22,5% 0,4% 3,9% 36,6% 0,0% 36,6%

  tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
24,32       130,2k       172,6k         0,0k      64,9G      86,1G       0,0k nvme0n1
24,22       129,4k       172,6k         0,0k      64,5G      86,1G       0,0k nvme1n1
 5,75         0,4k       694,3k         0,0k     185,5M     346,3G       0,0k nvme2n1

116,68 836,8k 3,4M 0,0k 417,3G 1,7T 0,0k nvme3n1
234,14 6,2M 3,1M 0,0k 3,1T 1,5T 0,0k sda
232,14 6,0M 3,0M 0,0k 3,0T 1,5T 0,0k sdb
169,68 6,2M 3,1M 0,0k 3,1T 1,5T 0,0k sdc
160,76 6,1M 3,1M 0,0k 3,0T 1,5T 0,0k sdd

avg-cpu: %user %nice %system %iowait %steal %idle
7,8% 0,2% 1,5% 79,5% 0,0% 11,0%

  tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device

126,00 0,0k 1,8M 0,0k 0,0k 1,8M 0,0k nvme0n1
127,00 0,0k 1,8M 0,0k 0,0k 1,8M 0,0k nvme1n1
0,00 0,0k 0,0k 0,0k 0,0k 0,0k 0,0k nvme2n1
74,00 641,5k 19,0k 0,0k 641,5k 19,0k 0,0k nvme3n1
258,00 848,0k 8,0M 0,0k 848,0k 8,0M 0,0k sda
251,00 580,0k 7,6M 0,0k 580,0k 7,6M 0,0k sdb
168,00 904,0k 7,8M 0,0k 904,0k 7,8M 0,0k sdc
150,00 740,0k 10,3M 0,0k 740,0k 10,3M 0,0k sdd

@teolemon teolemon removed the 🐛 bug Something isn't working label Oct 19, 2024
@cquest
Copy link
Contributor

cquest commented Oct 26, 2024

Causes of I/O issues on off2 have been identified:

  • Bad firmware on 2 HDDs (sdc and sdd), that need to be updated
  • Too much load caused by serving images from off2 (solved by serving images from another server, currently ks1)
  • hourly backups to HDDs

I propose to close this issue and open more specific new ones if needed.

@cquest cquest changed the title off2 is sometimes very slow off2 is sometimes very slow (processes in IOWAIT) Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

6 participants