Too big /var/log/messages on vmhost-x86-copr01 #2799

FrostyX · 2023-07-01T21:04:18Z

<zodbot>  PROBLEM -
    vmhost-x86-copr01.rdu-cc.fedoraproject.org/Disk_Space_/ is
    WARNING: DISK WARNING - free space: / 4326 MiB (14.28% inode=59%):
    (noc01)
<nirik>  so thats because /var/log/messages is really big... it seems
    like lots of copr sessions. I'll rotate the file, but FrostyX or
    other copr folks would need to look into the underlying
    problem. I'll compress the file for now.

The text was updated successfully, but these errors were encountered:

FrostyX · 2023-07-04T19:40:33Z

At the moment the disk space is fine

But I will try to setup some rotation for /var/log/messages that nirik had to do manually.

FrostyX · 2023-07-04T20:08:59Z

But I will try to setup some rotation for /var/log/messages that nirik had to do manually.

This might solve it
https://pagure.io/fedora-infra/ansible/pull-request/1499

nirik · 2023-07-05T18:54:03Z

Note that it's "fine" now because I keep deleting logs so they stop alerting.

I'll just ack them so you can look more closely.

This is not just that virthost, its all of them... it seems like the backend is connecting a LOT so the messages are mostly ssh connections...

FrostyX · 2023-07-06T14:42:15Z

Note that it's "fine" now because I keep deleting logs so they stop alerting.

Thank you @nirik, I didn't know. I thought you only deleted it once.

This is not just that virthost, its all of them... it seems like the backend is connecting a LOT so the messages are mostly ssh connections...

It looks like resalloc went crazy. There are tens of

 INFO 2023-07-06 14:39:02,011 PID:3939926:140290543318720(Allocator-10992094) Allocating copr_hv_x86_64_01_prod_10992094_20230706_143901 finished => ENDED
 INFO 2023-07-06 14:39:02,015 PID:3939926:140289436006080(Allocator-10992093) Allocating copr_hv_x86_64_01_prod_10992093_20230706_143901 finished => ENDED
 INFO 2023-07-06 14:39:02,021 PID:3939926:140291608667840(Allocator-10992095) Allocating copr_hv_x86_64_01_prod_10992095_20230706_143901 finished => ENDED

rows in /var/log/resallocserver/main.log every second. When stopped resalloc for a few seconds, the SSH connections stopped appearing in the hypervisor log.

I am trying to investigate what is wrong with resalloc.

FrostyX · 2023-07-06T14:44:50Z

Ah, that's because all of those allocations fail

INFO:root:Hypervisor ID: 1
INFO:root:Using an IPv6 ending with ':c148'
DEBUG:root:Logging to qemu+ssh://[email protected]/system
DEBUG:root:cmd: mkisofs -o /tmp/tmpdhja99z3/config.iso -V eimg_config -r -J --quiet /tmp/tmpdhja99z3/config
xorriso 1.5.4 : RockRidge filesystem manipulator, libburnia project.

DEBUG:root: -> exit_status=0, time=0.105s
INFO:root:using config image /tmp/tmpdhja99z3/config.iso
DEBUG:root:cmd: virsh -c qemu+ssh://[email protected]/system vol-create-as images copr_hv_x86_64_02_prod_10992112_20230706_143902_config 1M
error: failed to connect to the hypervisor
error: End of file while reading data: Warning: Permanently added 'vmhost-x86-copr02.rdu-cc.fedoraproject.org' (ED25519) to the list of known hosts.^M
virt-ssh-helper: cannot connect to '/var/run/libvirt/libvirt-sock': Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused: Input/output error
DEBUG:root: -> exit_status=1, time=0.608s
DEBUG:root:Cleaning up ...
Traceback (most recent call last):
  File "/var/lib/resallocserver/provision/libvirt-new", line 461, in <module>
    _main()
  File "/var/lib/resallocserver/provision/libvirt-new", line 452, in _main
    spawner.spawn()
  File "/var/lib/resallocserver/provision/libvirt-new", line 342, in spawn
    self.create_volume_from_iso(config_vol_name, '1M', config_iso,
  File "/var/lib/resallocserver/provision/libvirt-new", line 226, in create_volume_from_iso
    self.alloc_disk(name, prealloc_size, pool)
  File "/var/lib/resallocserver/provision/libvirt-new", line 152, in alloc_disk
    raise Exception("can't create '{0}' disk".format(name))
Exception: can't create 'copr_hv_x86_64_02_prod_10992112_20230706_143902_config' disk

nirik · 2023-07-06T15:06:12Z

Strange. It looks like they are in a weird state. :(

If you like I can upgrade/reboot them and we can see if that clears things up? Or do you want to investigate more first?

FrostyX · 2023-07-06T16:11:27Z

The libvirtd service died 4 days ago, and therefore all the virsh commands failed. I don't know the reason why it died though

● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sun 2023-07-02 06:36:27 GMT; 4 days ago
     Docs: man:libvirtd(8)
           https://libvirt.org
  Process: 3321797 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=6)
 Main PID: 3321797 (code=exited, status=6)
    Tasks: 2 (limit: 32768)
   Memory: 64.3G
   CGroup: /system.slice/libvirtd.service
           ├─3506 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           └─3509 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

I would say it was OOM kill but since it died on 3/4 hypervisors at the same time, IMHO something else happened. We discussed with @nirik that it would be a good idea to update and reboot those machines, but I would like to do that once @praiskup is back from vacation since the libvirt setup is a bit tricky for me.

Until then, I did systemctl start libvirtd.service and it seems to work fine. New builders are spawning, no nonsense in /var/log/messages.

praiskup · 2024-04-08T11:17:20Z

It seems we don't experience /var/log/messages message floods nowadays. Closing as WORKSFORME

FrostyX added the fedora-copr-admin Tasks that need to be done by Fedora Copr administrator label Jul 1, 2023

praiskup closed this as completed Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too big /var/log/messages on vmhost-x86-copr01 #2799

Too big /var/log/messages on vmhost-x86-copr01 #2799

FrostyX commented Jul 1, 2023

FrostyX commented Jul 4, 2023

FrostyX commented Jul 4, 2023

nirik commented Jul 5, 2023

FrostyX commented Jul 6, 2023

FrostyX commented Jul 6, 2023

nirik commented Jul 6, 2023

FrostyX commented Jul 6, 2023

praiskup commented Apr 8, 2024

Too big /var/log/messages on vmhost-x86-copr01 #2799

Too big /var/log/messages on vmhost-x86-copr01 #2799

Comments

FrostyX commented Jul 1, 2023

FrostyX commented Jul 4, 2023

FrostyX commented Jul 4, 2023

nirik commented Jul 5, 2023

FrostyX commented Jul 6, 2023

FrostyX commented Jul 6, 2023

nirik commented Jul 6, 2023

FrostyX commented Jul 6, 2023

praiskup commented Apr 8, 2024