Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aarch64 multi-arch builds fail due to no disk space left on builder #1554

Closed
marmijo opened this issue Jul 17, 2024 · 3 comments
Closed

aarch64 multi-arch builds fail due to no disk space left on builder #1554

marmijo opened this issue Jul 17, 2024 · 3 comments

Comments

@marmijo
Copy link
Contributor

marmijo commented Jul 17, 2024

We've been hitting storage issues on the aarch64 multi-arch builder lately and it's causing our builds to fail with a message similar to, but not limited to, the following:

[2024-07-17T16:24:06.840Z] Committing 01fcos: /home/jenkins/agent/workspace/build-arch/src/config/overlay.d/01fcos ... error: Writing content object: min-free-space-percent '3%' would be exceeded, at least 4.1?kB requested

OR

OSError: [Errno 28] No space left on device: 

OR

qemu-img: error while writing at byte 2859466752: No space left on device

I was able to log into the aarch64 builder today as the builder user and I found /sysroot at 100% usage.

core@coreos-aarch64-builder:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p4  200G  200G  2.2M 100% /sysroot
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           126G  200K  126G   1% /dev/shm
...
...

I freed up some space today by running podman volume prune after noticing that most of the storage space was being used by those volumes.

builder@coreos-aarch64-builder:~$ podman volume prune
WARNING! This will remove all volumes not used by at least one container. The following volumes will be removed:
04ca0c2da268f19d45440991aebc0ca9f2518c09f2a0dcdbeae66cccc563a521
11e3d74469587125fd71ce12e2d84cf6210363e1ce50c432e5ac0da098089a2b
164a592f879a706839806895605af1b1e599c82a54d7a7e9cd1b11421f4201bb
f5fa83bd6c333d4e302f180c5aa838217c2cb41e98186b98ddaf2b92d83022bc
Are you sure you want to continue? [y/N] y
04ca0c2da268f19d45440991aebc0ca9f2518c09f2a0dcdbeae66cccc563a521
11e3d74469587125fd71ce12e2d84cf6210363e1ce50c432e5ac0da098089a2b
164a592f879a706839806895605af1b1e599c82a54d7a7e9cd1b11421f4201bb
f5fa83bd6c333d4e302f180c5aa838217c2cb41e98186b98ddaf2b92d83022bc
builder@coreos-aarch64-builder:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p4  200G   92G  109G  46% /sysroot
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           126G  400K  126G   1% /dev/shm
efivarfs        512K  4.6K  508K   1% /sys/firmware/efi/efivars
tmpfs            51G  9.9M   51G   1% /run
tmpfs           126G     0  126G   0% /tmp
/dev/nvme0n1p3  350M  265M   62M  82% /boot
tmpfs            26G  452K   26G   1% /run/user/1001
tmpfs            26G   60K   26G   1% /run/user/1002
tmpfs            26G   16K   26G   1% /run/user/1000


Hopefully this will be mitigated once we redeploy the multi-arch builders on AWS and increase the size of the disk to at least 600GB from 200GB. While not necessary to redeploy the builder, landing coreos/fedora-coreos-pipeline#986 would make it much easier. However, it might be worth exploring if we can reduce/prevent the number of dangling volumes on the builders.

@jlebon
Copy link
Member

jlebon commented Jul 17, 2024

The volumes get cleaned up by https://github.com/coreos/fedora-coreos-pipeline/blob/ddadc038aa99692b346b422c21ede0436cd55de3/multi-arch-builders/builder-common.bu#L81, which runs daily. But I think what can happen is if too many jobs fail too quickly, we blow through the 200G limit before we even make it to the next prune.

c4rt0 added a commit to c4rt0/fedora-coreos-pipeline that referenced this issue Sep 10, 2024
The Aarch64 builder consistently complains about a lack of space,
particularly around 10am UTC / 12pm BST (London).
This additional prune job aims to mitigate the space issues.

See: openshift/os#1554
@c4rt0
Copy link
Contributor

c4rt0 commented Sep 10, 2024

I issued the above since the very same thing hit us again earlier today.

c4rt0 added a commit to c4rt0/fedora-coreos-pipeline that referenced this issue Sep 10, 2024
The Aarch64 builder consistently complains about a lack of space,
particularly around 10am UTC / 12pm BST (London).
This additional prune job aims to mitigate the space issues.

See: openshift/os#1554
c4rt0 added a commit to c4rt0/fedora-coreos-pipeline that referenced this issue Sep 10, 2024
The Aarch64 builder consistently complains about a lack of space.
After a brief discussion we decided to increase its size.

See: openshift/os#1554
Ref: coreos#1031 (comment)
dustymabe pushed a commit to coreos/fedora-coreos-pipeline that referenced this issue Sep 10, 2024
The Aarch64 builder consistently complains about a lack of space.
After a brief discussion we decided to increase its size.

See: openshift/os#1554
Ref: #1031 (comment)
@dustymabe
Copy link
Member

I redeployed the builder last week with larger disk size so we should be good now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants