You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been pondering about an issue we currently have. On most Cloud environments there is some automagic creation of swap partitions, so for example running swapon on an AWS environment delivers this:
uaa/f5754eca-ca77-4154-ace6-b6e9a8be83fa: stdout | /dev/nvme1n1p1 partition 1.9G 949.4M -2
diego-cell/77b0fac0-bc7a-4d41-a937-1470436a9563: stdout | /dev/nvme1n1p1 partition 30.7G 126.5M -2
cc-worker/cde3bf43-fee6-4fa9-b017-5aa763c4f9fe: stdout | /dev/nvme1n1p1 partition 933M 115.3M -2
nvme1n1 is the disk that also houses /var/vcap/data, so the swap is basically a defined subsize of the ephemeral disk.
But not in all CPIs do we get a separate ephemeral disk. If an openstack flavor comes with swap disk that will be always used. So for example in our Openstack environment the disks look like this:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 252:0 0 20G 0 disk
├─vda1 252:1 0 4.8G 0 part /home
│ /
├─vda2 252:2 0 1G 0 part [SWAP]
└─vda3 252:3 0 14.2G 0 part /var/tmp
/tmp
/opt
/var/opt
/var/log
/var/vcap/data
So here is the question I have, that in my opinion is worth debating:
How should we handle swaps?
There are 2 Main ways to handle swap in linux, both can technically be mixed:
Swapfile
Swap FS partition
not using swap (the yolo option)
Downsides of swapfiles compared to swap-partitions
The first downside of Swapfiles compared to a partion is that on magnetic disks, due to how storage near the center is faster, the partition has location benefits, but let's be honest, Data centers do not recommend using anything but SSDs to boot (or even offer magnetic volumes).
The second downside of swapfiles are, if not created on a fresh system, the file can be fragmented and therefore slower. I doubt this will be an issue, as the agent sets up swap right after the first boot anyways so there is not yet much data there that could cause fragmentation.
The third downside is that ZFS would not be supported (as would NILFS or BCACHEFS)
What would this accomplish?
First, even without going from swp partition to mount it would be nice if it was configurable. Our Diego Cells usually do not run into swap deep enough to make a 30GiB SWP partition sensible, and if they did, the customers would not be happy. On the other hand some other VMs would scale better if more swap could be added.
Second, on the topic of Swapfiles, it would decouple it from the IaaS provider where the "If flavor has swap then it has swap" can be a limiting factor, and not everyone can get special flavors upon asking (for example when using a public provider). This would mean there would be an unified way to set up swap, which would in turn mean, that at some point in the future it could be configurable via bosh deployment manifest.
What if the flavor delivers a Swapdrive
In this case we can simply use the Swapdrive provided, if left on default or add an additional swap file. Multiple swap drives/files can be present on the system at the same time and can be prioritized based on a priority, see: https://wiki.archlinux.org/title/Swap#Priority
Why now?
Because adding it to an already GA, in use Stemcell can cause issues that I would rather like to avoid.
The text was updated successfully, but these errors were encountered:
What led to this Issue:
I have been pondering about an issue we currently have. On most Cloud environments there is some automagic creation of swap partitions, so for example running
swapon
on an AWS environment delivers this:uaa/f5754eca-ca77-4154-ace6-b6e9a8be83fa: stdout | /dev/nvme1n1p1 partition 1.9G 949.4M -2
diego-cell/77b0fac0-bc7a-4d41-a937-1470436a9563: stdout | /dev/nvme1n1p1 partition 30.7G 126.5M -2
cc-worker/cde3bf43-fee6-4fa9-b017-5aa763c4f9fe: stdout | /dev/nvme1n1p1 partition 933M 115.3M -2
nvme1n1 is the disk that also houses /var/vcap/data, so the swap is basically a defined subsize of the ephemeral disk.
But not in all CPIs do we get a separate ephemeral disk. If an openstack flavor comes with swap disk that will be always used. So for example in our Openstack environment the disks look like this:
which means that we have a "fixed" swap here. This comes from here: https://github.com/cloudfoundry/bosh-openstack-cpi-release/blob/341637140e4d4981fb01e846f3e4fa98e155872b/src/bosh_openstack_cpi/lib/cloud/openstack/cloud.rb#L738-L740 so in case the flavor has no swap there will be no swap and no way to get swap.
For all other cases there is the agent: https://github.com/cloudfoundry/bosh-agent/blob/a0491e84f1224c5297c3f7179674258e9ce125c6/platform/linux_platform.go#L632
So here is the question I have, that in my opinion is worth debating:
How should we handle swaps?
There are 2 Main ways to handle swap in linux, both can technically be mixed:
Downsides of swapfiles compared to swap-partitions
The first downside of Swapfiles compared to a partion is that on magnetic disks, due to how storage near the center is faster, the partition has location benefits, but let's be honest, Data centers do not recommend using anything but SSDs to boot (or even offer magnetic volumes).
The second downside of swapfiles are, if not created on a fresh system, the file can be fragmented and therefore slower. I doubt this will be an issue, as the agent sets up swap right after the first boot anyways so there is not yet much data there that could cause fragmentation.
The third downside is that ZFS would not be supported (as would NILFS or BCACHEFS)
What would this accomplish?
First, even without going from swp partition to mount it would be nice if it was configurable. Our Diego Cells usually do not run into swap deep enough to make a 30GiB SWP partition sensible, and if they did, the customers would not be happy. On the other hand some other VMs would scale better if more swap could be added.
Second, on the topic of Swapfiles, it would decouple it from the IaaS provider where the "If flavor has swap then it has swap" can be a limiting factor, and not everyone can get special flavors upon asking (for example when using a public provider). This would mean there would be an unified way to set up swap, which would in turn mean, that at some point in the future it could be configurable via bosh deployment manifest.
What if the flavor delivers a Swapdrive
In this case we can simply use the Swapdrive provided, if left on default or add an additional swap file. Multiple swap drives/files can be present on the system at the same time and can be prioritized based on a priority, see: https://wiki.archlinux.org/title/Swap#Priority
Why now?
Because adding it to an already GA, in use Stemcell can cause issues that I would rather like to avoid.
The text was updated successfully, but these errors were encountered: