Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nobel: SWAP, should the way it is done be changed? #402

Open
schmidtsv opened this issue Dec 13, 2024 · 0 comments
Open

Nobel: SWAP, should the way it is done be changed? #402

schmidtsv opened this issue Dec 13, 2024 · 0 comments

Comments

@schmidtsv
Copy link

What led to this Issue:

I have been pondering about an issue we currently have. On most Cloud environments there is some automagic creation of swap partitions, so for example running swapon on an AWS environment delivers this:
uaa/f5754eca-ca77-4154-ace6-b6e9a8be83fa: stdout | /dev/nvme1n1p1 partition 1.9G 949.4M -2
diego-cell/77b0fac0-bc7a-4d41-a937-1470436a9563: stdout | /dev/nvme1n1p1 partition 30.7G 126.5M -2
cc-worker/cde3bf43-fee6-4fa9-b017-5aa763c4f9fe: stdout | /dev/nvme1n1p1 partition 933M 115.3M -2

nvme1n1 is the disk that also houses /var/vcap/data, so the swap is basically a defined subsize of the ephemeral disk.

But not in all CPIs do we get a separate ephemeral disk. If an openstack flavor comes with swap disk that will be always used. So for example in our Openstack environment the disks look like this:

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
vda    252:0    0   20G  0 disk 
├─vda1 252:1    0  4.8G  0 part /home
│                               /
├─vda2 252:2    0    1G  0 part [SWAP]
└─vda3 252:3    0 14.2G  0 part /var/tmp
                                /tmp
                                /opt
                                /var/opt
                                /var/log
                                /var/vcap/data

which means that we have a "fixed" swap here. This comes from here: https://github.com/cloudfoundry/bosh-openstack-cpi-release/blob/341637140e4d4981fb01e846f3e4fa98e155872b/src/bosh_openstack_cpi/lib/cloud/openstack/cloud.rb#L738-L740 so in case the flavor has no swap there will be no swap and no way to get swap.
For all other cases there is the agent: https://github.com/cloudfoundry/bosh-agent/blob/a0491e84f1224c5297c3f7179674258e9ce125c6/platform/linux_platform.go#L632

So here is the question I have, that in my opinion is worth debating:

How should we handle swaps?

There are 2 Main ways to handle swap in linux, both can technically be mixed:

  • Swapfile
  • Swap FS partition
  • not using swap (the yolo option)

Downsides of swapfiles compared to swap-partitions

The first downside of Swapfiles compared to a partion is that on magnetic disks, due to how storage near the center is faster, the partition has location benefits, but let's be honest, Data centers do not recommend using anything but SSDs to boot (or even offer magnetic volumes).
The second downside of swapfiles are, if not created on a fresh system, the file can be fragmented and therefore slower. I doubt this will be an issue, as the agent sets up swap right after the first boot anyways so there is not yet much data there that could cause fragmentation.
The third downside is that ZFS would not be supported (as would NILFS or BCACHEFS)

What would this accomplish?

First, even without going from swp partition to mount it would be nice if it was configurable. Our Diego Cells usually do not run into swap deep enough to make a 30GiB SWP partition sensible, and if they did, the customers would not be happy. On the other hand some other VMs would scale better if more swap could be added.

Second, on the topic of Swapfiles, it would decouple it from the IaaS provider where the "If flavor has swap then it has swap" can be a limiting factor, and not everyone can get special flavors upon asking (for example when using a public provider). This would mean there would be an unified way to set up swap, which would in turn mean, that at some point in the future it could be configurable via bosh deployment manifest.

What if the flavor delivers a Swapdrive

In this case we can simply use the Swapdrive provided, if left on default or add an additional swap file. Multiple swap drives/files can be present on the system at the same time and can be prioritized based on a priority, see: https://wiki.archlinux.org/title/Swap#Priority

Why now?

Because adding it to an already GA, in use Stemcell can cause issues that I would rather like to avoid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant