Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase HTCondor spool ramdisk partition from 8GB to 12GB #12156

Open
amaltaro opened this issue Oct 22, 2024 · 4 comments
Open

Increase HTCondor spool ramdisk partition from 8GB to 12GB #12156

amaltaro opened this issue Oct 22, 2024 · 4 comments

Comments

@amaltaro
Copy link
Contributor

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
With the migration to Alma9, we also started seeing vm_kill and condor_schedd restarts every now and then. Discussing these with the SI team (Marco M.), he suggested to increase the production WMAgent HTCondor spool area, which is currently defined at 8GB size.

Describe the solution you'd like
Follow up with the VoC and gradually increase the /mnt/ramdisk partition area from 8GB to 12GB. Nodes that are not in use can be modified right away, while those that are active will have to wait until we can stop services.

Describe alternatives you've considered
None

Additional context
Latest condor_schedd restart and vm_kill dates from Oct/22/2024, on vocms0282.

@amaltaro
Copy link
Contributor Author

Relevant JIRA ticket: https://its.cern.ch/jira/browse/CMSVOC-598

@amaltaro
Copy link
Contributor Author

Just a quick update, 6 out of 8 nodes are now set to 12GB of RAM. The other 2 nodes are currently in use and we cannot make this change until we can actually drain those agents/nodes. Further details in the ticket above.

@anpicci
Copy link
Contributor

anpicci commented Dec 2, 2024

@amaltaro could we close this issue?

@amaltaro
Copy link
Contributor Author

amaltaro commented Dec 2, 2024

@anpicci not yet, we still have a couple of nodes to take care of (but they are currently running components, so it will take a little longer).

@amaltaro amaltaro moved this from In Progress to Waiting in WMCore quarterly developments Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting
Development

No branches or pull requests

2 participants