Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAM / CPU cores partitioning for multiple agents on the same machine #176

Open
mkurczew opened this issue Nov 20, 2023 · 1 comment
Open

Comments

@mkurczew
Copy link

mkurczew commented Nov 20, 2023

Hi,
I have several workstations with many CPU cores and a lot of RAM. All agents run Ubuntu.
I would like to run multiple ClearML agents (barebones, no k8s) on each of the workstations.

Can I somehow prevent one agent running a job from hoarding all of the resources (e.g. CPU cores) and guarantee each agent a minimum quota (prefably dynamic e.g. at least 8 cores and 1/3rd of RAM but more if avaialble)?

Or, alternatively, can I prevent the accidental scheduling of another job to the machine which is bogged down by other tasks?

Is there any way to achieve that with ClearML?
Documentation suggests that the only resource agents "manage" is GPUs.

EDIT: Just started to wonder, should I file it here or under ClearML project?

@ainoam
Copy link

ainoam commented Nov 23, 2023

This is probably the right place @mkurczew :)

To police the resources available to an agent you can make use of extra_docker_arguments for example:

extra_docker_arguments=["--memory=1g", "--cpus=2"]

(see https://docs.docker.com/config/containers/resource_constraints/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants