-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kit changes for Nvidia settings APIs #48
Kit changes for Nvidia settings APIs #48
Conversation
packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-conf
Outdated
Show resolved
Hide resolved
plugin: | ||
passDeviceSpecs: {{default true settings.kubernetes.nvidia.device-plugin.pass-device-specs}} | ||
deviceListStrategy: "{{default "volume-mounts" settings.kubernetes.nvidia.device-plugin.device-list-strategy}}" | ||
deviceIDStrategy: "{{default "index" settings.kubernetes.nvidia.device-plugin.device-id-strategy}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this result in e.g. "volume-mounts" being double-quoted if the setting is unset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
ExecStart=/usr/bin/nvidia-device-plugin --device-list-strategy volume-mounts --device-id-strategy index --pass-device-specs=true | ||
ExecStart=/usr/bin/nvidia-device-plugin --config-file=/etc/nvidia-k8s-device-plugin/settings.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned when chatting that we put a default configuration file in place to keep compatibility with downstream variants that won't add the device plugin settings. That makes sense.
An alternative to overwriting a config tempfile could be to place a systemd dropin that overwrites ExecStart
based on settings changes. So this file would remain the same, but a template could be rendered to /etc/sysetmd/system/nvidia-k8s-device-plugin.service.d/exec-start
adding the alternative line with a reference to the rendered config file.
I know we do something like this for kubelet
, though I'm not sure if there are compelling reasons to prefer one to the other. I suppose anything that gets rendered to /etc
consumes system memory, so perhaps it's slightly nicer to avoid doing that if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed as suggested
d958a6d
to
076025b
Compare
@@ -1 +1 @@ | |||
C /etc/nvidia-container-runtime/config.toml - - - - /usr/share/factory/nvidia-container-runtime/nvidia-container-toolkit-config-k8s.toml | |||
d /etc/nvidia-container-runtime - - - - - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't need to be created by tmpfiles.d because the template rendering system will make the directory on our behalf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the tmpfiles as suggested.
[Service] | ||
{{#if settings.kubernetes.device-plugins.nvidia}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test that this works correctly on instances without an nvidia device plugin set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I tested it if settings.kubernetes.device-plugins.nvidia
is not defined then this file is rendered as
[Service]
which are kind of ignored as there is no overriding settings. It works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird and to someone just seeing this on the system would appear to be a failure? Can we add a comment in an else statement to say deliberately left this way or something? Is it not better to just remove the [Service]
as well?
@@ -0,0 +1,2 @@ | |||
d /etc/nvidia-k8s-device-plugin - - - - - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you could similarly drop the tmpfiles requirement here since both files are templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted as suggested.
076025b
to
fcf84f1
Compare
[Service] | ||
{{#if settings.kubernetes.device-plugins.nvidia}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird and to someone just seeing this on the system would appear to be a failure? Can we add a comment in an else statement to say deliberately left this way or something? Is it not better to just remove the [Service]
as well?
Signed-off-by: Monirul Islam <[email protected]>
fcf84f1
to
db48620
Compare
Issue number:
Closes #
Description of changes:
This PR contains the changes required to support Nvidia settings APIs,
settings.nvidia-container-runtime
andsettings.kubernetes.nvidia
Testing done:
Yes.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.