-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically use Open GPU drivers when desired #114
Conversation
sources/ghostdog/src/main.rs
Outdated
/// Given a PCI ID, search the Open GPU Supported Devices File to determine if the Open GPU Driver should be used | ||
fn find_preferred_driver(pci_id: String) -> Result<String> { | ||
// This corresponds to T4 devices which are Turing based and supported, but currently do not show up in the supported-devices file | ||
if pci_id == "10DE:1EB8" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the g5g instances also don't match up on the supported list https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus but are the Turing generation of devices that should support it. I'll do some more testing to see if the driver works on these instances too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to get light confirmation from NVIDIA that these devices should use the proprietary driver so I've removed this if statement.
[Install] | ||
WantedBy=load-kernel-modules.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very racy - systemd
will always queue the job immediately, and if it happens to start before ghostdog
finishes creating the link, then the condition check won't pass and it will fail to start.
It looks like you're working around that by pushing the job after systemd-tmpfiles-setup.service
, but that's a hack.
Zooming out a bit, it seems like you've created two synchronization problems for yourself:
- the need for the correct config file to be in place before either
driverdog link
ordriverdog load
can run - the need to copy the .ko files into place before
driverdog load
runs, in the open GPU case
You can eliminate the second coordination problem here by making it a driverdog
config problem instead - having a copy-source
field instead of link-objects
to tell driverdog
where to find the .ko
.
I know I pushed back on more driverdog
functionality here but I can be wrong sometimes. It would let you delete this unit (and its sync problem) and focus only on the "correct config for driverdog" problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've resolved this by moving away from udev
entirely. We can determine at the time of the unit file running what decision to make. It costs very little to do the ghostdog match-nvidia-driver
for these invocations so keeping track of these decisions somewhere else don't look to be worth it. Testing on a g3.4xlarge which should have to crawl the entire file and not make a match (worst case runtime) is still insignificant compared to the time it takes to link/load the modules:
bash-5.1# time ghostdog match-nvidia-driver tesla
real 0m0.005s
user 0m0.002s
sys 0m0.003s
bash-5.1# time ghostdog match-nvidia-driver open-gpu
open-gpu is not preferred driver: tesla
real 0m0.005s
user 0m0.005s
sys 0m0.000s
I can be more precise in the measurement if needed, but it looks to be that ghostdog
is very fast at providing this decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e712775
to
8b4aea4
Compare
^ Just pushed a bunch of changes, this should "reset" the PR to be the new approach and should remove the code already reviewed. |
After=link-tesla-kernel-modules.service | ||
Requires=link-tesla-kernel-modules.service | ||
# Disable manual restarts to prevent loading kernel modules | ||
# that weren't linked by the running system | ||
RefuseManualStart=true | ||
RefuseManualStop=true | ||
|
||
[Service] | ||
Type=oneshot | ||
ExecStart=/usr/bin/driverdog load-modules | ||
ExecCondition=/usr/bin/ghostdog match-nvidia-driver tesla | ||
ExecStart=/usr/bin/driverdog --modules-set nvidia-tesla load-modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that these units are so specific to NVIDIA, they don't really fit with the general purpose code in os
.
I would either move them to the kmod-*-nvidia
packages, or else turn these into unit templates and adjust the logic a bit so we can do something like:
[Unit]
After=link-kernel-modules@%i.service
Requires=link-kernel-modules@%i.service
...
[Service]
Type=oneshot
ExecCondition=/usr/bin/ghostdog match-driver %I
ExecStart=/usr/bin/driverdog --modules-set %I load-modules
Then the kmod-*-nvidia
packages could just add symlinks to instantiate the units they supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, I'll move them into the kmod packages. I had issues getting templates to block boot so we probably will opt for explicit files here to keep things straightforward even if its a bit of copying the same things over.
sources/ghostdog/src/main.rs
Outdated
/// The GPU Device Data contains various features of the device. Only Name, Device ID, and Features are required | ||
/// a particular device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing word, guessing ...?
/// The GPU Device Data contains various features of the device. Only Name, Device ID, and Features are required | |
/// a particular device | |
/// The GPU Device Data contains various features of the device. Only Name, Device ID, and Features are required | |
/// for a particular device |
sources/ghostdog/src/main.rs
Outdated
for input_device in present_devices.iter() { | ||
match open_gpu_devices { | ||
SupportedDevicesConfiguration::OpenGpu(ref device_list) => { | ||
let formatted_device_id = format!("0x{}", input_device.device()); | ||
for supported_device in device_list.iter() { | ||
if supported_device.device_id == formatted_device_id { | ||
return Ok("open-gpu".to_string()); | ||
} | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a system with a large number of GPUs that weren't supported by the open driver, and a JSON file with a large number of different devices, we could end up doing a lot of extra work.
It would be more efficient to transform the open GPU device data into a hash set of device IDs. Then we would guarantee one pass through the list of supported devices, and one lookup per device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried putting these into HashSet
s to ensure we didn't have negative performance even in cases where there are 16 devices and the runs all were taking between 5-6ms both as the Vec implementation as it stands vs the HashSet contains()
. It stands to reason that HashSet's would be better so I can convert, but I don't think we gain a lot here with the ~700 devices listed in the current file, even in worst case.
// Bogus PCI devices but "miss" to ensure there will be no match
let present_devices: Vec<ListDevicesOutput> = vec![ListDevicesOutput {
pci_slot: "00:00.0".to_string(),
class: "0600".to_string(),
vendor: "8086".to_string(),
device: "1234".to_string(),
program_interface: Some("00".to_string()),
subsystem_vendor: Some("1d0f".to_string()),
subsystem_device: Some("1237".to_string()),
..Default::default()
};16];
// reduce down to unique ids so we only iterate on uniqueness (down from 16 to 1 here)
let unique_ids: HashSet<String> = present_devices.iter().map(|x| format!("0x{}", x.device()).clone()).collect();
match open_gpu_devices {
SupportedDevicesConfiguration::OpenGpu(ref device_list) => {
let open_devices_list: HashSet<String> = device_list.iter().map(|x| x.device_id.clone()).collect();
for input_device in unique_ids.iter() {
if open_devices_list.contains(input_device) {
return Ok("open-gpu".to_string());
}
}
}
}
The compiler might be tricking me here and optimizing as well (I did try creating the Vec of unique objects and still saw the same times for input devices) so I can test a bit more, but even so, 5ms runs for this when parsing the entire file is good considering the kernel module loads will take seconds. We might save more by saving the decision to a file to be checked before rerunning the deserialization and searching since this will be called several times by systemd.
8b4aea4
to
d2e111a
Compare
^ updated to reflect comments. This now holds back open gpu use in some cases where it is known to cause issues. Also moved the kernel module services to the kmod packages. |
d2e111a
to
c9d1de9
Compare
^ Pushed a new set of commits, this renames the systemd services to be more clear and fixes the unique id bug. Validated on g5.2xlarge which used proprietary and a g4d.24xlarge that chose the open GPU drivers. |
ghostdog can now look up PCI devices and confirm which NVIDIA driver should be used. The match-nvidia-driver takes one argument, which type of driver such as tesla or open-gpu and exits 0 if that matches the PCI devices currently present. If those devices do not match the provided driver, it exits 1. This can be used to set ExecCondition for things like linking or loading drivers. Signed-off-by: Matthew Yeazel <[email protected]>
This adds upon the present logic to build the open-gpu driver to provide configuration that driverdog can use to work with the open-gpu drivers. Signed-off-by: Matthew Yeazel <[email protected]>
This adds upon the present logic to build the open-gpu driver to provide configuration that driverdog can use to work with the open-gpu drivers. Signed-off-by: Matthew Yeazel <[email protected]>
The os package should not be concerned with specifics to the NVIDIA kmod module preferences. os will provide driverdog, but the configuration files it reads will be provided by the package providing the modules to keep that specific logic local to their domain. Signed-off-by: Matthew Yeazel <[email protected]>
The os package doesn't need to concern itself with NVIDIA specific loading behavior. It will provide driverdog, but the configurations read by driverdog will be included with the specific kernel modules package that provides the drivers described in the configuration. Signed-off-by: Matthew Yeazel <[email protected]>
The os package doesn't need to concern itself with NVIDIA specific loading behavior. It will provide driverdog, but the configurations read by driverdog will be included with the specific kernel modules package that provides the drivers described in the configuration. This moves the tesla and open-gpu services into the kmod-5.15-nvidia package instead. Signed-off-by: Matthew Yeazel <[email protected]>
The os package doesn't need to concern itself with NVIDIA specific loading behavior. It will provide driverdog, but the configurations read by driverdog will be included with the specific kernel modules package that provides the drivers described in the configuration. This moves the tesla and open-gpu services into the kmod-6.1-nvidia package instead. Signed-off-by: Matthew Yeazel <[email protected]>
c9d1de9
to
b44e4c7
Compare
^ rebase on develop |
let present_devices = | ||
pciclient::list_devices(list_input).context(error::ListPciDevicesSnafu)?; | ||
|
||
// If there a multiple devices with the same ID, dedup them to minimize iterations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// If there a multiple devices with the same ID, dedup them to minimize iterations | |
// If there are multiple devices with the same ID, dedup them to minimize iterations |
Description of changes:
This adds the ability to select the open driver to the kmod packages in addition to the proprietary drivers at runtime. This should enable newer instance types that support the open GPU driver to use it automatically while keeping the older instance types on the proprietary driver.
This now uses
ExecCondition
to callghostdog match-nvidia-driver
to check which devices are present and then either stop the link/load for the non-desired driver or continue if the requested driver matches the requested driver.It is best reviewed in commit order.
ghostdog
can be reviewed separately, the two kmod packages are mirrors of each other, and then theos
package commit glues it all together.Additional notes
I added in an additional "copy only" configuration to copy the modules but not load them for
nvidia-drm
andnvidia-peermem
. This is purely optional and we can choose to still not provide them, but since we are building them and they are present, it might be useful to some use cases to make them loadable at runtime. I haven't been able to find a way to exercise them though so we can remove that configuration file from the PR and they will remain in the staging directory until we decide to pull them in.Testing done:
Tested on g5.2xlarge and g3.4xlarge both k8s 1.30 and 1.25 to confirm that the correct driver is loaded and the smoke tests to ensure the drivers still worked on each node.
g3.4xlarge
g4d.24xlarge
Example output from the gpu test:
Output for the g4d.24xlarge:
Output for the g3.4xlarge:
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.