Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability for driverdog to copy modules #119

Merged
merged 2 commits into from
Sep 12, 2024

Conversation

yeazelm
Copy link
Contributor

@yeazelm yeazelm commented Sep 3, 2024

Issue number:

Related to # bottlerocket-os/bottlerocket#4172

Description of changes:
This PR fixes a bug in driverdog (see the first commit) and adds the ability for driverdog to read a configuration file that doesn't link. With that new logic it also supports a link option to only copy the drivers to the right place since some modules may already be linked, but still need to be loaded by driverdog. See bottlerocket-os/bottlerocket#4172 for more background.

Testing done:
Added unit tests to ensure the new loading of the file works for linking vs copying files. I also built an aws-k8s-1.30-nvidia AMI and ensured things still loaded correctly. More testing will come under the PR that combines all the work together to ensure the two drivers can be chosen at runtime.

bash-5.1# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.183.06  Wed Jun 26 06:46:07 UTC 2024
GCC version:  gcc version 11.3.0 (Buildroot 2022.11.1)
bash-5.1# exit
exit
[root@admin]# sheltie
bash-5.1# systemctl -u link-kernel-modules.service
systemctl: invalid option -- 'u'
bash-5.1# journalctl -u link-kernel-modules.service
Sep 03 03:21:25 localhost systemd[1]: Starting Link additional kernel modules...
Sep 03 03:21:25 localhost driverdog[2219]: 03:21:25 [INFO] Linked object 'nvidia-modeset.o'
Sep 03 03:21:26 localhost driverdog[2219]: 03:21:26 [INFO] Stripped object 'nvidia-modeset.o'
Sep 03 03:21:34 ip-192-168-92-157.us-west-2.compute.internal driverdog[2219]: 03:21:34 [INFO] Linked object 'nvidia.o'
Sep 03 03:21:34 ip-192-168-92-157.us-west-2.compute.internal driverdog[2219]: 03:21:34 [INFO] Stripped object 'nvidia.o'
Sep 03 03:21:34 ip-192-168-92-157.us-west-2.compute.internal driverdog[2219]: 03:21:34 [INFO] Linked nvidia-uvm.ko
Sep 03 03:21:35 ip-192-168-92-157.us-west-2.compute.internal driverdog[2219]: 03:21:35 [INFO] Linked nvidia.ko
Sep 03 03:21:35 ip-192-168-92-157.us-west-2.compute.internal driverdog[2219]: 03:21:35 [INFO] Linked nvidia-modeset.ko
Sep 03 03:21:35 ip-192-168-92-157.us-west-2.compute.internal systemd[1]: Finished Link additional kernel modules.
bash-5.1# journalctl -u load-kernel-modules.service
Sep 03 03:21:35 ip-192-168-92-157.us-west-2.compute.internal systemd[1]: Starting Load additional kernel modules...
Sep 03 03:21:37 ip-192-168-92-157.us-west-2.compute.internal driverdog[2960]: 03:21:37 [INFO] Updated modules dependencies
Sep 03 03:21:37 ip-192-168-92-157.us-west-2.compute.internal driverdog[2960]: 03:21:37 [INFO] Loaded kernel modules
Sep 03 03:21:37 ip-192-168-92-157.us-west-2.compute.internal systemd[1]: Finished Load additional kernel modules.

I also built on top of #118 for including both drivers and provided the following config file to make sure it works:

[nvidia-open-gpu]
lib-modules-path = "kernel/drivers/extra/video/nvidia/open-gpu"

[nvidia-open-gpu.kernel-modules."nvidia.ko"]
copy-source = "/x86_64-bottlerocket-linux-gnu/sys-root/usr/share/nvidia/open-gpu/drivers/"

[nvidia-open-gpu.kernel-modules."nvidia-modeset.ko"]
copy-source = "/x86_64-bottlerocket-linux-gnu/sys-root/usr/share/nvidia/open-gpu/drivers/"

[nvidia-open-gpu.kernel-modules."nvidia-uvm.ko"]
copy-source = "/x86_64-bottlerocket-linux-gnu/sys-root/usr/share/nvidia/open-gpu/drivers/"

Which resulted in:

bash-5.1# mkdir -p /lib/modules/6.1.102/kernel/drivers/extra/video/nvidia/open-gpu/
bash-5.1# driverdog --modules-set nvidia-open-gpu link-modules
23:53:20 [INFO] Copied nvidia.ko
23:53:20 [INFO] Copied nvidia-modeset.ko
23:53:21 [INFO] Copied nvidia-uvm.ko

Note, to make this work we would need nvidia-modeset.ko and nvidia-peermem.ko in the config file for it to fully work as intended.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@yeazelm
Copy link
Contributor Author

yeazelm commented Sep 3, 2024

^ Fix clippy for Enum naming.

sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
sources/driverdog/README.md Outdated Show resolved Hide resolved
sources/driverdog/src/tests/linking.conf Outdated Show resolved Hide resolved
sources/driverdog/src/tests/copying.conf Outdated Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
@yeazelm
Copy link
Contributor Author

yeazelm commented Sep 5, 2024

Updated the commit to reflect the two different configuration types: LinkingDriverConfig and CopyingDriverConfig to make this logic a bit more robust for choosing between link or copy paths.

The expected behavior for link-modules and load-modules is to load all
the modules from all the driver configuration files unless a specific
modules set is specified. In the case that a modules set is specified,
then driverdog should only link or load modules from the specified
target configuration. The link logic worked this way but the load logic
would always run the else clause, therefore always loading all modules
even if only one was specified.

Signed-off-by: Matthew Yeazel <[email protected]>
sources/driverdog/src/main.rs Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
sources/driverdog/src/main.rs Show resolved Hide resolved
@cbgbt
Copy link
Contributor

cbgbt commented Sep 7, 2024

Overall I think the config driven by the enums seems quite clean and easy to follow. Nice work!

@yeazelm
Copy link
Contributor Author

yeazelm commented Sep 10, 2024

^ updated based upon @cbgbt comments

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

sources/driverdog/src/main.rs Show resolved Hide resolved
sources/driverdog/src/main.rs Outdated Show resolved Hide resolved
There are cases where modules could be included that are already linked
but could be loaded by driverdog. This adds the ability to specify the
command line argument copy-modules and improves the configuration format
to allow configurations to either link or copy modules.

Signed-off-by: Matthew Yeazel <[email protected]>
@yeazelm
Copy link
Contributor Author

yeazelm commented Sep 12, 2024

^ Fix the comments from @arnaldo2792

@yeazelm yeazelm merged commit 10dd6b5 into bottlerocket-os:develop Sep 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants