-
Notifications
You must be signed in to change notification settings - Fork 110
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial POC for installing nvidia driver on EDPM nodes
TODO: - molecule tests - docs - more checks
- Loading branch information
Showing
18 changed files
with
395 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
# Copyright 2024 Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
- name: Import edpm role | ||
hosts: "{{ cifmw_target_host | default('localhost') }}" | ||
tasks: | ||
- name: Install drivers in phase 1 | ||
ansible.builtin.include_role: | ||
name: edpm_nvidia_mdev_prepare | ||
tasks_from: phase1 | ||
|
||
- name: Reboot the host | ||
ansible.builtin.reboot: | ||
|
||
- name: Run phase 2 | ||
ansible.builtin.include_role: | ||
name: edpm_nvidia_mdev_prepare | ||
tasks_from: phase2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# edpm_nvidia_mdev_prepare | ||
Please explain the role purpose. | ||
|
||
## Privilege escalation | ||
If apply, please explain the privilege escalation done in this role. | ||
|
||
## Parameters | ||
* `param_1`: this is an example | ||
|
||
## Examples |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
--- | ||
# Copyright Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
# Does the OS needs to disable the nouveau driver ? | ||
cifmw_edpm_nvidia_mdev_prepare_disable_nouveau: true | ||
|
||
# What is the URL or path for the nvidia driver RPM ? | ||
cifmw_edpm_nvidia_mdev_prepare_driver_url: '' | ||
|
||
# What will be the name of the nvidia package ? | ||
cifmw_edpm_nvidia_mdev_prepare_package_name: "NVIDIA-vGPU-rhel" | ||
|
||
# Which SR-IOV GPU devices should be creating VFs ? | ||
cifmw_edpm_nvidia_mdev_prepare_sriov_devices: | ||
- ALL |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
[Unit] | ||
After = nvidia-vgpu-mgr.service | ||
After = nvidia-vgpud.service | ||
Description = Enable Nvidia GPU virtual functions | ||
|
||
[Service] | ||
Type = oneshot | ||
User = root | ||
Group = root | ||
ExecStart = /usr/lib/nvidia/sriov-manage -e %i | ||
# Give a reasonable amount of time for the server to start up/shut down | ||
TimeoutSec = 120 | ||
# This creates a specific slice which all services will operate from | ||
# The accounting options give us the ability to see resource usage | ||
# through the `systemd-cgtop` command. | ||
Slice = system.slice | ||
# Set Accounting | ||
CPUAccounting = True | ||
BlockIOAccounting = True | ||
MemoryAccounting = True | ||
TasksAccounting = True | ||
RemainAfterExit = True | ||
ExecStartPre = /usr/bin/sleep 30 | ||
|
||
[Install] | ||
WantedBy = multi-user.target |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
# Copyright Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
# Copyright Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
|
||
galaxy_info: | ||
author: CI Framework | ||
description: CI Framework Role -- edpm_nvidia_mdev_prepare | ||
company: Red Hat | ||
license: Apache-2.0 | ||
min_ansible_version: "2.14" | ||
namespace: cifmw | ||
galaxy_tags: | ||
- cifmw | ||
|
||
# List your role dependencies here, one per line. Be sure to remove the '[]' above, | ||
# if you add dependencies to this list. | ||
dependencies: [] |
33 changes: 33 additions & 0 deletions
33
roles/edpm_nvidia_mdev_prepare/molecule/default/converge.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
# Copyright Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
|
||
- name: Converge | ||
hosts: all | ||
tasks: | ||
- name: Run phase1 | ||
ansible.builtin.import_role: | ||
name: edpm_nvidia_mdev_prepare | ||
tasks_from: phase1 | ||
|
||
# Do we really need to reboot the host this way ? | ||
- name: Reboot the host | ||
ansible.builtin.reboot: | ||
|
||
- name: Run phase 2 | ||
ansible.builtin.import_role: | ||
name: edpm_nvidia_mdev_prepare | ||
tasks_from: phase2 |
11 changes: 11 additions & 0 deletions
11
roles/edpm_nvidia_mdev_prepare/molecule/default/molecule.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
# Mainly used to override the defaults set in .config/molecule/ | ||
# By default, it uses the "config_podman.yml" - in CI, it will use | ||
# "config_local.yml". | ||
log: true | ||
|
||
provisioner: | ||
name: ansible | ||
log: true | ||
env: | ||
ANSIBLE_STDOUT_CALLBACK: yaml |
21 changes: 21 additions & 0 deletions
21
roles/edpm_nvidia_mdev_prepare/molecule/default/prepare.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
# Copyright Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
|
||
- name: Prepare | ||
hosts: all | ||
roles: | ||
- role: test_deps |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
# Copyright Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
- name: Cleaning the World | ||
ansible.builtin.debug: | ||
msg: "So here edpm_nvidia_mdev_prepare should clean things up!" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
--- | ||
# Copyright Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
--- | ||
# Copyright 2024 Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
- name: Source the secret file | ||
ansible.builtin.include_vars: | ||
file: "{{ lookup('env', 'HOME') }}/secret.txt" | ||
name: cifmw_edpm_nvidia_mdev_prepare_secrets | ||
|
||
- name: Set url value from cifmw_edpm_nvidia_mdev_prepare_secrets | ||
when: | ||
- cifmw_edpm_nvidia_mdev_prepare_secrets['driver_url'] is defined | ||
ansible.builtin.set_fact: | ||
cifmw_edpm_nvidia_mdev_prepare_driver_url: >- | ||
{{ | ||
cifmw_edpm_nvidia_mdev_prepare_secrets['driver_url'] | ||
}} | ||
- name: Blacklist nouveau | ||
become: true | ||
ansible.builtin.copy: | ||
dest: "/etc/modprobe.d/blacklist-nouveau.conf" | ||
mode: "0644" | ||
content: |- | ||
blacklist nouveau | ||
options nouveau modeset=0 | ||
force: false | ||
when: | ||
- cifmw_edpm_nvidia_mdev_prepare_disable_nouveau | bool | ||
register: _blacklist_nouveau | ||
|
||
- name: Make sure that we defined the driver URL | ||
ansible.builtin.assert: | ||
that: | ||
- cifmw_edpm_nvidia_mdev_prepare_driver_url is defined | ||
- cifmw_edpm_nvidia_mdev_prepare_driver_url | length > 0 | ||
msg: "You need to set cifmw_edpm_nvidia_mdev_prepare_driver_url" | ||
|
||
- name: Gather the package facts | ||
ansible.builtin.package_facts: | ||
manager: auto | ||
|
||
- name: Install nvidia driver RPM either from path or URL | ||
become: true | ||
ansible.builtin.dnf: | ||
name: "{{ cifmw_edpm_nvidia_mdev_prepare_driver_url }}" | ||
state: present | ||
disable_gpg_check: true | ||
when: cifmw_edpm_nvidia_mdev_prepare_package_name not in ansible_facts.packages | ||
register: _nvidia_driver_install | ||
|
||
- name: Regenerate initramfs | ||
become: true | ||
ansible.builtin.command: "{{ item }}" | ||
loop: | ||
- 'dracut --force' | ||
- 'grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg' | ||
when: _blacklist_nouveau.changed or _nvidia_driver_install.changed | ||
|
||
- name: Enforce a reboot to ensure that we have the driver loaded | ||
block: | ||
- name: Create directory required by edpm-reboot role | ||
become: true | ||
ansible.builtin.file: | ||
path: /var/lib/openstack/reboot_required/ | ||
state: directory | ||
mode: "0755" | ||
- name: Create required file to enforce a reboot | ||
become: true | ||
ansible.builtin.file: | ||
path: /var/lib/openstack/reboot_required/nvidia_mdev_reboot | ||
state: touch | ||
mode: "0600" | ||
# Is the right way for asking a reboot ? | ||
- name: Call edpm_reboot role | ||
ansible.builtin.include_role: | ||
name: edpm_reboot |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
# Copyright 2024 Red Hat, Inc. | ||
# All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); you may | ||
# not use this file except in compliance with the License. You may obtain | ||
# a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
# License for the specific language governing permissions and limitations | ||
# under the License. | ||
|
||
- name: Create a systemd unit file that will enable SRIOV VFs | ||
become: true | ||
ansible.builtin.copy: | ||
dest: "/etc/systemd/system/[email protected]" | ||
mode: "0644" | ||
src: "[email protected]" | ||
force: false | ||
|
||
- name: Enable the systemd unit file | ||
become: true | ||
ansible.builtin.systemd_service: | ||
name: "nvidia-sriov-manage@{{ item }}.service" | ||
enabled: true | ||
state: started | ||
loop: "{{ cifmw_edpm_nvidia_mdev_prepare_sriov_devices }}" |
Oops, something went wrong.