Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new branch with linux header changes #65

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JunAr7112
Copy link

new branch with linux header changes

Copy link
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JunAr7112 for the progress on this. I left some comments. I mainly have some questions around how to accurately determine whether we need to install these packages or not. Ideally we would have a robust way to determine this and avoid using / introducing new flags / environment variables for this feature.

Comment on lines +76 to +79
if [ ! -d "/usr/src/linux-headers-$(uname -r)/" ]; then
echo "Installing Linux kernel headers..."
apt-get -qq install --no-install-recommends linux-headers-${KERNEL_VERSION} > /dev/null
fi
Copy link
Contributor

@cdesiniotis cdesiniotis Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An observation -- if we mount /usr/src from the host AND /usr/src/linux-headers-<kernel-version> does not exist, meaning the linux headers are not available on the host, then we end up installing the headers both in the container AND on the host at /usr/src/linux-headers-<kernel-version>. I am not sure if this is desirable.

Comment on lines -79 to -80
apt-get -qq download linux-image-${KERNEL_VERSION} && dpkg -x linux-image*.deb .
{ apt-get -qq download linux-modules-${KERNEL_VERSION} && dpkg -x linux-modules*.deb . || true; } 2> /dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the linux-headers package, shouldn't we have a conditional here along the lines of "if linux-image / linux-modules packages are not install, then install them"? I am uncertain on the best way to determine whether we need to install them or not...

local tmp_dir=$(mktemp -d)

trap "rm -rf ${tmp_dir}" EXIT
cd ${tmp_dir}

if [ "${PERSIST_DRIVER}" = false ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should not be relying on the PERSIST_DRIVER environment variable as that feature is not introduced in this PR.

local tmp_dir=$(mktemp -d)

trap "rm -rf ${tmp_dir}" EXIT
cd ${tmp_dir}

if [ "${PERSIST_DRIVER}" = false ]; then
rm -rf /lib/modules/${KERNEL_VERSION}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why cleaning up this directory is required on every driver container instantiation. @shivamerla @tariq1890 would you happen to know why? If we can remove this, then we no longer need this conditional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On container startup, if we do not mount /lib/modules or /usr/src explicitly from the host, both of these directories will be empty inside of the container. As an example:

$ kubectl exec -n gpu-operator -it nvidia-driver-daemonset-7qwn5 -- sh
# ls -ltr /lib/modules/
ls: cannot access '/lib/modules/': No such file or directory
# ls -ltr /usr/src/
total 0

Based on this observation, what are your thoughts on my below proposal?

  1. Remove the following command from our script as it appears to not be needed: rm -rf /lib/modules/${KERNEL_VERSION}
  2. Only install the linux-modules package if /lib/modules/${KERNEL_VERSION} directory does not exist.

Comment on lines +90 to +92
mv lib/modules/${KERNEL_VERSION}/modules.* /lib/modules/${KERNEL_VERSION}
mv lib/modules/${KERNEL_VERSION}/kernel /lib/modules/${KERNEL_VERSION}
depmod ${KERNEL_VERSION}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, these commands won't work as they require that the linux-image and linux-modules deb packages were downloaded and extracted locally first.

Comment on lines -87 to -90
ls -1 boot/vmlinuz-* | sed 's/\/boot\/vmlinuz-//g' - > version
if [ -z "$(<version)" ]; then
echo "Could not locate Linux kernel version string" >&2
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivamerla after reviewing this again, do we even really require to install the linux-image-$KERNEL_VERSION package? It seems like we only use it to construct the kernel-version string. However, the kernel version string is already assumed to be set correctly in the KERNEL_VERSION environment variable before we ever reach this point in the script.

Based on my understanding, I think we can remove this code block entirely in all cases and never install the linux-image-$KERNEL_VERSION package.

Comment on lines +89 to +92
if [ "${PERSIST_DRIVER}" = false ]; then
mv lib/modules/${KERNEL_VERSION}/modules.* /lib/modules/${KERNEL_VERSION}
mv lib/modules/${KERNEL_VERSION}/kernel /lib/modules/${KERNEL_VERSION}
depmod ${KERNEL_VERSION}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my comment https://github.com/NVIDIA/gpu-driver-container/pull/65/files#r1681872260 I would recommend the following change:

Suggested change
if [ "${PERSIST_DRIVER}" = false ]; then
mv lib/modules/${KERNEL_VERSION}/modules.* /lib/modules/${KERNEL_VERSION}
mv lib/modules/${KERNEL_VERSION}/kernel /lib/modules/${KERNEL_VERSION}
depmod ${KERNEL_VERSION}
if [ ! -d "/lib/modules/${KERNEL_VERSION}" ]; then
{ apt-get -qq download linux-modules-${KERNEL_VERSION} && dpkg -x linux-modules*.deb . || true; } 2> /dev/null
mv lib/modules/${KERNEL_VERSION}/modules.* /lib/modules/${KERNEL_VERSION}
mv lib/modules/${KERNEL_VERSION}/kernel /lib/modules/${KERNEL_VERSION}
depmod ${KERNEL_VERSION}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants