Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set cos_gpu_installer image reference as part image build steps #518

Open
wants to merge 8 commits into
base: cs_gpu
Choose a base branch
from

Conversation

meetrajvala
Copy link

@meetrajvala meetrajvala commented Dec 23, 2024

Currently cos_gpu_installer image reference is being fetch at runtime using cos-extensions list command. This PR makes required changes to set it as part of build steps and stores the image reference value in file. It also updates the required logger references after latest pull from main branch.

@meetrajvala
Copy link
Author

/gcbrun

@meetrajvala
Copy link
Author

/gcbrun

Comment on lines +23 to +25
set_cos_gpu_installer_image_reference() {
cos-extensions list -- --gpu-installer >> "${CS_PATH}"/"${GPU_INSTALLER_IMAGE_REF}"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about an alternative approach:
You're basically sharing a config to the launcher process, is it possible to pass it as a kernel cmd arg instead of writing to a file? This also makes gpu installation more measurable in some sense. Awaiting for @alexmwu and @jkl73 's inputs on this.

Copy link
Contributor

@jkl73 jkl73 Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to pass it as a kernel cmd arg instead

This means we determine the version of gpu installer during the image building time. This is my original thought as well, but I remember @meetrajvala said there was a reason to do this during the runtime, because the version may change? @meetrajvala could you elaborate more here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yawangwang Command line will have length constraint of 4096 chars, but currently we are at less than 50% (~1900 chars) so it should not be a big concern. But is it ok to add config required only for userspace program (and not for the customizing the kernel behavior) to kernel command line ? I agree that it would make the gpu driver installation bit measurable in some sense, though current implementation also addresses it by putting file under oem partition which will be sealed in the later step.

@jkl73 earlier we were discussing about hardcoding the installer version, which would not be very appropriate as we need to keep updating installer version when base COS image changes (via image-family flag). But now we are deriving it using the cos-extensions command (which would always return the specific supported installer image reference corresponds to current base COS image), so we are good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have existing examples of adding userspace program configs to kernel command line (e.g., systemd, launcher), so it should be ok to add gpu installer config as well. Also the main purpose of storing files under oem partition is to encrypt/protect it using dm-crypt from tampering by the operator. IMO it doesn't provide a very explicit way of measurement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole oem partition is sealed and measured, and its hash is part of the cmdline. So I think technically the gpu ref file is still measured and bind to the specific CS image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your info. My original thought was to provide an explicit way of measurement (the presence of gpu installer ref in kernel cmd), but as you pointed out, since the hash of oem partition is also measured into kernel cmd, I'm also okay with the current approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to know exactly what installer we run for a given version of CS. Explicit measurement is not required, since we won't surface what version of the installer we are using to customers. It's mostly nice to have for debugging purposes, but I don't see a huge reason to change this implementation.

Copy link
Contributor

@alexmwu alexmwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also pin the GPU installer image digest per-image (similar to how you pin the ref in the OEM partition), since we need to verify the image hasn't changed since we don't own the installer repo. Then we should check the digest matches when we pull it down at boot time.

launcher/internal/gpu/driverinstaller.go Show resolved Hide resolved
launcher/internal/gpu/driverinstaller.go Show resolved Hide resolved
launcher/internal/gpu/driverinstaller.go Show resolved Hide resolved
launcher/internal/gpu/driverinstaller.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants