-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazon linux support #127
base: main
Are you sure you want to change the base?
Amazon linux support #127
Conversation
f6096c6
to
57fdf0b
Compare
0a5abb1
to
7cb0b27
Compare
dedd781
to
31a2314
Compare
6d553ac
to
bc5998b
Compare
426a2cf
to
49b283c
Compare
eb0282a
to
c46bcaf
Compare
3d7b0bb
to
d179a93
Compare
74b2554
to
539af6c
Compare
@cdesiniotis and @tariq1890 PTAL |
amzn2023/Dockerfile
Outdated
# due to cuda repo cache issue , nvidia-fabric-manager refers to 565 version only | ||
# install fabric-manager and nvidia-nscq | ||
RUN if [ "$DRIVER_TYPE" != "vgpu" ] && [ "$TARGETARCH" != "arm64" ]; then \ | ||
dnf install -y nvidia-fabric-manager libnvidia-nscq-${DRIVER_BRANCH}-${DRIVER_VERSION}-1; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't commit this as-is. I just looked at the packages uploaded here, so the following should work
dnf install -y nvidia-fabric-manager libnvidia-nscq-${DRIVER_BRANCH}-${DRIVER_VERSION}-1; fi | |
dnf install -y nvidia-fabricmanager-${DRIVER_BRANCH}-${DRIVER_VERSION}-1 libnvidia-nscq-${DRIVER_BRANCH}-${DRIVER_VERSION}-1; fi |
Note that you have nvidia-fabric-manager
, when it should be nvidia-fabricmanager*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed 560.35.03 . 560.35.03 fabric manager not available
dnf list *fabric*
24.23 cuda-drivers-fabricmanager.x86_64 565.57.01-1 cuda-amzn2023-x86_64
24.23 cuda-drivers-fabricmanager-555.x86_64 555.42.06-1 cuda-amzn2023-x86_64
24.23 cuda-drivers-fabricmanager-560.x86_64 560.35.03-1 cuda-amzn2023-x86_64
24.23 cuda-drivers-fabricmanager-565.x86_64 565.57.01-1 cuda-amzn2023-x86_64
24.23 libfabric.x86_64 1.14.0-2.amzn2023.0.2 amazonlinux
24.23 libfabric-devel.x86_64 1.14.0-2.amzn2023.0.2 amazonlinux
24.23 nvidia-fabric-manager.x86_64 565.57.01-1 cuda-amzn2023-x86_64
24.23 nvidia-fabric-manager-devel.x86_64 565.57.01-1 cuda-amzn2023-x86_64
added conditional check for both the packages and installation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed if condition and added installation of nvidia-fabric-manager-${DRIVER_VERSION}-1
400da5c
to
ed626af
Compare
PTAL @cdesiniotis @tariq1890 |
amzn2023/Dockerfile
Outdated
# Initialize the fabric manager package variable | ||
FABRIC_PACKAGE=""; \ | ||
if dnf list nvidia-fabric-manager-${DRIVER_VERSION}-1 &>/dev/null; then \ | ||
FABRIC_PACKAGE="nvidia-fabric-manager-${DRIVER_VERSION}-1"; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/x86_64/, the only fabric manager packages available are named nvidia-fabric-manager-${DRIVER_VERSION}-1
. Let's remove the conditional here and always use that package name. If the name ever changes, our builds will break and we will know right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed if condition and added installation of nvidia-fabric-manager-${DRIVER_VERSION}-1
if [ -f /sys/module/nvidia_fs/refcnt ]; then | ||
nvidia_fs_refs=$(< /sys/module/nvidia_fs/refcnt) | ||
rmmod_args+=("nvidia-fs") | ||
((++nvidia_deps)) | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change required? We have a separate sidecar container for loading / unloading nvidia-fs.
Signed-off-by: shiva kumar <[email protected]>
ed626af
to
de77660
Compare
No description provided.