Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression build failing with modern CUDA #24

Open
tgrogers opened this issue Jan 23, 2025 · 0 comments
Open

Regression build failing with modern CUDA #24

tgrogers opened this issue Jan 23, 2025 · 0 comments
Assignees

Comments

@tgrogers
Copy link
Contributor

tgrogers commented Jan 23, 2025

Now that regressions are setup, we need to get them to pass with workloads of importance.
I have prioritized the following workloads to start:

all: GPU_Microbenchmark microbench rodinia_2.0-ft mnist_cudnn cutlass mlperf_inference mlperf_training rodinia-3.1 pannotia proxy-apps heterosync ispass-2009 lonestargpu-2.0 polybench custom_apps

Right now, I think all of the following are failing:

mnist_cudnn cutlass mlperf_inference mlperf_training

Docker makes it much easier to reproduce the errors - what the regressions system does is listed here:

https://github.com/accel-sim/gpu-app-collection/blob/dev/.github/workflows/test-build.yml

It starts with the docker image

nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04

You can test this on any machine that has docker installed. You don't even need a GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants