Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sccache for CTK 11.1 and properly track compilations in stats #2285

Merged
merged 16 commits into from
Dec 3, 2024

Conversation

trxcllnt
Copy link
Contributor

@trxcllnt trxcllnt commented Nov 8, 2024

This PR has some fixes I neglected to add to #2247.

  1. nvcc in CUDA toolkit v11.1 didn't add the -D__CUDA_ARCH_LIST__= definition, so 5271494 expands the list of defines that indicate an nvcc host compiler invocation.
  2. f160a0a and 1591089 report compilation type (local or dist) and duration for forced-no-cache, forced-recache, and compilation failures. It also counts and reports total compilations performed, not just compilations due to cache misses.
  3. ccfc60b ensures compilations with --verbose are never dist-compiled, since the verbose output is parsed by tools like CMake and must reflect the local toolchain.
  4. bdaf35e adds more clang flags so using clang as a CUDA compiler with -Xclang doesn't fail

Question for @sylvestre related to the last point -- do you know which bits of the clang toolchain (or CTK?) sccache should package when using clang as a device compiler? I am seeing errors like the following when attempting to dist-compile with ClangCUDA, but I'm not sure which files define the __nvvm_* symbols:

In file included from build/libcudacxx/test/internal_headers/headers/__barrier_async_contract_fulfillment.h.cu:1:
In file included from <built-in>:1:
In file included from /usr/lib/llvm-18/lib/clang/18/include/__clang_cuda_runtime_wrapper.h:73:
/usr/lib/llvm-18/lib/clang/18/include/__clang_cuda_builtin_vars.h:53:180: error: use of undeclared identifier '__nvvm_read_ptx_sreg_tid_x'
   53 |   __declspec(property(get = __fetch_builtin_x)) unsigned int x; static inline __attribute__((always_inline)) __attribute__((device)) unsigned int __fetch_builtin_x(void) { return __nvvm_read_ptx_sreg_tid_x(); };
...

@sylvestre
Copy link
Collaborator

sorry, i don't know

…rsed by inputs/outputs even if the preprocessor, cicc, and ptxas commands are out of order.
@trxcllnt trxcllnt force-pushed the fix/cuda11.1-and-stats branch from 30a1a7f to b1acd9d Compare November 14, 2024 01:27
@trxcllnt
Copy link
Contributor Author

trxcllnt commented Nov 14, 2024

This doesn't seem to be an issue with sccache. It appears clang can't compile its own preprocessor output:

#!/usr/bin/env bash

# Basic CUDA example from https://godbolt.org/
cat <<EOF >/tmp/test.cu
__global__ void square(int* array, int n) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid < n)
        array[tid] = array[tid] * array[tid];
}
EOF

# Preprocess
clang++ -x cuda -E --cuda-gpu-arch=sm_80 --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version /tmp/test.cu > /tmp/test.cui

# Compile (fails)
clang++ -x cuda-cpp-output --cuda-gpu-arch=sm_80 --cuda-path=/usr/local/cuda -Wno-unknown-cuda-version -o /tmp/test.cu.o /tmp/test.cui

@trxcllnt
Copy link
Contributor Author

cc: @robertmaynard for review

@trxcllnt trxcllnt force-pushed the fix/cuda11.1-and-stats branch from 3b0ae6c to 5469482 Compare November 22, 2024 21:40
@trxcllnt trxcllnt changed the title Fix sccache for CTK 11.1 and properly track compilations in stats [do not merge] Fix sccache for CTK 11.1 and properly track compilations in stats Nov 25, 2024
@trxcllnt trxcllnt force-pushed the fix/cuda11.1-and-stats branch from 02f84c0 to ac6372a Compare November 26, 2024 23:10
@trxcllnt trxcllnt changed the title [do not merge] Fix sccache for CTK 11.1 and properly track compilations in stats Fix sccache for CTK 11.1 and properly track compilations in stats Nov 27, 2024
@trxcllnt
Copy link
Contributor Author

This PR is ready to merge.

@sylvestre
Copy link
Collaborator

Given that the initial implementation had issues and you are fixing them here, could you please add tests to cover these missing cases?
thanks

@trxcllnt trxcllnt force-pushed the fix/cuda11.1-and-stats branch from 6b9faab to e7b529e Compare November 29, 2024 22:17
@trxcllnt
Copy link
Contributor Author

@sylvestre I added tests to ensure -v|--verbose and clang-cuda compilations aren't dist-compiled, and added CUDA Toolkit v11.1 to the CI jobs to ensure we accommodate nvcc changes introduced between CTK v11.1 and v11.8.

CTK v11.1 is the oldest version we test in CUDA Core Compute Libraries. I wouldn't mind testing earlier versions, but CTK v10.2 is only available up to ubuntu18.04, GH has removed ubuntu18.04 runners, so we'd have to containerize the test jobs.

@sylvestre sylvestre merged commit a1c33d4 into mozilla:main Dec 3, 2024
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants