CUDA driver device support does not match toolkit #70

Cody-G · 2018-03-10T15:52:04Z

continued from JuliaGPU/CUDAnative.jl#141

I ran the build script with TRACE=true and I'm getting this:

julia> Pkg.build("CUDAnative")
INFO: Building LLVM
DEBUG: Performing package build for LLVM.jl from /home/cody/.julia/reg_and_seg/v0.6/LLVM/deps
DEBUG: Discovering LLVM libraries in /home/cody/src/julia_06/usr/bin/../lib/julia, and configs in /home/cody/src/julia_06/usr/bin, /home/cody/src/julia_06/usr/bin/../tools
TRACE: Looking for libllvm in /home/cody/src/julia_06/usr/bin/../lib/julia
TRACE: Looking for llvm-config in /home/cody/src/julia_06/usr/bin
TRACE: Looking for llvm-config in /home/cody/src/julia_06/usr/bin/../tools
TRACE: - 3.9.1 at /home/cody/src/julia_06/usr/bin/../tools/llvm-config
TRACE: Looking for libllvm in /home/cody/src/julia_06/usr/lib
TRACE: - v3.9.1 at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so
TRACE: - v3.9.1 at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.so
DEBUG: Discovered LLVM toolchains: 3.9.1 at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so
DEBUG: Discovering LLVM libraries in , and configs in /home/cody/julia/bin, /home/cody/xmrig/build, /home/cody/src/julia/deps/build, /home/cody/src/antsbin/bin, , /home/cody/bin, /home/cody/.local/bin, /home/cody/bin, /home/cody/.local/bin, /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, /bin, /usr/games, /usr/local/games, /snap/bin/usr/local/cuda-7.5/bin
TRACE: Looking for llvm-config in /home/cody/julia/bin
TRACE: Looking for llvm-config in /home/cody/xmrig/build
TRACE: Looking for llvm-config in /home/cody/src/antsbin/bin
TRACE: Looking for llvm-config in /usr/local/sbin
TRACE: Looking for llvm-config in /usr/local/bin
TRACE: Looking for llvm-config in /usr/sbin
TRACE: Looking for llvm-config in /usr/bin
TRACE: - 4.0.1 at /usr/bin/llvm-config-4.0
TRACE: Looking for libllvm in /usr/lib/llvm-4.0/lib
TRACE: - v4.0.1 at /usr/lib/llvm-4.0/lib/libLLVM-4.0.1.so
TRACE: - v4.0.1 at /usr/lib/llvm-4.0/lib/libLLVM-4.0.so
TRACE: Looking for llvm-config in /sbin
TRACE: Looking for llvm-config in /bin
TRACE: Looking for llvm-config in /usr/games
TRACE: Looking for llvm-config in /usr/local/games
DEBUG: Discovered LLVM toolchains: 4.0.1 at /usr/lib/llvm-4.0/lib/libLLVM-4.0.1.so
DEBUG: Selecting LLVM from libraries 3.9.1 at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so (bundled: true), 4.0.1 at /usr/lib/llvm-4.0/lib/libLLVM-4.0.1.so (bundled: false) and wrappers 3.9.0, 4.0.0
DEBUG: Selected LLVM 3.9.1 at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so (bundled: true)
DEBUG: Selecting wrapper for 3.9.1 at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so (bundled: true) out of wrappers 3.9.0, 4.0.0
DEBUG: Selected wrapper 3.9 for LLVM 3.9.1 at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so (bundled: true)
DEBUG: Checking validity of existing ext.jl...
INFO: LLVM.jl has already been built for this toolchain, no need to rebuild
INFO: Building CUDAdrv
INFO: Building CUDAnative
TRACE: LLVM.jl is running in trace mode, this will generate a lot of additional output
DEBUG: Checking validity of bundled library at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so
config[:llvm_version] = LLVM.version() = v"3.9.1"
version = v"3.9.1"
target_support = CUDAapi.devices_for_llvm(version) = Set(VersionNumber[v"3.7.0", v"6.2.0", v"6.0.0", v"5.2.0", v"3.5.0", v"5.0.0", v"3.0.0", v"5.3.0", v"6.1.0", v"2.0.0", v"2.1.0", v"3.2.0"])
DEBUG: Dropping down to post-finalizer I/O

Note that I added a few @show statements at the end to show the version and target support info retrieved by the script.

In case it's useful here's my ext.jl

# autogenerated file, do not edit
const ptx_support = VersionNumber[v"3.2.0", v"4.0.0", v"4.1.0", v"4.2.0", v"4.3.0"]
const llvm_version = v"3.9.1"
const cuda_driver_version = v"9.0.0"
const julia_llvm_version = v"3.9.1"
const cuda_toolkit_version = v"7.5.17"
const cuobjdump = "/usr/local/cuda-7.5/bin/cuobjdump"
const julia_version = v"0.6.2"
const target_support = VersionNumber[v"3.0.0", v"3.2.0", v"3.5.0", v"3.7.0", v"5.0.0", v"5.2.0", v"5.3.0"]
const ptxas = "/usr/local/cuda-7.5/bin/ptxas"
const configured = true
const libdevice = Dict(v"3.0.0"=>"/usr/local/cuda-7.5/nvvm/libdevice/libdevice.compute_30.10.bc",v"3.5.0"=>"/usr/local/cuda-7.5/nvvm/libdevice/libdevice.compute_35.10.bc",v"5.0.0"=>"/usr/local/cuda-7.5/nvvm/libdevice/libdevice.compute_50.10.bc")

The text was updated successfully, but these errors were encountered:

maleadt · 2018-03-10T16:11:11Z

const cuda_driver_version = v"9.0.0"

Your CUDA driver is way to recent, and its CUDA compatibility level does not support sm_20.

maleadt · 2018-03-10T16:20:40Z

To elaborate a little more, we use the CUDA driver to compile PTX code to GPU assembly, whereas nvcc does this upfront. So even though your CUDA 7.5 set-up might work fine with CUDA C, it doesn't with CUDAnative because of us JIT compiling code with the driver.

Cody-G · 2018-03-10T16:23:13Z

We have the 384.11 driver, installed with ubuntu's package manager and visible when calling nvidia-smi. The 384 is still the recommended driver for the card on nvidia's website:

http://www.nvidia.com/download/driverResults.aspx/122825/en-us

Moreover as long as the driver supports the GPU there's generally no maximum driver version enforced for CUDA. This answer makes that general statement and also a specific one about our card's Fermi architecture.

I'm not sure I understand...the driver can compile without nvcc? In any case, how would I find which driver version should work? Thanks!

maleadt · 2018-03-10T16:36:39Z

the driver can compile without nvcc

Yeah, that's how CUDAnative works. Generate PTX code, let the driver JIT-compile it to SASS assembly.

Anyway, I don't have a sm_20 device (or any unsupported one, for that matter), so I can't really test this. Could you run the following code?

using CUDAnative
kernel() = nothing
mod, entry = CUDAnative.irgen(kernel, Tuple{})
entry = CUDAnative.promote_kernel!(mod, entry, Tuple{})
CUDAnative.optimize!(mod, entry, v"2.0")
ptx = CUDAnative.mcgen(mod, entry, v"2.0")

using CUDAdrv
dev = CuDevice(0)
ctx = CuContext(dev)
cumod = CuModule(ptx)

Cody-G · 2018-03-10T16:44:12Z

Yeah, that's how CUDAnative works

That's really cool! I'm no GPU expert, so I didn't know that was possible. I should read your paper :-)

Here's the output. It seems to have worked except that promote_kernel! is undefined.

julia> using CUDAnative
TRACE: LLVM.jl is running in trace mode, this will generate a lot of additional output
DEBUG: Checking validity of bundled library at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so

julia> kernel() = nothing
kernel (generic function with 1 method)

julia> mod, entry = CUDAnative.irgen(kernel, Tuple{})
(source_filename = "kernel"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

define void @julia_kernel_63254() #0 !dbg !4 {
top:
  %0 = call i8**** @jl_get_ptls_states()
  %1 = bitcast i8**** %0 to i8***
  %2 = getelementptr i8**, i8*** %1, i64 3
  %3 = bitcast i8*** %2 to i64**
  %4 = load i64*, i64** %3, !tbaa !6
  ret void, !dbg !9
}

declare i8**** @jl_get_ptls_states()

attributes #0 = { "no-frame-pointer-elim"="true" }

!llvm.module.flags = !{!0}
!llvm.dbg.cu = !{!1}

!0 = !{i32 1, !"Debug Info Version", i32 3}
!1 = distinct !DICompileUnit(language: DW_LANG_C89, file: !2, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !3)
!2 = !DIFile(filename: "REPL[2]", directory: ".")
!3 = !{}
!4 = distinct !DISubprogram(name: "kernel", linkageName: "julia_kernel_63254", scope: null, file: !2, type: !5, isLocal: false, isDefinition: true, isOptimized: true, unit: !1, variables: !3)
!5 = !DISubroutineType(types: !3)
!6 = !{!7, !7, i64 0, i64 1}
!7 = !{!"jtbaa_const", !8, i64 0}
!8 = !{!"jtbaa"}
!9 = !DILocation(line: 1, scope: !4)
, 
define void @julia_kernel_63254() #0 !dbg !4 {
top:
  %0 = call i8**** @jl_get_ptls_states()
  %1 = bitcast i8**** %0 to i8***
  %2 = getelementptr i8**, i8*** %1, i64 3
  %3 = bitcast i8*** %2 to i64**
  %4 = load i64*, i64** %3, !tbaa !6
  ret void, !dbg !9
}
)

julia> entry = CUDAnative.promote_kernel!(mod, entry, Tuple{})
ERROR: UndefVarError: promote_kernel! not defined

julia> CUDAnative.optimize!(mod, entry, v"2.0")
true

julia> ptx = CUDAnative.mcgen(mod, entry, v"2.0")
"//\n// Generated by LLVM NVPTX Back-End\n//\n\n.version 3.2\n.target sm_20\n.address_size 64\n\n\t.file\t1 \"./REPL[2]\"\n\t// .globl\tjulia_kernel_63254\n\n.visible .entry julia_kernel_63254()\n{\n\t.reg .s32 \t%r<2>;\n\n\t.loc 1 1 0\n\tret;\n}\n\n\n"

julia> using CUDAdrv

julia> dev = CuDevice(0)
CuDevice(0): Tesla M2090

julia> ctx = CuContext(dev)
CUDAdrv.CuContext(Ptr{Void} @0x0000560940862980, true, true)

julia> cumod = CuModule(ptx)
CUDAdrv.CuModule(Ptr{Void} @0x0000560940af8660, CUDAdrv.CuContext(Ptr{Void} @0x0000560940862980, true, true))

maleadt · 2018-03-10T16:47:58Z

Ah, so cuDriverGetVersion returning 9.0 apparently doesn't imply its JIT compiler supporting the same devices as CUDA toolkit 9.0... Fun times.

Cody-G · 2018-03-10T16:53:32Z

Poor choice of version names by nvidia! In case it's useful here's a snippet from my deviceQuery cuda sample:

Device 2: "Tesla M2090"
  CUDA Driver Version / Runtime Version          9.0 / 7.5
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5301 MBytes (5558763520 bytes)
  (16) Multiprocessors, ( 32) CUDA Cores/MP:     512 CUDA Cores
  GPU Max Clock rate:                            1301 MHz (1.30 GHz)
  Memory Clock rate:                             1848 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 129 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla M2090 (GPU0) -> Tesla M2090 (GPU1) : Yes
> Peer access from Tesla M2090 (GPU0) -> Tesla M2090 (GPU2) : No
> Peer access from Tesla M2090 (GPU1) -> Tesla M2090 (GPU0) : Yes
> Peer access from Tesla M2090 (GPU1) -> Tesla M2090 (GPU2) : No
> Peer access from Tesla M2090 (GPU2) -> Tesla M2090 (GPU0) : No
> Peer access from Tesla M2090 (GPU2) -> Tesla M2090 (GPU1) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 7.5, NumDevs = 3, Device0 = Tesla M2090, Device1 = Tesla M2090, Device2 = Tesla M2090
Result = PASS

maleadt · 2018-03-10T16:56:14Z

Thanks. I'm not sure I'll be able to fix this quickly though, so you better edit your ext.jl in order to support your device.

Cody-G · 2018-03-10T16:57:28Z

I'll do that, thanks! Just let me know if you want me to test anything else.

Cody-G · 2018-03-10T17:12:25Z

After editing ext.jl all of the tests pass. There are however some early warnings and errors worth mentioning:

julia> Pkg.test("CUDAnative")
INFO: Testing CUDAnative
TRACE: LLVM.jl is running in trace mode, this will generate a lot of additional output
DEBUG: Checking validity of bundled library at /home/cody/src/julia_06/usr/lib/libLLVM-3.9.1.so
WARNING: Encountered incompatible LLVM IR for codegen_ref_nonexisting() at capability 2.0.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_get_binding_or_error",   %2 = tail call i8** @jl_get_binding_or_error(i8** inttoptr (i64 140455442071568 to i8**), i8** inttoptr (i64 140455441112520 to i8**)), !dbg !12))
WARNING: Encountered incompatible LLVM IR for codegen_ref_nonexisting() at capability 2.0.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_undefined_var_error",   tail call void @jl_undefined_var_error(i8** inttoptr (i64 140455441112520 to i8**)), !dbg !12))
WARNING: Encountered incompatible LLVM IR for codegen_call_nonexisting() at capability 2.0.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_get_binding_or_error",   %11 = call i8** @jl_get_binding_or_error(i8** inttoptr (i64 140455442071568 to i8**), i8** inttoptr (i64 140455441112520 to i8**)), !dbg !15))
WARNING: Encountered incompatible LLVM IR for codegen_call_nonexisting() at capability 2.0.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_undefined_var_error",   call void @jl_undefined_var_error(i8** inttoptr (i64 140455441112520 to i8**)), !dbg !15))
WARNING: Encountered incompatible LLVM IR for codegen_call_nonexisting() at capability 2.0.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic",   %17 = call i8** @jl_apply_generic(i8*** %1, i32 1), !dbg !15))
INFO: Testing using device Tesla M2090
ERROR: MethodError: no method matching start(::Base.DevNullStream)
Closest candidates are:
  start(::SimpleVector) at essentials.jl:258
  start(::Base.MethodList) at reflection.jl:560
  start(::ExponentialBackOff) at error.jl:107
  ...
ERROR: MethodError: no method matching start(::Base.DevNullStream)
Closest candidates are:
  start(::SimpleVector) at essentials.jl:258
  start(::Base.MethodList) at reflection.jl:560
  start(::ExponentialBackOff) at error.jl:107
  ...
ERROR: MethodError: no method matching start(::Base.DevNullStream)
Closest candidates are:
  start(::SimpleVector) at essentials.jl:258
  start(::Base.MethodList) at reflection.jl:560
  start(::ExponentialBackOff) at error.jl:107
  ...
ERROR: MethodError: no method matching start(::Base.DevNullStream)
Closest candidates are:
  start(::SimpleVector) at essentials.jl:258
  start(::Base.MethodList) at reflection.jl:560
  start(::ExponentialBackOff) at error.jl:107

maleadt · 2018-03-13T13:34:31Z

There are however some early warnings and errors worth mentioning:

Fixed in JuliaGPU/CUDAnative.jl@f372132, tagging a release now.

maleadt · 2023-08-17T05:43:35Z

The whole mechanism for loading CUDA toolkits has been significantly updated since this issue, so I think we can close this.

Cody-G changed the title ~~Compute cabability 2.0 not supported with CUDA toolkit >= 8.0~~ Compute cabability 2.0 not supported with CUDA toolkit <= 8.0 Mar 10, 2018

Cody-G changed the title ~~Compute cabability 2.0 not supported with CUDA toolkit <= 8.0~~ Compute capability 2.0 not supported with CUDA toolkit <= 8.0 Mar 10, 2018

maleadt closed this as completed Mar 10, 2018

maleadt reopened this Mar 10, 2018

maleadt changed the title ~~Compute capability 2.0 not supported with CUDA toolkit <= 8.0~~ CUDA driver 9.0 _does_ support sm_2.0 Mar 10, 2018

maleadt changed the title ~~CUDA driver 9.0 _does_ support sm_2.0~~ CUDA driver device support does not match toolkit Nov 22, 2018

maleadt transferred this issue from JuliaGPU/CUDAnative.jl May 27, 2020

maleadt added the installation CUDA is easy to install, right? label May 27, 2020

maleadt closed this as completed Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA driver device support does not match toolkit #70

CUDA driver device support does not match toolkit #70

Cody-G commented Mar 10, 2018

maleadt commented Mar 10, 2018

maleadt commented Mar 10, 2018

Cody-G commented Mar 10, 2018

maleadt commented Mar 10, 2018

Cody-G commented Mar 10, 2018

maleadt commented Mar 10, 2018

Cody-G commented Mar 10, 2018

maleadt commented Mar 10, 2018 •

edited

Loading

Cody-G commented Mar 10, 2018

Cody-G commented Mar 10, 2018

maleadt commented Mar 13, 2018

maleadt commented Aug 17, 2023

CUDA driver device support does not match toolkit #70

CUDA driver device support does not match toolkit #70

Comments

Cody-G commented Mar 10, 2018

maleadt commented Mar 10, 2018

maleadt commented Mar 10, 2018

Cody-G commented Mar 10, 2018

maleadt commented Mar 10, 2018

Cody-G commented Mar 10, 2018

maleadt commented Mar 10, 2018

Cody-G commented Mar 10, 2018

maleadt commented Mar 10, 2018 • edited Loading

Cody-G commented Mar 10, 2018

Cody-G commented Mar 10, 2018

maleadt commented Mar 13, 2018

maleadt commented Aug 17, 2023

maleadt commented Mar 10, 2018 •

edited

Loading