Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. #1948

RomeoV · 2023-06-11T17:24:59Z

This fixes #1947.

RomeoV · 2023-06-11T17:30:45Z

Here's a small script that shows the loading and saving

import Pkg;
Pkg.activate("@CUDAPerfCachingTest")
# Setup:
# I don't think you can link to the cuDNN.jl module within CUDA.jl directly, so
# you'll have to clone github.com/romeov/CUDA.jl and then link
# Pkg.develop(path="<local>/romeov/CUDA.jl/lib/cudnn")
# Also
# Pkg.add("Flux")
# Pkg.add("JLD2")
#
# Execute e.g. with `julia caching_test.jl save` or `julia caching_test.jl load` or just `julia caching_test.jl`
using Flux, JLD2
import Flux.Zygote: gradient

function load_conv_caches!(; cudnn_mod::Module=Flux.cuDNN, filename="/tmp/conv_cache.jld2")
    @info "Loading conv_cache."
    conv_data_cache = JLD2.load(filename, "conv_data_cache");
    push!(cudnn_mod.cudnnConvolutionBwdDataAlgoPerfCache,
          conv_data_cache...)
    conv_filter_cache = JLD2.load(filename, "conv_filter_cache");
    push!(cudnn_mod.cudnnConvolutionBwdFilterAlgoPerfCache,
          conv_filter_cache...)
end

function save_conv_caches(; cudnn_mod::Module=Flux.cuDNN, filename="/tmp/conv_cache.jld2")
    @info "Storing conv_cache."
    JLD2.save(filename,
              "conv_data_cache", cudnn_mod.cudnnConvolutionBwdDataAlgoPerfCache,
              "conv_filter_cache", cudnn_mod.cudnnConvolutionBwdFilterAlgoPerfCache,
    )
end



if "load" in ARGS
    load_conv_caches!()
end

model = Chain(Conv((3, 3), 3=>64, relu; pad=SamePad()),
              Conv((3, 3), 64=>32, relu),
              GlobalMeanPool(),
              Flux.flatten,
              Dense(32=>1))

x = rand(Float32, 32, 32, 3, 7);

let x = gpu(x),
    model = gpu(model),
    ps = Flux.params(model)

  t0 = time()
  gradient(ps) do
      model(x) |> sum
  end
  @info "done in $(time() - t0) seconds :)"
end;
("load" in ARGS) && @show length(Flux.cuDNN.cudnnConvolutionBwdDataAlgoPerfCache)

if "save" in ARGS
    save_conv_caches()
end

ToucheSir · 2023-06-11T22:09:26Z

lib/cudnn/src/convolution.jl

+# Helper fct to recover cudnn descriptor tuples from cudnn descriptor pointers
+# so that we can cache algorithms based on data descriptors.
+# Actually just reverses the cache dict and returns the descriptor as a tuple.
+map_cudnn_ptr_to_jl_tuple(cache_dict, desc_ptr) = Dict(zip(values(cache_dict),
+                                                           keys(cache_dict)))[desc_ptr]


Instead of recreating the cache in reversed form and searching it every time (expensive!), CUDA.jl provides functions for pulling out the info from a descriptor (cheap!). See

CUDA.jl/lib/cudnn/src/tensor.jl

Lines 49 to 72 in 864ec5e

function cudnnGetTensorDescriptor(d::cudnnTensorDescriptor)

nbDimsRequested = CUDNN_DIM_MAX

dataType = Ref{cudnnDataType_t}(CUDNN_DATA_FLOAT)

nbDims = Ref{Cint}(0)

dimA = Array{Cint}(undef, CUDNN_DIM_MAX)

strideA = Array{Cint}(undef, CUDNN_DIM_MAX)

cudnnGetTensorNdDescriptor(d, nbDimsRequested, dataType, nbDims, dimA, strideA)

T = juliaDataType(dataType[])

D = (dimA[nbDims[]:-1:1]...,)

S = (strideA[nbDims[]:-1:1]...,)

return T,D,S

end

function cudnnGetFilterDescriptor(d::cudnnFilterDescriptor)

nbDimsRequested = CUDNN_DIM_MAX

dataType = Ref{cudnnDataType_t}(CUDNN_DATA_FLOAT)

format = Ref{cudnnTensorFormat_t}(CUDNN_TENSOR_NCHW)

nbDims = Ref{Cint}(0)

dimA = Array{Cint}(undef, CUDNN_DIM_MAX)

cudnnGetFilterNdDescriptor(d, nbDimsRequested, dataType, format, nbDims, dimA)

T = juliaDataType(dataType[])

D = (dimA[nbDims[]:-1:1]...,)

return T,D,format[]

end

. You'll have to write equivalent functions for some of the conv-specific descriptors, but it should be quite straightforward.

The descriptors as they are still have some Cenum types in them, which we could convert to julia Ints or something if we run into serialization trouble.

There is already `cudnnGetTensorDescriptor` and `cudnnGetFilterDescriptor`, so now we have everything to cache algorithm performances.

However, there's still a few `CUDNN_xyz_t` datatypes, which are Cenums. We could still map those to Julia integers if serialization is difficult otherwise.

lib/cudnn/src/convolution.jl

lib/cudnn/src/libcudnn.jl

maleadt · 2023-08-18T09:45:31Z

This is still marked WIP; anything to do here @RomeoV @ToucheSir?

ToucheSir · 2023-08-18T13:39:10Z

From my end no, didn't even notice the PR title still had WIP.

ToucheSir · 2023-08-20T04:21:04Z

lib/cudnn/src/convolution.jl

+    dyDesc_native = cudnnGetTensorDescriptor(dyDesc)
+    convDesc_native = cudnnGetConvolutionDescriptor(convDesc)
+
+    key = (xDesc_native, dyDesc_native, convDesc_native)
    val = lock(cudnnConvolutionBwdFilterAlgoPerfCacheLock) do
        get(cudnnConvolutionBwdFilterAlgoPerfCache, (xDesc, dyDesc, convDesc), nothing)


@RomeoV whoops, I think I missed this line. It should be get(cudnnConvolutionBwdFilterAlgoPerfCache, key, nothing), right?

Good catch, thanks. Opened another PR with that one-line change.

This is a follow up to JuliaGPU#1948.

This is a follow up to #1948.

Reverse lookup (Ptr->Tuple) for cudnn descriptors.

262c3da

RomeoV mentioned this pull request Jun 11, 2023

cuDNN: Store convolution algorithm choice to disk. #1947

Closed

ToucheSir reviewed Jun 11, 2023

View reviewed changes

RomeoV added 2 commits June 12, 2023 02:49

Add cudnnGetConvolutionDescriptor function

638d608

There is already `cudnnGetTensorDescriptor` and `cudnnGetFilterDescriptor`, so now we have everything to cache algorithm performances.

Store descriptor keys as native Julia dtypes

2cc6d3f

However, there's still a few `CUDNN_xyz_t` datatypes, which are Cenums. We could still map those to Julia integers if serialization is difficult otherwise.

RomeoV force-pushed the master branch 2 times, most recently from a765151 to 9da9e11 Compare June 12, 2023 10:02

maleadt reviewed Jun 13, 2023

View reviewed changes

lib/cudnn/src/convolution.jl Outdated Show resolved Hide resolved

maleadt reviewed Jun 13, 2023

View reviewed changes

lib/cudnn/src/libcudnn.jl Outdated Show resolved Hide resolved

Remove debug messages.

9d61195

RomeoV force-pushed the master branch from 41aafd1 to 9d61195 Compare June 13, 2023 17:51

RomeoV mentioned this pull request Aug 5, 2023

Use PrecompileTools.jl FluxML/FastAI.jl#284

Open

maleadt added cuda libraries Stuff about CUDA library wrappers. enhancement New feature or request labels Aug 18, 2023

maleadt changed the title ~~WIP: Implement reverse lookup (Ptr->Tuple) for cudnn descriptors.~~ Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. Aug 19, 2023

maleadt merged commit 4b87ec0 into JuliaGPU:master Aug 19, 2023

ToucheSir reviewed Aug 20, 2023

View reviewed changes

RomeoV added a commit to RomeoV/CUDA.jl that referenced this pull request Aug 20, 2023

Fixup wrong key lookup.

418157a

This is a follow up to JuliaGPU#1948.

RomeoV added a commit to RomeoV/CUDA.jl that referenced this pull request Aug 20, 2023

Fixup wrong key lookup.

10804e5

This is a follow up to JuliaGPU#1948.

RomeoV mentioned this pull request Aug 20, 2023

Fixup wrong key lookup. #2048

Merged

maleadt pushed a commit that referenced this pull request Aug 21, 2023

Fixup wrong key lookup. (#2048)

b084b5b

This is a follow up to #1948.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. #1948

Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. #1948

RomeoV commented Jun 11, 2023 •

edited

Loading

RomeoV commented Jun 11, 2023

ToucheSir Jun 11, 2023 •

edited

Loading

RomeoV Jun 12, 2023

RomeoV Jun 12, 2023

maleadt commented Aug 18, 2023

ToucheSir commented Aug 18, 2023

ToucheSir Aug 20, 2023

RomeoV Aug 20, 2023

	function cudnnGetTensorDescriptor(d::cudnnTensorDescriptor)
	nbDimsRequested = CUDNN_DIM_MAX
	dataType = Ref{cudnnDataType_t}(CUDNN_DATA_FLOAT)
	nbDims = Ref{Cint}(0)
	dimA = Array{Cint}(undef, CUDNN_DIM_MAX)
	strideA = Array{Cint}(undef, CUDNN_DIM_MAX)
	cudnnGetTensorNdDescriptor(d, nbDimsRequested, dataType, nbDims, dimA, strideA)
	T = juliaDataType(dataType[])
	D = (dimA[nbDims[]:-1:1]...,)
	S = (strideA[nbDims[]:-1:1]...,)
	return T,D,S
	end

	function cudnnGetFilterDescriptor(d::cudnnFilterDescriptor)
	nbDimsRequested = CUDNN_DIM_MAX
	dataType = Ref{cudnnDataType_t}(CUDNN_DATA_FLOAT)
	format = Ref{cudnnTensorFormat_t}(CUDNN_TENSOR_NCHW)
	nbDims = Ref{Cint}(0)
	dimA = Array{Cint}(undef, CUDNN_DIM_MAX)
	cudnnGetFilterNdDescriptor(d, nbDimsRequested, dataType, format, nbDims, dimA)
	T = juliaDataType(dataType[])
	D = (dimA[nbDims[]:-1:1]...,)
	return T,D,format[]
	end

Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. #1948

Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. #1948

Conversation

RomeoV commented Jun 11, 2023 • edited Loading

RomeoV commented Jun 11, 2023

ToucheSir Jun 11, 2023 • edited Loading

Choose a reason for hiding this comment

RomeoV Jun 12, 2023

Choose a reason for hiding this comment

RomeoV Jun 12, 2023

Choose a reason for hiding this comment

maleadt commented Aug 18, 2023

ToucheSir commented Aug 18, 2023

ToucheSir Aug 20, 2023

Choose a reason for hiding this comment

RomeoV Aug 20, 2023

Choose a reason for hiding this comment

RomeoV commented Jun 11, 2023 •

edited

Loading

ToucheSir Jun 11, 2023 •

edited

Loading