Inference time "PTX compile error: Entry function uses too much parameter space" #171

smart-fr · 2023-02-05T10:22:44Z

I successfully trained a NN on my game for 8x8 and 12x12 boards. I am aiming at 16x16, which is the original board dimension.
Inference on a 8x8 board works perfectly, the NN seems to win against any human player, this is fascinating!
Thank you again Jonathan for this generic implementation of AlphaZero.

Now during inference on a 12x12 board, I run into what looks like a CUDA problem. Probably not a bug; I need to allow an "Entry function" to use more parameter space. NB. neither the GPU memory nor the RAM are fully used when this occurs.

Has someone encountered this limitation, and tried to resolve it?

ERROR: Failed to compile PTX code (ptxas exited with code 4294967295)
Invocation arguments: --generate-line-info --verbose --gpu-name sm_86 --output-file C:\Users\smart\AppData\Local\Temp\jl_pK7SBkXwjg.cubin C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx
ptxas C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx, line 2027; error   : Entry function '_Z27julia_broadcast_kernel_818015CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE2__S4_I8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS9_ES4_IS6_S6_EES8_I13ReshapedArrayIS1_Li2E6SArrayIS4_ILi1152EES1_Li1ELi1152EES4_ES4_IS9_S9_ES4_IS6_S6_EEEES6_' uses too much parameter space (0x12b0 bytes, 0x1100 max).
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:428
  [3] #224
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:347 [inlined]       
  [4] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}}})
    @ GPUCompiler C:\Users\smart\.julia\packages\GPUCompiler\qdoh1\src\driver.jl:76
  [5] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:346
  [6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\smart\.julia\packages\GPUCompiler\qdoh1\src\cache.jl:90
  [7] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:299
  [8] cufunction
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:292 [inlined]       
  [9] macro expansion
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:102 [inlined]       
 [10] #launch_heuristic#248
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\gpuarrays.jl:17 [inlined]
 [11] _copyto!
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:63 [inlined]       
 [12] copyto!
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:46 [inlined]       
 [13] copy
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:37 [inlined]       
 [14] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}}})
    @ Base.Broadcast .\broadcast.jl:860
 [15] forward_normalized(nn::ResNet, state::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, actions_mask::Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}})
    @ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:265
 [16] evaluate(nn::ResNet, state::NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}})
    @ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:292
 [17] AbstractNetwork
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:297 [inlined]    
 [18] state_info(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, state::NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}})
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:170
 [19] run_simulation!(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, game::AlphaZero.Examples.BonbonRectangle.GameEnv; η::Vector{Float64}, root::Bool)
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:206
 [20] explore!(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, game::AlphaZero.Examples.BonbonRectangle.GameEnv, nsims::Int64)
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:244
 [21] think(p::MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, game::AlphaZero.Examples.BonbonRectangle.GameEnv)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:202
 [22] select_move
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:49 [inlined]
 [23] select_move(p::TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, Human}, game::AlphaZero.Examples.BonbonRectangle.GameEnv, turn::Int64)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:265
 [24] interactive!(game::AlphaZero.Examples.BonbonRectangle.GameEnv, player::TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, Human})
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:378
 [25] interactive!
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:400 [inlined]
 [26] interactive!(game::AlphaZero.Examples.BonbonRectangle.GameSpec, white::MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, black::Human)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:402
 [27] play(e::Experiment; args::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ AlphaZero.Scripts C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:59
 [28] play
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:39 [inlined]      
 [29] #play#19
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:71 [inlined]      
 [30] play(s::String)
    @ AlphaZero.Scripts C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:71
 [31] top-level scope
    @ none:1

The text was updated successfully, but these errors were encountered:

smart-fr · 2023-02-08T17:51:21Z

I could overcome this CUDA inference issue by forcing CPU use during play sessions, in the Arena parameters:

arena = ArenaParams(
  sim=SimParams(
    use_gpu=false,#true,

This is more a workaround than a satisfactory solution, hence I don't close the issue yet.

Maybe my hardware is too limited? RTX3080 Laptop with 16GB Memory.
But I got the same issue using a cloud V100.

(By the way, in order to reuse a session with different parameters for playing than the original ones used for training, I used the trick suggested here: To continue a training #118)

jonathan-laurent · 2023-02-15T10:45:59Z

I am really glad to hear that you are starting to see good results on your game!
Your hardware is fine. The RTX3080 is a good GPU and more than I had when I originally developed AlphaZero.jl.

I never encountered the error you reported, although it is probably not AlphaZero.jl-specific.
I would encourage you to look for a minimal nonworking example and submit an issue to CUDA.jl.

smart-fr mentioned this issue Feb 5, 2023

Error using KNET #172

Closed

smart-fr changed the title ~~PTX compile error: Entry function uses too much parameter space~~ Inference time "PTX compile error: Entry function uses too much parameter space" Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference time "PTX compile error: Entry function uses too much parameter space" #171

Inference time "PTX compile error: Entry function uses too much parameter space" #171

smart-fr commented Feb 5, 2023 •

edited

Loading

smart-fr commented Feb 8, 2023 •

edited

Loading

jonathan-laurent commented Feb 15, 2023

Inference time "PTX compile error: Entry function uses too much parameter space" #171

Inference time "PTX compile error: Entry function uses too much parameter space" #171

Comments

smart-fr commented Feb 5, 2023 • edited Loading

smart-fr commented Feb 8, 2023 • edited Loading

jonathan-laurent commented Feb 15, 2023

smart-fr commented Feb 5, 2023 •

edited

Loading

smart-fr commented Feb 8, 2023 •

edited

Loading