Add Buffer Device Address Backend #311

VarLad · 2025-05-31T18:22:01Z

More on the extension here: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#cl_ext_buffer_device_address

This PR introduces a new backend for the array interface, as well as bumps the generated wrapper version for OpenCL Headers.

Main purpose for the new backend is to be able to support more devices than currently possible.
This is heavily inspired by the recent merge of bda into Mesa's Rusticl. Specifically the Zink backend, which means OpenCL.jl can use Vulkan as a backend and ideally work on a large number of previously unsupported devices.

~~This is still a work in progress, and mostly meant for review or question purposes at the current stage.~~

Current status: All tests pass. Feel free to try! This is supported both by POCL 7.0 as well as with mesa-git using the environment variable RUSTICL_ENABLE=zink (this can work with any compliant Vulkan driver)

codecov · 2025-05-31T18:28:27Z

Codecov Report

Attention: Patch coverage is 95.31250% with 3 lines in your changes missing coverage. Please review.

Project coverage is 77.07%. Comparing base (d65f049) to head (f9b0d01).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/array.jl	91.30%	2 Missing ⚠️
src/util.jl	96.15%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #311      +/-   ##
==========================================
+ Coverage   75.32%   77.07%   +1.74%     
==========================================
  Files          12       12              
  Lines         616      663      +47     
==========================================
+ Hits          464      511      +47     
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-06-15T17:03:51Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.

diff --git a/lib/cl/libopencl.jl b/lib/cl/libopencl.jl
index 1886fcd..afa8233 100644
--- a/lib/cl/libopencl.jl
+++ b/lib/cl/libopencl.jl
@@ -8,8 +8,9 @@ end
 
 function check(f)
     res = retry_reclaim(err -> err == CL_OUT_OF_RESOURCES ||
-                               err == CL_MEM_OBJECT_ALLOCATION_FAILURE ||
-                               err == CL_OUT_OF_HOST_MEMORY) do
+            err == CL_MEM_OBJECT_ALLOCATION_FAILURE ||
+            err == CL_OUT_OF_HOST_MEMORY
+    ) do
         return f()
     end
 
@@ -1330,13 +1331,17 @@ const clIcdSetPlatformDispatchDataKHR_fn = Ptr{clIcdSetPlatformDispatchDataKHR_t
 end
 
 function clIcdGetFunctionAddressForPlatformKHR(platform, func_name)
-    @ext_ccall libopencl.clIcdGetFunctionAddressForPlatformKHR(platform::cl_platform_id,
-                                                               func_name::Ptr{Cchar})::Ptr{Cvoid}
+    return @ext_ccall libopencl.clIcdGetFunctionAddressForPlatformKHR(
+        platform::cl_platform_id,
+        func_name::Ptr{Cchar}
+    )::Ptr{Cvoid}
 end
 
 @checked function clIcdSetPlatformDispatchDataKHR(platform, dispatch_data)
-    @ext_ccall libopencl.clIcdSetPlatformDispatchDataKHR(platform::cl_platform_id,
-                                                         dispatch_data::Ptr{Cvoid})::cl_int
+    @ext_ccall libopencl.clIcdSetPlatformDispatchDataKHR(
+        platform::cl_platform_id,
+        dispatch_data::Ptr{Cvoid}
+    )::cl_int
 end
 
 # typedef cl_program CL_API_CALL clCreateProgramWithILKHR_t ( cl_context context , const void * il , size_t length , cl_int * errcode_ret )
@@ -2054,7 +2059,8 @@ end
 
 @checked function clGetMemAllocInfoINTEL(context, ptr, param_name, param_value_size,
                                          param_value, param_value_size_ret)
-    @ext_ccall libopencl.clGetMemAllocInfoINTEL(context::cl_context, ptr::PtrOrCLPtr{Cvoid},
+    @ext_ccall libopencl.clGetMemAllocInfoINTEL(
+        context::cl_context, ptr::PtrOrCLPtr{Cvoid},
                                                 param_name::cl_mem_info_intel,
                                                 param_value_size::Csize_t,
                                                 param_value::Ptr{Cvoid},
@@ -2064,15 +2070,16 @@ end
 @checked function clSetKernelArgMemPointerINTEL(kernel, arg_index, arg_value)
     @ext_ccall libopencl.clSetKernelArgMemPointerINTEL(kernel::cl_kernel,
                                                        arg_index::cl_uint,
-                                                       arg_value::PtrOrCLPtr{Cvoid})::cl_int
+        arg_value::PtrOrCLPtr{Cvoid}
+    )::cl_int
 end
 
 @checked function clEnqueueMemFillINTEL(command_queue, dst_ptr, pattern, pattern_size, size,
                                         num_events_in_wait_list, event_wait_list, event)
     @ext_ccall libopencl.clEnqueueMemFillINTEL(command_queue::cl_command_queue,
-                                               dst_ptr::PtrOrCLPtr{Cvoid},
-                                               pattern::Ptr{Cvoid}, pattern_size::Csize_t,
-                                               size::Csize_t,
+        dst_ptr::PtrOrCLPtr{Cvoid},
+        pattern::Ptr{Cvoid}, pattern_size::Csize_t,
+        size::Csize_t,
                                                num_events_in_wait_list::cl_uint,
                                                event_wait_list::Ptr{cl_event},
                                                event::Ptr{cl_event})::cl_int
@@ -2091,7 +2098,7 @@ end
 @checked function clEnqueueMemAdviseINTEL(command_queue, ptr, size, advice,
                                           num_events_in_wait_list, event_wait_list, event)
     @ext_ccall libopencl.clEnqueueMemAdviseINTEL(command_queue::cl_command_queue,
-                                                 ptr::PtrOrCLPtr{Cvoid}, size::Csize_t,
+        ptr::PtrOrCLPtr{Cvoid}, size::Csize_t,
                                                  advice::cl_mem_advice_intel,
                                                  num_events_in_wait_list::cl_uint,
                                                  event_wait_list::Ptr{cl_event},
@@ -2106,7 +2113,7 @@ const clEnqueueMigrateMemINTEL_fn = Ptr{clEnqueueMigrateMemINTEL_t}
 @checked function clEnqueueMigrateMemINTEL(command_queue, ptr, size, flags,
                                            num_events_in_wait_list, event_wait_list, event)
     @ext_ccall libopencl.clEnqueueMigrateMemINTEL(command_queue::cl_command_queue,
-                                                  ptr::PtrOrCLPtr{Cvoid}, size::Csize_t,
+        ptr::PtrOrCLPtr{Cvoid}, size::Csize_t,
                                                   flags::cl_mem_migration_flags,
                                                   num_events_in_wait_list::cl_uint,
                                                   event_wait_list::Ptr{cl_event},
@@ -2250,9 +2257,11 @@ const clSetKernelArgDevicePointerEXT_t = Cvoid
 const clSetKernelArgDevicePointerEXT_fn = Ptr{clSetKernelArgDevicePointerEXT_t}
 
 @checked function clSetKernelArgDevicePointerEXT(kernel, arg_index, arg_value)
-    @ext_ccall libopencl.clSetKernelArgDevicePointerEXT(kernel::cl_kernel,
-                                                        arg_index::cl_uint,
-                                                        arg_value::cl_mem_device_address_ext)::cl_int
+    @ext_ccall libopencl.clSetKernelArgDevicePointerEXT(
+        kernel::cl_kernel,
+        arg_index::cl_uint,
+        arg_value::cl_mem_device_address_ext
+    )::cl_int
 end
 
 # typedef cl_int CL_API_CALL clCancelCommandsIMG_t ( const cl_event * event_list , size_t num_events_in_list )
@@ -2273,8 +2282,10 @@ const clSetPerfHintQCOM_t = Cvoid
 const clSetPerfHintQCOM_fn = Ptr{clSetPerfHintQCOM_t}
 
 @checked function clSetPerfHintQCOM(context, perf_hint)
-    @ext_ccall libopencl.clSetPerfHintQCOM(context::cl_context,
-                                           perf_hint::cl_perf_hint_qcom)::cl_int
+    @ext_ccall libopencl.clSetPerfHintQCOM(
+        context::cl_context,
+        perf_hint::cl_perf_hint_qcom
+    )::cl_int
 end
 
 const CL_NAME_VERSION_MAX_NAME_SIZE = 64
diff --git a/lib/cl/memory/buffer.jl b/lib/cl/memory/buffer.jl
index dfd198d..3d7a392 100644
--- a/lib/cl/memory/buffer.jl
+++ b/lib/cl/memory/buffer.jl
@@ -2,7 +2,7 @@
 
 struct Buffer <: AbstractMemoryObject
     id::cl_mem
-    ptr::Union{Nothing,CLPtr{Cvoid}}
+    ptr::Union{Nothing, CLPtr{Cvoid}}
     bytesize::Int
     context::Context
 end
@@ -17,8 +17,10 @@ context(buf::Buffer) = buf.context
 ## constructors
 
 # for internal use
-function Buffer(sz::Int, flags::Integer, hostbuf=nothing;
-                device=:rw, host=:rw, device_private_address=bda_supported(cl.device()))
+function Buffer(
+        sz::Int, flags::Integer, hostbuf = nothing;
+        device = :rw, host = :rw, device_private_address = bda_supported(cl.device())
+    )
     if device == :rw
         flags |= CL_MEM_READ_WRITE
     elseif device == :r
@@ -68,29 +70,33 @@ function Buffer(sz::Int, flags::Integer, hostbuf=nothing;
 end
 
 # allocated buffer
-function Buffer(sz::Integer; host_accessible=false, kwargs...)
+function Buffer(sz::Integer; host_accessible = false, kwargs...)
     flags = host_accessible ? CL_MEM_ALLOC_HOST_PTR : 0
-    Buffer(sz, flags, nothing; kwargs...)
+    return Buffer(sz, flags, nothing; kwargs...)
 end
 
 # from host memory
-function Buffer(hostbuf::Array; copy::Bool=true, kwargs...)
+function Buffer(hostbuf::Array; copy::Bool = true, kwargs...)
     flags = copy ? CL_MEM_COPY_HOST_PTR : CL_MEM_USE_HOST_PTR
-    Buffer(sizeof(hostbuf), flags, hostbuf; kwargs...)
+    return Buffer(sizeof(hostbuf), flags, hostbuf; kwargs...)
 end
 
 
 ## memory operations
 
 # reading from buffer to host array, return an event
-function enqueue_read(dst::Ptr, src::Buffer, src_off::Int, nbytes::Int;
-                      blocking::Bool=false, wait_for::Vector{Event}=Event[])
-    n_evts  = length(wait_for)
+function enqueue_read(
+        dst::Ptr, src::Buffer, src_off::Int, nbytes::Int;
+        blocking::Bool = false, wait_for::Vector{Event} = Event[]
+    )
+    n_evts = length(wait_for)
     evt_ids = isempty(wait_for) ? C_NULL : [pointer(evt) for evt in wait_for]
-    GC.@preserve wait_for begin
+    return GC.@preserve wait_for begin
         ret_evt = Ref{cl_event}()
-        clEnqueueReadBuffer(queue(), src, blocking, src_off, nbytes, dst,
-                            n_evts, evt_ids, ret_evt)
+        clEnqueueReadBuffer(
+            queue(), src, blocking, src_off, nbytes, dst,
+            n_evts, evt_ids, ret_evt
+        )
         @return_nanny_event(ret_evt[], dst)
     end
 end
@@ -98,14 +104,18 @@ enqueue_read(dst::Ptr, src::Buffer, nbytes; kwargs...) =
     enqueue_read(dst, src, 0, nbytes; kwargs...)
 
 # writing from host array to buffer, return an event
-function enqueue_write(dst::Buffer, dst_off::Int, src::Ptr, nbytes::Int;
-                       blocking::Bool=false, wait_for::Vector{Event}=Event[])
-    n_evts  = length(wait_for)
+function enqueue_write(
+        dst::Buffer, dst_off::Int, src::Ptr, nbytes::Int;
+        blocking::Bool = false, wait_for::Vector{Event} = Event[]
+    )
+    n_evts = length(wait_for)
     evt_ids = isempty(wait_for) ? C_NULL : [pointer(evt) for evt in wait_for]
-    GC.@preserve wait_for begin
+    return GC.@preserve wait_for begin
         ret_evt = Ref{cl_event}()
-        clEnqueueWriteBuffer(queue(), dst, blocking, dst_off, nbytes, src,
-                             n_evts, evt_ids, ret_evt)
+        clEnqueueWriteBuffer(
+            queue(), dst, blocking, dst_off, nbytes, src,
+            n_evts, evt_ids, ret_evt
+        )
         @return_nanny_event(ret_evt[], dst)
     end
 end
@@ -113,15 +123,19 @@ enqueue_write(dst::Buffer, src::Ptr, nbytes; kwargs...) =
     enqueue_write(dst, 0, src, nbytes; kwargs...)
 
 # copying between two buffers, return an event
-function enqueue_copy(dst::Buffer, dst_off::Int, src::Buffer, src_off::Int,
-                      nbytes::Int; blocking::Bool=false,
-                      wait_for::Vector{Event}=Event[])
-    n_evts  = length(wait_for)
+function enqueue_copy(
+        dst::Buffer, dst_off::Int, src::Buffer, src_off::Int,
+        nbytes::Int; blocking::Bool = false,
+        wait_for::Vector{Event} = Event[]
+    )
+    n_evts = length(wait_for)
     evt_ids = isempty(wait_for) ? C_NULL : [pointer(evt) for evt in wait_for]
-    GC.@preserve wait_for begin
+    return GC.@preserve wait_for begin
         ret_evt = Ref{cl_event}()
-        clEnqueueCopyBuffer(queue(), src, dst, src_off, dst_off, nbytes,
-                            n_evts, evt_ids, ret_evt)
+        clEnqueueCopyBuffer(
+            queue(), src, dst, src_off, dst_off, nbytes,
+            n_evts, evt_ids, ret_evt
+        )
         @return_event ret_evt[]
     end
 end
@@ -129,8 +143,10 @@ enqueue_copy(dst::Buffer, src::Buffer, N; kwargs...) =
     enqueue_copy(dst, 0, src, 0, N; kwargs...)
 
 # map a buffer into the host address space, returning a pointer and an event
-function enqueue_map(buf::Buffer, offset::Integer, nbytes::Int, flags=:rw;
-                     blocking::Bool=false, wait_for::Vector{Event}=Event[])
+function enqueue_map(
+        buf::Buffer, offset::Integer, nbytes::Int, flags = :rw;
+        blocking::Bool = false, wait_for::Vector{Event} = Event[]
+    )
     flags = if flags == :rw
         CL_MAP_READ | CL_MAP_WRITE
     elseif flags == :r
@@ -142,12 +158,14 @@ function enqueue_map(buf::Buffer, offset::Integer, nbytes::Int, flags=:rw;
     end
 
     ret_evt = Ref{cl_event}()
-    n_evts  = length(wait_for)
+    n_evts = length(wait_for)
     evt_ids = isempty(wait_for) ? C_NULL : [pointer(evt) for evt in wait_for]
     GC.@preserve wait_for begin
-        status  = Ref{Cint}()
-        ptr = clEnqueueMapBuffer(queue(), buf, blocking, flags, offset, nbytes,
-                                 n_evts, evt_ids, ret_evt, status)
+        status = Ref{Cint}()
+        ptr = clEnqueueMapBuffer(
+            queue(), buf, blocking, flags, offset, nbytes,
+            n_evts, evt_ids, ret_evt, status
+        )
         if status[] != CL_SUCCESS
             throw(CLError(status[]))
         end
@@ -155,12 +173,12 @@ function enqueue_map(buf::Buffer, offset::Integer, nbytes::Int, flags=:rw;
         return ptr, Event(ret_evt[])
     end
 end
-enqueue_map(buf::Buffer, nbytes::Int, flags=:rw; kwargs...) =
+enqueue_map(buf::Buffer, nbytes::Int, flags = :rw; kwargs...) =
     enqueue_map(buf, 0, nbytes, flags; kwargs...)
 
 # unmap a buffer, return an event
-function enqueue_unmap(buf::Buffer, ptr::Ptr; wait_for::Vector{Event}=Event[])
-    n_evts  = length(wait_for)
+function enqueue_unmap(buf::Buffer, ptr::Ptr; wait_for::Vector{Event} = Event[])
+    n_evts = length(wait_for)
     evt_ids = isempty(wait_for) ? C_NULL : [pointer(evt) for evt in wait_for]
     GC.@preserve wait_for begin
         ret_evt = Ref{cl_event}()
@@ -170,18 +188,22 @@ function enqueue_unmap(buf::Buffer, ptr::Ptr; wait_for::Vector{Event}=Event[])
 end
 
 # fill a buffer with a pattern, returning an event
-function enqueue_fill(buf::Buffer, offset::Integer, pattern::T, N::Integer;
-                      wait_for::Vector{Event}=Event[]) where {T}
+function enqueue_fill(
+        buf::Buffer, offset::Integer, pattern::T, N::Integer;
+        wait_for::Vector{Event} = Event[]
+    ) where {T}
     nbytes = N * sizeof(T)
     nbytes_pattern = sizeof(T)
     @assert nbytes_pattern > 0
-    n_evts  = length(wait_for)
+    n_evts = length(wait_for)
     evt_ids = isempty(wait_for) ? C_NULL : [pointer(evt) for evt in wait_for]
-    GC.@preserve begin
+    return GC.@preserve begin
         ret_evt = Ref{cl_event}()
-        clEnqueueFillBuffer(queue(), buf, [pattern],
-                            nbytes_pattern, offset, nbytes,
-                            n_evts, evt_ids, ret_evt)
+        clEnqueueFillBuffer(
+            queue(), buf, [pattern],
+            nbytes_pattern, offset, nbytes,
+            n_evts, evt_ids, ret_evt
+        )
         @return_event ret_evt[]
     end
 end
diff --git a/lib/cl/memory/usm.jl b/lib/cl/memory/usm.jl
index a12bc52..9aec2f2 100644
--- a/lib/cl/memory/usm.jl
+++ b/lib/cl/memory/usm.jl
@@ -1,7 +1,7 @@
 abstract type UnifiedMemory <: AbstractPointerMemory end
 
 function usm_free(mem::UnifiedMemory; blocking::Bool = false)
-    if blocking
+    return if blocking
         clMemBlockingFreeINTEL(context(mem), mem)
     else
         clMemFreeINTEL(context(mem), mem)
diff --git a/lib/cl/state.jl b/lib/cl/state.jl
index de44bd5..46a2943 100644
--- a/lib/cl/state.jl
+++ b/lib/cl/state.jl
@@ -212,7 +212,7 @@ function default_memory_backend(dev::Device)
         error("Unknown memory backend '$backend_str' requested")
     end
     in(backend, supported_backends) ? backend : nothing
-    backend
+    return backend
 end
 
 function memory_backend()
@@ -224,7 +224,7 @@ function memory_backend()
         end
         if backend === BufferBackend() && !bda_supported(dev)
             @warn """Your device $(dev.name) does not support the necessary extensions for OpenCL.jl's memory management (requiring either USM, coarse-grained SVM, or BDA).
-                     Falling back to plain OpenCL buffers, which severely limits compatibility with other OpenCL.jl, only supporting OpenCL C kernels.""" maxlog=1 _id="memory_backend_$(dev.name)"
+            Falling back to plain OpenCL buffers, which severely limits compatibility with other OpenCL.jl, only supporting OpenCL C kernels.""" maxlog = 1 _id = "memory_backend_$(dev.name)"
         end
         backend
     end
diff --git a/src/array.jl b/src/array.jl
index ee5ee63..28655e2 100644
--- a/src/array.jl
+++ b/src/array.jl
@@ -52,7 +52,7 @@ mutable struct CLArray{T, N, M} <: AbstractGPUArray{T, N}
             maxsize
         end
         data = GPUArrays.cached_alloc((CLArray, cl.context(), M, bufsize)) do
-            buf = managed_alloc(M, bufsize; alignment=Base.datatype_alignment(T))
+            buf = managed_alloc(M, bufsize; alignment = Base.datatype_alignment(T))
             DataRef(free, buf)
         end
         obj = new{T, N, M}(data, maxsize, 0, dims)
@@ -392,20 +392,26 @@ for (srcty, dstty) in [(:Array, :CLArray), (:CLArray, :Array), (:CLArray, :CLArr
                     cl.enqueue_usm_copy(pointer(dst, dst_off), pointer(src, src_off), nbytes; blocking)
                 else
                     if src isa CLArray && dst isa CLArray
-                        cl.enqueue_copy(convert(cl.Buffer, dst.data[]),
+                        cl.enqueue_copy(
+                            convert(cl.Buffer, dst.data[]),
                             (dst.offset * Base.elsize(dst)) + (dst_off - 1) * sizeof(T),
                             convert(cl.Buffer, src.data[]),
                             (src.offset * Base.elsize(src)) + (src_off - 1) * sizeof(T),
-                            nbytes; blocking)
+                            nbytes; blocking
+                        )
                     elseif dst isa CLArray
-                        cl.enqueue_write(convert(cl.Buffer, dst.data[]),
+                        cl.enqueue_write(
+                            convert(cl.Buffer, dst.data[]),
                             (dst.offset * Base.elsize(dst)) + (dst_off - 1) * sizeof(T),
-                            pointer(src, src_off), nbytes; blocking)
+                            pointer(src, src_off), nbytes; blocking
+                        )
                     elseif src isa CLArray
-                        cl.enqueue_read(pointer(dst, dst_off),
+                        cl.enqueue_read(
+                            pointer(dst, dst_off),
                             convert(cl.Buffer, src.data[]),
                             (src.offset * Base.elsize(src)) + (src_off - 1) * sizeof(T),
-                            nbytes; blocking)
+                            nbytes; blocking
+                        )
                     end
                 end
             end
@@ -522,7 +528,7 @@ function Base.resize!(a::CLVector{T}, n::Integer) where {T}
     # replace the data with a new CL. this 'unshares' the array.
     # as a result, we can safely support resizing unowned buffers.
     new_data = cl.context!(context(a)) do
-        mem = managed_alloc(memtype(a), bufsize; alignment=Base.datatype_alignment(T))
+        mem = managed_alloc(memtype(a), bufsize; alignment = Base.datatype_alignment(T))
         ptr = convert(CLPtr{T}, mem)
         m = min(length(a), n)
         if m > 0
@@ -532,7 +538,7 @@ function Base.resize!(a::CLVector{T}, n::Integer) where {T}
                 elseif memtype(a) <: cl.UnifiedMemory
                     cl.enqueue_usm_copy(ptr, pointer(a), m*sizeof(T); blocking=false)
                 else
-                    cl.enqueue_copy(convert(cl.Buffer, mem), 0, convert(cl.Buffer, a.data[]), a.offset * Base.elsize(a), m*sizeof(T); blocking=false)
+                    cl.enqueue_copy(convert(cl.Buffer, mem), 0, convert(cl.Buffer, a.data[]), a.offset * Base.elsize(a), m * sizeof(T); blocking = false)
                 end
             end
         end
diff --git a/src/memory.jl b/src/memory.jl
index 94d8e44..e13aed5 100644
--- a/src/memory.jl
+++ b/src/memory.jl
@@ -113,7 +113,7 @@ end
 
 
 ## public interface
-function managed_alloc(t::Type{T}, bytes::Int; kwargs...) where T
+function managed_alloc(t::Type{T}, bytes::Int; kwargs...) where {T}
     if bytes == 0
         return Managed(T())
     else
diff --git a/src/util.jl b/src/util.jl
index 70a002f..98b1530 100644
--- a/src/util.jl
+++ b/src/util.jl
@@ -56,7 +56,7 @@ function versioninfo(io::IO=stdout)
     prefs = [
         "default_memory_backend" => load_preference(OpenCL, "default_memory_backend"),
     ]
-    if any(x->!isnothing(x[2]), prefs)
+    if any(x -> !isnothing(x[2]), prefs)
         println(io, "Preferences:")
         for (key, val) in prefs
             if !isnothing(val)
@@ -90,7 +90,7 @@ function versioninfo(io::IO=stdout)
                 if svm_caps.fine_grain_buffer
                     push!(svm_tags, "f")
                 end
-                push!(tags, "svm:"*join(svm_tags, "+"))
+                push!(tags, "svm:" * join(svm_tags, "+"))
             end
             if cl.usm_supported(device)
                 usm_tags = []
@@ -101,7 +101,7 @@ function versioninfo(io::IO=stdout)
                 if usm_caps.device.access
                     push!(usm_tags, "d")
                 end
-                push!(tags, "usm:"*join(usm_tags, "+"))
+                push!(tags, "usm:" * join(usm_tags, "+"))
             end
             if cl.bda_supported(device)
                 push!(tags, "bda")
diff --git a/test/buffer.jl b/test/buffer.jl
index 34245c9..6eca059 100644
--- a/test/buffer.jl
+++ b/test/buffer.jl
@@ -15,7 +15,7 @@
     end
 
     # host accessible, mapped
-    let buf = cl.Buffer(sizeof(Int); host_accessible=true)
+    let buf = cl.Buffer(sizeof(Int); host_accessible = true)
         src = [42]
         cl.enqueue_write(buf, pointer(src), sizeof(src); blocking=true)
 
@@ -65,7 +65,7 @@
     end
 
     # fill
-    let buf = cl.Buffer(3*sizeof(Int))
+    let buf = cl.Buffer(3 * sizeof(Int))
         cl.enqueue_fill(buf, 42, 3)
 
         arr = Vector{Int}(undef, 3)

VarLad · 2025-06-15T18:21:46Z

Tested with mesa-git rusticl with RUSTICL_ENABLE=zink environment variable on an Intel iGPU.
The results are as follows:

Indexing looks bad.

~~It would be nice if I could somehow test the GPUArrays suite with POCL's BDA backend.~~

Just tried, it looks like indexing issues seem to stem from my implementation in this PR.

VarLad · 2025-06-16T11:18:19Z

@maleadt Can you take a look at the PR to see whats missing? Most tests pass now.
A few don't but I need to find out if its Mesa or OpenCL.jl, and a few errors I think are acceptable since I'm testing for BDA and Mesa Zink doesn't support SVM or USM.

Regarding BDA specific tests, I think what we have for normal Buffer/cl_mem should suffice for BDA as well for now :)

Here's the latest one:

VarLad · 2025-06-16T15:30:16Z

fixes #324 and #281

maleadt

Couple of initial thoughts. I haven't had the time to thoroughly test this.

lib/cl/libopencl.jl

lib/cl/memory/bda.jl

src/memory.jl

maleadt · 2025-06-17T12:25:41Z

If PoCL supports this, we should probably have a way to pick the back-end (e.g. using a preference, asserting at run time that the device supports said back-end) and test this functionality on CI.

The test failures are problematic though. Indexing really shouldn't behave differently on this back-end.

VarLad · 2025-06-17T12:43:17Z

Indexing really shouldn't behave differently on this back-end.

I fixed the indexing part. :)

The only remaining errors come from:

Empty arrays of size 0
There's a bug involving views. views don't work when range is provided.
For example, @view x[3:4] doesn't work (while @view x[[3,4]] does). It doesn't give an error, it always treats it as @view x[1:2], that is, it always starts from the first element for some reason...
Would you know the issue or where to look for it?

VarLad · 2025-06-17T12:46:03Z

If PoCL supports this, we should probably have a way to pick the back-end (e.g. using a preference, asserting at run time that the device supports said back-end) and test this functionality on CI.

PoCL does support this.

lib/cl/memory/bda.jl

VarLad · 2025-06-19T05:20:45Z

@maleadt For now, I've introduced a environment variable based method to select backend before loading OpenCL. This should be enough to run pocl + bda tests.
JULIA_OPENCL_BACKEND=svm # or bda or usm
A better way to do this (using Preferences.jl for example) would be better in another PR in my opinion.

All tests pass on my side

The PR should be ready if you don't have any issues. :)

maleadt · 2025-06-19T08:15:36Z

Taking a look...

maleadt · 2025-06-19T13:15:41Z

I have some fixes locally. Am trying to unify some more things, I've always been annoyed how the Buffer abstraction is pretty disjoint from our Memory wrappers.

maleadt · 2025-06-19T13:37:46Z

Pushed the refactor. WIP because there's a remaining issue:

Error in testset gpuarrays/reductions/== isequal:
Error During Test at /home/tim/.julia/packages/GPUArrays/uiVyU/test/testsuite/reductions.jl:203
  Test threw exception
  Expression: compare(((A, B)->begin
            #= /home/tim/.julia/packages/GPUArrays/uiVyU/test/testsuite/reductions.jl:203 =#
            A == B
        end), AT, [missing], [missing])
  Buffer does not have a device private address

IIUC this used to work because the Buffer -> pointer conversion was postponed to launch time, while I'm doing the conversion earlier now. That's not ideal, because it also makes it impossible to pass buffers to kernels accepting pointers directly (as opposed to contained in structs) without the BDA extension.

EDIT: Pushed a quick fix for that. But the question remains if we still want to support passing buffers directly. Maybe we should just ditch that and require one of the back-ends to be supported...

VarLad · 2025-06-19T14:16:35Z

Maybe we should just ditch that and require one of the back-ends to be supported...

This makes sense from my POV. I say this because at the very least, SVM is mandated by the actual OpenCL spec. We should at least have the liberty to assume that we're working with a complete implementation of the very spec we're wrapping around here, or else it would get too complicated.
I get that there are actual devices that can't support shared memory like SVM, but bda is an extension for those devices specifically. Plus, my intentions for bda were for it to act as a Vulkan backend for OpenCL.jl (via Mesa Zink), for devices lacking a proper OpenCL implementation in the first place.

What do you think?

maleadt · 2025-06-19T16:44:27Z

at the very least, SVM is mandated by the actual OpenCL spec.

I don't think it is, at least not for OpenCL 3.0+: KhronosGroup/OpenCL-Docs#913

VarLad · 2025-06-19T17:20:47Z

Okay, reading about it, it seems like in 3.0 they made a lot of things optional and brought it closer to 1.2...

Well, whichever of the two options is easier and cleaner to maintain then I guess. :)

maleadt · 2025-06-19T17:25:02Z

I figured out a way to have the buffer (previously bda) back-end also work with non-BDA targets, obviously only supporting OpenCL C then. Forcing bda_supported to return false in combination with default_memory_backend=buffer:

❯ jl --project examples/vadd.jl
┌ Warning: Your device cpu-haswell-AMD Ryzen 9 5950X 16-Core Processor does not support the necessary extensions for OpenCL.jl's memory management (requiring either USM, coarse-grained SVM, or BDA).
│ Falling back to plain OpenCL buffers, which severely limits compatibility with other OpenCL.jl, only supporting OpenCL C kernels.
└ @ OpenCL.cl ~/Julia/pkg/OpenCL/lib/cl/state.jl:226

❯ jl --project examples/vadd_native.jl 
┌ Warning: Your device cpu-haswell-AMD Ryzen 9 5950X 16-Core Processor does not support the necessary extensions for OpenCL.jl's memory management (requiring either USM, coarse-grained SVM, or BDA).
│ Falling back to plain OpenCL buffers, which severely limits compatibility with other OpenCL.jl, only supporting OpenCL C kernels.
└ @ OpenCL.cl ~/Julia/pkg/OpenCL/lib/cl/state.jl:226
ERROR: LoadError: Conversion of a buffer to a pointer is not supported by this device

Implemented.

VarLad · 2025-06-19T18:29:18Z

obviously only supporting OpenCL C

I believe this will improve in the future if/when #319 works?

maleadt · 2025-06-19T19:03:19Z

I believe this will improve in the future if/when #319 works?

Sadly not, because even with spirv2clc we still rely on the ability to pass structs containing pointers (even to pass a simple CuDeviceArray), so we need the ability to look up the device address of a buffer. So the restriction is even more stringent, only allowing OpenCL C kernels that do not rely on literal pointers being passed. But that's a bit much to put in an error message 🙂

Besides, only OpenCL 2.0 technically supports passing pointers inside of structures, and despite NVIDIA and Intel's drivers advertising OpenCL 2.0 compatibility, their copies of clang inside the driver aren't recent enough to support that (i.e., they don't include llvm/llvm-project@eae70cc, which is part of LLVM 17).

maleadt · 2025-06-19T19:47:09Z

I think this is good to go. @VarLad Any final thoughts on my changes / Anything else you want to add? Otherwise I think I'll merge this once CI agrees.

VarLad · 2025-06-19T20:09:55Z

@maleadt I think clCreateBufferWithProperties is provided by OpenCL 3.0 so maybe we can use just that and remove clCreateBuffer? (I think this is the case but its confusing to check...)

Other than this, the code looks good, no obvious issues I can see.

maleadt · 2025-06-20T04:53:32Z

clCreateBufferWithProperties is provided by OpenCL 3.0 so maybe we can use just that and remove clCreateBuffer?

Doesn't hurt to keep the old clCreateBuffer path, in case somebody wants to reinstate pre-3.0 support.

VarLad · 2025-06-20T04:56:04Z

Ok, makes sense. Looks good to me, lets merge this! 👀

VarLad mentioned this pull request Jun 14, 2025

does not support USM or coarse-grained SVM #324

Closed

VarLad marked this pull request as ready for review June 15, 2025 17:03

VarLad changed the title ~~WIP: Add Buffer Device Address Backend~~ Add Buffer Device Address Backend Jun 15, 2025

maleadt reviewed Jun 17, 2025

View reviewed changes

lib/cl/libopencl.jl Show resolved Hide resolved

lib/cl/memory/bda.jl Outdated Show resolved Hide resolved

src/memory.jl Outdated Show resolved Hide resolved

maleadt reviewed Jun 17, 2025

View reviewed changes

lib/cl/memory/bda.jl Outdated Show resolved Hide resolved

VarLad added 10 commits June 19, 2025 09:58

initial BDA work

62b4ca9

get bda working

458a35d

remove debug

8fc1462

fix resize

5053b77

get indexing working

dfa773a

get bda to use Managed properly

5a5559b

typo: bring back pocl_jll for tests

558c9b3

spoof 0 sized buffer device arrays

9d1f5dc

cleaner 0 sized buffers

19b4061

fixed offsets and views

f0db23d

maleadt self-assigned this Jun 19, 2025

VarLad and others added 3 commits June 19, 2025 10:33

update libopencl according to latest headers

c98d60b

Switch to preferences.

7eea327

Improve reporting.

26fdaf7

maleadt force-pushed the bda branch from 014974d to 26fdaf7 Compare June 19, 2025 08:50

Implement pointer query as getproperty call.

abc5009

WIP: Refactor memory hierarchy.

94461cf

maleadt force-pushed the bda branch from 6edf8ac to 94461cf Compare June 19, 2025 13:36

Work around early pointer conversion issue.

08fd080

maleadt mentioned this pull request Jun 19, 2025

Buffers aren't released #208

Closed

Keep minimal support for non-BDA capable drivers.

7be0d40

maleadt force-pushed the bda branch from 44d111c to 7be0d40 Compare June 19, 2025 19:41

Oops.

e5fd026

maleadt approved these changes Jun 19, 2025

View reviewed changes

maleadt removed their assignment Jun 19, 2025

VarLad mentioned this pull request Jun 20, 2025

Consider wrapping beta extensions #334

Open

maleadt added 2 commits June 20, 2025 08:50

Improve capability reporting.

54d9610

Fix CI.

f9b0d01

This was linked to issues Jun 20, 2025

does not support USM or coarse-grained SVM #324

Closed

Support cl_mem based buffers using cl_ext_buffer_device_address extension #281

Closed

maleadt merged commit 51a39ef into JuliaGPU:master Jun 20, 2025
44 checks passed

VarLad deleted the bda branch June 21, 2025 03:03

Add Buffer Device Address Backend #311

Add Buffer Device Address Backend #311

Uh oh!

Conversation

VarLad commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VarLad commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VarLad commented Jun 16, 2025

Uh oh!

VarLad commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleadt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maleadt commented Jun 17, 2025

Uh oh!

VarLad commented Jun 17, 2025

Uh oh!

VarLad commented Jun 17, 2025

Uh oh!

Uh oh!

VarLad commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleadt commented Jun 19, 2025

Uh oh!

maleadt commented Jun 19, 2025

Uh oh!

maleadt commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VarLad commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleadt commented Jun 19, 2025

Uh oh!

VarLad commented Jun 19, 2025

Uh oh!

maleadt commented Jun 19, 2025

Uh oh!

VarLad commented Jun 19, 2025

Uh oh!

maleadt commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleadt commented Jun 19, 2025

Uh oh!

VarLad commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleadt commented Jun 20, 2025

Uh oh!

VarLad commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

VarLad commented May 31, 2025 •

edited

Loading

codecov bot commented May 31, 2025 •

edited

Loading

github-actions bot commented Jun 15, 2025 •

edited

Loading

VarLad commented Jun 15, 2025 •

edited

Loading

VarLad commented Jun 16, 2025 •

edited

Loading

VarLad commented Jun 19, 2025 •

edited

Loading

maleadt commented Jun 19, 2025 •

edited

Loading

VarLad commented Jun 19, 2025 •

edited

Loading

maleadt commented Jun 19, 2025 •

edited

Loading

VarLad commented Jun 19, 2025 •

edited

Loading