You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
deck.gl has multiple feature areas that would benefit from a shared & standardized computation module. Example use cases:
To consume data input in memory-efficient formats such as Arrow and Parquet
Attribute animation/transition (currently implemented with ad-hoc transforms)
Data aggregation (currently implemented with ad-hoc transforms)
To perform one-time data transform such as 64bit split, coordinate system conversion (currently implemented on the CPU)
Proposal
Create a new module @luma.gl/gpgpu.
The proposed syntax is strongly inspired by tensorflow.js, especially the functions in Creation, Slicing and Joining, Arithmetic, Basic Math & Reduction
Example API
Creation: returns a wrapper of a GPU buffer
// A column with stride of 1, example: Int32Array [property0, property1, property2, ...]gpu.array1d(data: TypedArray): GPUAttribute// A column with stride > 1, example: Float64Array [x0, y0, x1, y1, x2, y2, ...]gpu.array2d(data: TypedArray,shape: [number,number]): GPUAttribute// Constant scaler|vec2|vec3|vec4gpu.constant(value: number|NumericArray): GPUAttribute
// element-wise addgpu.add(value: GPUAttribute,1): GPUAttribute// min value across dimensionsgpu.min(value: GPUAttribute): number// map each 64bit float element to two 32bit floats, as in highPart=Math.fround(x) and lowPart = x - highPartgpu.fp64Split(value: GPUAttribute): [highPart: GPUAttribute,lowPart: GPUAttribute]
Interface with loaders.gl
There is no direct dependency on loaders.gl, but the module can be "loaders friendly" by accepting a Table shaped input:
deck.gl could add support for accessors that return a @luma.gl/gpgpuGPUAttribute object. If such an accessor is provided, instead of filling an attribute's value array on the CPU, the underlying GPU buffer is directly transferred.
It might be appropriate to move the BufferTransform and TextureTransform classes from the engine module to this new module.
The module will contain multiple "backends" for WebGL2 and WebGPU. Dynamic import can be used to reduce runtime footprint.
Actual GPU resources (shaders/buffer) will need to be lazily allocated/written when the buffer is accessed. This allows a) the JS wrapper to be created without waiting for an available device; b) batching calculations for performance instead of running one render pass for each JS function; c) the buffer to be created on the same device where it will be used for render:
gpuAttribute.getBuffer(device: Device): Buffer;
Release of no longer needed resources. Consider the following case:
getPosition: (_,{data}: {data: Table|ArrowTableBatch})=>{constx=gpu.array1d(data,'x');// intermediate buffer that will not be needed after evaluationconsty=gpu.array1d(data,'y');// intermediate buffer that will not be needed after evaluationconstz=gpu.constant(0);returngpu.stack([x,y,z]);// output buffer that will be used for render}
We could have something similar to tf.tidy(fn) which cleans up all intermediate tensors allocated by fn except those returned by fn.
Alternatively we could consider using FinalizationRegistry to clean up intermediate buffers, though the application will have less control of when the clean up happens. (i.e. the standard deck.gl Layer tests will fail due to unreleased WebGL resources).
Discussion
Do we want to use an existing external library instead of rolling our own?
First of all, I have not conducted an extensive investigation of existing offerings, so additional comment on this is much welcomed. Based on my own experience, the main pain point (with a long maintenance tail) is context sharing (required for deck.gl to reuse the output GPU buffer without reading it out to the CPU).
tensorflow.js: a proof-of-concept is available here. It is very mature with a big user base, cross-platform existence, and a variety of backend implementations (WebGL, WebGPU, WebAssembly). The library itself is fairly heavy-weight (> 1 MB minified) with extra machine-learning functionalities, though it could likely be reduced if we re-distribute a tree-shaked bundle. Forcing it to use an external WebGL context is painful because the context state handoff is not clean.
gpu.js: the ability to write JavaScript functions that get translated to shader code is very appealing. The library has not been updated for 2 years and I doubt there will be WebGPU support.
Motivation
deck.gl has multiple feature areas that would benefit from a shared & standardized computation module. Example use cases:
Proposal
Create a new module
@luma.gl/gpgpu
.Example API
Creation: returns a wrapper of a GPU buffer
Reshape: joining, slicing and/or rearranging GPU buffers
Transform: math operations on GPU buffers
Interface with loaders.gl
There is no direct dependency on loaders.gl, but the module can be "loaders friendly" by accepting a
Table
shaped input:Interface with deck.gl
deck.gl could add support for accessors that return a
@luma.gl/gpgpu
GPUAttribute
object. If such an accessor is provided, instead of filling an attribute's value array on the CPU, the underlying GPU buffer is directly transferred.Sample layer with JSON input:
Equivalent layer with Arrow input (option A):
Equivalent declarative layer with Arrow input (option B):
Implementation Considerations
It might be appropriate to move the
BufferTransform
andTextureTransform
classes from the engine module to this new module.The module will contain multiple "backends" for WebGL2 and WebGPU. Dynamic import can be used to reduce runtime footprint.
Actual GPU resources (shaders/buffer) will need to be lazily allocated/written when the buffer is accessed. This allows a) the JS wrapper to be created without waiting for an available device; b) batching calculations for performance instead of running one render pass for each JS function; c) the buffer to be created on the same device where it will be used for render:
Release of no longer needed resources. Consider the following case:
We could have something similar to tf.tidy(fn) which cleans up all intermediate tensors allocated by
fn
except those returned byfn
.Alternatively we could consider using FinalizationRegistry to clean up intermediate buffers, though the application will have less control of when the clean up happens. (i.e. the standard deck.gl Layer tests will fail due to unreleased WebGL resources).
Discussion
Do we want to use an existing external library instead of rolling our own?
First of all, I have not conducted an extensive investigation of existing offerings, so additional comment on this is much welcomed. Based on my own experience, the main pain point (with a long maintenance tail) is context sharing (required for deck.gl to reuse the output GPU buffer without reading it out to the CPU).
TBD
@ibgreen @felixpalmer @donmccurdy
The text was updated successfully, but these errors were encountered: