[Feat] Generic GPU-based data manipulation #2340

Pessimistress · 2025-02-19T00:01:43Z

Motivation

deck.gl has multiple feature areas that would benefit from a shared & standardized computation module. Example use cases:

To consume data input in memory-efficient formats such as Arrow and Parquet
Attribute animation/transition (currently implemented with ad-hoc transforms)
Data aggregation (currently implemented with ad-hoc transforms)
To perform one-time data transform such as 64bit split, coordinate system conversion (currently implemented on the CPU)

Proposal

Create a new module @luma.gl/gpgpu.

The proposed syntax is strongly inspired by tensorflow.js, especially the functions in Creation, Slicing and Joining, Arithmetic, Basic Math & Reduction

Example API

Creation: returns a wrapper of a GPU buffer

// A column with stride of 1, example: Int32Array [property0, property1, property2, ...]
gpu.array1d(data: TypedArray): GPUAttribute
// A column with stride > 1, example: Float64Array [x0, y0, x1, y1, x2, y2, ...]
gpu.array2d(data: TypedArray, shape: [number, number]): GPUAttribute
// Constant scaler|vec2|vec3|vec4
gpu.constant(value: number | NumericArray): GPUAttribute

Reshape: joining, slicing and/or rearranging GPU buffers

// Interleaving multiple columns, example: [x0, x1, x2, ...] + [y0, y1, y2, ...] -> [x0, y0, x1, y1, x2, y2, ...]
gpu.stack(values: GPUAttribute[]): GPUAttribute
// N-D slicing of interleaved buffer, example: [a0, b0, c1, a1, b1, c1, a2, b2, c2, ...] -> [a0, c0, a1, c1, c2, c2, ...]
gpu.slice(value: GPUAttribute, begin: number[], size: number[]): GPUAttribute

Transform: math operations on GPU buffers

// element-wise add
gpu.add(value: GPUAttribute, 1): GPUAttribute
// min value across dimensions
gpu.min(value: GPUAttribute): number
// map each 64bit float element to two 32bit floats, as in highPart=Math.fround(x) and lowPart = x - highPart
gpu.fp64Split(value: GPUAttribute): [highPart: GPUAttribute, lowPart: GPUAttribute]

Interface with loaders.gl

There is no direct dependency on loaders.gl, but the module can be "loaders friendly" by accepting a Table shaped input:

gpu.array1d(table: Table, columnNameOrIndex: string | number): GPUAttribute

Interface with deck.gl

deck.gl could add support for accessors that return a @luma.gl/gpgpu GPUAttribute object. If such an accessor is provided, instead of filling an attribute's value array on the CPU, the underlying GPU buffer is directly transferred.

Sample layer with JSON input:

import {ScatterplotLayer} from '@deck.gl/layers';
import {extent} from 'd3-array';
import {scaleLog} from 'd3-scale';
import memoize from 'memoize';

const getColorScaleMemoized = memoize(
  data => scaleLog()
    .domain(extent(data, d => d.color_value))
    .range([0, 200, 255, 255], [255, 180, 0, 255]);
);

const layer = new ScatterplotLayer({
  data: 'points.json',

  getPosition: d => [d.x, d.y, d.z],

  getRadius: d => Math.max(Math.min(d.radius_value * 10, 100), 1),

  getFillColor: (d, {data}) => getColorScaleMemoized(data)(d.color_value)
});

Equivalent layer with Arrow input (option A):

import {ScatterplotLayer} from '@deck.gl/layers';
import {gpu} from '@luma.gl/gpgpu';
import {ArrowLoader} from '@loaders.gl/arrow';
import type {Table, ArrowTableBatch} from '@loaders.gl/schema';

const layer = new ScatterplotLayer({
  data: 'points.arrow',
  loaders: [ArrowLoader],

  getPosition: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const x = gpu.array1d(data, 'x');
    const y = gpu.array1d(data, 'y');
    const z = gpu.constant(0);
    return gpu.stack([x, y, z]);
  },

  getRadius: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const value = gpu.array1d(data, 'radius_value');
    return value.mul(10).clipByValue(1, 100);
  },

  getFillColor: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const value = gpu.array1d(data, 'color_value');
    return value.scaleLog([[0, 200, 255, 255], [255, 180, 0, 255]]);
  }
});

Equivalent declarative layer with Arrow input (option B):

{
  "type": "ScatterplotLayer",
  "data": "points.arrow",

  "getPosition": ["x", "y", "z"],

  "getRadius": {
    "source": "radius_value",
    "transform": [
      {"func": "mul", "args": [10]},
      {"func": "clipByValue", "args": [1, 100]}
    ]
  },

  "getFillColor": {
    "source": "color_value",
    "transform": [
      {"func": "scaleLog", "args": [[0, 200, 255, 255], [255, 180, 0, 255]]}
    ]
  }
}

Implementation Considerations

It might be appropriate to move the BufferTransform and TextureTransform classes from the engine module to this new module.
The module will contain multiple "backends" for WebGL2 and WebGPU. Dynamic import can be used to reduce runtime footprint.
Actual GPU resources (shaders/buffer) will need to be lazily allocated/written when the buffer is accessed. This allows a) the JS wrapper to be created without waiting for an available device; b) batching calculations for performance instead of running one render pass for each JS function; c) the buffer to be created on the same device where it will be used for render:
```
gpuAttribute.getBuffer(device: Device): Buffer;
```
Release of no longer needed resources. Consider the following case:
```
getPosition: (_, {data}: {data: Table | ArrowTableBatch}) => {
  const x = gpu.array1d(data, 'x'); // intermediate buffer that will not be needed after evaluation
  const y = gpu.array1d(data, 'y'); // intermediate buffer that will not be needed after evaluation
  const z = gpu.constant(0);
  return gpu.stack([x, y, z]); // output buffer that will be used for render
}
```
We could have something similar to tf.tidy(fn) which cleans up all intermediate tensors allocated by fn except those returned by fn.

Alternatively we could consider using FinalizationRegistry to clean up intermediate buffers, though the application will have less control of when the clean up happens. (i.e. the standard deck.gl Layer tests will fail due to unreleased WebGL resources).

Discussion

Do we want to use an existing external library instead of rolling our own?

First of all, I have not conducted an extensive investigation of existing offerings, so additional comment on this is much welcomed. Based on my own experience, the main pain point (with a long maintenance tail) is context sharing (required for deck.gl to reuse the output GPU buffer without reading it out to the CPU).
- tensorflow.js: a proof-of-concept is available here. It is very mature with a big user base, cross-platform existence, and a variety of backend implementations (WebGL, WebGPU, WebAssembly). The library itself is fairly heavy-weight (> 1 MB minified) with extra machine-learning functionalities, though it could likely be reduced if we re-distribute a tree-shaked bundle. Forcing it to use an external WebGL context is painful because the context state handoff is not clean.
- gpu.js: the ability to write JavaScript functions that get translated to shader code is very appealing. The library has not been updated for 2 years and I doubt there will be WebGPU support.
TBD

@ibgreen @felixpalmer @donmccurdy

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Generic GPU-based data manipulation #2340

[Feat] Generic GPU-based data manipulation #2340

Pessimistress commented Feb 19, 2025 •

edited

Loading

[Feat] Generic GPU-based data manipulation #2340

[Feat] Generic GPU-based data manipulation #2340

Comments

Pessimistress commented Feb 19, 2025 • edited Loading

Motivation

Proposal

Example API

Interface with loaders.gl

Interface with deck.gl

Implementation Considerations

Discussion

Pessimistress commented Feb 19, 2025 •

edited

Loading