Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Generic GPU-based data manipulation #2340

Open
Pessimistress opened this issue Feb 19, 2025 · 0 comments
Open

[Feat] Generic GPU-based data manipulation #2340

Pessimistress opened this issue Feb 19, 2025 · 0 comments

Comments

@Pessimistress
Copy link
Collaborator

Pessimistress commented Feb 19, 2025

Motivation

deck.gl has multiple feature areas that would benefit from a shared & standardized computation module. Example use cases:

  • To consume data input in memory-efficient formats such as Arrow and Parquet
  • Attribute animation/transition (currently implemented with ad-hoc transforms)
  • Data aggregation (currently implemented with ad-hoc transforms)
  • To perform one-time data transform such as 64bit split, coordinate system conversion (currently implemented on the CPU)

Proposal

Create a new module @luma.gl/gpgpu.

The proposed syntax is strongly inspired by tensorflow.js, especially the functions in Creation, Slicing and Joining, Arithmetic, Basic Math & Reduction

Example API

Creation: returns a wrapper of a GPU buffer

// A column with stride of 1, example: Int32Array [property0, property1, property2, ...]
gpu.array1d(data: TypedArray): GPUAttribute
// A column with stride > 1, example: Float64Array [x0, y0, x1, y1, x2, y2, ...]
gpu.array2d(data: TypedArray, shape: [number, number]): GPUAttribute
// Constant scaler|vec2|vec3|vec4
gpu.constant(value: number | NumericArray): GPUAttribute

Reshape: joining, slicing and/or rearranging GPU buffers

// Interleaving multiple columns, example: [x0, x1, x2, ...] + [y0, y1, y2, ...] -> [x0, y0, x1, y1, x2, y2, ...]
gpu.stack(values: GPUAttribute[]): GPUAttribute
// N-D slicing of interleaved buffer, example: [a0, b0, c1, a1, b1, c1, a2, b2, c2, ...] -> [a0, c0, a1, c1, c2, c2, ...]
gpu.slice(value: GPUAttribute, begin: number[], size: number[]): GPUAttribute

Transform: math operations on GPU buffers

// element-wise add
gpu.add(value: GPUAttribute, 1): GPUAttribute
// min value across dimensions
gpu.min(value: GPUAttribute): number
// map each 64bit float element to two 32bit floats, as in highPart=Math.fround(x) and lowPart = x - highPart
gpu.fp64Split(value: GPUAttribute): [highPart: GPUAttribute, lowPart: GPUAttribute]

Interface with loaders.gl

There is no direct dependency on loaders.gl, but the module can be "loaders friendly" by accepting a Table shaped input:

gpu.array1d(table: Table, columnNameOrIndex: string | number): GPUAttribute

Interface with deck.gl

deck.gl could add support for accessors that return a @luma.gl/gpgpu GPUAttribute object. If such an accessor is provided, instead of filling an attribute's value array on the CPU, the underlying GPU buffer is directly transferred.

Sample layer with JSON input:

import {ScatterplotLayer} from '@deck.gl/layers';
import {extent} from 'd3-array';
import {scaleLog} from 'd3-scale';
import memoize from 'memoize';

const getColorScaleMemoized = memoize(
  data => scaleLog()
    .domain(extent(data, d => d.color_value))
    .range([0, 200, 255, 255], [255, 180, 0, 255]);
);

const layer = new ScatterplotLayer({
  data: 'points.json',

  getPosition: d => [d.x, d.y, d.z],

  getRadius: d => Math.max(Math.min(d.radius_value * 10, 100), 1),

  getFillColor: (d, {data}) => getColorScaleMemoized(data)(d.color_value)
});

Equivalent layer with Arrow input (option A):

import {ScatterplotLayer} from '@deck.gl/layers';
import {gpu} from '@luma.gl/gpgpu';
import {ArrowLoader} from '@loaders.gl/arrow';
import type {Table, ArrowTableBatch} from '@loaders.gl/schema';

const layer = new ScatterplotLayer({
  data: 'points.arrow',
  loaders: [ArrowLoader],

  getPosition: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const x = gpu.array1d(data, 'x');
    const y = gpu.array1d(data, 'y');
    const z = gpu.constant(0);
    return gpu.stack([x, y, z]);
  },

  getRadius: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const value = gpu.array1d(data, 'radius_value');
    return value.mul(10).clipByValue(1, 100);
  },

  getFillColor: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const value = gpu.array1d(data, 'color_value');
    return value.scaleLog([[0, 200, 255, 255], [255, 180, 0, 255]]);
  }
});

Equivalent declarative layer with Arrow input (option B):

{
  "type": "ScatterplotLayer",
  "data": "points.arrow",

  "getPosition": ["x", "y", "z"],

  "getRadius": {
    "source": "radius_value",
    "transform": [
      {"func": "mul", "args": [10]},
      {"func": "clipByValue", "args": [1, 100]}
    ]
  },

  "getFillColor": {
    "source": "color_value",
    "transform": [
      {"func": "scaleLog", "args": [[0, 200, 255, 255], [255, 180, 0, 255]]}
    ]
  }
}

Implementation Considerations

  • It might be appropriate to move the BufferTransform and TextureTransform classes from the engine module to this new module.

  • The module will contain multiple "backends" for WebGL2 and WebGPU. Dynamic import can be used to reduce runtime footprint.

  • Actual GPU resources (shaders/buffer) will need to be lazily allocated/written when the buffer is accessed. This allows a) the JS wrapper to be created without waiting for an available device; b) batching calculations for performance instead of running one render pass for each JS function; c) the buffer to be created on the same device where it will be used for render:

    gpuAttribute.getBuffer(device: Device): Buffer;
  • Release of no longer needed resources. Consider the following case:

    getPosition: (_, {data}: {data: Table | ArrowTableBatch}) => {
      const x = gpu.array1d(data, 'x'); // intermediate buffer that will not be needed after evaluation
      const y = gpu.array1d(data, 'y'); // intermediate buffer that will not be needed after evaluation
      const z = gpu.constant(0);
      return gpu.stack([x, y, z]); // output buffer that will be used for render
    }

    We could have something similar to tf.tidy(fn) which cleans up all intermediate tensors allocated by fn except those returned by fn.

    Alternatively we could consider using FinalizationRegistry to clean up intermediate buffers, though the application will have less control of when the clean up happens. (i.e. the standard deck.gl Layer tests will fail due to unreleased WebGL resources).

Discussion

  • Do we want to use an existing external library instead of rolling our own?

    First of all, I have not conducted an extensive investigation of existing offerings, so additional comment on this is much welcomed. Based on my own experience, the main pain point (with a long maintenance tail) is context sharing (required for deck.gl to reuse the output GPU buffer without reading it out to the CPU).

    • tensorflow.js: a proof-of-concept is available here. It is very mature with a big user base, cross-platform existence, and a variety of backend implementations (WebGL, WebGPU, WebAssembly). The library itself is fairly heavy-weight (> 1 MB minified) with extra machine-learning functionalities, though it could likely be reduced if we re-distribute a tree-shaked bundle. Forcing it to use an external WebGL context is painful because the context state handoff is not clean.
    • gpu.js: the ability to write JavaScript functions that get translated to shader code is very appealing. The library has not been updated for 2 years and I doubt there will be WebGPU support.
  • TBD

@ibgreen @felixpalmer @donmccurdy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant