Eliminate `GraphExecutionContext`

This issue proposes a simplification to the wasi-nn API: eliminate the `GraphExecutionContext` state object altogether and instead simply pass all tensors to and from an inference call &mdash; `compute(list<tensor>) -> result<list<tensor>, ...>`. This change would make `set_input` and `get_output` unnecessary and they would also be removed.

As background, the WITX IDL is the cause of `GraphExecutionContext`'s existence. As I understood it back when wasi-nn was originally designed, WITX forced us to pass an empty "pointer + length" buffer across to the host so that the host could fill it. This led to `get_output(...)`, which included an index parameter for retrieving tensors from multi-output models (multiple outputs is a possibility that must be handled, though not too common). Because `get_output` was now separate from `compute`, we needed some state to track the inference request &mdash; `GraphExecutionContext`.

Now, with WIT, we can expect the ABI to be able the host-allocate into our WebAssembly linear memory for us. This is better in two ways:
 - with the original "pass a buffer to `get_output(...)`, the user had to statically know how large that buffer should be; this led to thoughts about expanding the API with some introspection (#37); if we replace `GraphExecutionContext` with a single `compute` call, this is no longer a user-facing paper cut
 - the API would now be simpler: if `compute` accepts and returns all the necessary tensors, then `GraphExecutionContext`, `set_input`, and `get_output` can all be removed, making the API easier to explain and use

One consideration here is _ML framework compatibility_: some frameworks (e.g., OpenVINO) expose an equivalent to `GraphExecutionContext` in their external API that must be called by implementations of wasi-nn. But, because this context object can be created inside the implementation, there is no compatibility issue. Implementations of `compute` will simply do a bit more than they currently do, but no more overall work than they do currently.

Another consideration is _memory copying overhead_: will WIT force us to copy the tensor bytes across the guest-host boundary in both directions? Tensors can be large and additional copies could be expensive. For output tensors, this may be unavoidable: when the tensor is generated on the host side during inference it must be made accessible to the Wasm guest somehow &mdash; copying is a simple solution. For input tensors, though, [this discussion](https://github.com/WebAssembly/WASI/pull/549#issuecomment-1667138021) might suggest that there is no WIT-inherent limitation to avoid the copy. If tensor copying becomes a bottleneck, perhaps WIT resources could be the solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eliminate `GraphExecutionContext` #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eliminate GraphExecutionContext #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Eliminate `GraphExecutionContext` #43