Skip to content

Eliminate GraphExecutionContext #43

Closed
@abrown

Description

@abrown

This issue proposes a simplification to the wasi-nn API: eliminate the GraphExecutionContext state object altogether and instead simply pass all tensors to and from an inference call — compute(list<tensor>) -> result<list<tensor>, ...>. This change would make set_input and get_output unnecessary and they would also be removed.

As background, the WITX IDL is the cause of GraphExecutionContext's existence. As I understood it back when wasi-nn was originally designed, WITX forced us to pass an empty "pointer + length" buffer across to the host so that the host could fill it. This led to get_output(...), which included an index parameter for retrieving tensors from multi-output models (multiple outputs is a possibility that must be handled, though not too common). Because get_output was now separate from compute, we needed some state to track the inference request — GraphExecutionContext.

Now, with WIT, we can expect the ABI to be able the host-allocate into our WebAssembly linear memory for us. This is better in two ways:

  • with the original "pass a buffer to get_output(...), the user had to statically know how large that buffer should be; this led to thoughts about expanding the API with some introspection (feature: describe graph inputs and outputs #37); if we replace GraphExecutionContext with a single compute call, this is no longer a user-facing paper cut
  • the API would now be simpler: if compute accepts and returns all the necessary tensors, then GraphExecutionContext, set_input, and get_output can all be removed, making the API easier to explain and use

One consideration here is ML framework compatibility: some frameworks (e.g., OpenVINO) expose an equivalent to GraphExecutionContext in their external API that must be called by implementations of wasi-nn. But, because this context object can be created inside the implementation, there is no compatibility issue. Implementations of compute will simply do a bit more than they currently do, but no more overall work than they do currently.

Another consideration is memory copying overhead: will WIT force us to copy the tensor bytes across the guest-host boundary in both directions? Tensors can be large and additional copies could be expensive. For output tensors, this may be unavoidable: when the tensor is generated on the host side during inference it must be made accessible to the Wasm guest somehow — copying is a simple solution. For input tensors, though, this discussion might suggest that there is no WIT-inherent limitation to avoid the copy. If tensor copying becomes a bottleneck, perhaps WIT resources could be the solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions