Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EncodedChunks confusingly provide access to constructed ".data" -- despite the decoder making an internal copy. #80

Closed
dalecurtis opened this issue Sep 22, 2020 · 9 comments · Fixed by #174
Assignees
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).

Comments

@dalecurtis
Copy link
Contributor

As folks are starting to play with the API, at least one person has hit a footgun where they modified the encoded chunk after sending it to the decoder. We should just copy before control returns to JS to avoid this.

Obviously this prevents a zero copy on encoded data path, but generally copying the input data isn't expensive.

@koush
Copy link

koush commented Sep 22, 2020

I am that footgun user.
In my head, EncodedVideoChunk seemed like a simple dictionary object (which is why I modified post creation), but I guess more is happening under the hood there.

This was before I had thought the data had been passed to the decoder (via the decode) call: simply constructing the EncodedVideoChunk object with values has side effects.

@sandersdan
Copy link
Contributor

sandersdan commented Sep 22, 2020

There is potential confusion either way. One solution is to be more verbose; instead of an EncodedChunk constructor we could have a static EncodedChunk::CopyFrom(). We can also remove the .data member and provide a copyInto() method (like Plane has), and that would remove a copy in the Chrome implementation.

It does mean that JS codec implementations would have to copy their chunks at least once, but that is likely worth the cost.

@dalecurtis dalecurtis changed the title Decoder implementations should always copy the encoded chunks before returning control to JS. EncodedChunks confusingly provide access to constructed ".data" -- despite the decoder making an internal copy. Sep 22, 2020
@tidoust
Copy link
Member

tidoust commented Sep 23, 2020

Regarding memory copies, I note that ongoing discussions in the workshop on Machine Learning on the Web suggest that, in the case of ML inference applied to media frames, memory copies throughout the processing pipeline (fetch, decoding, processing, rendering) may take as much time as the ML inference itself. See the Memory copies issue for details.

I'm proposing a dedicated virtual breakout session during TPAC breakout week (26-30 October) on Memory copies & zero-copy operations on the Web, so that we can better understand when copies are needed, expensive/inexpensive, etc.

@dalecurtis
Copy link
Contributor Author

Thanks. The one API that would help us avoid this is an immutable ArrayBuffer.

Note: We're only discussing copies of encoded data here, not the raw frames. So in real-time cases I wouldn't expect these copies to go over ~15MBit/s. The copies may slow down faster-than-realtime encoding, but I would still expect them to be smaller than the raw frames used by ML.

@koush
Copy link

koush commented Sep 23, 2020

This was part of the reason I managed to run into this bug: I assumed the underlying pipeline would be zero copy (as that's something I strive towards for performance reasons). Ie, the ArrayBuffer I pass in belongs to the decoder until the call/promise returns.

Similar to Android's MediaCodec's management of input/output buffers that are queued/dequeued.

@chcunningham chcunningham added this to the Q4 2020 milestone Dec 4, 2020
@chcunningham
Copy link
Collaborator

See related issue #104.

One solution is to be more verbose; instead of an EncodedChunk constructor we could have a static EncodedChunk::CopyFrom()

Right now we have 2 copies.

  1. at creation of EncodedChunk
  2. sometime after decode() is called (when the decode is actually processed)

#104 is about eliminating copy number 2. But what if we eliminate both. Rather than EncodedChunk::CopyFrom(), we could define EncodedChunk::TransferTo() (naming needs work), which would transfer the underlying bytes from input buffer, leaving neutering the callers reference.

@sandersdan
Copy link
Contributor

General note: all of our APIs that take ArrayBuffers should probably have a transfer flag or transferList.

Either way we do it, one of the two operations will be eager:

  • Encoder implementation outputs EncodedChunk: ideally we would not yet create an ArrayBuffer for the data.
  • Demuxer creates EncodedChunk: we should transfer an ArrayBuffer to avoid copying. (Note that there may be cases where copying is actually better, EncodedChunks are often small.)

If we can avoid exposing any implementation details of EncodedChunk then it could be generic over different backings. (That would be ArrayBuffer and DecoderBuffer in Chrome.)

@padenot
Copy link
Collaborator

padenot commented May 5, 2021

General note: all of our APIs that take ArrayBuffers should probably have a transfer flag or transferList.

#104, but also #212.

@chcunningham chcunningham added the breaking Interface changes that would break current usage (producing errors or undesired behavior). label May 12, 2021
@chcunningham
Copy link
Collaborator

Triage note: 'breaking', as we are removing an attribute (replaced by a method).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants