Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early feedback request for the API shape #2

Open
bialpio opened this issue Sep 23, 2020 · 17 comments
Open

Early feedback request for the API shape #2

bialpio opened this issue Sep 23, 2020 · 17 comments

Comments

@bialpio
Copy link
Contributor

bialpio commented Sep 23, 2020

Hey all,

I'd like to ask people to take a look at the initial version of the explainer and let me know if there are any major problems with the current approach (either as a comment under this issue, or by filing a new issue). I'm looking mostly for feedback around API ergonomics / general usage, and possible challenges related to implementation of the API on various different kinds of hardware / in different browsers / etc.

+@toji, @thetuvix, @grorg, @cabanier, @mrdoob, @elalish

@cabanier
Copy link
Member

A couple of first thoughts:

  • Is there a way to directly push this data to the GPU? Having an intermediate JS buffer will create overhead (and more GC). If the data is going to be used by the GPU, it seems more appropriate to surface it there.
  • How is a single depth buffer going to work with stereo? Will each eye have its own depth buffer?

@bialpio
Copy link
Contributor Author

bialpio commented Sep 24, 2020

Thanks for taking a look! Responses below.

  • Is there a way to directly push this data to the GPU? Having an intermediate JS buffer will create overhead (and more GC). If the data is going to be used by the GPU, it seems more appropriate to surface it there.

We can extend the API to expose this capability via a method on XRWebGLBinding to hand out a WebGLTexture directly - would this work here? This way, implementations would be able to implement the API with less overhead if the underlying APIs allow it to.

2 things worth calling out about the approach with WebGLTexture:

  • What would be the expected lifetime of the texture? In our initial prototypes w/ raw camera access API, we have run into the issues related to that. We wanted to fully control the texture's lifetime, which is not really possible when exposing WebGLTexture, & implementations relying on swap chain / reusing buffers might run into this as well.
  • In our implementation, we'll have to push the data to the GPU ourselves since ARCore does not provide a GL texture for us to use. This is fine, we can call the glTexImage2D() in the native code, but it does mean that there's still going to be some overhead.
  • How is a single depth buffer going to work with stereo? Will each eye have its own depth buffer?

I'd say each eye can have its own depth buffer - the API already accepts XRView which is associated with a specific eye, so the implementation can use that to decide what needs to be returned - if it is able to provide stereo depth buffers, it could do so.

Let me know if that makes sense!

@cabanier
Copy link
Member

What would be the expected lifetime of the texture

The layers spec defines a texture that is only valid during the Raf call where it was created: https://immersive-web.github.io/layers/#xropaquetextures
Maybe you could use something similar? I'd love to see other specs use these texture type so we can all work on the proper definition.

In our implementation, we'll have to push the data to the GPU ourselves since ARCore does not provide a GL texture for us to use. This is fine, we can call the glTexImage2D() in the native code, but it does mean that there's still going to be some overhead.

You could also do it all in the browser process so you can avoid having to send the data to the render process and back.

I'd say each eye can have its own depth buffer

Does that mean that devices that have a single physical depth sensor won't be able to support this spec? Or would they have to reproject the depth map for each eye?

@bialpio
Copy link
Contributor Author

bialpio commented Sep 25, 2020

The layers spec defines a texture that is only valid during the Raf call where it was created: https://immersive-web.github.io/layers/#xropaquetextures
Maybe you could use something similar? I'd love to see other specs use these texture type so we can all work on the proper definition.

Ooh, thanks for the pointer! I think this is effectively what we currently do in our prototype implementation for raw camera access API, but I was concerned about exposing the data as a WebGLTexture since that seemed to me like a potential footgun - for example, users could still try and call gl.deleteTexture() on them. Maybe we can try to work around this by providing very detailed diagnostics for the cases where the textures are not used correctly.

You could also do it all in the browser process so you can avoid having to send the data to the render process and back.

Good point, if we do not care about exposing the data on the CPU at all, this is definitely something we should do. The question then is: should we focus only on the GPU use case? From what I've seen, the data we get from ARCore is detailed enough that leveraging it on the CPU for physics is probably going to be beneficial, but for GPU use cases (occlusion), it won't be good enough.

Does that mean that devices that have a single physical depth sensor won't be able to support this spec? Or would they have to reproject the depth map for each eye?

Either reproject the depth map, or expose an additional XRView that will only be used to obtain that single depth map. In that case, it may be better to expose the more sensor-oriented features via something else than XRView (raw camera access API also happens to need such mechanism, so we can think about the best way to expose those kinds of features).

@cabanier
Copy link
Member

users could still try and call gl.deleteTexture() on them. Maybe we can try to work around this by providing very detailed diagnostics for the cases where the textures are not used correctly.

deleteTexture should indeed not be allowed. I will update the spec.
Can you think of other calls that should be disallowed?

Does that mean that devices that have a single physical depth sensor won't be able to support this spec? Or would they have to reproject the depth map for each eye?

Either reproject the depth map, or expose an additional XRView that will only be used to obtain that single depth map. In that case, it may be better to expose the more sensor-oriented features via something else than XRView (raw camera access API also happens to need such mechanism, so we can think about the best way to expose those kinds of features).

I suspect the system would have to do the reprojection because it would be a big burden on the author to do this correctly.
Do you know of any stereoscopic systems that expose a depth map?

@bialpio
Copy link
Contributor Author

bialpio commented Sep 25, 2020

users could still try and call gl.deleteTexture() on them. Maybe we can try to work around this by providing very detailed diagnostics for the cases where the textures are not used correctly.

deleteTexture should indeed not be allowed. I will update the spec.
Can you think of other calls that should be disallowed?

No, the gl.deleteTexture() is what I was mostly concerned about.

Does that mean that devices that have a single physical depth sensor won't be able to support this spec? Or would they have to reproject the depth map for each eye?

Either reproject the depth map, or expose an additional XRView that will only be used to obtain that single depth map. In that case, it may be better to expose the more sensor-oriented features via something else than XRView (raw camera access API also happens to need such mechanism, so we can think about the best way to expose those kinds of features).

I suspect the system would have to do the reprojection because it would be a big burden on the author to do this correctly.

Won't this depend on the intended usage of the depth map? If the goal is to perform some simple environment reconstruction based on depth map data, I imagine the authors will already need to know what they are doing when trying to leverage the data we return.

Do you know of any stereoscopic systems that expose a depth map?

Unfortunately, I do not - I've seen articles about HoloLens Research Mode that referred to depth cameras, but I did not see a way to access the data through the API. FWIW, based on documentation available from ARCore, it seems that RGB camera is sufficient to provide a (maybe less precise) depth map, so maybe it's something that could be made available on stereoscopic devices with multiple cameras as well.

@bialpio
Copy link
Contributor Author

bialpio commented Dec 1, 2020

Call for action for potential implementers of the depth-sensing API! Please take a look at the issues linked above, explainer, and at the PR #8 with early draft of the spec (it should hopefully contain most of the relevant details), taking extra care to ensure that the API can be implemented in other browsers!

Other things that will be adjusted that I am tracking:

@bialpio
Copy link
Contributor Author

bialpio commented Dec 2, 2020

/agenda for visibility & to ensure that the initially proposed API shape is implementable for other form factors

@probot-label probot-label bot added the agenda label Dec 2, 2020
@nbutko
Copy link

nbutko commented Dec 8, 2020

For texture lifetime, consider a flow where the user allocates a texture, and receives data in it through texImage2D.

@nbutko
Copy link

nbutko commented Dec 8, 2020

In the explainer, it looks like the proposal is to use texImage2D to upload a buffer that's in memory. The gpu->cpu->gpu round-trip could be minimized by creating a new source for texImage2D, similar to gl.texImage2D(target, level, internalformat, format, type, HTMLVideoElement? pixels); or gl.texImage2D(target, level, internalformat, format, type, HTMLCanvasElement? pixels);.

@kdashg
Copy link

kdashg commented Dec 8, 2020

We really should have a texImage2D(XRDepthInformation), and make the cpu-data path optional. (maybe Promise<Uint16Array> getData()?)

@kdashg
Copy link

kdashg commented Dec 8, 2020

In particular, having to go through the cpu data path means a couple of extra copies:

  • 1: getDepthInformation: 1-2 copies
    • A: (maybe) readback from GPU to cpu
    • B: copy from privileged process to content process
  • 2: texImage2D(ArrayBufferView): 2 copies
    • A: copy from content process to privileged process
    • B: upload to GPU resource

@kdashg
Copy link

kdashg commented Dec 8, 2020

There are some concerns about how to upload this to WebGL but I think DEPTH_COMPONENT16 would always work, and would be great for this, allowing both direct sampling of values (through the R channel) as well as supporting depth-sampling operations, which I can imagine might be desirable here. ;)

@Yonet Yonet removed the agenda label Dec 8, 2020
@Maksims
Copy link

Maksims commented Dec 8, 2020

and make the cpu-data path optional. (maybe Promise<Uint16Array> getData()?)

Async API here, definitely would be not good, as real-time applications need this data in sync manner. There is time in background IO, to pass that data before it is needed by JS thread, so sync blocking - should not be an issue.

@thetuvix
Copy link

thetuvix commented Dec 9, 2020

On HoloLens 2, serving up depth images to apps will involve rendering world mesh on the GPU, and so it would be wasteful for an app that intends to use the data on the GPU (e.g. for occlusion) to move the images from GPU to CPU and back to GPU again.

Excited to find the right GPU-first path here for apps that end up using GPU-derived images on the GPU!

@bialpio
Copy link
Contributor Author

bialpio commented Dec 14, 2020

Please take a look at PR #11, hopefully it addresses the issues we chatted about during the call. Most importantly, it allows the user to specify the desired usage of data, thus allowing the user agents to do the right thing and minimize the amount of data round trips. Additionally, it allows the UAs to express their preferred usage and data format, with the intent that if the user picks the preferred settings, it would incur the lowest possible cost.

@wischi-chr
Copy link

I took a look at the current API and couldn't find anything about "raw depth" information - I'm not sure if this is the right place to ask/give feedback.

Currently the reported depth values seem to be smoothed which might be good in a lot of cases, but in situations where the device (like a smartphone) is used to 3D scan an environment (just collect data, and post process later) it would be great if there would be an option to report raw depth (and confidence if available).

I haven't thought about what a good API for that would look like, but IMO it would certainly be good to have that option because currently the only option is to use native arcore (or arkit) to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants