-
-
Notifications
You must be signed in to change notification settings - Fork 37
compress vertex color data to [u8;4] #49
base: master
Are you sure you want to change the base?
Conversation
I just read your sections on HDR, and to clarify this only affects the colors you supply to tint sprites. All the texture data, and anything in post-processing is all the same as before so scene output should still be in HDR. |
Thank you for this! It's a good example of an optimization I probably never would've thought of myself. There is a slight problem with this in that it breaks using tint colors as a sort of HDR emmissive boost. Here's a simple example simple_game!("Nice red circle", update);
fn update(_c: &mut EngineContext) {
game_config_mut().bloom_enabled = true;
draw_circle(vec2(0.0, 0.0), 0.5, RED * 5.0, 0);
} with the result being a glowing red circle This is something that is very useful for things like lasers (we use it in nanovoid and a few other places) where you can make things glow by just boosting the color. Interestingly it seems that different engines do this differently. I haven't double checked but iirc in Godot you can specify RGBA as float values while in Unity there's an "intensity" multiplier while the color remains as 0-255 RGB? I'm not sure what the best solution here is honestly, HDR colors are definitely not something used on every sprite, so it would make sense to not have to pay the overhead. On the other hand I would like to avoid having to jump through hoops whenever the user wants this, as I tend to use it quite a bit in random places. Feels like there's two possible options:
The nice thing about the second solution is that it keeps the renderer simpler at a small cost to performance, but it's not clear to me what the user facing API should look like. If Comfy had user-defined vertex attributes & shaders this would be a lot simpler though. I'm not sure how much work it would be to add that. My gut feeling says that maybe it wouldn't be too bad. I'd be happy to talk about specifics of optimizing the sprite rendering in other ways, if you want feel free to join the Discord linked in the repo, I'm online basically daily, or we can just discuss it here if you prefer. |
Hmm yeah, I've never worked anything with HDR (don't have the hardware) - I mostly figured it was a matter of precision in the floats and things were still in ranges 0-1f, but it sounds like you know more than me there. I think your approach of adding an extra float that controls intensity on top is not bad. I imagine there is some way to approximate an HDR color, by deciding an avg. intensity based on the color provided by the user, the only downside being that intensity is shared between all color channels instead of each having it's own? In fact (wgpu specifics barred) it probably won't incur much if any extra cost on the GPU side. on a driver level my understanding is that the GPU actually only accepts buffers in increments of 4 floats (16 bytes) - so in a case like this each set vertex data ends up getting padded with 8 bytes of empty data right now. So there is some cost in line with sending a bit of extra data, but it also saves some work the GPU does too. lots of weird tradeoffs! The REAL path to both optimizing this, and supporting HDR colorspace better would be to avoid sending sprite colors into the vertex buffer at all. Each vertex doesn't really need it's own color, since you tint the whole sprite it's all being duplicated 4x. There are plenty of ways to do this but I think a simple one is having sprite tints in a SSBO, where you only have 1-per-sprite, and using the vertex ID in the vertex shader to sample that array. It would be up to you with that ssbo whether you want HDR colors or not, but it would be faster to still skip them there (I could imagine a feature flag switching like that). It's a bit more invasive logic (and end users would have to replicate it in their custom shaders when you support that) Anyways I'd be glad to join the discord and chat more this evening - I checked the link last night but was hesitant because it seemed like it wasn't for the engine itself (it's pretty large!) |
HDR at least in comfy is not about having HDR output on the monitor though :) It gets picked up by the bloom shader, where higher HDR values result in more glow. It also ends up being tonemapped to 0-1 to be displayed normally. One of the main benefits of working in HDR is that your lighting won't clip and things just kind of naturally add up together, can influence other effects (like bloom), and then get blended back together via tonemapping.
I'm not sure if I understand correctly but in a sense this is what tonemapping does at the end. The problem with not having HDR in the pipeline is that values end up clipping when they're affected by lighting. This can of course be managed by setting up your lighting in a way where it won't clip, but literally the #1 reason comfy exists is because macroquad didn't support F16 textures and thus HDR workflows, and would force the user to carefully manage their lighting setup, while in HDR all of this works out naturally. I may be oversimplifying it, because "just manage your lights" is not even enough. Consider a sprite which has near white or white colors on it, and then you add a light source. In HDR this is not a problem at all. Without HDR the color you get is unrepresentable and you either have to scale down all your sprites with something like an ambient color multiplier, or accept clipping.
This makes a lot of sense to me, I think it also ties back into needing more than one vertex attribute layout, because one might want this for say particle trails with gradients where having different vertex colors gives you a gradient for free, but of course this is silly overhead for every single sprite.
This might be a little problematic, because afaik SSBOs aren't supported in WebGL. I'm not 100% certain, but I think dynamically sized buffers (basically SSBOs) aren't possible in wgpu when targetting WASM, while the only solution is to use fixed size uniform buffers (UBOs in GL?). Comfy does this at the moment for point lights where we have a static 128 light buffer https://github.com/darthdeus/comfy/blob/master/comfy-wgpu/shaders/sprite.wgsl#L15-L18. It's quite possible there's a better way to do this though, it's not something I looked at too deeply.
This is vertex pulling right? Just checking because I keep hearing about it but never used it :)
I feel like we'll have to experiment with this quite a bit and maybe consider a few different layers of abstraction. Ideally I'd want the core comfy renderer to be flexible enough to allow different vertex attribute layouts, especially since the goal is to support simple 3D in not so distant future (realistically after the current roadmap is done). At a very high level I'm imagining maybe something closer to miniquad than what we have now, and implement the "default sprite rendering logic" on top. That should also allow us to have a much better system for particles, trails, etc.
It's a bit chaotic at the moment because I didn't want to create a second server just for Comfy, but there is a dedicated channel right now :) We'll add some "channel selection roles" so people who only care about Comfy would only see those, and people who care about games would see those instead. |
For this I was specifically referring to including an intensity field to sprite drawing - instead of a user-supplied intensity you could derive an intensity from the color they pass in, basically pick an intensity for them so they don't have to worry. I think I have a better picture of how your HDR setup works now though, too haha
https://webgl2fundamentals.org/webgl/lessons/webgl-pulling-vertices.html Vertex pulling is basically doing what instancing does, just manually in your vertex shader. So it would also be possible to fallback to instancing which is supported in webgl2. At first I avoided it hearing it's less efficient for small meshes like quads (there is a minimum number of compute units the gpu dispatches per mesh, which isn't ideal for 4 vertices) - but I've found in practice that isn't a limiting factor. I've found when you get rid of the CPU bottleneck the next biggest thing to improve on the GPU is simply the number of painted pixels by the fragment shader, so the physical size on screen of the geometry. Once you reach that point the only real thing you can do to improve performance is scale sprites down (not textures, just the size you show them onscreen), which probably isn't very useful to do hahaha For reference in my own testing, this is sort of the heirarchy I've found: I think it depends on your overall goal but I could see a situation where you pick one of the 3 that you find suits the project best, and simply fallback to meshes when platform compatibility is an issue. I think given you roll a world/physics system and a number of other things, that will probably supersede the difference between vertex pulling and instancing (and geometry shaders aren't supported in wgpu if i recall?), so you could really go with whatever is ergonomic. I simply suggested pulling, because it would require the smallest changes to your current setup |
Hi, I saw your chat on reddit when you first released comfy, talking about bunnymark performance, and now that I've seen your official benchmark I thought I would try to contribute an improvement.
This changes vertices to use a [u8;4], with rgba ranging from 0-255, rather than a [f32;4] to represent color data - the gpu automatically reconstructs it as a vec4 in the shader, so no shader changes are needed. The main benefit of doing this is decreasing the bandwidth for transferring vertex data from sys mem -> gpu mem. Since color data has to be repeated for each vertex, it adds a lot of bulk to the vertex buffer.
For 2D tests like a bunnymark, cutting down on buffer size is the biggest bottleneck for most frameworks (even before game/engine logic). Most bunnymarks out there have a CPU bottleneck because uploading all that data is a blocking operation. based on your comfymark target of 50fps, on my system (linux, r9 5900x, rtx2070) - this change brought the comfymark being capable of ~45,000 up to ~65,000.
I am not super familar with the inner workings of wgpu (im most familiar with pure opengl), or anything specific you do in comfy. It does seem like your framework's entity/world concept probably does create some overhead which is to be expected, and will probably limit a benchmark like this in the end - but there are some other changes to the pipeline that could reduce the 'bandwidth' issue further - though they'd require more complex/invasive changes than this PR
If you'd like I'd be happy to talk more specifics about optimizing the sprite rendering - I've put a good amount of time into figuring out specifically the bunnymark, so it'd be nice to be able to share some of that ahaha