Rewrite FlatMesh #47

FlorentRevest · 2023-05-21T14:21:09Z

Problem 1 (Aesthetic):
The current FlatMesh works by:

generating an orthonormal grid of base points
leaving a square of free movement to every point This leads to a "stiff" square feeling that clashes with round screens.

Solution:
The new FlatMesh is based on a circle packing layout instead: each point has a circle in which it can freely move. Base points are not laid out in any obviously predictable patterns leading to a more chaotic layout. In addition, instead of animating quadrilateral shapes, the new FlatMesh animates triangles, this also contributes to a less "square" feeling.

Problem 2 (CPU use):
The current FlatMesh does all computations on the watch CPU leading to unnecessarily wasted CPU cycles and difficulty to evolve this code into more complicated use-cases. Some of the floating point operations like coordinates interpolation are unnecessarily expensive on the floating point unit when the GPU sits next to us and eats these for breakfast.

Solution:
The new code is essentially split in 4 different levels:

Very expensive operations that only need to be run once (ever). These operations are done in the generate_flatmeshgeometry.py script. The idea is to have complete freedom of visualization/experimentation at development time and not introduce any complexity on the watch. One can implement fairly advanced optimizations leveraging modules like Pandas to parse pre-calculated circle-packing results or like PyVista to generate Delaunay triangulations and triangle strips. The output of this script is cached into packed buffers in a flatmeshgeometry.h header.
Somewhat expensive operations that run very seldomly: These stay on the watch CPU. For example, center/outer colors mixes. Most FlatMeshes never change their colors so we only need to calculate this once. Still, some aspects of this are pre-calculated in Python, like the mix ratio but the actual mixing is done when the colors are available, on the watch. (side note: the current FlatMesh only lets QML update the center and outer colors one after the other leading to two updateColors() run which is unnecessarily expensive especially when animating colors for example in the default applauncher. This implementation exposes a setColors() function that lets animations set both colors at once and avoid running unnecessary updates)
Expensive operations that run very often are off-loaded to the GPU on a vertex shader. Vertex shaders run for every frame and every vertex but they benefit from specialized HW units. For example, interpolating vectors (to mix shifts) is a routine GPU operation and much cheaper than on a CPU
Per-pixel operations are reduced to a bare minimum with the simplest possible fragment shader that just forwards the color of a provoking vertex as the color of each pixel in a triangle. This is called flat-shading and lets us skip the fragment interpolation unit of the GPU. Depending on the GPU implementation, this may or may not save cycles (but this is not actually measured)

Problem 3 (GPU use):
The current FlatMesh geometry makes an inefficient use of the Scene Graph and GPU bus. It creates one QSGGeometryNode per triangle so each triangle has a different VAO and shaders runs are unnecessarily serialized. Also the scene graph is unnecessarily deep, most nodes evolve in the same way anyway and don't need separate handling (they are all marked dirty at the same time for example).

Solution:
By representing the mesh as one big QSGGeometryNode, one can run more vertex shaders in parallel and save significant GPU cycles. Also, by using an appropriate triangle strip EBO, one can massively reduce the number of vertices transmitted to the GPU and the numbebr of vertex shaders there even needs to run.
The flat-shading model maximizes the number of vertices that can be re-used since only the last vertex of each triangle (the provoking vertex) provides the color of the triangle so the two other vertices can be re-used from other triangles, even if they hold the color of another triangle. This leads to significant vertex re-use compared to the current QSGGeometryNodes.
Finally, the geometry is never invalidated because we expose shifts in a pre-computed large uniform vectors array and the shader indexes shifts from a hash of the coordinates and a global loop iteration count. This reduces the number of exchanges on the GPU bus since the vertex can pretty much operate independently from the CPU side (only one uniform between 0. and 1. needs to be updated to move the shift mix forward).

Problem 4 (Maintainability)
Contributors in the past have complained that the FlatMesh was a black-box they wouldn't understand. The code was poorly documented and obscure at first read.

Solution:
One needs a bit of OpenGL background to follow along but this code tries to extensively comment every operation and decision. Hopefully the code is structured in a way that makes it easy to follow for someone willing to learn OpenGL first. The architecture of the code should also make it simple to concentrate on subproblems: for example, if one wants to experiment with replacing the circle packing coordinates with an hexagonal packing, they only need to change the line 16 and 113-114 of generate_flatmeshgeometry.py. The rest of the code will naturally adapt.

Some questions are still left up in the air though:

On aesthetic:

What should be our round/square screen strategy ? The current FlatMesh renders the same squared content on both screen types and just relies on asteroid-launcher to clip a circle on round screens. This means that the outer colors (in the corners) are never shown on a round watch. This also means that we animate vertices that are off-screen. We have an opportunity to rethink this here. We could imagine having different base points on both screen types. (www.packomania.com also has circles in squares packings although they feel a lot more regular) We could clip a square out of the circle, that's actually what we do here, this is convenient since it means that the outer colors will show on both screen types but this also changes the look and feel a little bit and means that we now have less triangles on a round screen than on a round screen (opposite from the current situation)
How should we tweak the macro-parameters exactly ? I have spent most of my time optimizing the code but not so much time optimizing the look and feel. There are a few parameters that are easy to tweak, namely: the number of points (first line of generate_flatmeshgeometry) which makes things look more or less "low-poly". The color mix ratio exponent (currently 1.7 in generate_flatmeshgeometry.py) which changes how quickly the color gradient changes (it makes the screen overall a little bit brighter by keeping the center color a bit longer) or the shift mix animation easing curve (currently InOutQuad) that changes how the FlatMesh moves. I also had in mind that we could "wrap" the vertices a little bit such that center triangles end up being a bit larger and outer triangles a bit more squished against the screen borders. This could be done with a pos *= .2*cos(3.14*length(pos)) for example but I did not achieve a satisfying result and left this idea out for now.

On CPU use:

The new FlatMesh hooks into the Qt animation framework to interpolate the "shift mix". This has a few pros: 1- the animation is butter smooth since it syncs with the screen refresh rate 2- the animation clock is shared with other animations potentially saving cycles when using multiple animations) 3- this keeps the code very neat and tidy 4- this trivially lets us use different easing curves like the InOutQuad which makes the animation feel more organic then a linear interpolation. However, this also has a drawback: since the update interval is higher than our current manually-tuned timer, updates run more often and this leads to an overall higher CPU usage than the old FlatMesh! With the current FlatMesh, asteroid-flashlight idles at ~13/14% CPU on my bass whereas the new FlatMesh idles at ~16/17%. Either we accept this price and get all the above benefits or we fallback to using a custom timer and save some CPU time but loose the butter-smoothness/shared clock/code simplicity/InOutQuad...

On GPU use:

A lot of the GPU optimizations we leverage here depend on the availability of the "flat" GLSL keyword which is only available starting from GLES3.0. This effectively bumps AsteroidOS's minimum requirements. This works on my oldest watch that still runs (my bass, RIP my dory :() so I don't expect it to be an issue but it's good to keep in mind and we should properly test this before rolling it out. If this turns out to be an issue, we could implement a non-optimized version of this that does not benefit from the flat shading and vertex re-use optimizations

On maintainability:

It looks like Qt6 changes the SceneGraph API such that we can no longer call OpenGL functions directly (like glEnable()) this could mean that we'll no longer be able to use the fixed index primitive restart extension and instead of using 0xFF in the indices table to jump from one triangles strip to another we may have to generate empty triangles instead (by reusing the last index of the previous strip and the first indeex of the next strip) I expect this shouldn't cost very much on the GPU side since vertex shaders would run just as often and no fragment shaders should run for the empty triangles.
It also looks like they changed the shader format to "Rhi" which means that we may have to do some minor cosmetic changes but overall this should stay very close to the current GLSL. It's not a big deal but definitely an inconvenience on our eventual migration path.

Problem 1 (Aesthetic): The current FlatMesh works by: - generating an orthonormal grid of base points - leaving a square of free movement to every point This leads to a "stiff" square feeling that clashes with round screens. Solution: The new FlatMesh is based on a circle packing layout instead: each point has a circle in which it can freely move. Base points are not laid out in any obviously predictable patterns leading to a more chaotic layout. In addition, instead of animating quadrilateral shapes, the new FlatMesh animates triangles, this also contributes to a less "square" feeling. --- Problem 2 (CPU use): The current FlatMesh does all computations on the watch CPU leading to unnecessarily wasted CPU cycles and difficulty to evolve this code into more complicated use-cases. Some of the floating point operations like coordinates interpolation are unnecessarily expensive on the floating point unit when the GPU sits next to us and eats these for breakfast. Solution: The new code is essentially split in 4 different levels: - Very expensive operations that only need to be run _once_ (ever). These operations are done in the generate_flatmeshgeometry.py script. The idea is to have complete freedom of visualization/experimentation at development time and not introduce any complexity on the watch. One can implement fairly advanced optimizations leveraging modules like Pandas to parse pre-calculated circle-packing results or like PyVista to generate Delaunay triangulations and triangle strips. The output of this script is cached into packed buffers in a flatmeshgeometry.h header. - Somewhat expensive operations that run very seldomly: These stay on the watch CPU. For example, center/outer colors mixes. Most FlatMeshes never change their colors so we only need to calculate this once. Still, some aspects of this are pre-calculated in Python, like the mix ratio but the actual mixing is done when the colors are available, on the watch. (side note: the current FlatMesh only lets QML update the center and outer colors one after the other leading to two updateColors() run which is unnecessarily expensive especially when animating colors for example in the default applauncher. This implementation exposes a setColors() function that lets animations set both colors at once and avoid running unnecessary updates) - Expensive operations that run very often are off-loaded to the GPU on a vertex shader. Vertex shaders run for every frame and every vertex but they benefit from specialized HW units. For example, interpolating vectors (to mix shifts) is a routine GPU operation and much cheaper than on a CPU - Per-pixel operations are reduced to a bare minimum with the simplest possible fragment shader that just forwards the color of a provoking vertex as the color of each pixel in a triangle. This is called flat-shading and lets us skip the fragment interpolation unit of the GPU. Depending on the GPU implementation, this may or may not save cycles (but this is not actually measured) --- Problem 3 (GPU use): The current FlatMesh geometry makes an inefficient use of the Scene Graph and GPU bus. It creates one QSGGeometryNode per triangle so each triangle has a different VAO and shaders runs are unnecessarily serialized. Also the scene graph is unnecessarily deep, most nodes evolve in the same way anyway and don't need separate handling (they are all marked dirty at the same time for example). Solution: By representing the mesh as one big QSGGeometryNode, one can run more vertex shaders in parallel and save significant GPU cycles. Also, by using an appropriate triangle strip EBO, one can massively reduce the number of vertices transmitted to the GPU and the numbebr of vertex shaders there even needs to run. The flat-shading model maximizes the number of vertices that can be re-used since only the last vertex of each triangle (the provoking vertex) provides the color of the triangle so the two other vertices can be re-used from other triangles, even if they hold the color of another triangle. This leads to significant vertex re-use compared to the current QSGGeometryNodes. Finally, the geometry is never invalidated because we expose shifts in a pre-computed large uniform vectors array and the shader indexes shifts from a hash of the coordinates and a global loop iteration count. This reduces the number of exchanges on the GPU bus since the vertex can pretty much operate independently from the CPU side (only one uniform between 0. and 1. needs to be updated to move the shift mix forward). --- Problem 4 (Maintainability) Contributors in the past have complained that the FlatMesh was a black-box they wouldn't understand. The code was poorly documented and obscure at first read. Solution: One needs a bit of OpenGL background to follow along but this code tries to extensively comment every operation and decision. Hopefully the code is structured in a way that makes it easy to follow for someone willing to learn OpenGL first. The architecture of the code should also make it simple to concentrate on subproblems: for example, if one wants to experiment with replacing the circle packing coordinates with an hexagonal packing, they only need to change the line 16 and 113-114 of generate_flatmeshgeometry.py. The rest of the code will naturally adapt. ----- Some questions are still left up in the air though: On aesthetic: - What should be our round/square screen strategy ? The current FlatMesh renders the same squared content on both screen types and just relies on asteroid-launcher to clip a circle on round screens. This means that the outer colors (in the corners) are never shown on a round watch. This also means that we animate vertices that are off-screen. We have an opportunity to rethink this here. We could imagine having different base points on both screen types. (www.packomania.com also has circles in squares packings although they feel a lot more regular) We could clip a square out of the circle, that's actually what we do here, this is convenient since it means that the outer colors will show on both screen types but this also changes the look and feel a little bit and means that we now have less triangles on a round screen than on a round screen (opposite from the current situation) - How should we tweak the macro-parameters exactly ? I have spent most of my time optimizing the code but not so much time optimizing the look and feel. There are a few parameters that are easy to tweak, namely: the number of points (first line of generate_flatmeshgeometry) which makes things look more or less "low-poly". The color mix ratio exponent (currently 1.7 in generate_flatmeshgeometry.py) which changes how quickly the color gradient changes (it makes the screen overall a little bit brighter by keeping the center color a bit longer) or the shift mix animation easing curve (currently InOutQuad) that changes how the FlatMesh moves. I also had in mind that we could "wrap" the vertices a little bit such that center triangles end up being a bit larger and outer triangles a bit more squished against the screen borders. This could be done with a `pos *= .2*cos(3.14*length(pos))` for example but I did not achieve a satisfying result and left this idea out for now. On CPU use: - The new FlatMesh hooks into the Qt animation framework to interpolate the "shift mix". This has a few pros: 1- the animation is butter smooth since it syncs with the screen refresh rate 2- the animation clock is shared with other animations potentially saving cycles when using multiple animations) 3- this keeps the code very neat and tidy 4- this trivially lets us use different easing curves like the InOutQuad which makes the animation feel more organic then a linear interpolation. However, this also has a drawback: since the update interval is higher than our current manually-tuned timer, updates run more often and this leads to an overall higher CPU usage than the old FlatMesh! With the current FlatMesh, asteroid-flashlight idles at ~13/14% CPU on my bass whereas the new FlatMesh idles at ~16/17%. Either we accept this price and get all the above benefits or we fallback to using a custom timer and save some CPU time but loose the butter-smoothness/shared clock/code simplicity/InOutQuad... On GPU use: - A lot of the GPU optimizations we leverage here depend on the availability of the "flat" GLSL keyword which is only available starting from GLES3.0. This effectively bumps AsteroidOS's minimum requirements. This works on my oldest watch that still runs (my bass, RIP my dory :() so I don't expect it to be an issue but it's good to keep in mind and we should properly test this before rolling it out. If this turns out to be an issue, we could implement a non-optimized version of this that does not benefit from the flat shading and vertex re-use optimizations On maintainability: - It looks like Qt6 changes the SceneGraph API such that we can no longer call OpenGL functions directly (like glEnable()) this could mean that we'll no longer be able to use the fixed index primitive restart extension and instead of using 0xFF in the indices table to jump from one triangles strip to another we may have to generate empty triangles instead (by reusing the last index of the previous strip and the first indeex of the next strip) I expect this shouldn't cost very much on the GPU side since vertex shaders would run just as often and no fragment shaders should run for the empty triangles. - It also looks like they changed the shader format to "Rhi" which means that we may have to do some minor cosmetic changes but overall this should stay very close to the current GLSL. It's not a big deal but definitely an inconvenience on our eventual migration path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite FlatMesh #47

Rewrite FlatMesh #47

FlorentRevest commented May 21, 2023

Rewrite FlatMesh #47

Are you sure you want to change the base?

Rewrite FlatMesh #47

Conversation

FlorentRevest commented May 21, 2023