Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite FlatMesh #47

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Rewrite FlatMesh #47

wants to merge 1 commit into from

Conversation

FlorentRevest
Copy link
Member

Problem 1 (Aesthetic):
The current FlatMesh works by:

  • generating an orthonormal grid of base points
  • leaving a square of free movement to every point This leads to a "stiff" square feeling that clashes with round screens.

Solution:
The new FlatMesh is based on a circle packing layout instead: each point has a circle in which it can freely move. Base points are not laid out in any obviously predictable patterns leading to a more chaotic layout. In addition, instead of animating quadrilateral shapes, the new FlatMesh animates triangles, this also contributes to a less "square" feeling.


Problem 2 (CPU use):
The current FlatMesh does all computations on the watch CPU leading to unnecessarily wasted CPU cycles and difficulty to evolve this code into more complicated use-cases. Some of the floating point operations like coordinates interpolation are unnecessarily expensive on the floating point unit when the GPU sits next to us and eats these for breakfast.

Solution:
The new code is essentially split in 4 different levels:

  • Very expensive operations that only need to be run once (ever). These operations are done in the generate_flatmeshgeometry.py script. The idea is to have complete freedom of visualization/experimentation at development time and not introduce any complexity on the watch. One can implement fairly advanced optimizations leveraging modules like Pandas to parse pre-calculated circle-packing results or like PyVista to generate Delaunay triangulations and triangle strips. The output of this script is cached into packed buffers in a flatmeshgeometry.h header.
  • Somewhat expensive operations that run very seldomly: These stay on the watch CPU. For example, center/outer colors mixes. Most FlatMeshes never change their colors so we only need to calculate this once. Still, some aspects of this are pre-calculated in Python, like the mix ratio but the actual mixing is done when the colors are available, on the watch. (side note: the current FlatMesh only lets QML update the center and outer colors one after the other leading to two updateColors() run which is unnecessarily expensive especially when animating colors for example in the default applauncher. This implementation exposes a setColors() function that lets animations set both colors at once and avoid running unnecessary updates)
  • Expensive operations that run very often are off-loaded to the GPU on a vertex shader. Vertex shaders run for every frame and every vertex but they benefit from specialized HW units. For example, interpolating vectors (to mix shifts) is a routine GPU operation and much cheaper than on a CPU
  • Per-pixel operations are reduced to a bare minimum with the simplest possible fragment shader that just forwards the color of a provoking vertex as the color of each pixel in a triangle. This is called flat-shading and lets us skip the fragment interpolation unit of the GPU. Depending on the GPU implementation, this may or may not save cycles (but this is not actually measured)

Problem 3 (GPU use):
The current FlatMesh geometry makes an inefficient use of the Scene Graph and GPU bus. It creates one QSGGeometryNode per triangle so each triangle has a different VAO and shaders runs are unnecessarily serialized. Also the scene graph is unnecessarily deep, most nodes evolve in the same way anyway and don't need separate handling (they are all marked dirty at the same time for example).

Solution:
By representing the mesh as one big QSGGeometryNode, one can run more vertex shaders in parallel and save significant GPU cycles. Also, by using an appropriate triangle strip EBO, one can massively reduce the number of vertices transmitted to the GPU and the numbebr of vertex shaders there even needs to run.
The flat-shading model maximizes the number of vertices that can be re-used since only the last vertex of each triangle (the provoking vertex) provides the color of the triangle so the two other vertices can be re-used from other triangles, even if they hold the color of another triangle. This leads to significant vertex re-use compared to the current QSGGeometryNodes.
Finally, the geometry is never invalidated because we expose shifts in a pre-computed large uniform vectors array and the shader indexes shifts from a hash of the coordinates and a global loop iteration count. This reduces the number of exchanges on the GPU bus since the vertex can pretty much operate independently from the CPU side (only one uniform between 0. and 1. needs to be updated to move the shift mix forward).


Problem 4 (Maintainability)
Contributors in the past have complained that the FlatMesh was a black-box they wouldn't understand. The code was poorly documented and obscure at first read.

Solution:
One needs a bit of OpenGL background to follow along but this code tries to extensively comment every operation and decision. Hopefully the code is structured in a way that makes it easy to follow for someone willing to learn OpenGL first. The architecture of the code should also make it simple to concentrate on subproblems: for example, if one wants to experiment with replacing the circle packing coordinates with an hexagonal packing, they only need to change the line 16 and 113-114 of generate_flatmeshgeometry.py. The rest of the code will naturally adapt.


Some questions are still left up in the air though:

On aesthetic:

  • What should be our round/square screen strategy ? The current FlatMesh renders the same squared content on both screen types and just relies on asteroid-launcher to clip a circle on round screens. This means that the outer colors (in the corners) are never shown on a round watch. This also means that we animate vertices that are off-screen. We have an opportunity to rethink this here. We could imagine having different base points on both screen types. (www.packomania.com also has circles in squares packings although they feel a lot more regular) We could clip a square out of the circle, that's actually what we do here, this is convenient since it means that the outer colors will show on both screen types but this also changes the look and feel a little bit and means that we now have less triangles on a round screen than on a round screen (opposite from the current situation)
  • How should we tweak the macro-parameters exactly ? I have spent most of my time optimizing the code but not so much time optimizing the look and feel. There are a few parameters that are easy to tweak, namely: the number of points (first line of generate_flatmeshgeometry) which makes things look more or less "low-poly". The color mix ratio exponent (currently 1.7 in generate_flatmeshgeometry.py) which changes how quickly the color gradient changes (it makes the screen overall a little bit brighter by keeping the center color a bit longer) or the shift mix animation easing curve (currently InOutQuad) that changes how the FlatMesh moves. I also had in mind that we could "wrap" the vertices a little bit such that center triangles end up being a bit larger and outer triangles a bit more squished against the screen borders. This could be done with a pos *= .2*cos(3.14*length(pos)) for example but I did not achieve a satisfying result and left this idea out for now.

On CPU use:

  • The new FlatMesh hooks into the Qt animation framework to interpolate the "shift mix". This has a few pros: 1- the animation is butter smooth since it syncs with the screen refresh rate 2- the animation clock is shared with other animations potentially saving cycles when using multiple animations) 3- this keeps the code very neat and tidy 4- this trivially lets us use different easing curves like the InOutQuad which makes the animation feel more organic then a linear interpolation. However, this also has a drawback: since the update interval is higher than our current manually-tuned timer, updates run more often and this leads to an overall higher CPU usage than the old FlatMesh! With the current FlatMesh, asteroid-flashlight idles at ~13/14% CPU on my bass whereas the new FlatMesh idles at ~16/17%. Either we accept this price and get all the above benefits or we fallback to using a custom timer and save some CPU time but loose the butter-smoothness/shared clock/code simplicity/InOutQuad...

On GPU use:

  • A lot of the GPU optimizations we leverage here depend on the availability of the "flat" GLSL keyword which is only available starting from GLES3.0. This effectively bumps AsteroidOS's minimum requirements. This works on my oldest watch that still runs (my bass, RIP my dory :() so I don't expect it to be an issue but it's good to keep in mind and we should properly test this before rolling it out. If this turns out to be an issue, we could implement a non-optimized version of this that does not benefit from the flat shading and vertex re-use optimizations

On maintainability:

  • It looks like Qt6 changes the SceneGraph API such that we can no longer call OpenGL functions directly (like glEnable()) this could mean that we'll no longer be able to use the fixed index primitive restart extension and instead of using 0xFF in the indices table to jump from one triangles strip to another we may have to generate empty triangles instead (by reusing the last index of the previous strip and the first indeex of the next strip) I expect this shouldn't cost very much on the GPU side since vertex shaders would run just as often and no fragment shaders should run for the empty triangles.
  • It also looks like they changed the shader format to "Rhi" which means that we may have to do some minor cosmetic changes but overall this should stay very close to the current GLSL. It's not a big deal but definitely an inconvenience on our eventual migration path.

Problem 1 (Aesthetic):
The current FlatMesh works by:
- generating an orthonormal grid of base points
- leaving a square of free movement to every point
This leads to a "stiff" square feeling that clashes with round screens.

Solution:
The new FlatMesh is based on a circle packing layout instead: each point
has a circle in which it can freely move. Base points are not laid out
in any obviously predictable patterns leading to a more chaotic layout.
In addition, instead of animating quadrilateral shapes, the new FlatMesh
animates triangles, this also contributes to a less "square" feeling.

---

Problem 2 (CPU use):
The current FlatMesh does all computations on the watch CPU leading to
unnecessarily wasted CPU cycles and difficulty to evolve this code into
more complicated use-cases. Some of the floating point operations like
coordinates interpolation are unnecessarily expensive on the floating
point unit when the GPU sits next to us and eats these for breakfast.

Solution:
The new code is essentially split in 4 different levels:
- Very expensive operations that only need to be run _once_ (ever).
  These operations are done in the generate_flatmeshgeometry.py script.
  The idea is to have complete freedom of visualization/experimentation
  at development time and not introduce any complexity on the watch.
  One can implement fairly advanced optimizations leveraging modules
  like Pandas to parse pre-calculated circle-packing results or like
  PyVista to generate Delaunay triangulations and triangle strips. The
  output of this script is cached into packed buffers in a
  flatmeshgeometry.h header.
- Somewhat expensive operations that run very seldomly: These stay on
  the watch CPU. For example, center/outer colors mixes. Most FlatMeshes
  never change their colors so we only need to calculate this once.
  Still, some aspects of this are pre-calculated in Python, like the mix
  ratio but the actual mixing is done when the colors are available, on
  the watch. (side note: the current FlatMesh only lets QML update
  the center and outer colors one after the other leading to two
  updateColors() run which is unnecessarily expensive especially when
  animating colors for example in the default applauncher. This
  implementation exposes a setColors() function that lets animations
  set both colors at once and avoid running unnecessary updates)
- Expensive operations that run very often are off-loaded to the GPU on
  a vertex shader. Vertex shaders run for every frame and every vertex
  but they benefit from specialized HW units. For example, interpolating
  vectors (to mix shifts) is a routine GPU operation and much cheaper
  than on a CPU
- Per-pixel operations are reduced to a bare minimum with the simplest
  possible fragment shader that just forwards the color of a provoking
  vertex as the color of each pixel in a triangle. This is called
  flat-shading and lets us skip the fragment interpolation unit of the
  GPU. Depending on the GPU implementation, this may or may not save
  cycles (but this is not actually measured)

---

Problem 3 (GPU use):
The current FlatMesh geometry makes an inefficient use of the Scene
Graph and GPU bus. It creates one QSGGeometryNode per triangle so each
triangle has a different VAO and shaders runs are unnecessarily
serialized. Also the scene graph is unnecessarily deep, most nodes
evolve in the same way anyway and don't need separate handling (they are
all marked dirty at the same time for example).

Solution:
By representing the mesh as one big QSGGeometryNode, one can run more
vertex shaders in parallel and save significant GPU cycles. Also, by
using an appropriate triangle strip EBO, one can massively reduce the
number of vertices transmitted to the GPU and the numbebr of vertex
shaders there even needs to run.
The flat-shading model maximizes the number of vertices that can be
re-used since only the last vertex of each triangle (the provoking
vertex) provides the color of the triangle so the two other vertices can
be re-used from other triangles, even if they hold the color of another
triangle. This leads to significant vertex re-use compared to the
current QSGGeometryNodes.
Finally, the geometry is never invalidated because we expose shifts in a
pre-computed large uniform vectors array and the shader indexes shifts
from a hash of the coordinates and a global loop iteration count. This
reduces the number of exchanges on the GPU bus since the vertex can
pretty much operate independently from the CPU side (only one uniform
between 0. and 1. needs to be updated to move the shift mix forward).

---

Problem 4 (Maintainability)
Contributors in the past have complained that the FlatMesh was a
black-box they wouldn't understand. The code was poorly documented and
obscure at first read.

Solution:
One needs a bit of OpenGL background to follow along but this code tries
to extensively comment every operation and decision. Hopefully the code
is structured in a way that makes it easy to follow for someone willing
to learn OpenGL first. The architecture of the code should also make it
simple to concentrate on subproblems: for example, if one wants to
experiment with replacing the circle packing coordinates with an
hexagonal packing, they only need to change the line 16 and 113-114 of
generate_flatmeshgeometry.py. The rest of the code will naturally adapt.

-----

Some questions are still left up in the air though:

On aesthetic:
- What should be our round/square screen strategy ? The current FlatMesh
  renders the same squared content on both screen types and just relies
  on asteroid-launcher to clip a circle on round screens. This means
  that the outer colors (in the corners) are never shown on a round
  watch. This also means that we animate vertices that are off-screen.
  We have an opportunity to rethink this here. We could imagine having
  different base points on both screen types. (www.packomania.com also
  has circles in squares packings although they feel a lot more regular)
  We could clip a square out of the circle, that's actually what we do
  here, this is convenient since it means that the outer colors will
  show on both screen types but this also changes the look and feel a
  little bit and means that we now have less triangles on a round screen
  than on a round screen (opposite from the current situation)
- How should we tweak the macro-parameters exactly ? I have spent most
  of my time optimizing the code but not so much time optimizing the
  look and feel. There are a few parameters that are easy to tweak,
  namely: the number of points (first line of generate_flatmeshgeometry)
  which makes things look more or less "low-poly". The color mix ratio
  exponent (currently 1.7 in generate_flatmeshgeometry.py) which changes
  how quickly the color gradient changes (it makes the screen overall a
  little bit brighter by keeping the center color a bit longer) or the
  shift mix animation easing curve (currently InOutQuad) that changes
  how the FlatMesh moves. I also had in mind that we could "wrap" the
  vertices a little bit such that center triangles end up being a bit
  larger and outer triangles a bit more squished against the screen
  borders. This could be done with a `pos *= .2*cos(3.14*length(pos))`
  for example but I did not achieve a satisfying result and left this
  idea out for now.

On CPU use:
- The new FlatMesh hooks into the Qt animation framework to interpolate
  the "shift mix". This has a few pros: 1- the animation is butter
  smooth since it syncs with the screen refresh rate 2- the animation
  clock is shared with other animations potentially saving cycles when
  using multiple animations) 3- this keeps the code very neat and tidy
  4- this trivially lets us use different easing curves like the
  InOutQuad which makes the animation feel more organic then a linear
  interpolation. However, this also has a drawback: since the update
  interval is higher than our current manually-tuned timer, updates run
  more often and this leads to an overall higher CPU usage than the old
  FlatMesh! With the current FlatMesh, asteroid-flashlight idles at
  ~13/14% CPU on my bass whereas the new FlatMesh idles at ~16/17%.
  Either we accept this price and get all the above benefits or we
  fallback to using a custom timer and save some CPU time but loose the
  butter-smoothness/shared clock/code simplicity/InOutQuad...

On GPU use:
- A lot of the GPU optimizations we leverage here depend on the
  availability of the "flat" GLSL keyword which is only available
  starting from GLES3.0. This effectively bumps AsteroidOS's minimum
  requirements. This works on my oldest watch that still runs (my bass,
  RIP my dory :() so I don't expect it to be an issue but it's good to
  keep in mind and we should properly test this before rolling it out.
  If this turns out to be an issue, we could implement a non-optimized
  version of this that does not benefit from the flat shading and vertex
  re-use optimizations

On maintainability:
- It looks like Qt6 changes the SceneGraph API such that we can no
  longer call OpenGL functions directly (like glEnable()) this could
  mean that we'll no longer be able to use the fixed index primitive
  restart extension and instead of using 0xFF in the indices table to
  jump from one triangles strip to another we may have to generate
  empty triangles instead (by reusing the last index of the previous
  strip and the first indeex of the next strip) I expect this shouldn't
  cost very much on the GPU side since vertex shaders would run just as
  often and no fragment shaders should run for the empty triangles.
- It also looks like they changed the shader format to "Rhi" which means
  that we may have to do some minor cosmetic changes but overall this
  should stay very close to the current GLSL. It's not a big deal but
  definitely an inconvenience on our eventual migration path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant