Fast sphere renderer…for certain values of sphere.
demo.mp4
240 px diameter sphere running on one core of RP2040 overclocked to 252 MHz (while other core is generating DVI video).
TL;DR not ray tracing. The math started there, but by eliminating most generalizations and reducing to a single special case, the result is visually and numerically comparable, with an inner loop that’s 100% integer math. There’s some pre-compiled, drag-and-drop .UF2 files for a few boards in the uf2 folder.
Likely portable to other 32-bit MCUs (ESP32, SAMD51, etc.), sans DVI video.
Every musical act with a breakout hit is cursed to perform that song at every concert forever, else fans will riot. Devo’s Whip It, Jean-Michel Jarre’s Oxygene IV.
My personal Whip It is animated eyeballs. It’s been an evolutionary series of projects…first in the early Arduino Uno days, then through Adafruit’s HalloWing and Monster M4SK dev boards, and Raspberry Pi. I’d been tasked with another iteration, this time for the Adafruit Feather RP2040 DVI, but encountered turbulence: while prior projects relied on progressively larger tables in both flash and RAM, RP2040’s off-chip flash presents a bottleneck, and then the PicoDVI library demands a majority of on-chip resources. On a good day with a tailwind there’s perhaps 100K of RAM left over.
Prior eyeballs on Adafruit Monster M4SK (SAMD51 MCU).
Image credit: Adafruit |
Excepting Raspberry Pi (using OpenGL), all the prior microcontroller-based iterations really just created the illusion of a rotating eyeball using 2D blitting or bitmap displacement tricks. This limited the range of motion, but revolving in 3D was surely beyond these chips’ capacity. Additionally, each new chip, each new and higher-resolution screen involved a different approach, new tables, new incompatible code. The peculiar mix of strengths and constraints of the RP2040 had me re-considering what’s possible. “Solving the sphere problem” has been on my code bucket list for a while, and I was in the mood for a demoscene-style flex.
What makes this project work is that it doesn’t actually require a fully generalized ray tracer. It’s just a single object, pivoting in place but otherwise remaining in the same location on screen. A series of assumptions can be made, each stripping away elements of the ray tracing equation, until eventually there’s no ray tracing left at all.
There’s a whole genre of engineering jokes with the punchline, “First, we assume each (cow, racehorse, etc.) is a perfect sphere in a vacuum…”
First, we assume eyes are perfect spheres.
Further, there’s only ever this single sphere. It’s always centered on the origin (0,0,0), and it’s a unit sphere: the radius is 1.0. The latter means that in the line-sphere intersection calculation, any r or r2 can be replaced with 1, and the former can simplify or eliminate other terms.
Next, perspective is dropped from the equation. This might be controversial, but in practice these little graphics projects tend toward displays less than 2 inches across, and if it were a physical thing the distortion due to perspective would be nearly imperceptible. In doing so, there’s no longer a viewing frustum (a pyramid-shaped volume of viewable space), but just a rectangular tube. The camera is essentially at infinity and every ray is parallel, and the image plane can be as far back or far forward as we’d like…even centered on the origin (0,0,0), like the sphere.
In doing so, that table of square roots (distances from camera to points on sphere) can be flipped around, now it’s distances from image plane to sphere…
Height fields, basically. Each point in the table is the elevation of a hemisphere above a plane. |
So now, looking straight down the Z axis, the X and Y coordinate of each sphere intersection can be inferred directly from screen pixel coordinates, and Z comes from the table. At this point, it’s no longer ray tracing really, but a single “solved” image. In hindsight it’s a pretty obvious and minimalist solution. Ray tracing was helpful for conceptualizing the problem, we are grateful but do not actually require its services. With enough constraints and simplifications, the same set of points can be computed by trivial means.
To rotate the sphere, each intersection point(X,Y,Z) is rotated around the origin to yield a new (X',Y',Z'), from which texture coordinates are derived.
Dropping lighting from the equation — just using texture map colors directly — might also be controversial, but is part of the Last Shenanigan than makes this work.
Somewhere up there it was mentioned that the sphere was specifically made a unit sphere — a radius of 1. In doing so, each of the aforementioned intersection points is within the range (-1.0, +1.0). Rather than handling this all with floating-point math, we can speed this along by working with these coordinates in a signed 16-bit integer space (-32768 to +32767). Most operations on these numbers execute in just a single CPU cycle, and any small rounding errors are inconsequential when this is filtered down to pixel space (240 px across/down in the RP2040 example). The scaling operation on each axis of each vector is:
result = input * scale / 65536;
Where input
is a 16-bit value (signed or unsigned), and scale
is 0 to 65536. The interim result of the multiplication expands into 32-bit space, and the division brings it back down to 16-bit fixed-point units. This only works reliably with scaling down, not up. It is vitally important to keep that divide intact and NOT be L33T with a right-shift operation! The compiler, even at a vanilla no-optimization setting, will handle this with an arithmetic shift right instruction that preserves the sign of the input, typically 1 cycle. Division by an arbitrary integer can be an expensive multi-cycle operation, but division by powers-of-two is recognized and automatically handled efficiently.
The default example of the tumbling globe relies on two further table lookups: an arctangent table converts X/Y coordinates to longitudes (texture X coords), and an arcsine table converts Z to latitudes (texture Y coords).
The drawn sphere is 240 pixels across. In order to wrap around the globe without conspicuous texture artifacts, the texture width should be at least 240*π or 754 pixels. The example texture is 800x400 to round up to a simple figure, and to demonstrate that it’s not necessary to use powers of two. But if you want to operate on a powers-of-two constraint, the code could be modified to use a shift or divide and save a few cycles per pixel. I just chose not to. Because everything is always using powers-of-two constraints.
The arcsine thing is to make sure there’s adequate texture resolution at the poles and also to “wrap” the texture intuitively — it works about the way one would expect if you’ve done any 3D rendering. Cartographically it’s a plate carrée projection. If one only needs to revolve this in such a way that the poles aren’t viewed directly but only edge-on, some cycles and texture space could be freed up by eliminating the arcsin lookup and using Z directly (scaled or shifted to the texture resolution). Less intuitive to work with but with benefits in some situations. Cartographically this one’s a Lambert cylindrical projection. Not shown in the code, it’s left as an exercise to the reader.