Skip to content

Latest commit

 

History

History
174 lines (126 loc) · 14 KB

README.md

File metadata and controls

174 lines (126 loc) · 14 KB

intrusion

What is it?

A linux 4k intro and my first demo production.

It's best viewed in realtime but needs a linux-PC with a recent graphics card and the ability to run i386-code. If you don't have access to one, here's a capture of the output on YouTube: http://www.youtube.com/watch?v=kzC-IOtw_cQ

Basically, it's a camera movement through a changing mandelbox fractal, raytraced in realtime on the GPU, with some sound. See section Anatomy of intrusion for further technical description.

Running

To make running intrusion easy, the final less-than-4k-binary is included in the git-repository. It's named intrusion.release. Assuming the dependencies are fullfilled, just execute and enjoy it.

If they're not, you won't get just an error about unlocatable libraries or unresolved symbols but a SIGSEGV or a window not displaying anything meaningful. This is due to some very dirty hackery to keep the binary small. Details on this can be found later in this document.

So, if it doesn't work, you're most likely missing one of these:

  • glibc (or binary compatible, version 6) (Homepage)
  • SDL (version 1.2) (Homepage)
  • OpenGL (version 3.3.0)
  • XZ Utils (Homepage)
  • a POSIX conform implementation of sh

Since intrusion is build as 32-bit binary, remember you need the i386 versions of the libraries, if running on an amd64 system.

Originally, the goal was not to use anything a freshly installed ubuntu lacks. Except for the most recent graphics drivers added. This was achieved and missed at the same time -- while libsdl is present in the base-system of older Ubuntu-versions, it isn't anymore in the most recent, at the time it was finished.

Building

Building intrusion should be possible and is tested on both i386 and amd64 systems. On both i386 code is emitted.

A list with build-tools/-dependencies follows. The version numbers are the ones intrusion.release was built with, on an amd64-system. Others should work, too, but could produce binaries > 4k.

When OUTPUT_INFO is defined while compiling main.c, the framerate and time is written to the terminal while running, on cost of a bigger binary and slower execution.

Anatomy of intrusion

Overview

The core of intrusion is a GLSL-shader, raymarching a mandelbox fractal in realtime on the GPU. While there was some code for dynamic lighting in the beginning of the development, another approach was chosen later. Instead of calculate lighting, the number of distance estimations used to find if the fragment's ray intercepts the fractal is used to colourize the mandelbox. This saves a lot of code and, more important, makes execution way faster -- with real lighting, normals of the surfaces need to be known. To get them, a lot of additional rays would need to be cast.

Since I had some space after the initial size optimizations, I added some text rendered to the framebuffer, telling a simple story. This is mostly done in CPU-code, by rendering to a texture, that is just copied to the frame buffer by the shader.

The sound was sequenced in Reaper and is rendered by 4klang, a synthesizer designed for use in 4k intros. The intrusion-specific code to output it is quite boring. It spawns a thread to render the samples and passes a callback to SDL to copy the data.

More interesting are the various tricks to keep the binary small. First of all, less code usually means -- surprise -- a smaller binary. While I'm pretty sure there's some potential for shortening things, I think I found a good compromise between optimal and readable code.

All C code is implemented to compile to a single object file. More specific, everything, except for main.c, is implemented in header-files. Doing this allows the compiler to get way more information about the code and perform much better optimizations.

These optimizations are that good, that implementing simple libc-functions within the source of intrusion leads to a smaller binary than calling them in libc. Therefore, clib.h contains implementations of some syscalls and utility-functions.

More code is saved by stripping away unneeded stuff from the binary. While the normal strip from binutils does a good job removing unneeded sections, sstrip performs even better by removing everything that's not loaded to memory when executing the binary.

Last but not least some compilerflags are set. Optimization is set to size of course while, surprisingly, using clangs -Os gave smaller binaries than -Oz. Allowing non-IEEE-conform math-code to be emitted also saves some bytes. So does setting the target-achitecture. I experimented with various flags to get an optimal result, with both clang and gcc, and had to realize clang always emitted smaller binaries than gcc.

Other optimizations are done at linker level. While no real link-time optimizations are done, the order of the sections in the binary has a big impact on the compression described in the next section of this document. To get an optimal result, a linker script is used in conjunction with a small python script, trying all possible orders.

Some other, smaller optimizations are done. Have a look at the source code if you're interested.

Compression

Important to keep the binary small is compression. LZMA, using xz, is utilized for this. The Makefile adds a line of shell script in front of the compressed binary to extract and execute it when it is ran.

This compression had quite interesting consequences. Several times in development, less code meant the binary was bigger. Especially with the shaders there was a lot of try and error involved. Even adding some absolutely useless code caused the binary to compress better in some cases.

Runtime linking

Dynamic linking requires a bunch of space. The libraries linked, including the versions, and the names of the symbols used have to be stored in the binary.

Using libdl to load them at runtime saves some bytes, since the version information can be omitted. But with some dirty hackery, much more can be saved. libdl is just a wrapper around some internal functions of libc. Calling them -- _libc_dlopen_mode() to get a library mapped to the processes memory -- directly, allows to omit linking to libdl.

By doing this, we also get the loading of libgl for free. We just initialize libsdl with GL support enabled. The SDL loads libgl automatically in this case. So we get the dependencies -- with the exception of libc -- loaded with a single function call. Since we need libc to load libraries, there's no way to avoid traditional linking to it.

I was thinking about implementing library loading directly in intrusion. But all the relocation stuff would need a whole bunch of code, so I expect linking libc is the lesser evil in terms of size.

Having the libraries loaded is the first step. Functions from it have to be called. Directly calling them isn't a good option. It causes the linker to add the function names and a reference to the library they're in.

To avoid this overhead, not the function names are saved in the binary but just a hash. Knowing that a pointer to the _link_map_, the datastructure used to store information about loaded libraries, is always the second entry of the Global Offset Table, we can use the linker script to get a symbol to it. By traversing the _link_map_, we can locate the tables of the libraries.

Originally, I wrote code to use the gnu hash table of the library to look up symbols. But I had to realize that some libraries still use the old SYSV-style hash tables. In this case, the NVIDIA OpenGL implementation. It would have been possible to implement algorithms to process both of them. But this would be more code in the binary. I chose a different approach instead: When looking up a symbol, the symbol table is iterated, the symbol name is hashed and compared with the stored hash. This is slow and leads to segfaults, if the symbol isn't present, since there is no simple way to get the symbol tables length but, well, it is small.

This is done for every external function used -- with one exception. One call, using the usual mechanisms to libc, is required to prevent ld from assuming no shared libraries are used at all and to remove the dynamic sections of the binary.

Abusing ELF

There're some fields in the ELF header the Linux-kernel doesn't care about. Filling these with zeros violates the ELF standard, but doesn't prevent the binary from being executed and allows the compression to do a better job. Therefore, a script is used to set these zeros in the binary after building it.

Inspired by an Article by Brian Raiter, I also moved some data -- more concrete, a string used by the binary -- to the ELF header. This way, it doesn't need to be stored in the binaries data-section.

Text rendering

To implement the text-rendering, information how to render the glyphs is needed. To keep the binary small, I decided four bytes to save the face of a letter are enough. This leads to having 4x8 pixels available per letter. I drew the letters in GIMP and added a small script reading the bitmaps and generating a header file with the letters encoded as integers. While doing this, the scripts parses the source code for letters really used. Unneeded letters are just set to zero in the generated header to allow better compression.

The rendering itself is done by just writing the pixels of the letters to a texture, that is copied to the framebuffer by the fragment shader.

Shaders

While the shaders are the core of intrusion, keeping them small is quite important. Since they have to be stored as text in the binary -- the graphics driver compiles them at runtime -- they can consume a lot of space. Shader Minifier is used to shorten them. But since it cannot modify names of the externals without breaking the code, I had to set the names of the externels to single letters manually. To keep the code readable, the C-preprocessor is used to do this.

Smaller floats

The camera movement is realized by having a linear transition between predefined points. These points occupy a lot of space in the binary. To keep them as small as possible, I experimented with the fp.h I found on the internet. But, well, writing a simple script, similar to the one described in the article, turned out to save much more space.

What didn't work

While developing intrusion, I had some ideas to make the binary smaller, which didn't turn out to work.

String compression

I tried to get the large amount of strings, both the displayed ones and the shaders, smaller.

I experimented with Huffmann coding. I expected it to compress the strings more effective than the LZMA compression applied to the whole binary. I assumed it to provide size gain, even with LZMA unable to compress these parts of the program effectively afterwards. I was right in terms of the better compression of the strings but the gain was smaller than the amount of code needed to decode the strings. Therefore, no Huffman.

Another idea was grounded on the observation I didn't need the whole range of ASCII characters. In fact, 6 bit are enough to cover the whole range of used characters. This means, three bytes would be enough to save four characters. But, again, there was an impact on the LZMA compression, causing the size gain to be so small, the binary had the original size again, after adding the code needed to decode the strings.

Fixed point arithmetic

I tried to replace all floating point arithmetic with fixed point, hoping to get a smaller binary. This also didn't work out. The problem was, that I needed floats in the shader, so fixed point numbers had to be converted to floats again. Independently of the place where to do this -- on the CPU or GPU -- it caused too much code to be generated.

What's next?

Several times while developing, I was really wishing for a Crinkler-like linker for Linux, thinking about writing one. While I decided against it, to finish intrusion first, I may feel like doing it in future.

With a custom linker, all the linker.h stuff could be done transparently, without the problem of ld omitting the dynamic sections, when nothing is linked dynamically the traditional way.

Also, it could rearrange the code more effectively than optimize_linkerscript.py, do nasty stuff to the ELF header automatically, and so on.

Resources used

  • The project contains code from 4klang, a great 4k intro software synthesizer
  • I've leared almost everything I know and needed for this project about fractals from Fractal Forums
  • Reaper was used to sequence the sound
  • GIMP was used to draw the font
  • ... and I visited a lot of OpenGL and demo related websites, I didn't keep track of.