Skip to content

Latest commit

 

History

History
512 lines (403 loc) · 20.1 KB

working-with-binary-file-formats.md

File metadata and controls

512 lines (403 loc) · 20.1 KB
title
Working with binary file formats

Read and write bitmap files using only a few dozen lines of code

Even if we rarely give them much thought, binary file formats are everywhere. Ranging from images to audio files to nearly every other sort of media you can imagine, binary files are used because they are an efficient way of storing information in a ready-to-process format.

Despite their usefulness, binary files are cryptic and appear to be difficult to understand on the surface. Unlike a text-based data format, simply looking at a binary file won't give you any hints about what its contents are. To even begin to understand a binary encoded file, you need to read its format specification. These specifications tend to include lots of details about obscure edge cases, and that makes for challenging reading unless you already have spent a fair amount of time working in the realm of bits and bytes. For these reasons, it's probably better to learn by example rather than taking a more formal approach.

In this article, I will show you how to encode and decode the bitmap image format. Bitmap images have a simple structure, and the format is well documented. Despite the fact that you'll probably never need to work with bitmap images at all in your day-to-day work, the concepts involved in both reading and writing a BMP file are pretty much the same as any other file format you'll encounter.

The anatomy of a bitmap

A bitmap file consists of several sections of metadata followed by a pixel array that represents the color and position of every pixel in the image. The example below demonstrates that even if you break the sequence up into its different parts, it would still be a real challenge to understand without any documentation handy:

# coding: binary

hex_data = %w[
  42 4D
  46 00 00 00
  00 00
  00 00
  36 00 00 00

  28 00 00 00
  02 00 00 00
  02 00 00 00
  01 00
  18 00
  00 00 00 00
  10 00 00 00
  13 0B 00 00
  13 0B 00 00
  00 00 00 00
  00 00 00 00

  00 00 FF
  FF FF FF
  00 00
  FF 00 00
  00 FF 00
  00 00
]

out = hex_data.each_with_object("") { |e,s| s << Integer("0x#{e}") }

File.binwrite("example1.bmp", out)

Once you learn what each section represents, you can start to interpret the data. For example, if you know that this is a 24-bit per pixel image that is two pixels wide, and two pixels high, you might be able to make sense of the pixel array data shown below:

00 00 FF
FF FF FF
00 00
FF 00 00
00 FF 00
00 00

If you run this example script and open the image file it produces, you'll see something similar to what is shown below once you zoom in close enough to see its pixels:

Pixels

By experimenting with changing some of the values in the pixel array by hand, you will fairly quickly discover the overall structure of the array and the way pixels are represented. After figuring this out, you might also be able to look back on the rest of the file and determine what a few of the fields in the headers are without looking at the documentation.

After exploring a bit on your own, you should check out the field-by-field walkthrough of a 2x2 bitmap file that this example was based on. The information in that table is pretty much all you'll need to know in order to make sense of the bitmap reader and writer implementations I've built for this article.

Encoding a bitmap image

Now that you've seen what a bitmap looks like in its raw form, I can demonstrate how to build a simple encoder object that allows you to generate bitmap images in a much more convenient way. In particular, I'm going to show what I did to get the following code to output the same image that we rendered via a raw sequence of bytes earlier:

bmp = BMP::Writer.new(2,2)

# NOTE: Bitmap encodes pixels in BGR format, not RGB!
bmp[0,0] = "ff0000"
bmp[1,0] = "00ff00"
bmp[0,1] = "0000ff"
bmp[1,1] = "ffffff"

bmp.save_as("example_generated.bmp")

Like most binary formats, the bitmap format has a tremendous amount of options that make building a complete implementation a whole lot more complicated than just building a tool which is suitable for generating a single type of image. I realized shortly after skimming the format description that you can skip out on a lot of the boilerplate information if you stick to 24bit-per-pixel images, so I decided to do exactly that.

Looking at the implementation from the outside-in, you can see the general structure of the BMP::Writer class. Pixels are stored in a two-dimensional array, and all the interesting things happen at the time you write the image out to file:

class BMP
  class Writer
    def initialize(width, height)
      @width, @height = width, height

      @pixels = Array.new(@height) { Array.new(@width) { "000000" } }
    end

    def []=(x,y,value)
      @pixels[y][x] = value
    end

    def save_as(filename)
      File.open(filename, "wb") do |file|
        write_bmp_file_header(file)
        write_dib_header(file)
        write_pixel_array(file)
      end
    end

    # ... rest of implementation details omitted for now ...
  end
end

All bitmap files start out with the bitmap file header, which consists of the following things:

  • A two character signature to indicate the file is a bitmap file (typically "BM").
  • A 32bit unsigned little-endian integer representing the size of the file itself.
  • A pair of 16bit unsigned little-endian integers reserved for application specific uses.
  • A 32bit unsigned little-endian integer representing the offset to where the pixel array starts in the file.

The following code shows how BMP::Writer builds up this header and writes it to file:

class BMP
  class Writer
    PIXEL_ARRAY_OFFSET = 54
    BITS_PER_PIXEL     = 24

    # ... rest of code as before ...

    def write_bmp_file_header(file)
      file << ["BM", file_size, 0, 0, PIXEL_ARRAY_OFFSET].pack("A2Vv2V")
    end

    def file_size
      PIXEL_ARRAY_OFFSET + pixel_array_size
    end

    def pixel_array_size
      ((BITS_PER_PIXEL*@width)/32.0).ceil*4*@height
    end
  end
end

Out of the five fields in this header, only the file size ended up being dynamic. I was able to treat the pixel array offset as a constant because the headers for 24 bit color images take up a fixed amount of space. The file size computations1 will make sense later once we examine the way that the pixel array gets encoded.

The tool that makes it possible for us to convert these various field values into binary sequences is Array#pack. If you note that the file size of our reference image is 2x2 bitmap is 70 bytes, it becomes clear what pack is actually doing for us when we examine the byte by byte values in the following example:

header = ["BM", 70, 0, 0, 54].pack("A2Vv2V")
p header.bytes.map { |e| "%.2x" % e }

=begin expected output (NOTE: reformatted below for easier reading)
  ["42", "4d",
   "46", "00", "00", "00",
   "00", "00",
   "00", "00",
   "36", "00", "00", "00"]
=end

The byte sequence for the file header exactly matches that of our reference image, which indicates that the proper bitmap file header is being generated. Below I've listed out how each field in the header encoded:

  "A2" -> arbitrary binary string of width 2 (packs "BM" as: 42 4d)
  "V"  -> a 32bit unsigned little endian int (packs 70 as: 46 00 00 00)
  "v2" -> two 16bit unsigned little endian ints (packs 0, 0 as: 00 00 00 00)
  "V"  -> a 32bit unsigned little endian int (packs 54 as: 36 00 00 00)

While I went to the effort of expanding out the byte sequences to make it easier to see what is going on, you don't typically need to do this at all while working with Array#pack as long as you craft your template strings carefully. But like anything else in Ruby, it's nice to be able to write little scripts or hack around a bit in irb whenever you're trying to figure out how your code is actually working.

After figuring out how to encode the file header, the next step was to work on the DIB header, which includes some metadata about the image and how it should be displayed on the screen:

class BMP
  class Writer
    DIB_HEADER_SIZE    = 40
    PIXELS_PER_METER   = 2835 # 2835 pixels per meter is basically 72dpi

    # ... other code as before ...

   def write_dib_header(file)
      file << [DIB_HEADER_SIZE, @width, @height, 1, BITS_PER_PIXEL,
               0, pixel_array_size, PIXELS_PER_METER, PIXELS_PER_METER,
               0, 0].pack("Vl<2v2V2l<2V2")
  end
end

Because we are only working on a very limited subset of BMP features, it's possible to construct the DIB header mostly from preset constants combined with a few values that we already computed for the BMP file header.

The pack statement in the above code works in a very similar fashion as the code that writes out the BMP file header, with one exception: it needs to handle signed 32-bit little endian integers. This data type does not have a pattern of its own, but instead is a composite pattern made up of two characters: l<. The first character (l) instructs Ruby to read a 32-bit signed integer, and the second character (<) tells it to read it in little-endian byte order.

It isn't clear to me at all why a bitmap image could contain negative values for its width, height, and pixel density -- this is just how the format is specified. Because our goal is to learn about binary file processing and not image format esoterica, it's fine to treat that design decision as a black box for now and move on to looking at how the pixel array is processed.

class BMP
  class Writer
    # .. other code as before ...

    def write_pixel_array(file)
      @pixels.reverse_each do |row|
        row.each do |color|
          file << pixel_binstring(color)
        end

        file << row_padding
      end
    end

    def pixel_binstring(rgb_string)
      raise ArgumentError unless rgb_string =~ /\A\h{6}\z/
      [rgb_string].pack("H6")
    end

    def row_padding
      "\x0" * (@width % 4)
    end
  end
end

The most interesting thing to note about this code is that each row of pixels ends up getting padded with some null characters. This is to ensure that each row of pixels is aligned on WORD boundaries (4 byte sequences). This is a semi-arbitrary limitation that has to do with file storage constraints, but things like this are common in binary files.

The calculations below show how much padding is needed to bring rows of various widths up to a multiple of 4, and explains how I derived the computation for the row_padding method:

Width 2 : 2 * 3 Bytes per pixel = 6 bytes  + 2 padding  = 8
Width 3 : 3 * 3 Bytes per pixel = 9 bytes  + 3 padding  = 12
Width 4 : 4 * 3 Bytes per pixel = 12 bytes + 0 padding  = 12
Width 5 : 5 * 3 Bytes per pixel = 15 bytes + 1 padding  = 16
Width 6 : 6 * 3 Bytes per pixel = 18 bytes + 2 padding  = 20
Width 7 : 7 * 3 Bytes per pixel = 21 bytes + 3 padding  = 24
...

Sometimes calculations like this are provided for you in format specifications, other times you need to derive them yourself. Choosing to work with only 24bit per pixel images allowed me to skirt the question of how to generalize this computation to an arbitrary amount of bits per pixel.

While the padding code is definitely the most interesting aspect of the pixel array, there are a couple other details about this implementation worth discussing. In particular, we should take a closer look at the pixel_binstring method:

def pixel_binstring(rgb_string)
  raise ArgumentError unless rgb_string =~ /\A\h{6}\z/
  [rgb_string].pack("H6")
end

This is the method that converts the values we set in the pixel array via lines like bmp[0,0] = "ff0000" into actual binary sequences. It starts by matching the string with a regex to ensure that the input string is a valid sequence of 6 hexadecimal digits. If the validation succeeds, it then packs those values into a binary sequence, creating a string with three bytes in it. The example below should make it clear what is going on here:

>> ["ffa0ff"].pack("H6").bytes.to_a
=> [255, 160, 255]

This pattern makes it possible for us to specify color values directly in hexadecimal strings and then convert them to their numeric value just before they get written to the file.

With this last detail explained, you should now understand how to build a functional bitmap encoder for writing 24bit color images. If seeing things broken out step by step caused you to lose a sense of the big picture, you can check out the source code for BMP::Writer. Feel free to play around with it a bit before moving on to the next section: the best way to learn is to actually run these code samples and try to extend them and/or break them in various ways.

Decoding a bitmap image

As you might expect, there is a nice symmetry between encoding and decoding binary files. To show just to what extent this is the case, I will walk you through the code which makes the following example run:

bmp = BMP::Reader.new("example1.bmp")
p bmp.width  #=> 2
p bmp.height #=> 2

p bmp[0,0] #=> "ff0000"   
p bmp[1,0] #=> "00ff00"
p bmp[0,1] #=> "0000ff"
p bmp[1,1] #=> "ffffff"

The general structure of BMP::Reader ended up being quite similar to what I did for BMP::Writer. The code below shows the methods which define the public interface:

class BMP
  class Reader
    def initialize(bmp_filename)
      File.open(bmp_filename, "rb") do |file|
        read_bmp_header(file) # does some validations
        read_dib_header(file) # sets @width, @height
        read_pixels(file)     # populates the @pixels array
      end
    end

    attr_reader :width, :height

    def [](x,y)
      @pixels[y][x]
    end
  end
end

This time, we still are working with an ordinary array of arrays to store the pixel data, and most of the work gets done as soon as the file is read in the constructor. Because I decided to support only a single image type, most of the work of reading the headers is just for validation purposes. In fact, the read_bmp_header method does nothing more than some basic sanity checking, as shown below:

class BMP
  class Reader
    PIXEL_ARRAY_OFFSET = 54

    # ...other code as before ...

    def read_bmp_header(file)
      header = file.read(14)
      magic_number, file_size, reserved1,
      reserved2, array_location = header.unpack("A2Vv2V")

      fail "Not a bitmap file!" unless magic_number == "BM"

      unless file.size == file_size
        fail "Corrupted bitmap: File size is not as expected"
      end

      unless array_location == PIXEL_ARRAY_OFFSET
        fail "Unsupported bitmap: pixel array does not start where expected"
      end
    end
  end
end

The key thing to notice about this code is that it reads from the file just the bytes it needs in order to parse the header. This makes it possible to validate a very large file without loading much data into memory. Reading entire files into memory is rarely a good idea, and this is especially true when it comes to binary data because doing so will actually make your job harder rather than easier.

Once the header data is loaded into a string, the String#unpack method is used to extract some values from it. Notice here how String#unpack uses the same template syntax as Array#pack and simply provides the inverse operation. While the pack operation converts an array of values into a string of binary data, the unpack operation converts a binary string into an array of processed values. This allows us to recover the information packed into the bitmap file header as Ruby strings and fixnums.

Once these values have been converted into Ruby objects, it's easy to do some ordinary comparisons to check to see if they're what we'd expect them to be. Because they help detect corrupted files, clearly defined validations are an important part of writing any decoder for binary file formats. If you do not do this sort of sanity checking, you will inevitably run into subtle processing errors later on that will be much harder to debug.

As you might expect, the implementation of read_dib_header involves more of the same sort of extractions and validations. It also sets the @width and @height variables, which we use later to determine how to traverse the encoded pixel array.

class BMP
  class Reader
    # ... other code as before ...

    BITS_PER_PIXEL     = 24
    DIB_HEADER_SIZE    = 40

    def read_dib_header(file)
      header = file.read(40)

      header_size, width, height, planes, bits_per_pixel,
      compression_method, image_size, hres,
      vres, n_colors, i_colors = header.unpack("Vl<2v2V2l<2V2")

      unless header_size == DIB_HEADER_SIZE
        fail "Corrupted bitmap: DIB header does not match expected size"
      end

      unless planes == 1
        fail "Corrupted bitmap: Expected 1 plane, got #{planes}"
      end

      unless bits_per_pixel == BITS_PER_PIXEL
        fail "#{bits_per_pixel} bits per pixel bitmaps are not supported"
      end

      unless compression_method == 0
        fail "Bitmap compression not supported"
      end

      unless image_size + PIXEL_ARRAY_OFFSET == file.size
        fail "Corrupted bitmap: pixel array size isn't as expected"
      end

      @width, @height = width, height
    end
  end
end

Beyond what has already been said about this example and the DIB header itself, there isn't much more to discuss about this particular method. That means we can finally take a look at how BMP::Reader converts the encoded pixel array into a nested Ruby array structure.

class BMP
  class Reader
    def read_pixels(file)
      @pixels = Array.new(@height) { Array.new(@width) }

      (@height-1).downto(0) do |y|
        0.upto(@width - 1) do |x|
          @pixels[y][x] = file.read(3).unpack("H6").first
        end
        advance_to_next_row(file)
      end
    end

    def advance_to_next_row(file)
      padding_bytes = @width % 4
      return if padding_bytes == 0

      file.pos += padding_bytes
    end
  end
end

One interesting aspect of this code is that it uses explicit numerical iterators. These are relatively rare in idiomatic Ruby, but I did not see a better way to approach this particular problem. Rows are listed in the pixel array from the bottom up, while the image itself still gets indexed from the top down (with 0 at the top). This makes it necessary to iterate over the row numbers in reverse order, and the use of downto is the best way I could find to do that.

The other thing worth noticing about this code is that in the advance_to_next_row method, we actually move the pointer ahead in the file rather than reading the padding bytes between each row. This makes little difference when you're dealing with a maximum of three bytes of padding per row (two in this case), but is a good practice for writing more efficient code that consumes less memory.

When you take all these code examples and glue them together into a single class definition, you'll end up with a BMP::Reader object that is capable giving you the width and height of a 24bit BMP image as well as the color of each and every pixel in the image. For those who'd like to experiment further, the source code for BMP::Reader is available.

Reflections

The thing that makes me appreciate binary file formats is that if you just learn a few basic computing concepts, there are few things that could be more fundamentally simple to work with. But simple does not necessarily mean easy, and in the process of writing this article I realized that some aspects of binary file processing are not quite as trivial or intuitive as I originally thought they were.

What I can say is that this kind of work gets a whole lot easier with practice. Due to my work on Prawn I have written implementations for various different binary formats including PDF, PNG, JPG, and TTF. These formats each have their differences, but my experience tells me that if you fully understand the examples in this article, then you are already well on your way to tackling pretty much any binary file format.

Footnotes

  1. To determine the storage space needed for the pixel array in BMP images, I used the computations described in the Wikipedia article on bitmap images.