Skip to content

Latest commit



2055 lines (1607 loc) · 61.9 KB


File metadata and controls

2055 lines (1607 loc) · 61.9 KB

Embedded Programming with the GNU Toolchain


The GNU toolchain is increasingly being used for deeply embedded software development. This type of software development is also called standalone C programming and bare metal C programming. Standalone C programming brings along with it new problems, and dealing with them requires a deeper understanding of the GNU toolchain. The GNU toolchain’s manuals provide excellent information on the toolchain, but from the perspective of the toolchain, rather than the perspective of the problem. Well, that is how manuals are supposed to be written anyway. The result is that the answers to common problems are scattered all over, and new users of the GNU toolchain are left baffled.

This tutorial attempts to bridge the gap by explaining the tools from the perspective of the problem. Hopefully, this should enable more people to use the GNU toolchain for their embedded projects.

For the purpose of this tutorial, an ARM based embedded system is emulated using Qemu. With this you can learn the GNU toolchain from the comforts of your desktop, without having to invest on hardware. This tutorial itself does not teach the ARM instruction set. It is supposed to be used with other books and on-line tutorials like:

But for the convenience of the reader, frequently used ARM instructions are listed in the appendix.

Setting up the ARM Lab

This section shows how to setup a simple ARM development and testing environment in your PC, using Qemu and the GNU toolchain. Qemu is a machine emulator capable of emulating various machines including ARM based machines. You can write ARM assembly programs, compile them using the GNU toolchain and execute and test them in Qemu.

Qemu ARM

Qemu will be used to emulate a PXA255 based connex board from Gumstix. You should have at least version 0.9.1 of Qemu to work with this tutorial.

The PXA255 has an ARM core with a ARMv5TE compliant instruction set. The PXA255 also has several on-chip peripherals. Some peripherals will be introduced in the course of the tutorial.

Installing Qemu in Debian

This tutorial requires qemu version 0.9.1 or above. The qemu package available in Debian Squeeze/Wheezy, meets this requirement. Install qemu using apt-get.

$ apt-get install qemu

Installing GNU Toolchain for ARM

  1. Folks at CodeSourcery (part of Mentor Graphics) have been kind enough to make GNU toolchains available for various architectures. Download the GNU toolchain for ARM, available from from

  2. Extract the tar archive, to ~/toolchains.

    $ mkdir ~/toolchains
    $ cd ~/toolchains
    $ tar -jxf ~/downloads/arm-2008q1-126-arm-none-eabi-i686-pc-linux-gnu.tar.bz2
  3. Add the toolchain to your PATH.

    $ PATH=$HOME/toolchains/arm-2008q1/bin:$PATH
  4. You might want to add the previous line to your .bashrc.

Hello ARM

In this section, you will learn to assemble a simple ARM program, and test it on a bare metal connex board emulated by Qemu.

The assembly program source file consists of a sequence of statements, one per line. Each statement has the following format.

label:    instruction         @ comment

Each of the components is optional.


The label is a convenient way to refer to the location of the instruction in memory. The label can be used where ever an address can appear, for example as an operand of the branch instruction. The label name should consist of alphabets, digits, _ and $.


A comment starts with an @, and the characters that appear after an @ are ignored.


The instruction could be an ARM instruction or an assembler directive. Assembler directives are commands to the assembler. Assembler directives always start with a . (period).

Here is a very simple ARM assembly program to add two numbers.

Adding Two Numbers
start:                       @ Label, not really required
	mov   r0, #5	     @ Load register r0 with the value 5
	mov   r1, #4	     @ Load register r1 with the value 4
	add   r2, r1, r0     @ Add r0 and r1 and store in r2

stop:	b stop               @ Infinite loop to stop execution

The .text is an assembler directive, which says that the following instructions have to be assembled into the code section, rather than the .data section. Sections will be covered in detail, later in the tutorial.

Building the Binary

Save the program in a file say add.s. To assemble the file, invoke the GNU Toolchain’s assembler as, as shown in the following command.

$ arm-none-eabi-as -o add.o add.s

The -o option specifies the output filename.

Cross toolchains are always prefixed with the target architecture for which they are built, to avoid name conflicts with the host toolchain. For the sake readability, tools will be referred to without the prefix, in the text.

To generate the executable file, invoke the GNU Toolchain’s linker ld, as shown in the following command.

$ arm-none-eabi-ld -Ttext=0x0 -o add.elf add.o

Here again, the -o option specifies the output filename. The -Ttext=0x0, specifies that addresses should be assigned to the labels, such that the instructions were starting from address 0x0. To view the address assignment for various labels, the nm command can be used as shown below.

$ arm-none-eabi-nm add.elf
... clip ...
00000000 t start
0000000c t stop

Note the address assignment for the labels start and stop. The address assigned for start is 0x0. Since it is the label of the first instruction. The label stop is after 3 instructions. Each instructions is 4 bytes. Hence stop is assigned an address 12 (0xC).

Linking with a different base address for the instructions will result in a different set of addresses being assigned to the labels.

$ arm-none-eabi-ld -Ttext=0x20000000 -o add.elf add.o
$ arm-none-eabi-nm add.elf
... clip ...
20000000 t start
2000000c t stop

The output file created by ld is in a format called ELF. Various file formats are available for storing executable code. The ELF format works fine when you have an OS around, but since we are going to run the program on bare metal, we will have to convert it to a simpler file format called the binary format.

A file in binary format contains consecutive bytes from a specific memory address. No other additional information is stored in the file. This is convenient for Flash programming tools, since all that has to be done when programming is to copy each byte in the file, to consecutive address starting from a specified base address in memory.

The GNU toolchain’s objcopy command can be used to convert between different object file formats. A common usage of the command is given below.

objcopy -O <output-format> <in-file> <out-file>

To convert add.elf to binary format the following command can be used.

$ arm-none-eabi-objcopy -O binary add.elf add.bin

Check the size of the file. The file will be exactly 16 bytes. Since there are 4 instructions and each instruction occupies 4 bytes.

$ ls -al add.bin
-rw-r--r-- 1 vijaykumar vijaykumar 16 2008-10-03 23:56 add.bin

Executing in Qemu

When the ARM processor is reset, it starts executing from address 0x0. On the connex board a 16MB Flash is located at address 0x0. The instructions present in the beginning of the Flash will be executed.

When qemu emulates the connex board, a file has to be specified which will be treated file as Flash memory. The Flash file format is very simple. To get the byte from address X in the Flash, qemu reads the byte from offset X in the file. In fact, this is the same as the binary file format.

To test the program, on the emulated Gumstix connex board, we first create a 16MB file representing the Flash. We use the dd command to copy 16MB of zeroes from /dev/zero to the file flash.bin. The data is copied in 4K blocks.

$ dd if=/dev/zero of=flash.bin bs=4096 count=4096

add.bin file is then copied into the beginning of the Flash, using the following command.

$ dd if=add.bin of=flash.bin bs=4096 conv=notrunc

This is the equivalent of programming the bin file on to the Flash memory.

After reset, the processor will start executing from address 0x0, and the instructions from the program will get executed. The command to invoke qemu is given below.

$ qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null

The -M connex option specifies that the machine connex is to be emulated. The -pflash options specifies that flash.bin file represents the Flash memory. The -nographic specifies that simulation of a graphical display is not required. The -serial /dev/null specifies that the serial port of the connex board is to be connected to /dev/null, so that the serial port data is discarded.

The system executes the instructions and after completion, keeps looping infinitely in the stop: b stop instruction. To view the contents of the registers, the monitor interface of qemu can be used. The monitor interface is a command line interface, through which the emulated system can be controlled and the status of the system can be viewed. When qemu is started with the above mentioned command, the monitor interface is provided in the standard I/O of qemu.

To view the contents of the registers the info registers monitor command can be used.

(qemu) info registers
R00=00000005 R01=00000004 R02=00000009 R03=00000000
R04=00000000 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=00000000 R14=00000000 R15=0000000c
PSR=400001d3 -Z-- A svc32

Note the value in register R02. The register contains the result of the addition and should match with the expected value of 9.

More Monitor Commands

Some useful qemu monitor commands are listed in the following table.

Command Purpose


List available commands


Quits the emulator

xp /fmt addr

Physical memory dump from addr


Reset the system.

The xp command deserves more explanation. The fmt argument specifies how the memory contents is to be displayed. The syntax of fmt is <count><format><size>.


specifies no. of data items to be dumped.


specifies the size of each data item. b for 8 bits, h for 16 bits, w for 32 bits and g for 64 bits.


specifies the display format. x for hex, d for signed decimal, u for unsigned decimal, o for octal, c for char and i for asm instructions.

This xp command with the i format, can be used to disassemble the instructions present in memory. To disassemble the instructions located at 0x0, the xp command with the fmt specified as 4iw can be used. The 4 specifies 4 items are to be displayed, i specifies that the items are to be printed as instructions (yes, a built in disassembler!), w specifies that the items are 32 bits in size. The output of the command is shown below.

(qemu) xp /4iw 0x0
0x00000000:  mov        r0, #5  ; 0x5
0x00000004:  mov        r1, #4  ; 0x4
0x00000008:  add        r2, r1, r0
0x0000000c:  b  0xc

More Assembler Directives

In this section, we will describe some commonly used assembler directives, using two example programs.

  1. A program to sum an array

  2. A program to calculate the length of a string

Sum an Array

The following code sums an array of bytes and stores the result in r3.

Sum an Array

The code introduces two new assembler directives — .byte and .align. These assembler directives are described below.

.byte Directive

The byte sized arguments of .byte are assembled into consecutive bytes in memory. There are similar directives .2byte and .4byte for storing 16 bit values and 32 bit values, respectively. The general syntax is given below.

.byte   exp1, exp2, ...
.2byte  exp1, exp2, ...
.4byte  exp1, exp2, ...

The arguments could be simple integer literal, represented as binary (prefixed by 0b or 0B), octal (prefixed by 0), decimal or hexadecimal (prefixed by 0x or 0X). The integers could also be represented as character constants (character surrounded by single quotes), in which case the ASCII value of the character will be used.

The arguments could also be C expressions constructed out of literals and other symbols. Examples are shown below.

pattern:  .byte 0b01010101, 0b00110011, 0b00001111
npattern: .byte npattern - pattern
halpha:   .byte 'A', 'B', 'C', 'D', 'E', 'F'
dummy:    .4byte 0xDEADBEEF
nalpha:   .byte 'Z' - 'A' + 1

.align Directive

ARM requires that the instructions be present in 32-bit aligned memory locations. The address of the first byte, of the 4 bytes in an instruction, should be a multiple of 4. To adhere to this, the .align directive can be used to insert padding bytes till the next byte address will be a multiple of 4. This is required only when data bytes or half words are inserted within code.

String Length

The following code calculates the length of string and stores the length in register r1.

String Length

The code introduces two new assembler directives - .asciz and .equ. The assembler directives are described below.

.asciz Directive

The .asciz directive accepts string literals as arguments. String literal are a sequence characters in double quotes. The string literals are assembled into consecutive memory locations. The assembler automatically inserts a nul character (\0 character) after each string.

The .ascii directive is same as .asciz, but the assembler does not insert a nul character after each string.

.equ Directive

The assembler maintains something called a symbol table. The symbol table maps label names to addresses. Whenever the assembler encounters a label definition, the assembler makes an entry in the symbol table. And whenever the assembler encounters a label reference, it replaces the label by the corresponding address from the symbol table.

Using the assembler directive .equ, it is also possible to manually insert entries in the symbol table, to map names to values, which are not necessarily addresses. Whenever the assembler encounters these names, it replaces them by their corresponding values. These names and label names are together called symbol names.

The general syntax of the directive is given below.

.equ name, expression

The name is a symbol name, and has the same restrictions as that of the label name. The expression could be simple literal, or an expression as explained for the .byte directive.

Unlike the .byte directive, the .equ directive itself does not allocate any memory. They just create entries in the symbol table.

Using RAM

The Flash memory, in which the previous example programs were stored, is a kind of EEPROM. It is a useful secondary storage, like a hard disk, but is not convenient to store variables in Flash. The variables should be stored in RAM, so that they can be easily modified.

The connex board has a 64 MB of RAM starting at address 0xA0000000, in which variables can be stored. The memory map of the connex board can be pictured as shown in the following diagram.

Memory Map
Figure 1. Memory Map

Necessary setup has to be done to place the variables at this address. To understand what has to be done, the role of assembler and linker has to be understood.


While writing a multi-file program, each file is assembled individually into object files. The linker combines these object files to form the final executable.

Role of the Linker
Figure 2. Role of the Linker

While combining the object files together, the linker performs the following operations.

  1. Symbol Resolution

  2. Relocation

We will look into these operations, in detail, in this section.

Symbol Resolution

In a single file program, while producing the object file, all references to labels are replaced by their corresponding addresses by the assembler. But in a multi-file program, if there are any references to labels defined in another file, the assembler marks these references as "unresolved". When these object files are passed to the linker, the linker determines the values for these references from the other object files, and patches the code with the correct values.

The sum of array example is split into two files, to demonstrate the symbol resolution performed by the linker. The two files will be assembled and their symbol tables examined to show the presence of unresolved references.

The file sum-sub.s contains the sum subroutine, and the file main.s invokes the subroutine with the required arguments. The source of the files is shown below.

main.s - Subroutine Invocation
sum-sub.s - Subroutine Definition

A word on the .global directive is in order. In C, all variables declared outside functions are visible to other files, until explicitly stated as static. In assembly, all labels are static AKA local (to the file), until explicitly stated that they should be visible to other files, using the .global directive.

The files are assembled, and the symbol tables are dumped using the nm command.

$ arm-none-eabi-as -o main.o main.s
$ arm-none-eabi-as -o sum-sub.o sum-sub.s
$ arm-none-eabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
         U sum
$ arm-none-eabi-nm sum-sub.o
00000004 t loop
00000000 T sum

For now, focus on the letter in the second column, which specifies the symbol type. A t indicates that the symbol is defined, in the text section. A u indicates that the symbol is undefined. A letter in uppercase indicates that the symbol is .global.

It is evident that the symbol sum is defined in sum-sub.o and is not resolved yet in main.o. When the linker is invoked the symbol references will be resolved, and the executable will be produced.


Relocation is the process of changing addresses already assigned to labels. This will also involve patching up all label references to reflect the newly assigned address. Primarily, relocation is performed for the following two reasons:

  1. Section Merging

  2. Section Placement

To understand the process of relocation, an understanding of the concept of sections is essential.

Code and data have different run time requirements. For example code can be placed in read-only memory, and data might require read-write memory. It would be convenient, if code and data is not interleaved. For this purpose, programs are divided into sections. Most programs have at least two sections, .text for code and .data for data. Assembler directives .text and .data, are used to switch back and forth between the two sections.

It helps to imagine each section as a bucket. When the assembler hits a section directive, it puts the code/data following the directive in the selected bucket. Thus the code/data that belong to particular section appear in contiguous locations. The following figures show how the assembler re-arranges data into sections.

Figure 3. Sections

Now that we have an understanding of sections, let us look into the primary reasons for which relocation is performed.

Section Merging

When dealing with multi-file programs, the sections with the same name (example .text) might appear, in each file. The linker is responsible for merging sections from the input files, into sections of the output file. By default, the sections, with the same name, from each file is placed contiguously and the label references are patched to reflect the new address.

The effects of section merging can be seen by looking at the symbol table of the object files and the corresponding executable file. The multi-file sum of array program can be used to illustrate section merging. The symbol table of the object files main.o and sum-sub.o and the symbol table of the executable file sum.elf is shown below.

$ arm-none-eabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
         U sum
$ arm-none-eabi-nm sum-sub.o
00000004 t loop (1)
00000000 T sum
$ arm-none-eabi-ld -Ttext=0x0 -o sum.elf main.o sum-sub.o
$ arm-none-eabi-nm sum.elf
00000004 t arr
00000007 t eoa
00000008 t start
00000018 t stop
00000028 t loop (1)
00000024 T sum
  1. The loop symbol has address 0x4 in sum-sub.o, and 0x28 in sum.elf, since the .text section of sum-sub.o is placed right after the .text section of main.o.

Section Placement

When a program is assembled, each section is assumed to start from address 0. And thus labels are assigned values relative to start of the section. When the final executable is created, the section is placed at some address X. And all references to the labels defined within the section, are incremented by X, so that they point to the new location.

The placement of each section at a particular location in memory and the patching of all references to the labels in the section, is done by the linker.

The effects of section placement can be seen by looking at the symbol table of the object file and the corresponding executable file. The single file sum of array program can be used to illustrate section placement. To make things clearer, we will place the .text section at address 0x100.

$ arm-none-eabi-as -o sum.o sum.s
$ arm-none-eabi-nm -n sum.o
00000000 t entry (1)
00000004 t arr
00000007 t eoa
00000008 t start
00000014 t loop
00000024 t stop
$ arm-none-eabi-ld -Ttext=0x100 -o sum.elf sum.o (2)
$ arm-none-eabi-nm -n sum.elf
00000100 t entry (3)
00000104 t arr
00000107 t eoa
00000108 t start
00000114 t loop
00000124 t stop
  1. The address for labels are assigned starting from 0 within a section.

  2. When the executable is created the linker is instructed to place the text section at address 0x100.

  3. The address for labels in the .text section are re-assigned starting from 0x100, and all label references will be patched to reflect this.

The process of section merging and placement is shown in the following figure.

Section Merging and Placement
Figure 4. Section Merging and Placement

Linker Script File

As mentioned in the previous section, section merging and placement is done by the linker. The programmer can control how the sections are merged, and at what locations they are placed in memory through a linker script file. A very simple linker script file, is shown below.

Basic linker script
	. = 0x00000000; (2)
	.text : { (3)
		abc.o (.text);
		def.o (.text);
	} (3)
  1. The SECTIONS command is the most important linker command, it specifies how the sections are to be merged and at what location they are to be placed.

  2. Within the block following the SECTIONS command, the . (period) represents the location counter. The location is always initialised to 0x0. It can be modified by assigning a new value to it. Setting the value to 0x0 at the beginning is superfluous.

  3. This part of the script specifies that, the .text section from the input files abc.o and def.o should go to the .text section of the output file.

The linker script can be further simplified and generalised by using the wild card character * instead of individually specifying the file names.

Wildcard in linker scripts
	. = 0x00000000;
	.text : { * (.text); }

If the program contains both .text and .data sections, the .data section merging and location can be specified as shown below.

Multiple sections in linker scripts

Here, the .text section is located at 0x0 and .data is located at 0x400. Note that, if the location counter is not assigned a different value, the .text and .data sections will be located at adjacent memory locations.

Linker Script Example

To demonstrate the use of linker scripts, we will use the linker script shown in Multiple sections in linker scripts to control the placement of a program’s .text and .data section. We will use a slightly modified version of the sum of array program for this purpose. The code is shown below.


The only change here is that the array is now in the .data section. Also note that the nasty branch instruction to skip over the data is also not required, since the linker script will place the .text section and .data section appropriately. As a result, statements can be placed in the program, in any convenient way, and the linker script will take care of placing the sections correctly in memory.

When the program is linked, the linker script is passed as an input to the linker, as shown in the following command.

$ arm-none-eabi-as -o sum-data.o sum-data.s
$ arm-none-eabi-ld -T -o sum-data.elf sum-data.o

The option -T specifies that is to be used as the linker script. Dumping the symbol table, will provide an insight into how the sections are placed in memory.

$ arm-none-eabi-nm -n sum-data.elf
00000000 t start
0000000c t loop
0000001c t stop
00000400 d arr
00000403 d eoa

From the symbol table it is obvious that the .text is placed starting from address 0x0 and .data section is placed starting from address 0x400.

Data in RAM, Example

Now that we know, how to write linker scripts, we will attempt to write a program, and place the .data section in RAM.

The add program is modified to load two values from RAM, add them and store the result back to RAM. The two values and the space for result is placed in the .data section.

Add Data in RAM

When the program is linked, the linker script shown below is used.


The dump of the symbol table of .elf is shown below.

$ arm-none-eabi-nm -n add-mem.elf
00000000 t start
0000001c t stop
a0000000 d val1
a0000001 d val2
a0000002 d result

The linker script seems to have solved the problem of placing the .data section in RAM. But wait, the solution is not complete yet!

RAM is Volatile!

RAM is volatile memory, and hence it is not possible to directly make the data available in RAM, on power up.

All code and data should be stored in Flash before power-up. On power-up, a startup code is supposed to copy the data from Flash to RAM, and then proceed with the execution of the program. So the program’s .data section has two addresses, a load address in Flash and a run-time address in RAM.

In ld parlance, the load address is called LMA (Load Memory Address), and the run-time address is called VMA (Virtual Memory Address.).

The following two modifications have to be done, to make the program work correctly.

  1. The linker script has to be modified to specify both the load address and the run-time address, for the .data section.

  2. A small piece of code should copy the .data section from Flash (load address) to RAM (run-time address).

Specifying Load Address

The run-time address is what that should be used for determining the address of labels. In the previous linker script, we have specified the run-time address for the .data section. The load address is not explicitly specified, and defaults to the run-time address. This is OK, with the previous examples, since the programs were executed directly from Flash. But, if data is to be placed in RAM during execution, the load address should correspond to Flash and the run-time address should correspond to RAM.

A load address different from the run-time address can be specified using the AT keyword. The modified linker script is shown below.

	. = 0x00000000;
	.text : { * (.text); }
	etext = .; (1)

	. = 0xA0000000;
	.data : AT (etext) { * (.data); } (2)
  1. Symbols can be created on the fly within the SECTIONS command by assigning values to them. Here etext is assigned the value of the location counter at that position. etext contains the address of the next free location in Flash right after all the code. This will be used later on to specify where the .data section is to be placed in Flash. Note that etext itself will not be allocated any memory, it is just an entry in the symbol table.

  2. The AT keyword specifies the load address of the .data section. An address or symbol (whose value is a valid address) could be passed as argument to AT. Here the load address of .data is specified as the location right after all the code in Flash.

Copying .data to RAM

To copy the data from Flash to RAM, the following information is required.

  1. Address of data in Flash (flash_sdata)

  2. Address of data in RAM (ram_sdata)

  3. Size of the .data section. (data_size)

With this information the data can be copied from Flash to RAM using the following code snippet.

	ldr   r0, =flash_sdata
	ldr   r1, =ram_sdata
	ldr   r2, =data_size

	ldrb  r4, [r0], #1
	strb  r4, [r1], #1
	subs  r2, r2, #1
	bne   copy

The linker script can be slightly modified to provide these information.

Linker Script with Section Copy Symbols
	. = 0x00000000;
	.text : {
	      * (.text);
	flash_sdata = .; (1)

	. = 0xA0000000;
	ram_sdata = .; (2)
	.data : AT (flash_sdata) {
              * (.data);
	ram_edata = .; (3)
	data_size = ram_edata - ram_sdata; (3)
  1. Start of data in Flash is right after all the code in Flash.

  2. Start of data in RAM is at the base address of RAM.

  3. Obtaining the size of data is not straight forward. The data size is calculated from the difference in the start of data in RAM and the end of data in RAM. Yes, simple expressions are allowed within the linker script.

The add program with data copied to RAM from Flash is listed below.

Add Data in RAM (with copy)

The program is assembled and linked using the linker script listed in Linker Script with Section Copy Symbols. The program is executed and tested within Qemu.

qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null
(qemu) xp /4dw 0xA0000000
a0000000:         10         30         40          0
In a real system with an SDRAM, the memory should not be accessed right-away. The memory controller will have to be initialised before performing a memory access. Our code works because the simulated memory does not require the memory controller to be initialised.

Exception Handling

The examples given so far have a major bug. The first 8 words in the memory map are reserved for the exception vectors. When an exception occurs the control is transferred to one these 8 locations. The exceptions and their exception vector addresses are show in the following table.

Table 1. Exception Vector Addresses
Exception Address



Undefined Instruction


Software Interrupt (SWI)


Prefetch Abort


Data Abort


Reserved, not used






These locations are supposed to contain a branch that will transfer control the appropriate exception handler. In the examples we have seen so far, we haven’t inserted branch instructions at the exception vector addresses. We got away without issues since these exceptions did not occur. All the above programs can be fixed, by linking them with the following assembly code.

	.section "vectors"
reset:	b     start
undef:  b     undef
swi:	b     swi
pabt:	b     pabt
dabt:	b     dabt
irq:	b     irq
fiq:	b     fiq

Only the reset exception is vectored to a different address start. All other exceptions are vectored to the same address. So if any exception other that reset occurs, the processor will be spinning in the same location. The exception can then be identified by looking at the value of pc through a debugger (the monitor interface in our case).

To ensure that these instruction are placed at the exception vector addresses, the linker script should look something like below.

	. = 0x00000000;
	.text : {
		* (vectors);
		* (.text);

Notice how the vectors section is placed before all other code, ensuring that the vectors is located at address starting from 0x0.

C Startup

It is not possible to directly execute C code, when the processor comes out of reset. Since, unlike assembly language, C programs need some basic pre-requisites to be satisfied. This section will describe the pre-requisites and how to meet the pre-requisites.

We will take the example of C program that calculates the sum of an array as an example. And by the end of this section, we will be able to perform the necessary setup, transfer control to the C code and execute it.

Sum of Array in C

Before transferring control to C code, the following have to be setup correctly.

  1. Stack

  2. Global variables

    1. Initialized

    2. Uninitialized

  3. Read-only data


C uses the stack for storing local (auto) variables, passing function arguments, storing return address, etc. So it is essential that the stack be setup correctly, before transferring control to C code.

Stacks are highly flexible in the ARM architecture, since the implementation is completely left to the software. For people not familiar with the ARM architecture a overview is provided in Appendix D: ARM Stacks.

To make sure that code generated by different compilers is interoperable, ARM has created the ARM Architecture Procedure Call Standard (AAPCS). The register to be used as the stack pointer and the direction in which the stack grows is all dictated by the AAPCS. According to the AAPCS, register r13 is to be used as the stack pointer. Also the stack should be full-descending.

One way of placing global variables and the stack is shown in the following diagram.

Figure 5. Stack Placement

So all that has to be done in the startup code is to point r13 at the highest RAM address, so that the stack can grow downwards (towards lower addresses). For the connex board this can be acheived using the following ARM instruction.

	ldr sp, =0xA4000000

Note that the the assembler provides an alias sp for the r13 register.

The address 0xA4000000 itself does not correspond to RAM. The RAM ends at 0xA3FFFFFF. But that is OK, since the stack is full-descending, during the first push the stack pointer will be decremented first and the value will be stored.

Global Variables

When C code is compiled, the compiler places initialized global variables in the .data section. So just as with the assembly, the .data has to be copied from Flash to RAM.

The C language guarantees that all uninitialized global variables will be initialized to zero. When C programs are compiled, a separate section called .bss is used for uninitialized variables. Since the value of these variables are all zeroes to start with, they do not have to be stored in Flash. Before transferring control to C code, the memory locations corresponding to these variables have to be initialized to zero.

Read-only Data

GCC places global variables marked as const in a separate section, called .rodata. The .rodata is also used for storing string constants.

Since contents of .rodata section will not be modified, they can be placed in Flash. The linker script has to modified to accomodate this.

Startup Code

Now that we know the pre-requisites we can create the linker script and the startup code. The linker script Linker Script with Section Copy Symbols is modified to accomodate the following.

  1. .bss section placement

  2. vectors section placement

  3. .rodata section placement

The .bss is placed right after .data section in RAM. Symbols to locate the start of .bss and end of .bss are also created in the linker script. The .rodata is placed right after .text section in Flash. The following diagram shows the placement of the various sections.

Figure 6. Section Placement
Linker Script for C code

The startup code has the following parts

  1. exception vectors

  2. code to copy the .data from Flash to RAM

  3. code to zero out the .bss

  4. code to setup the stack pointer

  5. branch to main

C Startup Assembly

To compile the code, it is not necessary to invoke the assembler, compiler and linker individually. gcc is intelligent enough to do that for us.

As promised before, we will compile and execute the C code shown in Sum of Array in C.

$ arm-none-eabi-gcc -nostdlib -o csum.elf -T csum.c startup.s

The -nostdlib option is used to specify that the standard C library should not be linked in. A little extra care has to be taken when the C library is linked in. This is discussed in Using the C Library.

A dump of the symbol table will give a better picture of how things have been placed in memory.

$ arm-none-eabi-nm -n csum.elf
00000000 t reset	(1)
00000004 A bss_size
00000004 t undef
00000008 t swi
0000000c t pabt
00000010 t dabt
00000018 A data_size
00000018 t irq
0000001c t fiq
00000020 T main
00000090 t start	(2)
000000a0 t copy
000000b0 t init_bss
000000c4 t zero
000000d0 t init_stack
000000d8 t stop
000000f4 r n		(3)
000000f8 A flash_sdata
a0000000 d arr		(4)
a0000000 A ram_sdata
a0000018 A ram_edata
a0000018 A sbss
a0000018 b sum		(5)
a000001c A ebss
  1. reset and the rest of the exception vectors are placed starting from 0x0.

  2. The assembly code is placed right after the 8 exception vectors (8 * 4 = 32 = 0x20).

  3. The read-only data n, is placed in Flash after the code.

  4. The initialized data arr, an array of 6 integers, is placed at the start of RAM 0xA0000000.

  5. The uninitialized data sum is placed after the array of 6 integers. (6 * 4 = 24 = 0x18)

To execute the program, convert the program to .bin format, execute in Qemu, and dump the sum variable located at 0xA0000018.

$ arm-none-eabi-objcopy -O binary csum.elf csum.bin
$ dd if=csum.bin of=flash.bin bs=4096 conv=notrunc
$ qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null
(qemu) xp /6dw 0xa0000000
a0000000:          1         10          4          5
a0000010:          6          7
(qemu) xp /1dw 0xa0000018
a0000018:         33

Using the C Library

FIXME: This section is yet to be written.

Inline Assembly

FIXME: This section is yet to be written.


Like every other open source project, we gladly accept contributions. Sections that need help have been marked with FIXMEs. All contributions will be duly credited in the credits page.

This document’s source is maintained in a public git repo located at To contribute to the project, fork the project on github and send in a pull request.

The document is written in asciidoc, and converted to HTML using the docbook-xsl stylesheets.

Required software install

sudo apt install asciidoc docbook imgsizer dia
sudo apt install libsaxon-java libxslthl-java



  • The original tutorial was written by Vijay Kumar B., <[email protected]>

  • Jim Huang, Jesus Vicenti, Goodwealth Chu, Jeffrey Antony, Jonathan Grant, David LeBlanc, reported typos and suggested fixes in the code and text.

  • Dmitry Ponyatov added some info on build automation using make tool.


The following great free software tools were used for the construction of the tutorial.

  1. asciidoc for lightweight markup

  2. xsltproc for HTML transformation

  3. docbook-xsl for the stylesheets

  4. highlight.js for syntax highlighting

  5. dia for diagram creation

  6. GoSquared Arrow Icons for the navigation icons

  7. mercurial for version control

  8. emacs …​

"Embedded Programming with the GNU Toolchain" is Copyright © 2009, 2010, 2011 Vijay Kumar B. <[email protected]>

Appendix A: Using make

If you are bothered with the long commands you have to enter every time you compile samples in this tutorial, you can use the make tool. make is utility which executes commands to build targets, as described in a Makefile.

Writing makefiles is must have skill for programmers, especially large multi-file projects contains dozens of files, which must be assembled, compiled and translated into lot of different formats.

Every dependency between two or more files must be configured into rule. A rule has the following basic syntax.

<target> : <source>
<tab><compiling command1>
<tab><compiling command2>

is one or more filenames, delimited by spaces. This file(s) will be created or updated by this make rule.


zero or more filenames delimited by spaces. This file(s) will be checked for modifications by last edit time marker.


is the TAB character with ASCII code 0x09. You must have an editor, capable of working with tabs, not overwriting them with sequences of spaces. Make rules can have commands that are indented by a single TAB character.

compiling command

any command, like assembler command or linker command, which creates / updates `target`s.

The main principle of every Makefile rule: if one of source files is newer than one of target file, rule body will be executed to update target.

Let’s write a simple Makefile for a tiny program, described in Hello ARM section:

emulation: add.flash
	qemu-system-arm -M connex -pflash add.flash \
		-nographic -serial /dev/null

flash.bin: add.bin
	dd if=/dev/zero of=flash.bin bs=4K count=4K
	dd if=add.bin of=flash.bin bs=4K conv=notrunc

add.bin: add.elf
	arm-none-eabi-objcopy -O binary add.elf add.bin

add.elf: add.o
	arm-none-eabi-ld -o add.elf add.o

add.o: add.s
	arm-none-eabi-as -o add.o add.s

Note, that the backslash and next left-tabbed line in the first rule: You can split a long command into multiple lines; each line must be indented with one or more tab to follow make rule syntax.

Enter make command without any arguments in the directory with the Makefile and source files, and you will get the binary files automatically compiled and executed in Qemu.

$ make
QEMU 2.1.2 monitor - type 'help' for more information
If you use make without parameters, first rule in the Makefile will be processed as main target, walking over dependencies in all rules.

Selecting a Specific Target

If you need to update only a specific target, and not first rule, specify the required target after the make command:

$ make add.o
make: 'add.o' is up to date.

This command will build only add.o object file, if and only if you modified add.s before command run. If you see some message like make: 'add.o' is up to date., the source file was not changed, and make will not execute the commands in the rule.

This is very useful if you have lot of source files (thousands of files, like Linux kernel for example), and fix some bug only in one source file. Without make (using simple .rc shell script or .bat file) every tiny change in source file will require a recompilation of all files in the project, which can lasts for several hours. Using make you can do only few required compiler and linker calls, which will be much much faster.

Returning to our add.o, you can force the assembler to run without changing add.s source file, by `touch`ing them:

$ touch add.s
$ make add.o
arm-none-eabi-as -o add.o add.s

touch command changes only the modification time of the source file add.s, not changing its content, thus make will think that the file has changed, and will run assembler for selected target add.o.

By default, make will print every command and its output. If you have some reason to avoid logging a command, you can prefix the command in make rule with "-" minus sign.


Variables are storage locations, which can contain some text. For example, we can define some variables in a Makefile, and use them in all rules. Variables are defined using the following syntax.

myvar = Hello World

Here myvar is the variable name, and "Hello World" is the text stored in myvar. The value of a variable can be retrieved, using the notation $(myvar).

Very useful make tip: you can use two special variables: $@ and $<.


variable represents left side of Makefile rule, typically single target file name


variable represents the first file in the source list

While running make, you can force redefinition of any variable value. change compiler options, tune rules, and ever use one universal Makefile for all of your projects.

We change our Makefile to use variables, so that it is easy to adapt, and also reduces repetition.

# APPlication name, you can change it in make command line parameters
# to compile another one file program with same Makefile
APP = add

# some std.variables widely used in Linux source builds:
## architecture
ARCH = arm
## target system triplet
TARGET = $(ARCH)-none-eabi-
## std. toolchain commands
AS = $(TARGET)as
LD = $(TARGET)ld
CC = $(TARGET)gcc
CXX = $(TARGET)g++
OBJDUMP = $(TARGET)objdump
OBJCOPY = $(TARGET)objcopy

# FlashROM size of target system, in 4K block
emu: $(APP).flash
	qemu-system-$(ARCH) -M connex -pflash $< \
		-nographic -serial /dev/null

$(APP).flash: $(APP).bin
	dd if=/dev/zero of=$@ bs=4K count=$(FLASHBLOCKS)
	dd if=$< of=$@ bs=4K conv=notrunc

$(APP).bin: $(APP).elf
	$(OBJCOPY) -O binary $< $@

$(APP).elf: $(APP).o
	$(LD) -o $@ $<

$(APP).o: $(APP).s
	$(AS) -o $@ $<

First of all, we replaced all file names with $(APP).<extension>.

Note the use of variables in $(...) brackets, and specials $@ and $<

Then, we replaced all file parameters in assembler/linker calls with $@ and $<, which makes our rules more rigid.

Also we used some standard variables to select target system architecture, and prefixed all toolchain calls with $(TARGET) variable.

Now we can compile another source file: arrsum.s, by invoking make as shown below. We override the APP variable value in the command line.:

$ make APP=arrsum

arm-none-eabi-as -o arrsum.o arrsum.s
arm-none-eabi-ld -o arrsum.elf arrsum.o
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objcopy -O binary arrsum.elf arrsum.bin
dd if=/dev/zero of=arrsum.flash bs=4K count=4K
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 0.0184385 s, 910 MB/s
dd if=arrsum.bin of=arrsum.flash bs=4K conv=notrunc
0+1 records in
0+1 records out
48 bytes (48 B) copied, 5.9686e-05 s, 804 kB/s
qemu-system-arm -M connex -pflash arrsum.flash \
        -nographic -serial /dev/null
QEMU 2.1.2 monitor - type 'help' for more information
(qemu) q

We just compiled another program without any change to the Makefile, all file names was renamed automatically, we got a separate flash image file running in Qemu, in one command.

Standard Variables

Some variables are widely used in Makefiles of open source programs. These Makefiles have many tricks, not covered in this tutorial.


target system architecture: arm, mips, i386, x86_64, avr, etc.


target system triplet: arm-none-eabi, i486-none-elf, i686-linux-uclibc, selects the cross-compiler, and greatly impacts the generated binary code: CPU command set, memory layout, etc.


triplet for your developer computer, something like x86_64-linux-gnu. This variable can be used if you build some helper programs, which must run on your workstation, for example to autogenerate some code for the TARGET.


assembler command name, as for host system, and something like super-puper-elf-as for cross-compiler toolchain.


linker command name.


pure C compiler command name.


C++ compiler command name.


nm command name.


objcopy command name.


assembler flags.


linker flags.


flags for C compiler.


flags for C++ compiler.

Standard targets

Open source developers have some widely used standard targets:


build all targets of the project, it must be first rule in Makefile.


build documentation, using asciidoc, DocBook, LaTeX, etc. These are markup translators, which will translate documentation from sources into some widely used document format like .html and .pdf files.


remove all intermediate files (.o, .elf, .bin) and program binary. Typical use of clean is command which you must use for total program rebuild from scratch:

make clean

in addition to clean, removes backup files, temporary files, that were not generated by make itself. This is generally used to prepare the source tree for distribution.

Recommended Makefile rules
FLASH = add.flash arrsum.flash strlen.flash

.PHONY: all
all: $(FLASH)

.PHONY: clean
	rm -rf *.o *.elf *.bin *.flash

.PHONY: distclean
distclean: clean
	rm -rf doc/manual.pdf

Phony targets

In previous Makefile sample, you can see .PHONY directives. This directives marks target as one that is not really the name of a file; rather is just the name for a recipe to be executed when the rule is invoked.

If you write a rule whose recipe goal will not create the target file, mark it with .PHONY, and the recipe will be executed every time the target comes up for "re-making". In our example clean is a .PHONY targets, because the rm command in clean rule does not create a file named "clean", probably no such file will ever exist. Therefore, the rm command will be executed every time you say make clean.

You can think of phony targets in Makefile as a command line menu, you select the required action to run using make action.

Run qemu from Makefile

If you define special emu rule:

.PHONY: emu
emu: $(APP).flash
	qemu-system-$(ARCH) -M connex -pflash $< \
		-nographic -serial /dev/null

make Tips

You can set APP from the current directory name, as shown below.

APP = $(notdir $(CURDIR))

gives you current directory, where you run make

$(notdir <path>)

notdir is a built-in make function, that removes leading components of path.

Thus you can automagically set APP name to the current directory name, for example you made a new project. This allows you to use one universal Makefile, ~/universal.makefile, for all of your projects.

$ cd ~
$ mkdir superjob && cd superjob
$ git init
... write some code ...
$ make -f ~/universal.makefile
-f make option lets you set a file other then the Makefile.

Using Pattern Rules

Finally, let’s define universal pattern rules, which will assemble, compile and link any file with specific extensions.

Makefile pattern rules
%.o: %.s
	$(AS) $(ASFLAGS) -o $@ $<

%.o: %.c
	$(CC) $(CCFLAGS) -o $@ $<

%.o: %.cpp
	$(CXX) $(CXXFLAGS) -o $@ $<

%.elf: %.o
	$(LD) $(LDFLAGS) -o $@ $<

%.bin: %.elf
	$(OBJCOPY) -O binary $< $@

%.flash: %.bin
	dd if=/dev/zero of=$@ bs=4K count=$(FLASHBLOCKS)
	dd if=$< of=$@ bs=4K conv=notrunc

With this in place, you don’t need to define rules for every file like:

file1.o : file1.s
	$(AS) -o $@ $<

file2.o : file2.s
	$(AS) -o $@ $<

... 100 rules ...

file100.o : file100.s
	$(AS) -o $@ $<

Just define one rule, and pattern rules set will be automagically build your huge project.

.PHONY: all
all: file1.flash file2.flash ... file10050.flash

Appendix B: ARM Programmer’s Model

A simplified ARM programmer’s model is provided in this section.

Register File

In the ARM processor, 16 general purpose registers are available at any time. Each register is 32-bit in size. The registers are referred to as rn, where n represents the register index. All instructions treat registers r0 to r13 equally. Any operation that can be performed on r0 can be performed equally well on registers r1 to r13. But r14 and r15 are assigned special functions by the processor. r15 is the program counter, and contains the address of the next instruction to be fetched. r14 is the link register, and used to store the return address, when a subroutine is invoked.

Though register r13 has no special function assigned to by the processor, conventionally operating systems use it as the stack pointer, and thus points to the top of the stack.
Current Program Status Register

The Current Program Status Register (cpsr) is a dedicated 32-bit register, that contains the following fields.

  1. Condition Flags

  2. Interrupt Masks

  3. Processor Mode

  4. Processor State

Only the condition flags field will be used in the examples provided in this tutorial. And hence only the condition flags will be elaborated here.

The condition flags indicates the various conditions that occur while performing arithmetic and logical operations. The various condition flags and their meaning are given in the following table.

Table 2. Condition Flags
Flag Meaning

Carry C

Operation caused a carry.

Overflow O

Operation caused an overflow.

Zero Z

Operation resulted in 0.

Negative N

Operation resulted in a negative value.

Appendix C: ARM Instruction Set

The ARM processor has a powerful instruction set. But only a subset required to understand the examples in this tutorial will be discussed here.

The ARM has a load store architecture, meaning that all arithmetic and logical instructions take only register operands. They cannot directly operate on operands to memory. Separate instruction load and store instructions are used for moving data between registers and memory.

In this section, the following class of instructions will be elaborated

  1. Data Processing Instructions

  2. Branch Instructions

  3. Load Store Instructions

Data Processing Instructions

The most common data processing instructions are listed in the following table.

Table 3. Data Processing Instructions
Instruction Operation Example

mov rd, n

rd = n

mov r7, r5 ; r7 = r5

add rd, rn, n

rd = rn + n

add r0, r0, #1 ; r0 = r0 + 1

sub rd, rn, n

rd = rn - n

sub r0, r2, r1 ; r0 = r2 + r1

cmp rn, n

rn - n

cmp r1, r2 ; r1 - r2

By default data processing instructions do not update the condition flags. Instructions will update condition flags if it is suffixed with an S. For example, the following instruction adds two registers and updates the condition flags.

adds r0, r1, r2

One exception to this rule is the cmp instruction. Since the only purpose of the cmp instruction is to set condition flags, it does not require the s suffix, for setting flags.

Branch Instructions

The branch instructions cause the processor to execute instructions from a different address. Two branch instruction are available - b and bl. The bl instruction in addition to branching, also stores the return address in the lr register, and hence can be used for sub-routine invocation. The instruction syntax is given below.

b label        ; pc = label
bl label       ; pc = label, lr = addr of next instruction

To return from the subroutine, the mov instruction can be used as shown below.

mov pc, lr
Conditional Execution

Most other instruction sets allow conditional execution of branch instructions, based on the state of the condition flags. In ARM, almost all instructions have can be conditionally executed.

If corresponding condition is true, the instruction is executed. If the condition is false, the instruction is turned into a nop. The condition is specified by suffixing the instruction with a condition code mnemonic.

Mnemonic Condition




Not Equal


Carry Set


Carry Clear


Overflow Clear


Overflow Set






Higher Than


Higher or Same


Lower Than


Lower or Same


Greater Than


Greater Than or Equal


Less Than


Less Than or Equal

In the following example, the instruction moves r1 to r0 only if carry is set.

MOVCS r0, r1
Load Store Instructions

The load store instruction can be used to move single data item between register and memory. The instruction syntax is given below.

ldr   rd, addressing    ; rd = mem32[addr]
str   rd, addressing    ; mem32[addr] = rd
ldrb  rd, addressing    ; rd = mem8[addr]
strb  rd, addressing    ; mem8[addr] = rd

The addressing is formed from two parts

  • base register

  • offset

The base register can be any general purpose register. The offset and base register can interact in 3 different ways.


The offset is added or subtracted from the base register to form the address. ldr Syntax: ldr rd, [rm, offset]


The offset is added or subtracted from the base register to form the address, and the address is written back to the base register. ldr Syntax ldr rd, [rm, offset]!


The base register contains the address to be accessed, and the offset is added or subtracted from the address and stored in the base register. ldr Syntax ldr rd, [rm], offset

The offset can be in the following formats


Offset is an unsigned number, that can be added or subtracted from the base register. Useful for accessing structure members, local variables in the stack. Immediate values start with a #.


Offset is an unsigned value in a general purpose register, that can be a added or subtracted from the base register. Useful for accessing array elements.

Some examples of load store instructions are given below.

ldr  r1, [r0]              ; same as ldr r1, [r0, #0], r1 = mem32[r0]
ldr  r8, [r3, #4]          ; r8 = mem32[r3 + 4]
ldr  r12, [r13, #-4]       ; r12 = mem32[r13 - 4]
strb r10, [r7, -r4]        ; mem8[r7 - r4] = r10
strb r7, [r6, #-1]!        ; mem8[r6 - 1] = r7, r6 = r6 - 1
str  r2, [r5], #8          ; mem32[r5] = r2, r5 = r5 + 8

Appendix D: ARM Stacks

Stacks are highly flexible in the ARM architecture, since the implementation is completely left to the software.

Stack Instructions

The ARM instruction set does not contain any stack specific instructions like push and pop. The instruction set also does not enforce in anyway the use of a stack. Push and pop operations are performed by memory access instructions, with auto-increment addressing modes.

Stack Pointer

The stack pointer is a register that points to the top of the stack. In the ARM processor, there are no dedicated stack pointer registers, and any one of the general purpose registers can be used as the stack pointer.

Stack Types

Since it is left to the software to implement a stack, different implemenation choices result different types of stacks. There are two types of stack depending on how the stack grows.

Ascending stack

In a push the stack pointer is incremented, i.e the stack grows towards higher address.

Descending stack

In a push the stack pointer is decremented, i.e the stack grows towards lower address.

There are two types of stack depending on what the stack pointer points to.

Empty stack

Stack pointer points to the location in which the next item will be stored. A push will store the value, and increment the stack pointer.

Full stack

Stack pointer points to the location in which the last item was stored. A push will increment the stack pointer and store the value.

Four different stacks are possible - full-ascending, full-descending, empty-ascending, empty-descending. All 4 can be implemented using the register load store instructions.