The GNU toolchain is increasingly being used for deeply embedded software development. This type of software development is also called standalone C programming and bare metal C programming. Standalone C programming brings along with it new problems, and dealing with them requires a deeper understanding of the GNU toolchain. The GNU toolchain’s manuals provide excellent information on the toolchain, but from the perspective of the toolchain, rather than the perspective of the problem. Well, that is how manuals are supposed to be written anyway. The result is that the answers to common problems are scattered all over, and new users of the GNU toolchain are left baffled.
This tutorial attempts to bridge the gap by explaining the tools from the perspective of the problem. Hopefully, this should enable more people to use the GNU toolchain for their embedded projects.
For the purpose of this tutorial, an ARM based embedded system is emulated using Qemu. With this you can learn the GNU toolchain from the comforts of your desktop, without having to invest on hardware. This tutorial itself does not teach the ARM instruction set. It is supposed to be used with other books and on-line tutorials like:
-
ARM Assembler - http://www.heyrick.co.uk/assembler/
-
ARM Assembly Language Programming - http://www.arm.com/miscPDFs/9658.pdf
But for the convenience of the reader, frequently used ARM instructions are listed in the appendix.
This section shows how to setup a simple ARM development and testing environment in your PC, using Qemu and the GNU toolchain. Qemu is a machine emulator capable of emulating various machines including ARM based machines. You can write ARM assembly programs, compile them using the GNU toolchain and execute and test them in Qemu.
Qemu will be used to emulate a PXA255 based connex board from Gumstix. You should have at least version 0.9.1 of Qemu to work with this tutorial.
The PXA255 has an ARM core with a ARMv5TE compliant instruction set. The PXA255 also has several on-chip peripherals. Some peripherals will be introduced in the course of the tutorial.
This tutorial requires qemu version 0.9.1 or above. The qemu package
available in Debian Squeeze/Wheezy, meets this requirement. Install
qemu
using apt-get
.
$ apt-get install qemu
-
Folks at CodeSourcery (part of Mentor Graphics) have been kind enough to make GNU toolchains available for various architectures. Download the GNU toolchain for ARM, available from from http://www.mentor.com/embedded-software/sourcery-tools/sourcery-codebench/editions/lite-edition/
-
Extract the tar archive, to
~/toolchains
.$ mkdir ~/toolchains $ cd ~/toolchains $ tar -jxf ~/downloads/arm-2008q1-126-arm-none-eabi-i686-pc-linux-gnu.tar.bz2
-
Add the toolchain to your
PATH
.$ PATH=$HOME/toolchains/arm-2008q1/bin:$PATH
-
You might want to add the previous line to your
.bashrc
.
In this section, you will learn to assemble a simple ARM program, and test it on a bare metal connex board emulated by Qemu.
The assembly program source file consists of a sequence of statements, one per line. Each statement has the following format.
label: instruction @ comment
Each of the components is optional.
label
-
The label is a convenient way to refer to the location of the instruction in memory. The label can be used where ever an address can appear, for example as an operand of the branch instruction. The label name should consist of alphabets, digits,
_
and$
. comment
-
A comment starts with an
@
, and the characters that appear after an@
are ignored. instruction
-
The
instruction
could be an ARM instruction or an assembler directive. Assembler directives are commands to the assembler. Assembler directives always start with a.
(period).
Here is a very simple ARM assembly program to add two numbers.
.text start: @ Label, not really required mov r0, #5 @ Load register r0 with the value 5 mov r1, #4 @ Load register r1 with the value 4 add r2, r1, r0 @ Add r0 and r1 and store in r2 stop: b stop @ Infinite loop to stop execution
The .text
is an assembler directive, which says that the following
instructions have to be assembled into the code section, rather than
the .data
section. Sections will be covered in detail, later in the
tutorial.
Save the program in a file say add.s
. To assemble the file, invoke
the GNU Toolchain’s assembler as
, as shown in the following command.
$ arm-none-eabi-as -o add.o add.s
The -o
option specifies the output filename.
Note
|
Cross toolchains are always prefixed with the target architecture for which they are built, to avoid name conflicts with the host toolchain. For the sake readability, tools will be referred to without the prefix, in the text. |
To generate the executable file, invoke the GNU Toolchain’s linker
ld
, as shown in the following command.
$ arm-none-eabi-ld -Ttext=0x0 -o add.elf add.o
Here again, the -o
option specifies the output filename. The
-Ttext=0x0
, specifies that addresses should be assigned to the
labels, such that the instructions were starting from address
0x0
. To view the address assignment for various labels, the nm
command can be used as shown below.
$ arm-none-eabi-nm add.elf ... clip ... 00000000 t start 0000000c t stop
Note the address assignment for the labels start
and stop
. The
address assigned for start
is 0x0
. Since it is the label of the
first instruction. The label stop
is after 3 instructions. Each
instructions is 4 bytes. Hence stop
is assigned an address 12
(0xC
).
Linking with a different base address for the instructions will result in a different set of addresses being assigned to the labels.
$ arm-none-eabi-ld -Ttext=0x20000000 -o add.elf add.o $ arm-none-eabi-nm add.elf ... clip ... 20000000 t start 2000000c t stop
The output file created by ld
is in a format called ELF
. Various
file formats are available for storing executable code. The ELF format
works fine when you have an OS around, but since we are going to run
the program on bare metal, we will have to convert it to a simpler
file format called the binary
format.
A file in binary
format contains consecutive bytes from a specific
memory address. No other additional information is stored in the
file. This is convenient for Flash programming tools, since all that
has to be done when programming is to copy each byte in the file, to
consecutive address starting from a specified base address in memory.
The GNU toolchain’s objcopy
command can be used to convert between
different object file formats. A common usage of the command is given
below.
objcopy -O <output-format> <in-file> <out-file>
To convert add.elf
to binary format the following command can be
used.
$ arm-none-eabi-objcopy -O binary add.elf add.bin
Check the size of the file. The file will be exactly 16 bytes. Since there are 4 instructions and each instruction occupies 4 bytes.
$ ls -al add.bin -rw-r--r-- 1 vijaykumar vijaykumar 16 2008-10-03 23:56 add.bin
When the ARM processor is reset, it starts executing from address
0x0
. On the connex board a 16MB Flash is located at address
0x0
. The instructions present in the beginning of the Flash will be
executed.
When qemu
emulates the connex board, a file has to be specified
which will be treated file as Flash memory. The Flash file format is
very simple. To get the byte from address X in the Flash, qemu
reads
the byte from offset X in the file. In fact, this is the same as the
binary file format.
To test the program, on the emulated Gumstix connex board, we first
create a 16MB file representing the Flash. We use the dd
command to
copy 16MB of zeroes from /dev/zero
to the file flash.bin
. The data
is copied in 4K blocks.
$ dd if=/dev/zero of=flash.bin bs=4096 count=4096
add.bin
file is then copied into the beginning of the Flash, using
the following command.
$ dd if=add.bin of=flash.bin bs=4096 conv=notrunc
This is the equivalent of programming the bin
file on to the Flash
memory.
After reset, the processor will start executing from address 0x0
,
and the instructions from the program will get executed. The command
to invoke qemu
is given below.
$ qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null
The -M connex
option specifies that the machine connex
is to be
emulated. The -pflash
options specifies that flash.bin
file
represents the Flash memory. The -nographic
specifies that
simulation of a graphical display is not required. The -serial
/dev/null
specifies that the serial port of the connex board is to be
connected to /dev/null
, so that the serial port data is discarded.
The system executes the instructions and after completion, keeps
looping infinitely in the stop: b stop
instruction. To view the
contents of the registers, the monitor interface of qemu
can be
used. The monitor interface is a command line interface, through which
the emulated system can be controlled and the status of the system can
be viewed. When qemu
is started with the above mentioned command,
the monitor interface is provided in the standard I/O of qemu
.
To view the contents of the registers the info registers
monitor
command can be used.
(qemu) info registers R00=00000005 R01=00000004 R02=00000009 R03=00000000 R04=00000000 R05=00000000 R06=00000000 R07=00000000 R08=00000000 R09=00000000 R10=00000000 R11=00000000 R12=00000000 R13=00000000 R14=00000000 R15=0000000c PSR=400001d3 -Z-- A svc32
Note the value in register R02
. The register contains the result of
the addition and should match with the expected value of 9.
Some useful qemu
monitor commands are listed in the following table.
Command | Purpose |
---|---|
|
List available commands |
|
Quits the emulator |
|
Physical memory dump from |
|
Reset the system. |
The xp
command deserves more explanation. The fmt
argument
specifies how the memory contents is to be displayed. The syntax of
fmt
is <count><format><size>
.
count
-
specifies no. of data items to be dumped.
size
-
specifies the size of each data item.
b
for 8 bits,h
for 16 bits,w
for 32 bits andg
for 64 bits. format
-
specifies the display format.
x
for hex,d
for signed decimal,u
for unsigned decimal,o
for octal,c
for char andi
for asm instructions.
This xp
command with the i
format, can be used to disassemble the
instructions present in memory. To disassemble the instructions
located at 0x0
, the xp
command with the fmt
specified as 4iw
can be used. The 4
specifies 4 items are to be displayed, i
specifies that the items are to be printed as instructions (yes, a
built in disassembler!), w
specifies that the items are 32 bits in
size. The output of the command is shown below.
(qemu) xp /4iw 0x0 0x00000000: mov r0, #5 ; 0x5 0x00000004: mov r1, #4 ; 0x4 0x00000008: add r2, r1, r0 0x0000000c: b 0xc
In this section, we will describe some commonly used assembler directives, using two example programs.
-
A program to sum an array
-
A program to calculate the length of a string
The following code sums an array of bytes and stores the result in
r3
.
link:code/sum.s[role=include]
The code introduces two new assembler directives — .byte
and
.align
. These assembler directives are described below.
The byte sized arguments of .byte
are assembled into consecutive
bytes in memory. There are similar directives .2byte
and .4byte
for storing 16 bit values and 32 bit values, respectively. The general
syntax is given below.
.byte exp1, exp2, ... .2byte exp1, exp2, ... .4byte exp1, exp2, ...
The arguments could be simple integer literal, represented as binary
(prefixed by 0b
or 0B
), octal (prefixed by 0
), decimal or
hexadecimal (prefixed by 0x
or 0X
). The integers could also be
represented as character constants (character surrounded by single
quotes), in which case the ASCII value of the character will be used.
The arguments could also be C expressions constructed out of literals and other symbols. Examples are shown below.
pattern: .byte 0b01010101, 0b00110011, 0b00001111 npattern: .byte npattern - pattern halpha: .byte 'A', 'B', 'C', 'D', 'E', 'F' dummy: .4byte 0xDEADBEEF nalpha: .byte 'Z' - 'A' + 1
ARM requires that the instructions be present in 32-bit aligned memory
locations. The address of the first byte, of the 4 bytes in an
instruction, should be a multiple of 4. To adhere to this, the
.align
directive can be used to insert padding bytes till the next
byte address will be a multiple of 4. This is required only when data
bytes or half words are inserted within code.
The following code calculates the length of string and stores the
length in register r1
.
link:code/strlen.s[role=include]
The code introduces two new assembler directives - .asciz
and
.equ
. The assembler directives are described below.
The .asciz
directive accepts string literals as arguments. String
literal are a sequence characters in double quotes. The string
literals are assembled into consecutive memory locations. The
assembler automatically inserts a nul
character (\0 character)
after each string.
The .ascii
directive is same as .asciz
, but the assembler does not
insert a nul
character after each string.
The assembler maintains something called a symbol table. The symbol table maps label names to addresses. Whenever the assembler encounters a label definition, the assembler makes an entry in the symbol table. And whenever the assembler encounters a label reference, it replaces the label by the corresponding address from the symbol table.
Using the assembler directive .equ
, it is also possible to manually
insert entries in the symbol table, to map names to values, which are
not necessarily addresses. Whenever the assembler encounters these
names, it replaces them by their corresponding values. These names and
label names are together called symbol names.
The general syntax of the directive is given below.
.equ name, expression
The name
is a symbol name, and has the same restrictions as that of
the label name. The expression
could be simple literal, or an
expression as explained for the .byte
directive.
Note
|
Unlike the .byte directive, the .equ directive itself does
not allocate any memory. They just create entries in the symbol
table.
|
The Flash memory, in which the previous example programs were stored, is a kind of EEPROM. It is a useful secondary storage, like a hard disk, but is not convenient to store variables in Flash. The variables should be stored in RAM, so that they can be easily modified.
The connex board has a 64 MB of RAM starting at address 0xA0000000
,
in which variables can be stored. The memory map of the connex board
can be pictured as shown in the following diagram.
Necessary setup has to be done to place the variables at this address. To understand what has to be done, the role of assembler and linker has to be understood.
While writing a multi-file program, each file is assembled individually into object files. The linker combines these object files to form the final executable.
While combining the object files together, the linker performs the following operations.
-
Symbol Resolution
-
Relocation
We will look into these operations, in detail, in this section.
In a single file program, while producing the object file, all references to labels are replaced by their corresponding addresses by the assembler. But in a multi-file program, if there are any references to labels defined in another file, the assembler marks these references as "unresolved". When these object files are passed to the linker, the linker determines the values for these references from the other object files, and patches the code with the correct values.
The sum of array example is split into two files, to demonstrate the symbol resolution performed by the linker. The two files will be assembled and their symbol tables examined to show the presence of unresolved references.
The file sum-sub.s
contains the sum
subroutine, and the file
main.s
invokes the subroutine with the required arguments. The
source of the files is shown below.
main.s
- Subroutine Invocationlink:code/main.s[role=include]
sum-sub.s
- Subroutine Definitionlink:code/sum-sub.s[role=include]
A word on the .global
directive is in order. In C, all variables
declared outside functions are visible to other files, until
explicitly stated as static
. In assembly, all labels are static
AKA local (to the file), until explicitly stated that they should be
visible to other files, using the .global
directive.
The files are assembled, and the symbol tables are dumped using the
nm
command.
$ arm-none-eabi-as -o main.o main.s $ arm-none-eabi-as -o sum-sub.o sum-sub.s $ arm-none-eabi-nm main.o 00000004 t arr 00000007 t eoa 00000008 t start 00000018 t stop U sum $ arm-none-eabi-nm sum-sub.o 00000004 t loop 00000000 T sum
For now, focus on the letter in the second column, which specifies the
symbol type. A t
indicates that the symbol is defined, in the text
section. A u
indicates that the symbol is undefined. A letter in
uppercase indicates that the symbol is .global
.
It is evident that the symbol sum
is defined in sum-sub.o
and is
not resolved yet in main.o
. When the linker is invoked the symbol
references will be resolved, and the executable will be produced.
Relocation is the process of changing addresses already assigned to labels. This will also involve patching up all label references to reflect the newly assigned address. Primarily, relocation is performed for the following two reasons:
-
Section Merging
-
Section Placement
To understand the process of relocation, an understanding of the concept of sections is essential.
Code and data have different run time requirements. For example code
can be placed in read-only memory, and data might require read-write
memory. It would be convenient, if code and data is not
interleaved. For this purpose, programs are divided into
sections. Most programs have at least two sections, .text
for code
and .data
for data. Assembler directives .text
and .data
, are
used to switch back and forth between the two sections.
It helps to imagine each section as a bucket. When the assembler hits a section directive, it puts the code/data following the directive in the selected bucket. Thus the code/data that belong to particular section appear in contiguous locations. The following figures show how the assembler re-arranges data into sections.
Now that we have an understanding of sections, let us look into the primary reasons for which relocation is performed.
When dealing with multi-file programs, the sections with the same name
(example .text
) might appear, in each file. The linker is
responsible for merging sections from the input files, into sections
of the output file. By default, the sections, with the same name, from
each file is placed contiguously and the label references are patched
to reflect the new address.
The effects of section merging can be seen by looking at the symbol
table of the object files and the corresponding executable file. The
multi-file sum of array program can be used to illustrate section
merging. The symbol table of the object files main.o
and sum-sub.o
and the symbol table of the executable file sum.elf
is shown below.
$ arm-none-eabi-nm main.o 00000004 t arr 00000007 t eoa 00000008 t start 00000018 t stop U sum $ arm-none-eabi-nm sum-sub.o 00000004 t loop (1) 00000000 T sum $ arm-none-eabi-ld -Ttext=0x0 -o sum.elf main.o sum-sub.o $ arm-none-eabi-nm sum.elf ... 00000004 t arr 00000007 t eoa 00000008 t start 00000018 t stop 00000028 t loop (1) 00000024 T sum
-
The
loop
symbol has address0x4
insum-sub.o
, and0x28
insum.elf
, since the.text
section ofsum-sub.o
is placed right after the.text
section ofmain.o
.
When a program is assembled, each section is assumed to start from address 0. And thus labels are assigned values relative to start of the section. When the final executable is created, the section is placed at some address X. And all references to the labels defined within the section, are incremented by X, so that they point to the new location.
The placement of each section at a particular location in memory and the patching of all references to the labels in the section, is done by the linker.
The effects of section placement can be seen by looking at the symbol
table of the object file and the corresponding executable file. The
single file sum of array program can be used to illustrate section
placement. To make things clearer, we will place the .text
section
at address 0x100
.
$ arm-none-eabi-as -o sum.o sum.s $ arm-none-eabi-nm -n sum.o 00000000 t entry (1) 00000004 t arr 00000007 t eoa 00000008 t start 00000014 t loop 00000024 t stop $ arm-none-eabi-ld -Ttext=0x100 -o sum.elf sum.o (2) $ arm-none-eabi-nm -n sum.elf 00000100 t entry (3) 00000104 t arr 00000107 t eoa 00000108 t start 00000114 t loop 00000124 t stop ...
-
The address for labels are assigned starting from
0
within a section. -
When the executable is created the linker is instructed to place the text section at address
0x100
. -
The address for labels in the
.text
section are re-assigned starting from0x100
, and all label references will be patched to reflect this.
The process of section merging and placement is shown in the following figure.
As mentioned in the previous section, section merging and placement is done by the linker. The programmer can control how the sections are merged, and at what locations they are placed in memory through a linker script file. A very simple linker script file, is shown below.
SECTIONS { (1) . = 0x00000000; (2) .text : { (3) abc.o (.text); def.o (.text); } (3) }
-
The
SECTIONS
command is the most important linker command, it specifies how the sections are to be merged and at what location they are to be placed. -
Within the block following the
SECTIONS
command, the.
(period) represents the location counter. The location is always initialised to0x0
. It can be modified by assigning a new value to it. Setting the value to0x0
at the beginning is superfluous. -
This part of the script specifies that, the
.text
section from the input filesabc.o
anddef.o
should go to the.text
section of the output file.
The linker script can be further simplified and generalised by using
the wild card character *
instead of individually specifying the
file names.
SECTIONS { . = 0x00000000; .text : { * (.text); } }
If the program contains both .text
and .data
sections, the .data
section merging and location can be specified as shown below.
link:code/sum-data.lds[role=include]
Here, the .text
section is located at 0x0
and .data
is located
at 0x400
. Note that, if the location counter is not assigned a
different value, the .text
and .data
sections will be located at
adjacent memory locations.
To demonstrate the use of linker scripts, we will use the linker
script shown in Multiple sections in linker scripts to control the placement of a program’s
.text
and .data
section. We will use a slightly modified version
of the sum of array program for this purpose. The code is shown below.
link:code/sum-data.s[role=include]
The only change here is that the array is now in the .data
section. Also note that the nasty branch instruction to skip over the
data is also not required, since the linker script will place the
.text
section and .data
section appropriately. As a result,
statements can be placed in the program, in any convenient way, and
the linker script will take care of placing the sections correctly in
memory.
When the program is linked, the linker script is passed as an input to the linker, as shown in the following command.
$ arm-none-eabi-as -o sum-data.o sum-data.s $ arm-none-eabi-ld -T sum-data.lds -o sum-data.elf sum-data.o
The option -T sum-data.lds
specifies that sum-data.lds
is to be
used as the linker script. Dumping the symbol table, will provide an
insight into how the sections are placed in memory.
$ arm-none-eabi-nm -n sum-data.elf 00000000 t start 0000000c t loop 0000001c t stop 00000400 d arr 00000403 d eoa
From the symbol table it is obvious that the .text
is placed
starting from address 0x0
and .data
section is placed starting
from address 0x400
.
Now that we know, how to write linker scripts, we will attempt to
write a program, and place the .data
section in RAM.
The add program is modified to load two values from RAM, add them and
store the result back to RAM. The two values and the space for result
is placed in the .data
section.
link:code/add-mem.s[role=include]
When the program is linked, the linker script shown below is used.
link:code/flash-ram.lds[role=include]
The dump of the symbol table of .elf
is shown below.
$ arm-none-eabi-nm -n add-mem.elf 00000000 t start 0000001c t stop a0000000 d val1 a0000001 d val2 a0000002 d result
The linker script seems to have solved the problem of placing the
.data
section in RAM. But wait, the solution is not complete yet!
RAM is volatile memory, and hence it is not possible to directly make the data available in RAM, on power up.
All code and data should be stored in Flash before power-up. On
power-up, a startup code is supposed to copy the data from Flash to
RAM, and then proceed with the execution of the program. So the
program’s .data
section has two addresses, a load address in Flash
and a run-time address in RAM.
Tip
|
In ld parlance, the load address is called LMA (Load Memory
Address), and the run-time address is called VMA (Virtual Memory
Address.).
|
The following two modifications have to be done, to make the program work correctly.
-
The linker script has to be modified to specify both the load address and the run-time address, for the
.data
section. -
A small piece of code should copy the
.data
section from Flash (load address) to RAM (run-time address).
The run-time address is what that should be used for determining the
address of labels. In the previous linker script, we have specified
the run-time address for the .data
section. The load address is not
explicitly specified, and defaults to the run-time address. This is
OK, with the previous examples, since the programs were executed
directly from Flash. But, if data is to be placed in RAM during
execution, the load address should correspond to Flash and the
run-time address should correspond to RAM.
A load address different from the run-time address can be specified
using the AT
keyword. The modified linker script is shown below.
SECTIONS { . = 0x00000000; .text : { * (.text); } etext = .; (1) . = 0xA0000000; .data : AT (etext) { * (.data); } (2) }
-
Symbols can be created on the fly within the
SECTIONS
command by assigning values to them. Hereetext
is assigned the value of the location counter at that position.etext
contains the address of the next free location in Flash right after all the code. This will be used later on to specify where the.data
section is to be placed in Flash. Note thatetext
itself will not be allocated any memory, it is just an entry in the symbol table. -
The
AT
keyword specifies the load address of the.data
section. An address or symbol (whose value is a valid address) could be passed as argument toAT
. Here the load address of.data
is specified as the location right after all the code in Flash.
To copy the data from Flash to RAM, the following information is required.
-
Address of data in Flash (
flash_sdata
) -
Address of data in RAM (
ram_sdata
) -
Size of the
.data
section. (data_size
)
With this information the data can be copied from Flash to RAM using the following code snippet.
ldr r0, =flash_sdata ldr r1, =ram_sdata ldr r2, =data_size copy: ldrb r4, [r0], #1 strb r4, [r1], #1 subs r2, r2, #1 bne copy
The linker script can be slightly modified to provide these information.
SECTIONS { . = 0x00000000; .text : { * (.text); } flash_sdata = .; (1) . = 0xA0000000; ram_sdata = .; (2) .data : AT (flash_sdata) { * (.data); }; ram_edata = .; (3) data_size = ram_edata - ram_sdata; (3) }
-
Start of data in Flash is right after all the code in Flash.
-
Start of data in RAM is at the base address of RAM.
-
Obtaining the size of data is not straight forward. The data size is calculated from the difference in the start of data in RAM and the end of data in RAM. Yes, simple expressions are allowed within the linker script.
The add program with data copied to RAM from Flash is listed below.
link:code/add-ram.s[role=include]
The program is assembled and linked using the linker script listed in Linker Script with Section Copy Symbols. The program is executed and tested within Qemu.
qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null (qemu) xp /4dw 0xA0000000 a0000000: 10 30 40 0
Note
|
In a real system with an SDRAM, the memory should not be accessed right-away. The memory controller will have to be initialised before performing a memory access. Our code works because the simulated memory does not require the memory controller to be initialised. |
The examples given so far have a major bug. The first 8 words in the memory map are reserved for the exception vectors. When an exception occurs the control is transferred to one these 8 locations. The exceptions and their exception vector addresses are show in the following table.
Exception | Address |
---|---|
Reset |
0x00 |
Undefined Instruction |
0x04 |
Software Interrupt (SWI) |
0x08 |
Prefetch Abort |
0x0C |
Data Abort |
0x10 |
Reserved, not used |
0x14 |
IRQ |
0x18 |
FIQ |
0x1C |
These locations are supposed to contain a branch that will transfer control the appropriate exception handler. In the examples we have seen so far, we haven’t inserted branch instructions at the exception vector addresses. We got away without issues since these exceptions did not occur. All the above programs can be fixed, by linking them with the following assembly code.
.section "vectors" reset: b start undef: b undef swi: b swi pabt: b pabt dabt: b dabt nop irq: b irq fiq: b fiq
Only the reset exception is vectored to a different address
start
. All other exceptions are vectored to the same address. So if
any exception other that reset occurs, the processor will be spinning
in the same location. The exception can then be identified by looking
at the value of pc
through a debugger (the monitor interface in our
case).
To ensure that these instruction are placed at the exception vector addresses, the linker script should look something like below.
SECTIONS { . = 0x00000000; .text : { * (vectors); * (.text); ... } ... }
Notice how the vectors
section is placed before all other code,
ensuring that the vectors
is located at address starting from 0x0.
It is not possible to directly execute C code, when the processor comes out of reset. Since, unlike assembly language, C programs need some basic pre-requisites to be satisfied. This section will describe the pre-requisites and how to meet the pre-requisites.
We will take the example of C program that calculates the sum of an array as an example. And by the end of this section, we will be able to perform the necessary setup, transfer control to the C code and execute it.
link:code/csum.c[role=include]
Before transferring control to C code, the following have to be setup correctly.
-
Stack
-
Global variables
-
Initialized
-
Uninitialized
-
-
Read-only data
C uses the stack for storing local (auto) variables, passing function arguments, storing return address, etc. So it is essential that the stack be setup correctly, before transferring control to C code.
Stacks are highly flexible in the ARM architecture, since the implementation is completely left to the software. For people not familiar with the ARM architecture a overview is provided in Appendix D: ARM Stacks.
To make sure that code generated by different compilers is
interoperable, ARM has created the
ARM
Architecture Procedure Call Standard (AAPCS). The register to be used
as the stack pointer and the direction in which the stack grows is all
dictated by the AAPCS. According to the AAPCS, register r13
is to
be used as the stack pointer. Also the stack should be
full-descending.
One way of placing global variables and the stack is shown in the following diagram.
So all that has to be done in the startup code is to point r13
at
the highest RAM address, so that the stack can grow downwards (towards
lower addresses). For the connex
board this can be acheived using
the following ARM instruction.
ldr sp, =0xA4000000
Note that the the assembler provides an alias sp
for the r13
register.
Note
|
The address 0xA4000000 itself does not correspond to RAM. The
RAM ends at 0xA3FFFFFF . But that is OK, since the stack is
full-descending, during the first push the stack pointer will be
decremented first and the value will be stored.
|
When C code is compiled, the compiler places initialized global
variables in the .data
section. So just as with the assembly, the
.data
has to be copied from Flash to RAM.
The C language guarantees that all uninitialized global variables will
be initialized to zero. When C programs are compiled, a separate
section called .bss
is used for uninitialized variables. Since the
value of these variables are all zeroes to start with, they do not
have to be stored in Flash. Before transferring control to C code, the
memory locations corresponding to these variables have to be
initialized to zero.
GCC places global variables marked as const
in a separate section,
called .rodata
. The .rodata
is also used for storing string
constants.
Since contents of .rodata
section will not be modified, they can be
placed in Flash. The linker script has to modified to accomodate
this.
Now that we know the pre-requisites we can create the linker script and the startup code. The linker script Linker Script with Section Copy Symbols is modified to accomodate the following.
-
.bss
section placement -
vectors
section placement -
.rodata
section placement
The .bss
is placed right after .data
section in RAM. Symbols to
locate the start of .bss
and end of .bss
are also created in the
linker script. The .rodata
is placed right after .text
section in
Flash. The following diagram shows the placement of the various
sections.
link:code/csum.lds[role=include]
The startup code has the following parts
-
exception vectors
-
code to copy the
.data
from Flash to RAM -
code to zero out the
.bss
-
code to setup the stack pointer
-
branch to main
link:code/startup.s[role=include]
To compile the code, it is not necessary to invoke the assembler,
compiler and linker individually. gcc
is intelligent enough to do
that for us.
As promised before, we will compile and execute the C code shown in Sum of Array in C.
$ arm-none-eabi-gcc -nostdlib -o csum.elf -T csum.lds csum.c startup.s
The -nostdlib
option is used to specify that the standard C library
should not be linked in. A little extra care has to be taken when the
C library is linked in. This is discussed in Using the C Library.
A dump of the symbol table will give a better picture of how things have been placed in memory.
$ arm-none-eabi-nm -n csum.elf 00000000 t reset (1) 00000004 A bss_size 00000004 t undef 00000008 t swi 0000000c t pabt 00000010 t dabt 00000018 A data_size 00000018 t irq 0000001c t fiq 00000020 T main 00000090 t start (2) 000000a0 t copy 000000b0 t init_bss 000000c4 t zero 000000d0 t init_stack 000000d8 t stop 000000f4 r n (3) 000000f8 A flash_sdata a0000000 d arr (4) a0000000 A ram_sdata a0000018 A ram_edata a0000018 A sbss a0000018 b sum (5) a000001c A ebss
-
reset
and the rest of the exception vectors are placed starting from0x0
. -
The assembly code is placed right after the 8 exception vectors (
8 * 4 = 32 = 0x20
). -
The read-only data
n
, is placed in Flash after the code. -
The initialized data
arr
, an array of 6 integers, is placed at the start of RAM0xA0000000
. -
The uninitialized data
sum
is placed after the array of 6 integers. (6 * 4 = 24 = 0x18
)
To execute the program, convert the program to .bin
format, execute
in Qemu, and dump the sum
variable located at 0xA0000018
.
$ arm-none-eabi-objcopy -O binary csum.elf csum.bin $ dd if=csum.bin of=flash.bin bs=4096 conv=notrunc $ qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null (qemu) xp /6dw 0xa0000000 a0000000: 1 10 4 5 a0000010: 6 7 (qemu) xp /1dw 0xa0000018 a0000018: 33
Like every other open source project, we gladly accept contributions. Sections that need help have been marked with FIXMEs. All contributions will be duly credited in the credits page.
This document’s source is maintained in a public git repo located at https://github.com/bravegnu/gnu-eprog To contribute to the project, fork the project on github and send in a pull request.
The document is written in asciidoc, and converted to HTML using the docbook-xsl stylesheets.
-
The original tutorial was written by Vijay Kumar B., <[email protected]>
-
Jim Huang, Jesus Vicenti, Goodwealth Chu, Jeffrey Antony, Jonathan Grant, David LeBlanc, reported typos and suggested fixes in the code and text.
-
Dmitry Ponyatov added some info on build automation using make tool.
The following great free software tools were used for the construction of the tutorial.
-
asciidoc for lightweight markup
-
xsltproc for HTML transformation
-
docbook-xsl for the stylesheets
-
highlight.js for syntax highlighting
-
dia for diagram creation
-
GoSquared Arrow Icons for the navigation icons
-
mercurial for version control
-
emacs …
"Embedded Programming with the GNU Toolchain" is Copyright © 2009, 2010, 2011 Vijay Kumar B. <[email protected]>
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
If you are bothered with the long commands you have to enter every
time you compile samples in this tutorial, you can use the make
tool. make
is utility which executes commands to build targets, as
described in a Makefile
.
Writing makefiles is must have skill for programmers, especially large multi-file projects contains dozens of files, which must be assembled, compiled and translated into lot of different formats.
Every dependency between two or more files must be configured into
rule
. A rule has the following basic syntax.
<target> : <source> <tab><compiling command1> <tab><compiling command2> ...
target
-
is one or more filenames, delimited by spaces. This file(s) will be created or updated by this make rule.
source
-
zero or more filenames delimited by spaces. This file(s) will be checked for modifications by last edit time marker.
tab
-
is the TAB character with ASCII code
0x09
. You must have an editor, capable of working with tabs, not overwriting them with sequences of spaces. Make rules can have commands that are indented by a single TAB character. compiling command
-
any command, like assembler command or linker command, which creates / updates `target`s.
Note
|
The main principle of every Makefile rule: if one of source
files is newer than one of target file, rule body will be executed
to update target .
|
Let’s write a simple Makefile
for a tiny program, described in
Hello ARM
section:
emulation: add.flash qemu-system-arm -M connex -pflash add.flash \ -nographic -serial /dev/null flash.bin: add.bin dd if=/dev/zero of=flash.bin bs=4K count=4K dd if=add.bin of=flash.bin bs=4K conv=notrunc add.bin: add.elf arm-none-eabi-objcopy -O binary add.elf add.bin add.elf: add.o arm-none-eabi-ld -o add.elf add.o add.o: add.s arm-none-eabi-as -o add.o add.s
Note, that the backslash and next left-tabbed line in the first rule: You can split a long command into multiple lines; each line must be indented with one or more tab to follow make rule syntax.
Enter make
command without any arguments in the directory with the
Makefile
and source files, and you will get the binary files
automatically compiled and executed in Qemu.
$ make ... QEMU 2.1.2 monitor - type 'help' for more information (qemu)
Note
|
If you use make without parameters, first rule in the
Makefile will be processed as main target , walking over dependencies
in all rules.
|
If you need to update only a specific target, and not first rule,
specify the required target after the make
command:
$ make add.o make: 'add.o' is up to date.
This command will build only add.o object file, if and only if you
modified add.s before command run. If you see some message like make:
'add.o' is up to date.
, the source file was not changed, and make
will not execute the commands in the rule.
This is very useful if you have lot of source files (thousands of
files, like Linux kernel for example), and fix some bug only in one
source file. Without make (using simple .rc shell script or .bat file)
every tiny change in source file will require a recompilation of all
files in the project, which can lasts for several hours. Using make
you can do only few required compiler and linker calls, which will be
much much faster.
Returning to our add.o
, you can force the assembler to run without
changing add.s
source file, by `touch`ing them:
$ touch add.s $ make add.o arm-none-eabi-as -o add.o add.s
touch
command changes only the modification time of the source file
add.s
, not changing its content, thus make
will think that the
file has changed, and will run assembler for selected target add.o
.
By default, make will print every command and its output. If you have some reason to avoid logging a command, you can prefix the command in make rule with "-" minus sign.
Variables are storage locations, which can contain some text. For
example, we can define some variables in a Makefile
, and use them in
all rules. Variables are defined using the following syntax.
myvar = Hello World
Here myvar
is the variable name, and "Hello World" is the text
stored in myvar
. The value of a variable can be retrieved, using the
notation $(myvar)
.
Very useful make tip: you can use two special variables: $@
and
$<
.
$@
-
variable represents left side of Makefile rule, typically single
target
file name $<
-
variable represents the first file in the
source
list
Note
|
While running make , you can force redefinition of any variable
value. change compiler options, tune rules, and ever use one universal
Makefile for all of your projects.
|
We change our Makefile
to use variables, so that it is easy to
adapt, and also reduces repetition.
# APPlication name, you can change it in make command line parameters # to compile another one file program with same Makefile APP = add # some std.variables widely used in Linux source builds: ## architecture ARCH = arm ## target system triplet TARGET = $(ARCH)-none-eabi- ## std. toolchain commands AS = $(TARGET)as LD = $(TARGET)ld CC = $(TARGET)gcc CXX = $(TARGET)g++ OBJDUMP = $(TARGET)objdump OBJCOPY = $(TARGET)objcopy # FlashROM size of target system, in 4K block FLASHBLOCKS = 4K emu: $(APP).flash qemu-system-$(ARCH) -M connex -pflash $< \ -nographic -serial /dev/null $(APP).flash: $(APP).bin dd if=/dev/zero of=$@ bs=4K count=$(FLASHBLOCKS) dd if=$< of=$@ bs=4K conv=notrunc $(APP).bin: $(APP).elf $(OBJCOPY) -O binary $< $@ $(APP).elf: $(APP).o $(LD) -o $@ $< $(APP).o: $(APP).s $(AS) -o $@ $<
First of all, we replaced all file names with $(APP).<extension>.
Note
|
Note the use of variables in $(...) brackets, and specials
$@ and $<
|
Then, we replaced all file parameters in assembler/linker calls with
$@
and $<
, which makes our rules more rigid.
Also we used some standard variables to select target system architecture, and prefixed all toolchain calls with $(TARGET) variable.
Now we can compile another source file: arrsum.s
, by invoking
make
as shown below. We override the APP
variable value in the
command line.:
$ make APP=arrsum arm-none-eabi-as -o arrsum.o arrsum.s arm-none-eabi-ld -o arrsum.elf arrsum.o arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000 arm-none-eabi-objcopy -O binary arrsum.elf arrsum.bin dd if=/dev/zero of=arrsum.flash bs=4K count=4K 4096+0 records in 4096+0 records out 16777216 bytes (17 MB) copied, 0.0184385 s, 910 MB/s dd if=arrsum.bin of=arrsum.flash bs=4K conv=notrunc 0+1 records in 0+1 records out 48 bytes (48 B) copied, 5.9686e-05 s, 804 kB/s qemu-system-arm -M connex -pflash arrsum.flash \ -nographic -serial /dev/null QEMU 2.1.2 monitor - type 'help' for more information (qemu) q
We just compiled another program without any change to the Makefile
,
all file names was renamed automatically, we got a separate flash
image file running in Qemu, in one command.
Some variables are widely used in Makefiles of open source programs. These Makefiles have many tricks, not covered in this tutorial.
ARCH
-
target system architecture: arm, mips, i386, x86_64, avr, etc.
TARGET
-
target system triplet:
arm-none-eabi
,i486-none-elf
,i686-linux-uclibc
, selects the cross-compiler, and greatly impacts the generated binary code: CPU command set, memory layout, etc. BUILD
-
triplet for your developer computer, something like
x86_64-linux-gnu
. This variable can be used if you build some helper programs, which must run on your workstation, for example to autogenerate some code for theTARGET
. AS
-
assembler command name,
as
for host system, and something likesuper-puper-elf-as
for cross-compiler toolchain. LD
-
linker command name.
CC
-
pure C compiler command name.
CXX
-
C++ compiler command name.
NM
-
nm
command name. OBJCOPY
-
objcopy
command name. ASFLAGS
-
assembler flags.
LDFLAGS
-
linker flags.
CFLAGS
-
flags for C compiler.
CXXFLAGS
-
flags for C++ compiler.
Open source developers have some widely used standard targets:
all
-
build all targets of the project, it must be first rule in Makefile.
doc
-
build documentation, using asciidoc, DocBook, LaTeX, etc. These are markup translators, which will translate documentation from sources into some widely used document format like
.html
and.pdf
files. clean
-
remove all intermediate files (
.o
,.elf
,.bin
) and program binary. Typical use ofclean
is command which you must use for total program rebuild from scratch:make clean make
distclean
-
in addition to
clean
, removes backup files, temporary files, that were not generated bymake
itself. This is generally used to prepare the source tree for distribution.
FLASH = add.flash arrsum.flash strlen.flash .PHONY: all all: $(FLASH) .PHONY: clean clean: rm -rf *.o *.elf *.bin *.flash .PHONY: distclean distclean: clean rm -rf doc/manual.pdf
In previous Makefile
sample, you can see .PHONY
directives. This
directives marks target
as one that is not really the name of a
file; rather is just the name for a recipe to be executed when the
rule is invoked.
If you write a rule whose recipe goal will not create the target
file, mark it with .PHONY
, and the recipe will be executed every
time the target comes up for "re-making". In our example clean
is a
.PHONY
targets, because the rm
command in clean
rule does not
create a file named "clean", probably no such file will ever
exist. Therefore, the rm
command will be executed every time you say
make clean
.
Note
|
You can think of phony targets in Makefile as a command line
menu, you select the required action to run using make action .
|
If you define special emu
rule:
.PHONY: emu emu: $(APP).flash qemu-system-$(ARCH) -M connex -pflash $< \ -nographic -serial /dev/null
You can set APP
from the current directory name, as shown below.
APP = $(notdir $(CURDIR))
$(CURDIR)
-
gives you current directory, where you run make
$(notdir <path>)
-
notdir
is a built-inmake
function, that removes leading components of path.
Thus you can automagically set APP
name to the current directory
name, for example you made a new project. This allows you to use one
universal Makefile
, ~/universal.makefile
, for all of your
projects.
$ cd ~ $ mkdir superjob && cd superjob $ git init ... write some code ... $ make -f ~/universal.makefile
Note
|
-f make option lets you set a file other then the Makefile .
|
Finally, let’s define universal pattern rules, which will assemble, compile and link any file with specific extensions.
%.o: %.s $(AS) $(ASFLAGS) -o $@ $< %.o: %.c $(CC) $(CCFLAGS) -o $@ $< %.o: %.cpp $(CXX) $(CXXFLAGS) -o $@ $< %.elf: %.o $(LD) $(LDFLAGS) -o $@ $< %.bin: %.elf $(OBJCOPY) -O binary $< $@ %.flash: %.bin dd if=/dev/zero of=$@ bs=4K count=$(FLASHBLOCKS) dd if=$< of=$@ bs=4K conv=notrunc
With this in place, you don’t need to define rules for every file like:
file1.o : file1.s $(AS) -o $@ $< file2.o : file2.s $(AS) -o $@ $< ... 100 rules ... file100.o : file100.s $(AS) -o $@ $<
Just define one rule, and pattern rules set will be automagically build your huge project.
.PHONY: all all: file1.flash file2.flash ... file10050.flash
A simplified ARM programmer’s model is provided in this section.
In the ARM processor, 16 general purpose registers are available at
any time. Each register is 32-bit in size. The registers are referred
to as rn
, where n represents the register index. All instructions
treat registers r0
to r13
equally. Any operation that can be
performed on r0
can be performed equally well on registers r1
to
r13
. But r14
and r15
are assigned special functions by the
processor. r15
is the program counter, and contains the address of
the next instruction to be fetched. r14
is the link register, and
used to store the return address, when a subroutine is invoked.
Tip
|
Though register r13 has no special function assigned to by the
processor, conventionally operating systems use it as the stack
pointer, and thus points to the top of the stack.
|
The Current Program Status Register (cpsr
) is a dedicated 32-bit
register, that contains the following fields.
-
Condition Flags
-
Interrupt Masks
-
Processor Mode
-
Processor State
Only the condition flags field will be used in the examples provided in this tutorial. And hence only the condition flags will be elaborated here.
The condition flags indicates the various conditions that occur while performing arithmetic and logical operations. The various condition flags and their meaning are given in the following table.
Flag | Meaning |
---|---|
Carry |
Operation caused a carry. |
Overflow |
Operation caused an overflow. |
Zero |
Operation resulted in 0. |
Negative |
Operation resulted in a negative value. |
The ARM processor has a powerful instruction set. But only a subset required to understand the examples in this tutorial will be discussed here.
The ARM has a load store architecture, meaning that all arithmetic and logical instructions take only register operands. They cannot directly operate on operands to memory. Separate instruction load and store instructions are used for moving data between registers and memory.
In this section, the following class of instructions will be elaborated
-
Data Processing Instructions
-
Branch Instructions
-
Load Store Instructions
The most common data processing instructions are listed in the following table.
Instruction | Operation | Example |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
By default data processing instructions do not update the condition
flags. Instructions will update condition flags if it is suffixed with
an S
. For example, the following instruction adds two registers and
updates the condition flags.
adds r0, r1, r2
One exception to this rule is the cmp
instruction. Since the only
purpose of the cmp
instruction is to set condition flags, it does
not require the s
suffix, for setting flags.
The branch instructions cause the processor to execute instructions
from a different address. Two branch instruction are available - b
and bl
. The bl
instruction in addition to branching, also stores
the return address in the lr
register, and hence can be used for
sub-routine invocation. The instruction syntax is given below.
b label ; pc = label bl label ; pc = label, lr = addr of next instruction
To return from the subroutine, the mov
instruction can be used as
shown below.
mov pc, lr
Most other instruction sets allow conditional execution of branch instructions, based on the state of the condition flags. In ARM, almost all instructions have can be conditionally executed.
If corresponding condition is true, the instruction is executed. If
the condition is false, the instruction is turned into a nop
. The
condition is specified by suffixing the instruction with a condition
code mnemonic.
Mnemonic | Condition |
---|---|
EQ |
Equal |
NE |
Not Equal |
CS |
Carry Set |
CC |
Carry Clear |
VC |
Overflow Clear |
VS |
Overflow Set |
PL |
Positive |
MI |
Minus |
HI |
Higher Than |
HS |
Higher or Same |
LO |
Lower Than |
LS |
Lower or Same |
GT |
Greater Than |
GE |
Greater Than or Equal |
LT |
Less Than |
LE |
Less Than or Equal |
In the following example, the instruction moves r1
to r0
only if
carry is set.
MOVCS r0, r1
The load store instruction can be used to move single data item between register and memory. The instruction syntax is given below.
ldr rd, addressing ; rd = mem32[addr] str rd, addressing ; mem32[addr] = rd ldrb rd, addressing ; rd = mem8[addr] strb rd, addressing ; mem8[addr] = rd
The addressing
is formed from two parts
-
base register
-
offset
The base register can be any general purpose register. The offset and base register can interact in 3 different ways.
- Offset
-
The offset is added or subtracted from the base register to form the address.
ldr
Syntax:ldr rd, [rm, offset]
- Pre-indexed
-
The offset is added or subtracted from the base register to form the address, and the address is written back to the base register.
ldr
Syntaxldr rd, [rm, offset]!
- Post-indexed
-
The base register contains the address to be accessed, and the offset is added or subtracted from the address and stored in the base register.
ldr
Syntaxldr rd, [rm], offset
The offset can be in the following formats
- Immediate
-
Offset is an unsigned number, that can be added or subtracted from the base register. Useful for accessing structure members, local variables in the stack. Immediate values start with a
#
. - Register
-
Offset is an unsigned value in a general purpose register, that can be a added or subtracted from the base register. Useful for accessing array elements.
Some examples of load store instructions are given below.
ldr r1, [r0] ; same as ldr r1, [r0, #0], r1 = mem32[r0] ldr r8, [r3, #4] ; r8 = mem32[r3 + 4] ldr r12, [r13, #-4] ; r12 = mem32[r13 - 4] strb r10, [r7, -r4] ; mem8[r7 - r4] = r10 strb r7, [r6, #-1]! ; mem8[r6 - 1] = r7, r6 = r6 - 1 str r2, [r5], #8 ; mem32[r5] = r2, r5 = r5 + 8
Stacks are highly flexible in the ARM architecture, since the implementation is completely left to the software.
The ARM instruction set does not contain any stack specific
instructions like push
and pop
. The instruction set also does not
enforce in anyway the use of a stack. Push and pop operations are
performed by memory access instructions, with auto-increment
addressing modes.
The stack pointer is a register that points to the top of the stack. In the ARM processor, there are no dedicated stack pointer registers, and any one of the general purpose registers can be used as the stack pointer.
Since it is left to the software to implement a stack, different implemenation choices result different types of stacks. There are two types of stack depending on how the stack grows.
- Ascending stack
-
In a push the stack pointer is incremented, i.e the stack grows towards higher address.
- Descending stack
-
In a push the stack pointer is decremented, i.e the stack grows towards lower address.
There are two types of stack depending on what the stack pointer points to.
- Empty stack
-
Stack pointer points to the location in which the next item will be stored. A push will store the value, and increment the stack pointer.
- Full stack
-
Stack pointer points to the location in which the last item was stored. A push will increment the stack pointer and store the value.
Four different stacks are possible - full-ascending, full-descending, empty-ascending, empty-descending. All 4 can be implemented using the register load store instructions.