- In embedded systems, the C programming language is most often the language of choice. For more intensive
- elements in the system, assembly can be used. Embedded C is distinct from typical C programming in its
- requirements for efficiency, its limited resources, and its unique hardware problems which are much less common in
- the majority of C programs. Even still, the language itself is the same, so check out K&R's The C Programming
- Language and other reference books.
- Some of the problems central to embedded systems programming:
-
- Memory management
-
- Register access and manipulation
-
- Peripheral access
-
- Communication over serial and USB lines
-
- Handling of interrupts
-
- Sharing of data with programs and ISRs, typically requiring multithreading-like techniques like semaphores
-
- Managing compilation to the assembly level
-
- Cross-platform compilation
-
- Startup scripts
-
- Linker scripts
-
- Avoiding compiler optimization pitfalls
- And these apply only to 'bare-metal' programs - programs utilizing real-time OSs have other common issues to
- consider. */
/***** BOOK RECOMMENDATIONS ******/ // A first reference for all the topics talked about here is: // * Valano, "Embedded Microcomputer Systems - Real Time Interfacing" 3rd ed. (2011) // This should be the go-to reference for microcontroller programming and interfacing. It's one of the best books // I've found for these topics. // * Deitel, "Operating Systems" // Embedded systems just about -all- use operating-system principles for scheduling tasks, managing memory, // handling mutual exclusion, creating unified hardware interfaces, and so on. This is a great book to have. // Another good reference: // * Douglass, "Design Patterns for Embedded Systems in C" (2011) // Featuring Group-Of-Four style design patterns applied to microcontroller systems, this is a neat book. // Next suggestion: // * White, "Making Embedded Systems" // Good book for similar topics to Valano, and other essential topics like fixed-point arithmetic and bootloaders etc // Some more: // * Warren, "Hacker's Delight" // Basically -the- handbook of calculation and computing close to the metal, everything from bitwise operations // to optimized algorithms, a giant collection of low-level tricks and tips, a must-have. // * Clements, "Microprocessor Systems Design: 68000 Hardware, Software and Interfacing // A book for the M68k architecture, which itself is interesting (if dated), which introduces a lot of concepts // and is one I reach for to get a slightly different view of various topics like memory management.
/********* PROJECTS OF NOTE *********/ // It's hard to find good projects using a microprocessor with open source firmware to demonstrate these principles. // There are tons of projects out there, but finding high quality ones which aren't one-off tutorials or examples // is tricky. Here's what I've found. /
- Dangerous Prototypes (http://dangerousprototypes.com/) A few different open source hardware projects, need more research
- HackRF (https://github.com/mossmann/hackrf/wiki) HackRF One uses an LPC43xx ARM microcontroller -> Uses OpenCM3 (https://github.com/libopencm3/libopencm3), an alternative microcontroller library
- https://github.com/blacksphere/blackmagic Black Magic Probe (BMP), JTAG interface and debugger
- -> Also uses OpenCM3
- (https://hforsten.com/cheap-homemade-30-mhz-6-ghz-vector-network-analyzer.html) This is a VNA (!), uses HackRF firmware
- OpenMV (https://github.com/openmv/openmv) Machine vision, supports STM32 devices
- Klipper (https://github.com/KevinOConnor/klipper) 3D printer firmware, compatible across many devices
- Marlin (https://github.com/MarlinFirmware/Marlin) Another 3D printer firmware project, broad support
- NumWorks Epsilon (https://github.com/numworks/epsilon) NumWorks is an open source graphing calculator with amazing
-
documentation on everything from hardware to mechanical design to software. Runs bare metal ARM. See also the
-
website, here: https://www.numworks.com/resources/engineering/software/
*/
/********* COMPILERS AND PREPROCESSOR STATEMENTS *********/ // Know your compiler. Assuming GCC, we can look at common preprocessor statements. These can really throw you // for a loop when you see them in code unexpectedly, especially in manufacturer code, hardware drivers, and the // CMSIS library. But don't worry, they're only here to help.
// Some common reasons for needing preprocessor statements: // - Defining macros and constants // - Adding extra warnings // - Forcing the compiler to optimize or not optimize a section // - Forcing the compiler to make code inline // - Forcing the compiler to set a variable to a particular location in memory // - Inserting assembly directly into a C file // Many statements seen frequently, such as #pragma once, are specific to a compiler (in that case, visual // studio's C compiler). These may be shorthands for common statements, such as include guards. Others can be // used to pass argument-like commands to the compiler. Some common and useful preprocessor statements follow. // Note that when a compiler-specific feature is used, it's common practice to alias it. For example, the asm // keyword, a C extension available to GCC, might not be available in other compilers. To increase portability, // then, it's good practice to have something like this: #ifdef GNUC #define asm asm #endif #ifdef __MSC_VER #define asm #endif // etc
// Note how GNUC is assumed to be defined whenever we use gcc. There are loads of these that can be used to // check system and compiler information. For example, // Architecture: // i386 // x86_64 // arm (and ARM_ARCH_5T or ARM_ARCH_7A) // powerpc64 // aarch64 // Compiler: // _MSC_VER // GNUC // clang // MINGW32 // MINGW64
// Common preprocessor statements:
// 1. attribute // This statement covers a whole lot of different options for "attributes" of functions, variables, types, // labels, enums, and statements. The syntax is: // attribute (()) // Note the double parentheses. The attribute list is a comma-separated sequence of attributes, which can be // empty, an attribute name, an attribute name followed by a parenthesized list of parameters for the attribute. // You're also allowed to put double underscores around an attribute name to avoid conflicts. // // Let's look at a few attributes to get an idea of how they work. Attributes for variables, functions, etc can // be organized into 'common' attributes and architecture-specific attributes. As an example, take the // warn_if_not_aligned variable attribute. Given a struct, we may want to ensure data alignment to some number of // byte boundaries. The attribute section goes after the variable definition in this case. struct reg_struct { uint32_t section1; uint32_t section2 attribute(( warn_if_not_aligned(16) )); // Issue a warning if this section is not // aligned to a 16 byte boundary }; // Variable attributes include: // aligned // aligned (alignment) // warn_if_not_aligned(alignment) // alloc_size(position) // alloc_size(position1,position2) // cleanup (cleanup_function) // section ("section name") Set the section (e.g. .bss, .data, specialized sections) // Function attributes include: // aligned // aligned (alignment) // always_inline // constructor // constructor (priority) // destructor // destructor (priority) // noreturn // section ("section name") // ARM-specific function attributes: // general-regs-only Indicate no specialized registers (floating-point or Advanced SIMD) should be used // interrupt Indicate function is an interrupt handler. Can take a string argument "IRQ", "FIQ", "SWI", // "ABORT", or "UNDEF" to specify the type of interrupt handler. The type is ignored for // ARMv7-M. // isr Alias for interrupt // target Specify instruction set, architecture, floating-point unit for function // For example: void f () attribute ((interrupt));
// 2. asm (and asm) // Assembly can be inserted inline with the following syntax: // asm [qualifiers] ( "assembly instructions" ) // You can also use extended asm, which lets you read and write C variables and to jump to C labels, when you're // within a C function. You cannot use extended asm at file scope. // // The qualifiers are either volatile or inline. However, volatile has no effect, as all asm blocks are volatile. // The assembly instructions section is a literal string. For multi-line assembly, you can insert a newline (and // preferaby a tab as well), '\n\t', then continue. // Example: asm("MOV r3, 0x80\n"); // If you're inside a C function, you can use extended asm. It has a different format: // asm [qualifiers] ( "template" // : // [: ] // [: ] ) // Or: // asm [qualifiers] ( "template" // : // : // : // : ) // // In extended asm, qualifiers available are volatile, inline, and goto. In this case, volatile is -not- assumed // by default, so if the statements can change variable values, you should use volatile. The goto qualifier // indicates that the statement may jump to one of the labels listed in . // // Extended asm is more like a printf statement for assembly. The best source for this stuff is the gcc docs: // https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html // So there.
// 3. register // The 'register' keyword is used to indicate that a particular variable is going to be used very often for a // short time, so it should be kept in a register as opposed to in a stack frame.
/********* TYPES **********/ // We'll assume a 32-bit word size for the target platform.
// Endianness refers to the stored order of bits or bytes for data. If the type is big-endian, that means the
// big-end (MSB) comes first (bit/byte 0) while the little-end (LSB) is last (last bit/byte). Little-endian is
// the opposite, the little end (LSB) comes first.
//
// For example, in 8-bit binary, the number 122d is represented 0111 1010b. In little-endian binary, this is
// stored as:
// 7 6 5 4 3 2 1 0 BITS
// 0 1 1 1 1 0 1 0 VALUES
// While in big-endian binary, it would be flipped:
// 7 6 5 4 3 2 1 0 BITS
// 0 1 0 1 1 1 1 0 VALUES
//
// It's more common for endianness to refer to bytes. In this case, the endianness refers to the
// byte-order. A byte can be stored in hex with two digits, e.g. FF or 32 or 1A. If we have the uint_16t type and
// we want to store the hex 0x1B0E, in little-endian we would store it as:
// 1 0 BYTES
// 1B 0E VALUES
// Where the 'little end' is stored in the lowest location in memory. In big-endian, we would have:
// 1 0 BYTES
// 0E 1B VALUES
// Where the 'big end' is stored in the lowest location in memory.
//
// For whatever number of bytes is considered for one piece of stored data (2 bytes, 4 bytes, 8 bytes) the
// bytes obey the ordering: e.g. a 32-bit type stored little endian with data 0xAB CD EF 01 is stored in memory
// as 01 EF CD AB.
//
// Often in computing memory addresses are written from left-to-right with the lowest address value on the left.
// In this convention, 0xABCD would be written in little-endian as CD AB (little-end first) and in big-endian as
// AB CD (big-end first). In hardware, the convention tends to be flipped for writing bits (e.g. in a register).
// In communication, the convention is that reading from left-to-right has the left side being the first
// received/first transmitted. So sending the data 12 34 means sending the byte 12 first, and the byte 34 last.
// If we have hex 0x1234, big-endian data transmission would be 12 34, while little-endian would be 34 12.
// For int types, prefer types with explicit sizes ("fixed-width") // Unsigned ints uint8_t var1; uint16_t var2; uint32_t var3;
// Signed ints int8_t var5; int16_t var6; int32_t var7;
// For floating-point arithmetic we have float and double. However, many embedded systems don't support // floating-point arithmetic in hardware, so the compiler can be told to use software-based floating-point // (soft-fp). The software implementation can take up a fair amount of memory, so hardware FP is much better. float var9; // 1 word (32-bit), called "single precision" double var10; // 2 words (64-bit), called "double precision"
/* Single- and double-precision floating point formats are standardized in IEEE 754. A float ('binary32' in the
- standard) is defined as 1 sign bit, 8 exponent bits, and 23 significand bits/fraction bits. The
- float fraction, or significand, is the non-zero numerical part, which always appears after the decimal. The
- expnent is the 10^x exponent. Thus all floats are stored as numbers between 0 and 1, with an exponent. For
- example, the number 10.5 is + 0.105 x 10^2. The sign bit is the MSB (bit 31), the exponent is the next 8 bits
- (bits 30 to 23 inclusive), and the fraction is the 23 LSB bits (bits 22 to 0).
- In double-precision types ('binary64', or 'double'), the sign is 1 bit, the exponent 11 bits, and the fraction
- 52 bits
*/
/* EXAMPLE: Converting between floats and ints */ uint8_t num_uint = 128; // Stored as 1000 0000 int8_t num_signed = (int8_t)num_uint; // Now we have the special value -128 (1000 0000) which is // the "lowest negative number" whos 2's comp is itself
float num_f = (float)num_uint; // Now stored form changes a fair bit // Let's investigate by converting the float to bytes using a number of different techniques. // Version 1: union union{ float f; uint8_t bytes[4]; } u; u.f = num_f; // Now bytes is (from byte 0 to byte 3): 00 00 00 43
// Version 2: cast to byte array uint8_t num_fbytes[4] = (uint8_t*) &num_f; // Bytes is same as with union
// Both of the above depend on the endian-ness of the system. We can remove this dependency with the following.
// Version 3: cast to int and use bit-shifting and bitwise AND uint32_t float_as_int = (uint32_t) &num_f; uint8_t num_fbytes2[4]; for(int i = 0; i < 4; i++) num_fbytes2[i] = (float_as_int >> 8*i) & 0xFF;
// This takes the value byte-by-byte starting with the LSB and shifting to the MSB, it doesn't // depend on endian-ness. Output is still (byte 0 to byte 3): 00 00 00 43
// Now let's try interpreting the result. If our system were big-endian, then 00 00 00 43 would // indicate a positive sign, exponent of 0 (2^0) and a fractional part of 67. This doesn't make // much sense, because we expect 128. If it's little-endian, then the 43 (=0100 0011) is first, // so the sign is 0, and the next 8 bits (which includes the first bit of the next byte) is // 43 << 1 (left-shift by 1) = 10000110 = 134. This is the exponent value. But to use it we must // first subtract by 127 (why?) and we get 134-127 = 7, thus the result is 2^7 with a fractional // part of 0 (-> 1.0), 2^7 = 128. // This shows that this x86 system is little-endian, which is expected.
// FLOATING-POINT PROCESSING UNIT (FPU) // An embedded FPU can be included in microcontroller architectures. These include instructions for // floating-point arithmetic (+-*/), square roots, and more. Check your processors's technical reference manual as // well as the processor architecture's architecture reference manual for information about your deivice.
// FIXED-POINT ARITHMETIC // Implementing floating-point arithmetic often involves a floating-point unit, either a hard FPU (like a // secondary CPU) or a soft FPU (a software library). With a hard FPU, we can get away with floating-point at // little computational cost. If on the other hand we only have a soft FPU, the overhead can be considerable. The // fact is, the processor is performing integer arithmetic, and floating-point operations require many // instructions. // // To help the situation, one can use fixed-point arithmetic. With fixed-point, integers are used which can be // separated into an integer part and a fractional part, by breaking e.g. 32 bits in two pieces. // // To successfully use fixed-point, the precision (number of fractional bits) and data width (total number of // bits) must be specified. When an operation is performed, one must know the precision and width of the // two operands before calculating. Overflow and underflow errors can also crop up, and these should be checked.
// Example implementation (from here: https://www.embedded.com/fixed-point-math-in-c/): // A 12.20 fixed-point number has a range of [0.0 - 4095.1048575], or [0 - (2^12 -1).(2^20 -1) ] typedef union { int32_t full_number; struct { int32_t integer : 12; // 12 bit integer part. The colon (:) indicates a bit-field, here 12 bits int32_t frac : 20; // 20 bit fractional part } part; } Fixed12_20; // Fixed point, 12 bit integer part, 20 bit fractional part.
// This can be printed to std out: void printfx(Fixed12_20 num) { printf("%04d.%06d\n",num.part.integer,num.part.frac); // 4 digits (to 4096) and 6 digits (to 1048576) with // zero padding }
// This representation does not lend itself easily to generalization, meaning it's hard to support arbitrary // widths of integer and fractional parts. // Instead, we can use bit shifting. This is equivalent to multiplying by powers of two, which some people call // the multiplying factor. But most prefer the concept of bit-shifting. // Here's another example implementation, slightly different: #define FixedT int32_t #define FIXED_PREC 20 #define FIXED_INTPART(A) A>>FIXED_PREC #define FIXED_FRACPART(A) A & ((1<<FIXED_PREC)-1) // 0xFFF...F const uint8_t N_DEC_DIGITS[] = {0,1,1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,7,7,7,7,8,8,8,9,9,9,10,10,10};
FixedT FIXED_CREATE(int16_t integer, uint16_t frac) { return ((FixedT)(integer)<<FIXED_PREC)+(FixedT)(frac); }
void FIXED_PRINT(int32_t* Ap) { printf("%0d.%0d",N_DEC_DIGITS[32-FIXED_PREC],FIXED_INTPART(*Ap),N_DEC_DIGITS[FIXED_PREC],FIXED_FRACPART(*Ap)); }
int main() { FixedT a,b; a = FIXED_CREATE(12,34); b = FIXED_CREATE(45,67); FIXED_PRINT(&a); printf("\n"); FIXED_PRINT(&b); printf("\n"); }
// CASTING AND COPYING MEMORY // It is often the case that we receive data e.g. as a byte array, but we want to cast that same // byte data into a data type like int or float or even a string. We've seen how to convert data // into bytes using e.g. unions, and the same can be done again. We can also use bit arithmetic // with shifting, AND, OR, XOR, etc. // // For arrays, the C library provides memcpy (declared in string.h). The memcpy function takes a // pointer to a destination array (see next section) and a source array and a length argument and // copies the bytes from the source array into the destination. // The template is: // memcpy(void* to, const void* from, size_t num_bytes)
uint8_t byte_array[5] = {0xFF,0x00,0x12,0xA2,0x09}; uint8_t blank_array[5]; memcpy(blank_array, byte_array,5);
// Note that a void* pointer in C is a pointer without a specified type, so it can be cast to any // other pointer type to allow quick conversion of data. We can send any type of array to memcpy // as long as the number of bytes (not the number of elements) matches up. Memcpy is actually a // very simple function, which can be implemented without using the whole C standard library, // as follows (from the C library):
void* memcpy(void* dst, const void* src, size_t len) { size_t i;
/* * memcpy does not support overlapping buffers, so always do it * forwards. (Don't change this without adjusting memmove.) * * For speedy copying, optimize the common case where both pointers * and the length are word-aligned, and copy word-at-a-time instead * of byte-at-a-time. Otherwise, copy by bytes. * * The alignment logic below should be portable. We rely on * the compiler to be reasonably intelligent about optimizing * the divides and modulos out. Fortunately, it is. */
if((uintptr_t)dst % sizeof(long) == 0 &&
(uintptr_t)src % sizeof(long) == 0 &&
len % sizeof(long) == 0) {
long* d = dst;
const long* s = src;
for (i = 0; i<len/sizeof(long);i++){
d[i] = s[i];
}
}
else {
char* d = dst;
const char* s = src;
for(i=0; i<len; i++) {
d[i] = s[i];
}
}
return dst;
}
// This is essentially just copying bytes in a loop. It has the added bonus of also checking for word alignment // to speed up the copying.
/******** MATH ********/ // It is often the case that we must perform math on integer or float or fixed-point data. What options do we // have in an embedded system?
// There must be a distinction between integer, fixed-point, and floating-point math. For the basic operations: // + Addition, built-in // - Subtraction, built-in // * Multiplication, built-in // / Division, built-in; for integers the fractional part is discarded (sometimes called round-towards-zero) // % Modulus, built-in for integers // pow Exponentiation, pow() in math.h for double, float, and long double. For integer and fixed-point // exponentiation, without casting, not built-in; see below. // sqrt Square root, sqrt() in math.h for double, float, and long double. For integer and fixed-point, not // built-in; see below. // // If you're using floating-point math (float, double) you can use the math.h header file, part of the C standard // library. Many other functions are available, including trig, powers, exponentials, logs, and rounding // functions. Optimized routines are provided by Arm for Arm-core devices, found here: // https://github.com/ARM-software/optimized-routines // // For integer and floating-point math, when floats and doubles are to be avoided, there are a few options. // First, you can provide your own implementations. This is essentially algorithm design, so if you're going for // highly optimal code (optimal size, or memory usage, or speed, or code maintainability) best not to trust // yourself to have the best algorithm. Libraries do exist, for example liquid-fpm (floating-point math library, // merged to liquid-dsp) for floating-point math is quite complete. // // Some algorithms and topics to look at when rolling your own math functions: // Integer square root // Digit-by-digit algorithm // Approximation methods // Exponentiation // Exponentiation by squaring // Addition-chain exponentiation // Logarithms // CORDIC // Taylor approximation // Table lookup and interpolation // Turner's algorithm (IEEE 2011) // Trig and hyperbolic trig // CORDIC // An approach you will often find is to define convenient exponential and log functions (such as e^x and log2) // and to use exponent and log rules to calculate with other bases. Iterative numerical methods (e.g. // Newton-Raphson) are common, albeit with variable-time.
/********* ACCESSING MEMORY *********/ // Registers, SRAM, Flash, we often need to access memory. How can we do it? It's straightforward in most cases. // We have addresses in the form of pointers, whose addresses are set manually (as opposed to specifying the // address of a variable).
uint32_t var1 = 10; // Typical way of setting pointer addresses uint32_t* var_ptr = &var1; //
uint32_t* reg_ptr = (uint32_t* )0xE000E000; // Setting a pointer to a specific address in memory (e.g. a register) // Note that we must cast the address to a pointer type
// Many times, we need to access multiple words, so arrays are used. A quick refresher about the finer points of // C arrays is in order. uint32_t x[10]; // Declare an array of 10 words uint32_t* y[10]; // Declare an array of pointers to words
x[0]; // Access first value &x[0]; // Get memory address of first value &x; // Same as &x[0]
uint32_t* z = &x; // Declare a pointer to the first element of x
// Now we can access the pointer z as an array z[0]; // First element of x
// But note that this is a convenience, z is still a pointer. The pointer uses indrect addressing, while the // array uses direct addressing, meaning z[0] accesses the address of x[0] and returns its istored value, while // x[0] returns the value at that location directly.
// Arrays cannot be assigned addresses like pointers can. To get a location in memory, we need to assign a // pointer to that address, and an array can only be used through a pointer. But instead, we can use pointer // arithmetic and access brackets, avoiding arrays for the most part. uint32_t* var1_ptr = &var1; var1_ptr[0]; // Equivalent to *( var1_ptr + 0) var1_ptr[12]; // Equivalent to *( var1_ptr + 12) // Note there's no bound checking here, so we can access whatever memory starting from var1_ptr. Also, the square // brackets indicate dereferencing.
// When memory is divided into logical sections, we can use structs of arrays to reserve the memory. This can be // more convenient than using pointer arithmetic directly. Recall that in C structs must be prefaced with // "struct" before its type name in all occurrences, so it's often typedef'd. typedef struct{ uint32_t section1[4]; uint32_t reserved[10]; uint32_t section2[4]; } MEMORY_type; // This has two parts: the typedef, of the form "typedef " and the struct definition, // "struct { }". Now to use this at a particular memory location, we need a pointer that accepts an // address, and we can use the MEMORY_type as the pointer type. MEMORY_type* memory_accessor = (MEMORY_type*)0xE000E000; // Same as before, set an address for the pointer // Alternatively, we can use a uint32_t pointer, call it memory_ptr, and then remember how to access the data, // with the different offsets (possibly using #defines) but this is more work.
// Dynamic Memory // // In a simple world, memory would be static, with local variables stored on the stack and global variables // stored in .data. But we also have dynamically allocated data, stored on the heap, and sometimes this can be // tricky. // // Dynamic memory itself is simple enough, and I assume we're all on the same page as to what it means. In C, we // use malloc() to allocate dynamic memory, and free() to free that memory. Under the hood, malloc() uses a // function called sbrk() to resize the heap data segment. (brk = break value, sbrk = space? brk, probably). // // Some references: // https://web.archive.org/web/20190214041636/http://fun-tech.se/stm32/linker/index.php // http://e2e.ti.com/support/archive/stellaris_arm/f/471/t/44452 // https://stackoverflow.com/questions/10467244/using-newlibs-malloc-in-an-arm-cortex-m3 // // Most of the trouble with dynamic memory comes from issues with the linker script. We won't cover that here.
// Accessing Registers // // In assembly, we can access registers by writing them in commands, like mov. In C, we can only directly access // "memory-mapped", meaning they have a memory address associated with them. If that's the case, we can access // the register by setting a pointer to it, specifying the address directly. // // Other registers must be accessed indirectly. These are accessed by "helper registers" called indirect // registers; these include set registers and clear registers. If a register is not writable directly, // you can use a set register to set bits within that register, and a clear register to clear bits within that // register. The indirect registers are mapped bit-by-bit, so you set bits in the set register to set bits in the // target register, and you set bits in the clear register to clear bits in the target register.
// The manufacturer should provide a header file, e.g. for ARM devices, in a CMSIS Device driver folder. These // tend to be large files with memory mappings for all peripherals.
// When defining registers on a bit-by-bit basis, you don't have to use bitwise operators all the time. You can // use bit fields in structs: typedef union{ uint32_t full_register; struct{ uint32_t RWn : 1; // Bit 0 uint32_t BufEmpty : 1; // Bit 1 uint32_t Data : 8; // Bits 10 : 2 uint32_t Addr : 8; // Bits 18 : 11 uint32_t Reserved : 14; // Bits 32 : 19 } fields; } StatusRegister;
// The bit fields specify the number of bits used for the given label. The bits are occupied in order. All the // struct data members have the same type, and they can be accessed: StatusRegister sr; sr.fields.RWn = 1; sr.fields.BufEmpty = 0; sr.fields.Data = 0b11001100; sr.fields.Addr = 0b11111110; // Result is: sr.full_register => 00000000 00000011 11111011 00110001 (260913 in decimal)
/********* GLOBAL (EXTERNAL) VARIABLES *********/ // External (global) variables are defined at file scope uint8_t glob;
// Now we can use it. If the definition is in the same file, we can use it directly: void glob_init() { glob = 0; } // If it's in another file, we need to redeclare it with 'extern' void glob_inc() { extern uint8_t glob; glob++; }
// A variable that is global might need to be accessed in another stack (notably during interrupts). If this is // the case, it is very important that you use the -volatile- keyword. Volatile tells the compiler not to // optimize out the variable, because it can change at any time, from anywhere.
volatile uint8_t interrupt_ctr; void handle_init_interrupt() // An ISR { interrupt_ctr = 0; } void ctr_inc() { interrupt_ctr++; }
// In general, volatile should be used when the compiler might think a variable isn't doing anything. This // includes for loops with no body, temporary variables used for debugging values, and interrupt routines, as // well as for peripheral registers, whose values can change without the main program being aware.
// * Shared Memory, Mutex, Locks and Semaphores * // In multithreading and multi-stack applications, we have a shared resource problem. Memory is shared between // processes (such as the main loop and various interrupts) and during various operations (termed 'critical // sections') only one process should have control of the data at a time. We don't want to have data changing // while we're transmitting the contents of an array. // // To solve this problem requires -mutual exclusion-, meaning only one process is in its critical section at a // time. We also require that the system won't -deadlock- e.g. when two processes try to take control at the same // time. Mutual exclusion is abbreviated -mutex-. This can be achieved a few ways. // // Option 1. Disable interrupts // If we have a single processor core, and interrupts which can access data, then the simplest solution is for // critical sections to disable interrupts. An operation that cannot be interrupted is called -atomic-. // Here's an example, for global data tx_data which has a length LEN_DATA, and disable_interrupts() and // enable_interrupts() assumed to do what they say.
void send_data() { disable_interrupts(); for(int i = 0; i < LEN_DATA; i++) { send(tx_data[i]); } enable_interrupts(); }
// This is good. It has the following issues. If the critical section is long, we can have issues with watchdogs, // and clock synchronization, which depend on regular interrupts. If the program halts during the critical // section, then the system is stuck, and must be forcefully reset.
// Option 2. Busy waiting // In busy-waiting, a process repeatedly checks to see whether a variable is locked. There are instructions which // work with this system, although in general this sort of 'polling' is frowned upon. The instructions include: // - test-and-set // - compare-and-swap // - fetch-and-add // - load-link/store-conditional // These operations are all atomic, so they cannot be interrupted.
// Option 3. Software solutions // We can avoid changing data using software alone. A simple example would be to store the current value of some // variable, and then to check that the two values are equal.
volatile uint8_t timer_ctr;
void interrupt_handler() { timer_ctr++; }
void send_timer_count() { uint8_t temp_timer_ctr = timer_ctr; while(temp_timer_ctr != timer_ctr) temp_timer_ctr = timer_ctr;
send(timer_ctr);
}
// The above uses a loop to find when the two values are the same. This has obvious issues, but it can be useful. // Another option is to use a lock variable, often called a mutex (mutual exclusion). volatile uint8_t timer_ctr; volatile bool timer_ctr_lock;
void interrupt_handler() { if(!timer_ctr_lock) timer_ctr++; }
void send_timer_count() { if(!timer_ctr_lock) timer_ctr_lock = true; send(timer_ctr);
timer_ctr_lock = false;
}
// This uses a volatile bool to check when the variable timer_ctr is available. This has its drawbacks, namely // that counter increments are simply skipped, instead of waiting until it's available. Counts are lost. (Note // that the bool type may not be defined.)
// This brings up the problem of ensuring that every process will, eventually, run. Another technique is to have // two variables, and a lock, and to have the interrupt set one variable or the other depending on the lock. volatile uint8_t timer_ctr volatile uint8_t timer_ctr_2; volatile bool timer_ctr_lock;
void interrupt_handler() { if(!timer_ctr_lock) timer_ctr++; // Increment if unlocked else { timer_ctr_2 = timer_ctr; // Otherwise increment the other counter timer_ctr_2++; } } void send_timer_count() { if(!timer_ctr_lock) timer_ctr_lock = true; // Confirm the lock
send(timer_ctr); // Critical section
timer_ctr_lock = false;
if(timer_ctr != timer_ctr_2) // Update in case the other variable changed
timer_ctr = timer_ctr_2;
}
// STATIC VARIABLES - Better Global and Local Variables // There are two types of variables that are closely related to globals. Both are declared "static", but in // different contexts. // // The first is a variable at file scope declared static. static uint8_t my_var; // This variable is global within the file, but ONLY within the file it's declared in. It cannot be used outside // of THIS file. // // The second is a local variable in a function declared static. void func() { static uint8_t ctr = 0; ctr++; } // This variable will only be initialized the FIRST time the function runs. After that, the initialization (ctr = // 0) is skipped, as the static local variable keeps its value on each run. In this example, the first time // func() is called, ctr is initialized to 0, then gets incremented to 1. The second time we call func(), the // initialization is skipped, ctr is still 1, and we increment ctr again to 2. The local variable keeps its value // between function calls, unlike normal local variables. // // Static global variables give us a way to have variables shared between functions without having those // variables be truly global. Static local variables allow us to have a function maintain its own set of // variables as if it were a data structure. These are powerful tools that allow us to control variable scope and // promote good encapsulation. // // NOTE: In both cases, static variables have limited scope. They cannot be referenced outside of their // respective scopes. But something that's rather nice is, -pointers- to static variables are safe. This means // you can declare a global static variable, and provide a pointer to that variable, and the pointer can be used // anywhere in the program.
/*********** SHARING DATA WITH INTERRUPTS ***********/ // Most programmers don't like the idea of using globals. It's been instilled in us that globals are evil, they // lead to messy and fragile code, they pollute the namespace, they're evil. // // Put these fears aside. Keep an eye on your globals, but for small embedded programs, it's not a concern. Even // for larger programs, there are tons of reasons to use global variables. Just ask, is avoiding a global worth // the added overhead in code and maintenance? // // That being said, consider the case where an interrupt will generate a lot of data, and our main program needs // to detect the availability of this new data. What's the best way to do this? The first thing that comes to // mind is flags. // // The only way for the main loop to "detect" the calling of another function like an ISR while looping is // (a) if a volatile variable has changed value and (b) the main loop checks if that variable has changed. This // is the purpose of a flag variable, a bool or int value that the main loop checks repeatedly. // // Here are some methods of achieving this.
// 1. Global variables, main loop checks flag and resets it // This option is suitable for smaller programs with only a few interrupts, or for really common and widely // shared data
volatile uint8_t data_avail = 0; // Flag for available data volatile uint8_t b_data[255]; // Byte array that will hold the data int main() { while(1) { if(data_avail) { /* Do something with b_data */
// Reset
data_avail = 0;
}
}
return 0;
}
void isr() // Called automatically when new data comes in { if(!data_avail) { b_data = sys_get_data(b_data,255); // sys_get_data would be a function that loads b_data with the data // that caused the interrupt
// Set the flag
data_avail = 1;
}
else // data_avail == 1
{
// The main loop hasn't read the old data yet, handle this case somehow
}
}
// 2. "Driver" functions and data in a separate file, using static globals, and a function interface // This is suitable for larger programs, where we trade a little extra overhead for more robust and maintainable // software
/* it_driver.h / // Prototypes for functions, defining data structures, typedefs, #defines, etc // e.g. uint8_t data_avail(); // Return 1 if new data is available, 0 otherwise void get_byte_data(uint8_t bytes, uint32_t len); // Load 'len' bytes of data into 'bytes', reset new_data // etc
/* it_driver.c */ // Declaring static globals, providing implementations for functions static volatile uint8_t new_data = 0; // Flag for whether there is new data static volatile uint8_t b_data[255]; // Byte array with max size of 255 bytes void isr() // Called automatically/asynchronously when new data arrives { if(!new_data) { b_data = sys_get_data(b_data,255); new_data = 1; } else { // Main loop has read old data yet, ... } } // Other functions defined too
/* main.c */ #include "it_driver.h" int main() { uint8_t b_data[255];
while(1)
{
if(data_avail()) // Some better naming is of course in order
{
get_byte_data(b_data,255);
}
}
}
// 3. Same as 2, but using handler variables // We often have multiple peripherals which have the same methods available to them, e.g. multiple I2C lines. We // don't want to have to give each I2C line its own driver when the drivers will all be identical. To get around // this, we define the interface as in method 2, but each function will take a "handler" parameter referring to a // particular instance of the peripheral, and it will operate on that handler instead of global static data. // The handler variables themselves will be declared in the driver source file, so any time we want to add or // remove an instance we'll need to change them in that source file. // // The handler variable is a data structure (a struct) which stores data identifying a particular instance of // the peripheral. This is often a true global variable, or a pointer to a global variable, defined in the driver // source (.c) file.
/* it_driver.h */ typedef struct{ uint8_t b_data[255]; // etc } it_handler_struct; // Create a struct, call it it_handler_struct
/* it_driver.c */ it_handler_struct it_handler; // Global handler variable, used by main program
// If we want to instead provide a pointer to the data handler, we may want to have the user pass that pointer // around, but -not- dereference it. They can have it_handler_struct* it_handler_ptr, but they should not be // using it like it_handler_ptr->b_data, or *it_handler_ptr. This is cleaner. To promote this sort of behavior // and interaction, we can typedef the pointer to the struct to its own handler type, making the struct pointer // not look like a pointer, as a reminder of the design intent.
/* it_driver.h */ typedef struct{ uint8_t b_data[255]; // etc } it_handler_struct;
typedef (it_handler_struct*) it_handler_type;
// Now the variable it_handler can be a type it_handler_type, which is actually a pointer, but the fact that it's // a "type" expresses our design intent.
// All the functions previously defined now need to be changed to accept a handler, and we're done. The main loop // can specify which peripheral it's looking at by simply using the appropriate handler variable.
/*********** DATA STRUCTURES ***********/ // Common data structures include: // - Abstract data types // - Containers // - Lists // - Tuples // - Multimaps // - Sets // - Multisets // - Stacks // - Queues // - Graphs // - Linear data structures // - Arrays // - Linked lists // - Double-linked lists // - Circular buffers (aka circular queue, ring buffer, cyclic buffer, FIFO) // - Trees // - Binary trees // - B-trees // - Heaps // Of these, particularly important in embedded systems are circular buffers, queues, stacks, and of course // arrays.
// ARRAYS // Arrays are used to store contiguous blocks of data in memory. They are best used for fixed-length data, such // as read-only data, data packets, and registers.
// To declare an array: uint8_t rx_msg[8]; // Declare an 8 byte array by size uint8_t tx_msg[] = { 0x10, 0xff, 0x00, 0x10, 0xff }; // Declare an array by initializing elements uint8_t tx_rx_array[8] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00} ; // Both size and initializing elems
// The length must be const data. You cannot use a non-const variable to specify array length. Arrays cannot // directly be resized. Copying of data from one array to the other is done with e.g. memcpy (or a simple // for-loop) as mentioned above.
// CIRCULAR BUFFERS // Circular buffers are first-in-first-out (FIFO) data structures. The buffer is fixed-length, and you always // read data from the oldest going to the newest. When the end of the buffer is reached, we start overwriting the // beginning, hence the name "circular buffer." When this happens, the old data is lost, unless it has already // been read. // // Instead of specifying a 'start' and 'end' for the buffer, we use a 'head' and a 'tail', where the head is // the most recent element and the tail is the next element to be read. If the head overtakes the tail, we have // overwriting, which is typically not allowed, and may call for reallocation. The data itself is stored in // memory as an array, which does have a start and end, and this must be handled in the implementation, but the // user only knows the head, tail, and size.
// Here's an example implementation. The header file: #ifndef CIRCBUF_H #define CIRCBUF_H
#include <stdint.h>
// Header for circular buffer data structure
typedef struct{ uint32_t head; // Next free space uint32_t tail; // Current element, to be popped uint32_t capacity; // Max buffer size uint8_t full; // Whether or not the buffer is full (0 or 1) uint8_t* data; // Actual buffer } circbuf;
// Make a typedef for a handler to a circbuf pointer, so users don't think they're supposed to dereference it typedef circbuf* circbuf_handler; typedef uint8_t circbuf_status;
#define CIRCBUF_STATUS_OK 0 #define CIRCBUF_STATUS_FULL 1 #define CIRCBUF_STATUS_EMPTY 2 #define CIRCBUF_STATUS_WARN 3 // Warn = last byte written, now buffer is full
// Implementation functions circbuf_handler init_circbuf(uint8_t* buf, uint32_t size); // Initialize circular buffer with buffer and size void circbuf_reset(); // Clear buffer, set head = tail circbuf_status circbuf_push(uint8_t* wbyte); // Push len bytes to buffer, return status circbuf_status circbuf_pop(uint8_t* rbyte); // Pop len bytes from buffer to rbyte, return status circbuf_status circbuf_get_status(); // Return a status value uint32_t circbuf_get_capacity(); // Return capacity (full size) of buffer uint32_t circbuf_get_size(); // Return number of elements in buffer uint32_t circbuf_get_free_space(); // Return number of free spaces
void circbuf_print(); // Print circular buffer contents
#endif /* CIRCBUF_H */
// The source file: Note how we've hidden some of the implementation details by typedef-ing a pointer to the // buffer object, and we also instantiate a buffer object in the source file. This could have been static, so // that the user can't access it, but this is fine too.
#include "circbuf.h" #include <stdio.h>
circbuf circular_buffer; // Initialize circular buffer
circbuf_handler init_circbuf(uint8_t* buf, uint32_t size) { circular_buffer.data = buf; circular_buffer.capacity = size; return &circular_buffer; }
void circbuf_reset() { uint32_t i; for(i=0;i<circular_buffer.capacity;i++) circular_buffer.data[i] = 0; circular_buffer.head = 0; circular_buffer.tail = 0; }
circbuf_status circbuf_push(uint8_t* wbyte) { // Need to determine whether or not the next space is available, // updating if we go over the capacity uint32_t next_head = (circular_buffer.head+1) % circular_buffer.capacity; if(circular_buffer.full) { return CIRCBUF_STATUS_FULL; } else if(next_head == circular_buffer.tail) { // If we're going to overlap the tail next step (overflow) circular_buffer.full = 1; circular_buffer.data[circular_buffer.head] = *wbyte; circular_buffer.head = next_head; return CIRCBUF_STATUS_WARN; } else { // We aren't overlapping the tail (no overflow condition) circular_buffer.full = 0; circular_buffer.data[circular_buffer.head] = *wbyte; circular_buffer.head = next_head; return CIRCBUF_STATUS_OK; } }
circbuf_status circbuf_pop(uint8_t* rbyte) { uint32_t next_tail = (circular_buffer.tail+1) % circular_buffer.capacity; if(circular_buffer.tail == circular_buffer.head && !circular_buffer.full) { // Tail is caught up to head (empty condition) return CIRCBUF_STATUS_EMPTY; } else { *rbyte = circular_buffer.data[circular_buffer.tail]; circular_buffer.tail = next_tail; if(circular_buffer.full) { circular_buffer.full = 0; } return CIRCBUF_STATUS_OK; } }
circbuf_status circbuf_get_status() { if(circular_buffer.tail == circular_buffer.head) { return CIRCBUF_STATUS_FULL; } else if( (circular_buffer.tail+1)%circular_buffer.capacity == circular_buffer.head ) { return CIRCBUF_STATUS_EMPTY; } else { return CIRCBUF_STATUS_OK; } }
uint32_t circbuf_get_capacity() { return circular_buffer.capacity; }
uint32_t circbuf_get_size() { return 0; // TODO }
uint32_t circbuf_get_free_space() { return circular_buffer.capacity - circbuf_get_size(); }
void circbuf_print() { uint32_t i; printf("CIRCULAR BUFFER CONTENTS\n"); for(i=0; i<circular_buffer.capacity; i++) { printf("0x%.2x\t",circular_buffer.data[i]); } printf("\n"); for(i=0; i<circular_buffer.capacity; i++) { if(i == circular_buffer.head && i != circular_buffer.tail) printf("H\t"); else if(i == circular_buffer.tail && i != circular_buffer.head) printf("T\t"); else if(i == circular_buffer.tail && i == circular_buffer.head) printf("T/H\t"); else printf("\t"); } printf("\n");
}
// This sort of implementation should be enough to help implementing other data structures, where the general // principles remain mostly the same.
/*********** COMMUNICATION ************/ // The topic of "communication" is big, but there are many things shared across most common systems that // can be discussed.
// Device Communication: // - SPI // - UART // - I2C // - USB // - RS-232 // - Microwire // - JTAG (not a communication protocol, but very common and somewhat related)
// Now we can apply the topics covered so far to the task of receiving and transmitting data over // one of these communication links. Some things to look for: // - Data, status, control, and other registers // - Interrupts // - Master/slave roles and responsibilities // - Speed // - Addressing scheme
// In embedded systems, a protocol like I2C can be implemented a few different ways. The communication is // always the same, but how we are required to interact with it varies. For example, if no I2C peripheral exists, // we can program the protocol directly into a set of IO pins, a process called 'bit banging'. (Specifically, // bit banging refers to any method of transferring data using software instead of dedicated hardware.) // // One level up from this, our microprocessor might have dedicated hardware for handling communications. Our role // as the programmer is then to use the associated registers and interrupts to control when and how communication // proceeds. To transmit data, we load it into a data register; to receive data, we read from a data register. // // At a higher level of abstraction, we have high-level libraries such as HAL for STM32 devices which can provide // functions for handling all of the common tasks required.
// For any protocol (I2C, SPI, etc) that has associated peripherals, look for the following: // Control registers, for configuration of the peripheral // Status register, used for flags such as busy, overrun, buffer empty, buffer not empty, etc // Data register, data received or transmitted // The protocols should support both software polling as well as interrupt modes. Software polling means you // check the status register regularly to see if any new flags are set. Interrupt mode means an interrupt will be // automatically called when a flag is set. Polling mode is also called blocking mode. Interrupt mode can use the // interrupts to trigger DMA transfers instead of triggering an interrupt handler.
// TRANSMITTING // For transmitting, check the status register for a 'transmit buffer empty' flag. If this flag is a 0, there is // still data being transmitted, so don't transmit. If it's a 1, begin a transmit sequence. This can be as simple // as loading a byte into the transmit buffer. For managing transmitted data, you can either manually write // words, or you can use a circular buffer to transmit its contents over a number of steps.
// RECEIVING IN POLLING MODE (BLOCKING MODE) // It is the user's responsibility to check the status register for changes. This can be simple, reading one // word at a time: uint16_t rx_word = 0; while(1) { if(SR->RXNE) // Status Register RX buffer Not Empty { rx_word = *DR; // Set rx_word to Data Register // Do something with rx_word } } // This has the obvious drawback that you can only read 16 bits at a time, and you need to use those bits right // away. An alternative is to load the data register into a circular buffer, and handle the data as needed, // processing in bulk or as fast as possible.
// RECEIVING IN INTERRUPT MODE (NON-BLOCKING MODE) // Create an interrupt handler (the symbol name is probably defined in the startup file). In the handler, first // (as ALWAYS with interrupts) clear the interrupt. Then store the data any way that's convenient, using the // methods described above (sharing data with interrupts, using global variables, etc). For example, load the // data register into a circular buffer, and when the receiving operation is complete (e.g. after a string has // terminated) use a flag variable to signal to the main() function that the data is ready. You could also handle // the communication without using the main() loop at all, instead doing everything in an event-driven way.
/******* PARALLELISM AND CONCURRENCY *******/ // You have an embedded system. There are 17 sensors, 25 status LEDs, a half dozen peripherals communicating // over three different communication protocols, a TFT display to update, and a USB connection to a host // computer. Where do you start? Maybe if it was 2 sensors, 5 LEDs, and a VCP connection you could do something // like this: int main() { init(); // Initialize system while(1) { data1 = check_sensor_1(); if(data1 > LIMIT) { set_pin(OVERLIMIT_PIN, 1); } serial_transmit(data1);
data2 = check_sensor_2();
if(data2 < UNDERFLOW)
{
set_pin(UNDERFLOW_PIN, 1);
}
serial_transmit(data2);
}
} // Here, in a function remniscent of Arduino programs, we have a main loop which runs through a 'checklist' of // items, polling for updates and sending data out over serial. Obviously, this is not going to scale well. // Instead, we want to handle multiple tasks -simultaneously-.
// Operations can run nearly simultaneously in the following ways: // Parallel - Operations performed in true parallel, instructions executed at the same time // Concurrent - Operations performed in a round-robin fashion, allocating time to processes, processes take // turns executing instructions // Managing tasks with finite resources, that's the problem we face. Sometimes we have four or eight CPU // cores that we can use to achieve true parallelism. Other times, we need to prioritize certain tasks, allowing // some tasks to run uninterrupted, while others are run when nothing else is going on.
// Often, managing tasks is the objective of an -operating system-. If you need to guarantee that certain tasks // will complete in a bounded amount of time, you would probably look for a real-time operating system (RTOS). If // that's too much overhead, you can implement simple concurrency without much effort at all, like the above main // loop example.
// Some specific examples of parallel and concurrent processing: // Parallel // Multithreading, multiprocessing // Hardware (logic gates, FPGA) // GPU shaders // Concurrent // Schedule-driven processing // Event-driven processing // You'll often find the following terms used in these contexts (though note that the meanings of these words // change with different contexts): // Event // Task // Job // Process // Thread // Worker // Coroutine // Interrupt // Try looking these terms up and seeing what differences there are between them in different contexts. For // example, Linux uses processes, jobs, and threads; Windows has a task manager which handles programs and // processes.
// INTERRUPTS // The first example of a concurrent system is already familiar: interrupts. Interrupts use priority, they can be // nested, they sometimes use polling and otherwise are vectored (using ISRs). Some practical bits of advice // about using interrupts that relate to concurrent systems: // ISRs should be as short as possible to avoid conflicts with other interrupts. Come in clean, manage the // interrupt, and return. // Avoid using iterative loops, busy-waits, and other time-consuming or indeterminate-time flow control // methods // The idea with interrupt service routines is to try, when possible, to minimize the possibility of ISR nesting, // and lost execution time in the main thread. Interrupts can have their priority superseded, but they cannot // execute at the same time (e.g. with time-sharing) meaning they're not really a concurrent programming // technique. // // REENTRANCY // Is it safe for a function or program section to be called by both the main excecution and an interrupt? Is it // safe for it to call the same function in diferent threads at the same time? If the answer is yes, that program // segment is called -reentrant-. // // Functions that are not reentrant can typically be broken up into critical sections (where only one thread can // access that section at a time) and reentrant sections (which are safe). If a function is running in a critical // section in one thread, and another thread tries to call the function, an error occurs. // // We already covered how to deal with reentrancy issues when talking about sharing memory, mutexs, and // semaphores. Thread scheduling can also work to solve this issue. For concreteness, we can categorize critical // sections as follows: // 1. Read-modify-write sequences (reading data, modifying it, and writing it back) // 2. Write-read sequence (writing data, then later reading it) // 3. Nonatomic multistep write (writing data over multiple instructions) // Clearly, atomicity is key to avoiding critical sections and ensuring reentrancy.
// SEMAPHORES // The different types of semaphores: // Spin-lock semaphore // Blocking semaphore