End-End Compiler for a subset of the C language
To build all layers, run make
from the root directory.
To build a specific layer, for example, IR: cd IR; make
Make sure all layers below IR, i.e. L3,L2,L1 are built before invoking the IR compiler on .IR programs if you want to generate x86 binary. This is required because the output of the IR compiler is a .L3 file which is the input to L3 compiler and so on.
Invoke your compiler, for example, IR compiler to compile a .IR program:
cd IR; ./IRc IR/tests/test0.IR
This script works in the following way:
-
Invokes your IR compiler (
IR/bin/IR
) to generateIR/prog.L3
-
Invokes your L3 compiler (
L3/bin/L3
) onIR/prog.L3
to generateL3/prog.L2
-
Invokes your L2 compiler (
L2/bin/L2
) onL3/prog.L2
to generateL2/prog.L1
-
Invokes your L1 compiler (
L1/bin/L1
) onL2/prog.L1
to generateL1/prog.S
which is x86_64 assembly and invokes gcc to generate x86_64 binaryL1/a.out
-
Moves a.out back to
IR/
directory.
To compile the sample tests provided along with a compiler layer and compare generated output with oracle output for correctness: cd IR; make test
To compile the provided performance test LB/test/competition2019.b
and measure execution-time: cd LB; make performance
Here's a brief description of each compiler layer. Please refer to slides provided at the course's homepage, https://users.cs.northwestern.edu/~simonec/CC.html for more information.
Uses PEGTL, https://github.com/taocpp/PEGTL, for parsing .L1 code and generates x86_64 assembly. Deals with stack pointer movement for function calls and handles the x86_64 calling convention.
Computes IN and OUT sets for liveness analysis and performs graph-coloring based register allocation to minimize spilling variables to memory.
Performs instruction selection by tiling and pattern-matching: Generates and merges tiles (patterns) of instructions and uses the Maximal Munch pattern-matching algorithm for optimized code generation.
Handles explicit control flows and explicit data types. Linearizes multi-dimension arrays.
Performs value-encoding, generates code to check array accesses, and creates Basic Blocks.
Uses name binding to deal with scoped variables and handles 'If' and 'While' control structures.