This repository contains the toy-compiler created as part of the course CS F363: Compiler Construction at BITS Pilani, Pilani Campus in Spring 2023 under Dr. Vandana Agarwal.
Click here for complete language specifications.
- The language
ERPLAG
is a strongly typed language with primitive data types as integer and floating point. - It also supports two other data types: boolean and arrays.
- The language supports arithmetic and boolean expressions.
- The language supports assignment statements, input/output statements, declarative, conditional, iterative, and function call statements.
- The language supports the modular implementation of the functionalities.
- Functions can return multiple values. The function may or may not return a value as well.
- The scope of the variables is static and the variable is visible only in the block where it is declared.
- Ensure you have
gcc
, version 5.0 or greater, installed. This is essential to build the source code locally.
gcc
version can be checked viagcc --version
command on terminal.
- Ensure you have
nasm
, version 2.14.02 or greater, installed. NASM is needed for code generation, without NASM the compiler would not be able to execute the final x86 assembly code generated by our compiler . You can still analyse the lexical, syntax or semantic errors in your source code, as these layers do not require nasm.
nasm
version can be checked vianasm --version
command on terminal. You can installnasm
viasudo apt install nasm
- Lexer :
- Lexer reads the source code in ERPLAG as a stream of charaters and converts sequence of characters to meaningful tokens.
- Twin buffers are used to store the stream of characters to reduce IO operations and improve efficiency.
- The tokens are passed to the parser and contains information about the line of occurance, name of the token, token id etc.
- Parser :
- Parser takes in the sequence of tokens and verifies the syntactical correctness of the source code by constructing the parse tree.
- The Parser also has error recovery mechanism to parse the complete sequence of tokens even if a syntactical error exists.
- AST Construction :
- This module takes in the parse tree and deletes all the irrelevant tokens like keywords, changes the structure of and removes empty non-terminals.
- The goal of this module is to reduce the size of the parse tree (preserve only essential parts of the tree), so as to allow type checking and semantic analysis to be more efficient.
- Symbol Table and Scoping :
- This module constructs the symbol-table for the corresponding AST. The symbol table has a tree structure that depicts the scoping of the blocks of code.
- Each node of the tree has a linked list of its locally defined variables. It also reports the errors related to redundant declaration, no declaration of variables and modules.
- Type Checker and Semantic Analyser :
- This module performs semantic checks on the AST for verifiying semantic correctness of the source code.
- It also determines the correctness of the arithmetic and boolean expressions.
- Intermediate Code Generation :
- Generates the intermediate 3 address codes, so as to enable easy mapping of variables and temporaries to registers in the codegen phase.
- Code Generation : Generates the NASM-assembly code of the source code.
-
Create a local copy:
Clone this repository to your local machine using the following command or download the zip file.$ git clone https://github.com/aaryaattrey/Compiler-Construction.git
-
Building the compiler:
-
Make sure you are in the root directory.
-
You can build the compiler through the
makefile
provided or manually compile the code.Using make:
$ make
Alternatively, you can also compile manually:
$ gcc -o compiler driver.c
-
-
Compiling ERPLAG code:
Once all the files have been compiled correctly, you can compile an ERPLAG program by running the following:$ ./compiler <testcase.txt> <outputFile.asm>
Here, you can provide the path to your testcase (ERPLAG program file) and the name of your output asm file. For your reference, some example testcase files have been included here. By running this command, you will get to see a menu like this on the terminal:
$ Enter one option out of below: *********** 0. Exit 1. Print token list on terminal: 2. Print Parse tree on terminal: 3. Print AST on terminal: 4. Print Size of AST/Parse Tree: 5. Print Symbol Table : 6. Activation record size : 7. Print Static Dynamic Arrays : 8. Error Analysis(Lexical, Syntax, Semantic) : 9. Codegen : ******
0 - Exit: To exit the process.
1 - Prints the list of tokens on the terminal, after invoking the lexer. Lexical errors ,if any, are displayed with appropriate line numbers.
2 - If there are no lexical errors, invokes the parser and generates the parse tree on the terminal.
3 - Abstract Syntax Tree is generated from the parse tree for the source code. You can find the rules for AST conversion here.
4 - Prints the size of AST and parse tree, thereby showing the memory saved by creation of AST.
5 - Prints all the symbol tables for the given ERPLAG program, showing the scope and type information for each variable.
6 - Prints the activation record size for different scopes.
7 - Prints the list of static and dynamic arrays, with their low and high ranges, and line number of declaration.
8 - Error analysis for the source program with appropriate line numbers.
9 - Generates x86 executable for the given source program which can be run using the following command. This feature is unstable for now.To run the
.asm
file, use this command$ nasm -f elf64 <outputFile.asm> -o code.o && gcc -no-pie code.o -o code $ ./code
NOTE: You need to have the appropriate version of NASM installed for the .asm file to run. You can still analyse the intermediate compilation results without NASM installed.
<<<driver program>>>
start
declare x, y, z:integer;
declare a, b, c:integer;
a:= 5;
b:= 9;
get_value(x);
get_value(y);
z:= x + y*b +(a-b)*y+ a*2 - b*x;
print(z);
end
Input: Enter an integer value
2
Input: Enter an integer value
3
Output: 9