ERPLAG Compiler

This repository contains the toy-compiler created as part of the course CS F363: Compiler Construction at BITS Pilani, Pilani Campus in Spring 2023 under Dr. Vandana Agarwal.

Features supported by the language

Click here for complete language specifications.

The language ERPLAG is a strongly typed language with primitive data types as integer and floating point.
It also supports two other data types: boolean and arrays.
The language supports arithmetic and boolean expressions.
The language supports assignment statements, input/output statements, declarative, conditional, iterative, and function call statements.
The language supports the modular implementation of the functionalities.
Functions can return multiple values. The function may or may not return a value as well.
The scope of the variables is static and the variable is visible only in the block where it is declared.

Dependencies

Ensure you have gcc, version 5.0 or greater, installed. This is essential to build the source code locally.

gcc version can be checked via gcc --version command on terminal.

Ensure you have nasm, version 2.14.02 or greater, installed. NASM is needed for code generation, without NASM the compiler would not be able to execute the final x86 assembly code generated by our compiler . You can still analyse the lexical, syntax or semantic errors in your source code, as these layers do not require nasm.

nasm version can be checked via nasm --version command on terminal. You can install nasm via sudo apt install nasm

Modules

Lexer :
- Lexer reads the source code in ERPLAG as a stream of charaters and converts sequence of characters to meaningful tokens.
- Twin buffers are used to store the stream of characters to reduce IO operations and improve efficiency.
- The tokens are passed to the parser and contains information about the line of occurance, name of the token, token id etc.
Parser :
- Parser takes in the sequence of tokens and verifies the syntactical correctness of the source code by constructing the parse tree.
- The Parser also has error recovery mechanism to parse the complete sequence of tokens even if a syntactical error exists.
AST Construction :
- This module takes in the parse tree and deletes all the irrelevant tokens like keywords, changes the structure of and removes empty non-terminals.
- The goal of this module is to reduce the size of the parse tree (preserve only essential parts of the tree), so as to allow type checking and semantic analysis to be more efficient.
Symbol Table and Scoping :
- This module constructs the symbol-table for the corresponding AST. The symbol table has a tree structure that depicts the scoping of the blocks of code.
- Each node of the tree has a linked list of its locally defined variables. It also reports the errors related to redundant declaration, no declaration of variables and modules.
Type Checker and Semantic Analyser :
- This module performs semantic checks on the AST for verifiying semantic correctness of the source code.
- It also determines the correctness of the arithmetic and boolean expressions.
Intermediate Code Generation :
- Generates the intermediate 3 address codes, so as to enable easy mapping of variables and temporaries to registers in the codegen phase.
Code Generation : Generates the NASM-assembly code of the source code.

Instructions to run

Create a local copy:
Clone this repository to your local machine using the following command or download the zip file.
```
$ git clone https://github.com/aaryaattrey/Compiler-Construction.git
```
Building the compiler:
- Make sure you are in the root directory.
- You can build the compiler through the makefile provided or manually compile the code.
  
  Using make:
```
$ make
```
  Alternatively, you can also compile manually:
```
$ gcc -o compiler driver.c
```
Compiling ERPLAG code:
Once all the files have been compiled correctly, you can compile an ERPLAG program by running the following:
```
$ ./compiler <testcase.txt> <outputFile.asm>
```
Here, you can provide the path to your testcase (ERPLAG program file) and the name of your output asm file. For your reference, some example testcase files have been included here. By running this command, you will get to see a menu like this on the terminal:
```
 $ Enter one option out of below: 
     ***********
     0. Exit
     1. Print token list on terminal:
     2. Print Parse tree on terminal:
     3. Print AST on terminal:
     4. Print Size of AST/Parse Tree:
     5. Print Symbol Table :
     6. Activation record size :
     7. Print Static Dynamic Arrays :
     8. Error Analysis(Lexical, Syntax, Semantic) :
     9. Codegen :
    ******
```
What each option means:

0 - Exit: To exit the process.
1 - Prints the list of tokens on the terminal, after invoking the lexer. Lexical errors ,if any, are displayed with appropriate line numbers.
2 - If there are no lexical errors, invokes the parser and generates the parse tree on the terminal.
3 - Abstract Syntax Tree is generated from the parse tree for the source code. You can find the rules for AST conversion here.
4 - Prints the size of AST and parse tree, thereby showing the memory saved by creation of AST.
5 - Prints all the symbol tables for the given ERPLAG program, showing the scope and type information for each variable.
6 - Prints the activation record size for different scopes.
7 - Prints the list of static and dynamic arrays, with their low and high ranges, and line number of declaration.
8 - Error analysis for the source program with appropriate line numbers.
9 - Generates x86 executable for the given source program which can be run using the following command. This feature is unstable for now.

To run the .asm file, use this command
```
$ nasm -f elf64 <outputFile.asm> -o code.o && gcc -no-pie code.o -o code
$ ./code
```
NOTE: You need to have the appropriate version of NASM installed for the .asm file to run. You can still analyse the intermediate compilation results without NASM installed.

Sample `ERPLAG` Code

<<<driver program>>>
start
	declare x, y, z:integer;
	declare a, b, c:integer;
	a:= 5;
	b:= 9;
	get_value(x);
	get_value(y);
	z:= x + y*b +(a-b)*y+ a*2 - b*x;	
        print(z);
end

Sample Output

Input: Enter an integer value 
2
Input: Enter an integer value 
3
Output: 9

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
testcases		testcases
AST_SEMANTIC_RULES.pdf		AST_SEMANTIC_RULES.pdf
IRCodeGen.c		IRCodeGen.c
IRCodeGen.h		IRCodeGen.h
IRCodeGenDef.h		IRCodeGenDef.h
README.md		README.md
ast.c		ast.c
ast.h		ast.h
astDef.h		astDef.h
codegen.c		codegen.c
driver.c		driver.c
grammar.txt		grammar.txt
language-features.pdf		language-features.pdf
lexer.c		lexer.c
lexer.h		lexer.h
lexerDef.h		lexerDef.h
makefile		makefile
parser.c		parser.c
parser.h		parser.h
parserDef.h		parserDef.h
semanticAnalysis.c		semanticAnalysis.c
stackAdt.c		stackAdt.c
stackAdt.h		stackAdt.h
stackAdtDef.h		stackAdtDef.h
symbolTable.c		symbolTable.c
symbolTable.h		symbolTable.h
tree.c		tree.c
tree.h		tree.h
treeDef.h		treeDef.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ERPLAG Compiler

Features supported by the language

Dependencies

Modules

Instructions to run

What each option means:

Sample `ERPLAG` Code

Sample Output

About

Releases

Packages

Languages

aaryaattrey/Compiler-Construction

Folders and files

Latest commit

History

Repository files navigation

ERPLAG Compiler

Features supported by the language

Dependencies

Modules

Instructions to run

What each option means:

Sample ERPLAG Code

Sample Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Sample `ERPLAG` Code

Packages