This project is a university project for Language and Compilers Course at Faculty of Engineering Cairo University. The main output of this project is designing a simple programming language and implementing its compiler, to achieve this goal the following points were to be done:
- Designing and implementing the language and its rules
- Designing the lexical analyzer
- Designing the parser rules
- Designing a symbol table
- Implementing actions to execute upon successful reduction of parser rules
- Detecting syntax and sematic errors
- Translating input written in our language to assembly-like quadruples
The used tools and technologies for this purpose were
-
Bison and Flex for the lexical analyzer and the parser rules
-
C language for designing the actions executed upon parser rules reduction
-
Python for the GUI
So please make sure you have all these dependencies installed on your machine to be able to run our code
Then to run our code, execute the following command from your terminal
python .\compiler_gui.py
In this phase we designed a simple language and implemented its lexical analyzer and parser rules
-
Declaring and initializing a variable
-
Notes:
- A variable name should start with a letter, then it can include other alphanumeric characters but not special characters
- The variable datatype is determined according to the first value assigned to it
- Datatypes can be either integer, float, string or boolean. This datatype cannot be changed for the variable after declaration
- Variables can be declared as constant or normal variables
-
Example:
w = 10; x = 1.5; y = "test"; z = true;
-
-
Declaring and calling methods
-
Notes:
- Defining a method is done by preceding the method name by the word
def
- Method names start with a 'dollar sign'
$
, then a letter, then it can include other alphanumeric characters but not special characters - Method arguments are named exactly like variables but with an 'at sign'
@
at the beginning - If more than one method is to be declared in the same program, the arguments of each method should have unique names. No two methods are allowed to have the same argument names. This is our design to the language
- To pass a value calculated inside a method, this is done by passing a variable to the function call and this variable will be treated as a variable that is passed by reference
- Defining a method is done by preceding the method name by the word
-
Example:
def $sum(@x, @y, @z) { @z = @x + @y; } a = 1; b = 2; totalSum = 0; $sum(a, b, totalSum);
-
-
Mathematical expressions
-
Notes:
- The available mathematical operations are the simple
+
,-
,*
,/
- The left and right hand sides of each operator should be of the same datatype, so for example if a programmer wants to use a number that might be a fraction, then floats should be used on both sides of an operator
- The available mathematical operations are the simple
-
Example:
x = 0; y = 10; z = x / y - x;
-
-
Logic expressions
-
Notes:
- The available logic expressions are
<
,>
,==
,<=
,>=
,!=
,&&
,||
- Just like the mathematical expressions types of the left and right hand sides of each operator should match
- Can be used directly inside loops and if conditions.
- Their output cannot be assigned into a variable, but an example below will show how we designed the language to achieve the same result
- The available logic expressions are
-
Example:
a = true; b = false; if(a && b) { c = true; } else { c = false; }
This is equivalent to a=true; b=false; c=a&&b; but it is a rare case so this how we designed the language to handle it
-
-
If-else statements
-
Notes:
- This is written like any other normal if else statement
- Else if can be implemented by nesting an if statement inside the else of another if
-
Example:
a = 1; b = 2; c = 0; if(a >= b) { c = a; } else { if(a < b) { c = b; } }
-
-
For loops
-
Notes:
- For loops are written as follows
for(variable assignment; logic expression; variable assignment){statements}
- There's no
i++
like statements, so this can be done simply asi=i+1
break;
can be used inside a for loop (only allowed inside loops and switch-cases)continue;
can be used inside a for loop (only allowed inside loops)
- For loops are written as follows
-
Example:
for (i = 0; i < 10; i=i+1) { b = 10; }
-
-
While loops
-
Notes:
break;
can be used inside a for loop (only allowed inside loops and switch-cases)continue;
can be used inside a for loop (only allowed inside loops)
-
Example:
while (x < 20) { x = x + 1; }
-
-
Repeat-Until loops
-
Notes:
- The code in the
repeat
block will be executed till the condition inside theuntil
is true
- The code in the
-
Example:
x=1; repeat { x = x + 1; } until (x < 20);
-
-
Switch-Case
-
Notes:
- Switch takes a variable and case checks the equality of this variable to a certain value
break;
should be used after each case- No default cases
-
Example:
x=2; switch (x) { case 1: x = 10; break; case 2: x = 12; break; }
-
For this purpose we used the flex tool
-
Importance of this stage
-
Defining the acceptable tokens for our language
-
Matching these tokens and returning them for the syntax analyzer to perform parsing
-
Return matched values such as variable names, method names, method argument names, etc. to the parser to pass them to actions executed when parser rules are reduced
-
For this purpose we used the bison tool
- Importance of this stage
- Defining the parser rules for our language
- Parsing input and detecting syntax errors
- Executing actions when parser rules are successfully reduced
The main purpose for a symbol table is to hold information about declared variables to be able to check upon the correctness of the usage of these variables after declaration. Also to have a list of all the declared variables to check if an undeclared variable is being used
- Data structure
- It is a simple array of struct pointers
- These pointers are to point to a structs containing information about declared variables such as variable name, datatype and a flag to check if it is a constant
- We keep track of the variables count with every declaration
- Data structure
- It is a simple array of struct pointers
- These pointers are to point to structs containing information about defined functions such as function name, and number of arguments
Actions are implemented using c language, and the following are the signatures of each function and its purpose
-
Functions that create structs for constant values
nodeType *conInt(int value) {/*returns a pointer to created struct for int value*/} nodeType *conFloat(float value) {/*returns a pointer to created struct for float value*/} nodeType *conStr(char *value) {/*returns a pointer to created struct for string (ie. char[]) value*/} nodeType *conBool(bool value) {/*returns a pointer to created struct for boolean value*/}
-
Function called when a variable is declared or set
nodeType *setAndDeclare(char *variableName, nodeType* rhs, bool isConst) {/*declares a new variable, gives it a type and adds it to the symbol table, or checks the correctness of a variable assignment if it was already declared*/}
-
Function called to check if a variable is already declared (utility function)
int isVariableDeclared(char *variableName) {/*Returns variable index if the variable is found in the symbol table and returns -1 otherwise*/}
-
Function called when a variable is used
nodeType *id(char *s) {/*calls isVariable declared before returning a pointer for this variable*/}
-
Function called when an operation is used
nodeType *opr(int oper, int nops, ...) {/*returns a pointer to operation struct containing the operator number of operands and these operands. This struct is then used for code generation as it contains all the required info to be able to generate code*/}
-
Function called when a method is defined
nodeType *defineMethod(char *methodName, nodeType* statements) {/*defines a new function, counts its arguements and add it to the methods symbol table, then calls opr() to generate the struct that will be used for code generation, of course all of this logic is done after checking if it wasn't already defined, so it calls isMethodDeclared()*/}
-
Function called for defining method arguments
void addOperand(char *variableName) {/*Adds method arguments to the symbol table*/}
-
Function called to check if a method is already declared (utility function)
int isMethodDeclared(char *methodleName) {/*Returns variable index if the variable is found in the symbol table and returns -1 otherwise*/}
-
Function called to generate quadruples
int ex(nodeType *p) {/*This function convers the struct generated at the opr function and pushed to the stack into an assembly-like code called quadruples*/}
When errors are detected, error messages are displayed to the programmer
-
Example of syntax error
2s = 10;
This gives a syntax error since variables are not allowed to begin with numbers
-
Semantic errors such as using undeclared variable, type mismatch, etc. are detected and errors are displayed for the programmer. For example:
x = 1; a = "test"; x = a;
This gives a semantic error since x is of type integer and a is of type string and you are trying to assign the value of a to x
As mentioned above the functionex(nodeType* p)
is used to convert code written in this language to quadruples that are assembly-like language, the following table shows the used quadruples and their meaning:
Quadruple | Meaning |
---|---|
Push | Push the value into stack |
Pop | Pop value out of the Stack |
ret | It is called to return from method declaration |
call | It is called to call a predefined function |
Lxxx | It is label added in the code to indicate where to jump to in loops and conditional statements (x: represents a placeholder for a number) |
jnz | To jump if last operation is not equals 0 |
jz | To jump if last operation equals 0 (for example, if comparison result is true) |
jmp | Unconditional jump |
Print variable | |
neg | To invert, (multiply by negative 1) |
add | Add 2 operands |
sub | Subtract 2 operands |
mul | Multiply 2 operands |
div | Divide 2 operands |
compEQ | Compare two operands and check if equal (==) |
compLT | Compare two operands and check if first operand less than l the second operand (<) |
compGT | Compare two operands and check if first operand greater than the second operand (>) |
compAND | Compare two boolean operands and check they are both true |
compOR | Check if one of 2 boolean operands is true |
compGE | Compare two operands and check if first operand greater than or equal the second operand (>=) |
compLE | Compare two operands and check if first operand less than or equal the second operand (<=) |
compNE | Compare two operands and check if first operand not equal the second operand (!=) |
The GUI consists of the following:
- Text editor
- Symbol table viewer
- Quadruples viewer
- Errors Viewer (the rectangle at the bottom, this example has no errors so nothing is displayed)
- A button
run
to compile and convert your code to quadruples - A button
import
to import a file of code from your file system - An exit button