This is a simple compiler written for the University of Helsinki Compilers course. Compiler is written using Java 23.
A simple program written in this language would be as follows.
var n: Int = read_int();
print_int(n);
while n > 1 do {
if n % 2 == 0 then {
n = n / 2;
} else {
n = 3*n + 1;
}
print_int(n);
}
Note: The generated assembly code is related to the x86-64 family of processors.
- Java 23
- as - GNU assembler
- cc - C compiler
- ld - A linker in Unix OSes
Navigate to the directory and run ./gradlew clean build
which will generate the compiler as a jar in build/libs/
.
You can use the compiler as a one time compilation tool or as a language server at the moment.
java -jar uh-compiler-0.0.1.jar compile --output=path/to/output.out path/to/input.in
java -jar uh-compiler-0.0.1.jar serve --host=localhost --port=3000
- The server will respond to TCP requests{"command": "ping"}
or{"command":"compile", "code":"print_int(2);"}
with response as empty or{"program":"base64-encoded statically linked x86_64 program"}
.
At the moment, the compiler can be run on Docker as a language server as well. Run docker build -t . yourname/image_name:version
to build the image and run docker run -p 3000:3000 yourname/image_name:version
to start the container. Container will respond to TCP requests as stated above.
Note: Copied from the course page.
An expression is defined recursively as follows, where E
, E1
, E2
, …
En
represent some other arbitrary expression.
- Integer literal: a positive whole number.
- Negative numbers should be composed of token
-
followed by an integer literal token. - Boolean literal: either
true
orfalse
. - Identifier: a word consisting of letters, underscores or digits, but the first character must not be a digit.
- Unary operator: either
-E
ornot E
. - Binary operator:
E1 op E2
whereop
is one of the following:+
,-
,*
,/
,%
,==
,!=
,<
,<=
,>
,>=
,and
,or
,=
.- Operator
=
is right-associative. - All other operators are left-associative.
- Precedences are defined below.
- Operator
- Parentheses:
(E)
, used to override precedence. - Block:
{ E1; E2; ...; En }
or{ E1; E2; ...; En; }
(may be empty, last semicolon optional).- Semicolons after subexpressions that end in
}
are optional.
- Semicolons after subexpressions that end in
- Untyped variable declaration:
var ID = E
whereID
is an identifier. - Typed variable declaration:
var ID: T = E
whereID
is an identifier andT
isInt
,Bool
orUnit
. - If-then conditional:
if E1 then E2
- If-then-else conditional:
if E1 then E2 else E3
- While-loop:
while E1 do E2
- Function call:
ID(E1, E2, ..., En)
where ID is an identifier
Variable declarations (var ...
) are allowed only directly inside blocks ({ ... }
) and in top-level expressions.
=
or
and
,==
,!=
<
,<=
,>
,>=
+
,-
*
,/
,%
- Unary
-
andnot
- All other constructs: literals, identifiers, if, while, var, blocks, parentheses, function calls.
The program consists of a single top-level expression. If the program text has multiple expressions separated by semicolons, they are treated like the contents of a block, and that block becomes the top-level expression. The last expression may be optionally followed by a semicolon.
Arbitrary amounts of whitespace are allowed between tokens. One-line comments starting with #
or //
are supported.
These are the main parts of the implementation stages/progress of the compiler.
- Tokenizer
- Basic tokenization
- Basic test cases
- Edge test cases
- Negative test cases
- Parser
- Integer literal
- Identifiers
- Boolean literal
- If then else blocks
- Comparison operators (=, ==, !=, <=, >=, >, <, and, or)
- While block
- Function call
- Type declaration
- Break, Continue
- Function support
- Interpreter - This is an optional part of the compiler which is done for learning
- Basic recursion
- Symbol table
- All operators
- Function call
- Conditional block
- While block
- Break, Continue
- Function support
- Type Checker
- Positive test cases
- Negative test cases
- IR Generator
- Integer literal
- Identifiers
- Boolean literal
- If then else blocks
- Comparison operators (=, ==, !=, <=, >=, >, <, and, or)
- While block
- Function call
- Type declaration
- Break, Continue
- Function support
- Assembly Generator
- Operators
- Assemble
- Generate native executable
- Analysis & Optimization