Consider the OBJEling class defined by the following code
class point {
attribute int x;
attribute int y;
method void constructor(int a, int b) {
this.x = a;
this.y = b;
}
method int sum(int z) {
return z + this.x + this.y;
}
}
function void main() {
var point p;
var int c;
p = new point(2, 1);
c = p.sum(48);
putchar(c);
}
An instance of the class point
is represented by a pointer to a heap block with three fields:
- a pointer to the class descriptor of
point
(offset 0) - the value of
x
(offset 4) - the value of
y
(offset 8)
address offset | data |
---|---|
0 | descriptor |
4 | x |
8 | y |
The class descriptor is defined by a heap block with three fields:
- a
null
pointer, since the class has no parent (offset 0) - a code pointer to a function
point_constructor
(offset 4) - a code pointer to a function
point_sum
(offset 8)
address offset | data |
---|---|
0 | null |
4 | constructor_ptr |
8 | sum_ptr |
Note that each method is mapped to a function whose name combines the
name of the class and the name of the method (hence point_sum
).
We also define a global variable point_descr
containing a pointer
to the class descriptor of the class point
.
Consider the OBJEling instruction
p.x = 3*p.x;
The fragment p.x
is an access to a field.
objlang AST for this fragment is written as follows in caml (not showing type
annotations):
Atr(Var "p", "x")
This fragment denotes a memory location. It can be used for reading from or
writing to memory, using expression constructor Read
or instruction
constructor Write
. objlang AST for the instruction
p.x = 3 * p.x;
is then (omitting types):
let px = Atr(Var "p", "x") in
Write(
px,
Binop(
Mul,
Cst 3,
Read px
)
)
The address of the memory location denoted by p.x
can be obtained by
adding to the pointer p
an offset corresponding to the attribute x
,
that is 4.
The fragment is translated into IMP as (concrete IMP syntax):
p + 4
or (abstract IMP syntax)
Binop(Add, Var "p", Cst 4)
abbreviated ma' below
For the remainder of the instruction: the Read
access is translated using
a Deref
expression, and the Write
access is translated using a Write
instruction.
Thus, objlang instruction
p.x = 3 * p.x
translates to IMP (concrete IMP syntax):
*(p+4) = 3 * (*(p+4));
or (abstract IMP syntax)
Write(ma', Binop(Mul, Cst 3, Unop(Read, ma')))
Consider the objlang expression
p.sum(48)
objlang AST is (omitting types):
MCall(Var "p", "sum", [Cst 48])
It comprises:
- an access to the objet
p
; - a method name;
- a list of 1 argument;
The method call translates to IMP with a dynamic function call (function called
through a pointer rather than a name), with the two parameters p
and Cst 48
.
If we write f
the function pointer, this gives:
*f(p, 48) // concrete IMP syntax
or
DCall(f, [Var "p", Cst 48]) // abstract IMP syntax
Now, focus on this f
pointer: the code pointer of sum
is stored in the
class descriptor of point
, with an offset of 8 bytes. We get the pointer
this way:
- Address of the object:
p
- Contents of the first field (header):
*p
, this contents is the address of the class descriptor - Address where
sum
is stored:*p+8
- The address of
sum
itself:*(*p+8)
(what we are looking for)
In IMP abstract syntax:
Deref(Binop(Add, Deref(Var "p"), Cst 8))
Full methode call in IMP:
*(*p+8)(p, 48) // concrete IMP syntax
or (abstract IMP syntax)
DCall(
Deref(
Binop(
Add,
Deref(Var "p"),
Cst 8
)
),
[Var "p", Cst 48]
)
Consider the objlang instruction
p = new point(2, 1);
In objlang abstract syntax (omitting types), we provide the class name and a list of arguments:
Set("p", New("point", [Cst 2; Cst 1]))
Object creation is in three steps:
- first allocating the block;
- initializing its header;
- finally calling the
constructor
method on the created object.
Using the variable p
to store the created object, we would have the following three IMP instructions:
p = alloc(12);
*p = point_descr;
point_constructor(p, 2, 1);
In IMP abstract syntax, ommiting types:
Seq [
Set("p", Alloc(Cst 12));
Write(Var "p", Var "point_descr");
Expr(
Call(
"point_constructor",
[Var "p"; Cst 2; Cst 1]
)
)
]
The class descriptor is created before the execution of the actual main code of the program. It is not described by any specific objlang instruction: it is just triggerer by the existence of a class.
Creating the class descriptor requires creating a block, initializing the
global variable point_descr
with the address of this block, and storing
in the second and third fields the code pointers for the methods. We also store 0 in the first field to indicate the absence of a parent class.
In IMP syntax, we would have:
point_descr = alloc(12);
*point_descr = 0;
*(point_descr+4) = &point_constructor;
*(point_descr+8) = &point_sum;
In IMP abstract syntax (omitting types):
[
Set(Var "point_descr", Alloc(Cst 12));
Write(Var "point_descr", Cst 0);
Write(
Binop(Add, Var "point_descr", Cst 4),
Addr "point_constructor"
);
Write(
Binop(Add, Var "point_descr", Cst 8),
Addr "point_sum"
)
]
The proposed translation requires extending IMP abstract syntax with 3 new elements.
type expression =
| Addr of string
| DCall of expression * expression list
| ...
type instruction =
| Seq of sequence
| ...
Addr of string
It applies to the name of a statically allocated element (e.g. the name of a function), and returns its address.
DCall of expression * expression list
It represents a "dynamic" call, that is a call made using a function pointer.
The first argument is an expression that computes this pointer, and the list contains the arguments (as it is the case for Call
).
Seq of sequence
This constructor is convenient when a single objlang instruction translates to (a sequence of) more than one Imp instructions.
Do not forget to add these in imp.ml
, and to extends imp2mips.ml
and imppp.ml
accordingly. This folder however already contains the updated abstract syntax and parser for objlang.
Your first goal is to compile a version of objlang that obeys to the following restrictions:
- no inheritance
- method calls, functions calls, and object creation only allowed as "top-level" expressions The latter means that
p = new point(2, 1);
s = p.sum(48);
are allowed, but that
x = (new point(2, 1)).x;
y = 1 + p.sum(48);
are not since the "new" and the call are subexpressions of the main expression.
Note that this restricted version is enough for dealing with the class point
.
Without inheritance, each class is defined only by the attributes and methods explicitely declared in its definition. Then:
-
the number of fields of the memory block that represents an object
obj
of some classC
can be computed straightforwardly as(1 + number)
of attributs in the definition ofC
(and the same for the class descriptor with respect to the number of methods) -
the offset for accessing an attribute
x
is computed using the position ofx
in the list that enumerates the attributes of the class (and the same for the offset of a method in a class descriptor) do not forget to skip the header when computing the offset. -
a class descriptor contains only the methods explicitely defined in the corresponding class.
With complex expressions such as calls restricted to toplevel, they can be treated by the function translating instructions, thus keeping the nice types
val tr_instr: Objlang.instruction -> Imp.instruction
val tr_expr: Objlang.expression -> Imp.expression
Thus you can ignore the cases New
, Call
and MCall
in tr_expr
and consider cases such as Set(x, New(...))
in tr_instr
instead.
Remark that you can implement this base version using the full AST: just
ignore the field parent
of class definitions, and fail if a call or a new appears at the wrong place.
Note on typing: the typing module objlangtyper
is similar to the one built for SIMP, with one novelty: whenever typing a method, the environment should contain a local variable _this
, whose type is the current class. You can add it to the local environment, or make its type an explicit additional parameter of the typing functions. To keep the same code for typing methods
where _this
exists and functions where _this
does not exist, you may use a dummy type in the latter case.
If you have a working version of this base compiler, you can consider a few extensions:
- A. inheritance
- B. preprocessing calls
- C. advanced typing (relies partly on A)
Note: this is a list of suggestions. It is not intended to be restrictive. The suggested extensions may vary in difficulty.
The goal of this extension is to allow the use of extends
in a class definition of an objlang program, for defining a child class that inherits
attributes and methods from a parent class. The child class may also:
- add new attributes
- add new methods
- overwrite methods of the parent class
If you consider implementing this extension, I suggest you proceed again following an incremental approach
- add new attributes
- add new method
- and overwrite method
Novelty with respect to the base version: one has to take into account the number of attributes of the parent class for deducing the total number of attributes of the child class and their offsets.
address offset | data |
---|---|
0 | descriptor |
4 | inherited attributes |
? | new attributes |
However, since there is no new method, you could imagine sharing the class descriptor with the parent class.
Note that in practice, you probably would like to add a dedicated constructor.
However, in simple cases this particular method is only called statically and thus is never used through the descriptor, you can indeed at first avoid the creation of a new descriptor just for that.
As in the previous extension, one has to take into account the number of methods of the parent class to compute the right sizes and offsets.
Do not forget to also build a new descriptor that contains the methods of both the parent and the child class.
When a child class C'
redefines a method m
already defined by its base class C
, the new definition C'_m
should override the old definition C_m
in the class descriptor of C'
. Thus:
- method
C'_m
should have the same offset asC_m
- the class descriptor for
C'
should contain a pointer to the right version ofm
.
When methods can be overriden, the keyword super
can become useful for accessing the parent version of some method.
Restricting calls and object creations to toplevel expressions makes the
translation cleaner, but restricts the allowed input programs. However, it is
possible however to enable again the full syntax of expressions, while keeping
the current state of objlang2imp
. For this, you can add a preprocessing pass
applied between objlangtyper
and objlang2imp
, which takes as input an
arbitrary objlang (typed) AST, and returns an equivalent program where calls
and object creation are at toplevel.
For this, it is enough to translate each call that is not already at toplevel by an assignment to some new variable, and then refer to the variable. For instance,
x = p.sum(24) + (new point(2, 1)).sum(12);
could be "flattened" as
a = p.sum(24);
b = new point(2, 1);
c = b.sum(12);
x = a+c;
This translation decomposes complex expressions, at the cost of introducing a series of short-lived additional variables.
However, with a proper register allocation this additional cost will reduce to zero.
Here are a few suggestions of extensions related to typing. The first ones can be implemented on the base version, some others make sense only when inheritance is already present.
Produce a helpful message when the typer uncovers a typing error.
For instance: in case of an attempt to access an attribute that does not exist in the considered class, produce a message that names the class and the missing attribute
Add an expression obj instanceof C
, that tests whether the object obj
is from a class C'
that is a subclass of C
(or C
itself)
Allow casting an object obj
of static type C
to another static type C'
.
Reminder: casting toward a parent class is always possible, casting toward a child class requires a runtime check, and casting toward an unrelated class is not allowed
Allow static overloading of methods. For each method, build a new name based on the method name and the types of its argument.
For each call, use the types computed for the arguments to select the right method.
Warning: static overloading + inheritance and implicit up-cast produce a few corner cases where a call might become ambiguous; you may ignore this at first.
Allow the definition of abstract classes, which has abstract methods.
- one cannot create an instance of such a class with
new
. - Child classes should eventually provide definitions for all abstract methods
- the offset of each method is fixed by the abstract class that first introduces it.
- the offset is the same in any child class that defines the method