In total the documentation burden is (at most) 1000 words plus one diagram. Assesment of the documentation is not relative to compiler functionality, it just requires a description of the compiler as-is, and a realistic assessment of the compiler's strengths and weaknesses.
Add a diagram of your AST, which is designed to usefully communicate the important properties of the AST.
Describe the structure and organisation of your AST in 200 words or fewer.
-
Feel free to refer to your diagram.
-
Try to capture the most important properties.
-
You can use code, but it is included in your budget.
Whilst constructing the AST I tried to make everything seem logical and adhering to the C89 spec. Everything inherits from Node, an abstract base class. The main offshoots as seen in the diagram are Lists, DeclarBase, Expressions, and Statements. Lists are simply wrappers around a vector<const Node *>
and are necessary when a Node can contain any number of children. DeclarBase is a base class for Declarations and Declarators, which declare things. Function inherits directly from Node, but on reflection is more of a Declaration or Declarator, since it declares a function implementation; since it is quite a unique construct it does not matter much. Expression and Statement are base classes for the number of different sub-expression and sub-statement classes that inherit from them. The Type primitive is a class which can encapsulte the various type specifiers of a declaration. I've kept the number of classes limited; initially I started with a class for each binary operator, but realised it was too difficult to manage so many classes. The AST I've ended up with is a blend of pure OOP and generic programming, with OOP principles implemented up until a certain level, and generic programming from that point on.
Give two strengths or capabilites of your AST, using 50 words or less for each one.
Metadata is extremely easy to add to each of the nodes in the tree. This is because of the vector<const Node *> getChildren()
function, which adds some of the power of generic programming to my AST. Only need to write a couple functions to populate the whole tree.
48 words
A strength of the combined OOP and generic programming methods is that the OOP helps with debugging, and knowing inside which class bugs were occurring, whilst generic programming for example with BinaryExpression
allowed me to very easily add new operators without the overhead of an entirely new class for each.
50 words
Give two limitations of your AST, using 50 words or less for each one.
Biggest limitation: my AST is too generic. I was unable to refer to specific classes and had to use const Node *s for most class constructors or arguments, which meant later on either casting or defining methods in Node was required in order to create the functionality I needed.
49 words
The class hierarchy could be better thought out. Specifically, the list classes were intended to hold specific class pointers, but ended up all containing generic node pointers, and therefore were not used at all. For example, DeclarationList should also have inherited from DeclarBase and hold vector<const Declaration *>
.
48 words
Describe your overall approach to mapping variable, parameters, etc. into registers or memory locations at exection time, using 200 words or less.
- how did you manage registers?
- did you use the stack?
- is there a function or API for managing and looking up bindings?
For most functionalities I only used registers $2 and $3, and for example in a nested addition where $3 would get overwritten, I push and pop $3 onto the stack. In functions, $3 would get overwritten by recursive function calls, so I saved it in $16 ($s0), a Saved Register. Finally, for array and pointer access, since it involves a number of additional registers in order to load the correct address, I used some of the unused Temporary Registers $8-$15 since they are not used elsewhere and there would be no risk of overwriting other important registers.
I created a class called Context
to keep track of the offset from the $fp of each variable, which is passed through to all of the print_asm
functions by value, and only returned if it is necessary to pass Context
laterally e.g. all children of a CompoundStatement share Context
. This automatically implements variable shadowing since once the scope pops out of its Context
, the previous Context
is restored. Context
contains a Var
struct which has contains the offset, type, etc. of the variable. The class has an API for managing the bindings, and helper functions to help manage how the stack is used.
200 words
Give two strengths or capabilites of your binding approach, using 50 words or less for each one.
It is extremely simple, I only keep track of offsets from the frame pointer and do not need to worry about any kind of register management since all variables will be stored on the stack and loaded from the stack when needed. Particularly useful for recursive function calls, albeit ineffecient.
50 words
It is easy to add or remove additional features to suit future needs. Additional member data can be added to Var, and new functions built into the API for Context
, without having to change anything else except where those features will be used.
43 words
Give two limitations of your binding approach, using 50 words or less for each one.
Every use of any variable involves loading them from the stack into a register, and then storing them back on the stack. This is probably the slowest and least memory efficient way of managing variable binding in terms of runtime speed and memory usage, but was very easy to implement.
50 words
The strategy of using unused temporary registers when new, more complicated functionality requires more register usage will eventually lead to registers being quite randomly 'reserved' for different uses, and eventually all used up. It would be better to plan out register usage from the beginning.
45 words
What two aspects of your compiler do you think work well (beyond those identified in the AST and binding parts)?
All the I/O can be easily customised. Support for input from filenames is already built in, as well as the ability to change the outwards destination by changing the initialization of Context
to a different ostream pointer. Hence it's just a couple of steps away from a real compiler's functionality.
50 words
The metadata built up whilst creating the tree, such as each node class having a getNodeType() function with its name, and a getDeets() function with more details allows for very useful information when debugging and pretty printing a representation of the AST that has been built.
46 words
What parts of your compiler do you think could be improved?
- This is not looking for things like "It should support more C constructs". What would you need to do or change in order to support those things?
Too much 'brute force' implementation. It would be nice to be more elegant. An example of this is the creation of excess space on the stack for most functions in order to have one-size-fits-all code generation. Could check whether function is a leaf function etc. and adapt stack allocation more.
50 words
Writing more helper functions for the different instructions or repeated actions that are used, to compartmentalise their complexity. For example, an add(ctxt,left,right,dest)
function would help alleviate the issues I was having trying to figure out how to implement the different types of add (signed, unsigned, floating point).
47 words
Which of these features does your compiler support (insert
an x
to check a box):
- Local variables
- Integer arithmetic
- While
- IfElse
- For
- Function calls
- Arrays
- Pointers
- Strings
- Structures
- Floating-point
Note that all features will be tested, regardless of what appears here. This is documentation of what you expect to work, versus what doesn't work.
What aspects of your compiler would you like feedback on. Be specific, as "what could I have done differently" is too general to answer.
How can I balance incurring too much hacky implementation vs. not implementing anything in fear of not being future proof?
20 words or fewer
Is it bad practice to have protected
class data? Is there a workaround to get the advantages without breaking encapsulation?
20 words or fewer