-
Notifications
You must be signed in to change notification settings - Fork 0
About
The choice of the LC-3 ISA Architecture came from the vast availability of docs
and resources in the open source community, together with the rather simplicity
of the overall architecture. Both properties made the choice ideal for the author's
purpose, which was all about getting a deeper understanding of how an actual
virtual machine is actually built on the host system, in all its flavours from a
basic interpreter as the Java Virtual Machine to the most complexe ones as a
QEMU/KVM instance. In the project repository, you can find a .pdf document describing the whole
LC-3 ISA Architecture. It is stored inside the resources
folder.
Here is a sample description of an ADD operation in the LC-3 ISA Architecture, in both Register mode and Immediate mode:
Register mode (Mode bit 0):
15 Dest Src1 Mode Src2 0
|-------------------------------------------|
| 0 0 0 1 | D D D | A A A | 0 | 0 0 | B B B |
|-------------------------------------------|
D D D = 3-bit Destination Register
A A A = 3-bit Source 1 Register
B B B = 3-bit Source 2 Register
Immediate mode (Mode bit 1):
15 Dest Src1 Mode Immediate 0
|-------------------------------------------|
| 0 0 0 1 | D D D | A A A | 1 | I I I I I |
|-------------------------------------------|
D D D = 3-bit Destination Register
A A A = 3-bit Source 1 Register
I I I I I = 5-bit Immediate Value Two's Complement Integer
NOTE: The immediate value must be sign extended
Every instruction has been encoded through an "uint16_t", since it is guaranteed to be a 16-bit long unsigned integer value, regardless of the host machine in which the VM is being run. Similarly, the instruction fields are parsed through bit shifting operators, and storing the result of the shifting operation in a "uint8_t" field, which is the smaller data type for which we can be sure it is going to represent a byte-long unsigned integer value. The host machine takes care to zero-extend the 3 bit-parsed REGISTER, the 4-bit parsed OPCODE or the 5-bit parsed "immediate value two's complement integer" value into the bigger uint8_t type.
The main process of the program is a basic OS, which gives you an user interface with all the necessary information to
use the program efficiently. At the start, some basic system details are given, as the location of the virtualized
"system volume" from which LC3 programs (object files) can be picked up for execution.
The OS shows the user all the relevant information about the system, like for example the number
of processes launched from the system boot as the location of the system volume or the total amount of hard-drive space used by
the virtual machine. The amount of virtual memory granted to the program is about 8.0 GB.
The LC-3 System memory hardware has been emulated through an user defined
struct. This struct is composed of an array of uint16_t type, whose length
is defined through the macro CPU_REG_NUM and is corresponding to the actual
number of registers in a real LC-3 System memory hardware. The registry number
address can be seen as an offset in this array. The RAM memory is also emulated
through an array of uint16_t type, whose length is computed through the macro
MEM_16BIT, set to the maximum size of the LC-3 System memory hardware, corresponding
to 65536 bits or 8,0 GB.
CPU registers have been defined through an enumeration. This enumeration is
defined with constants whose numerical values are equal to the binary addresses
of the registers in a real LC-3 System. Those enum values can be used as offset
in an array of "registers" type, in order to simulate a registry memory access
from an instruction.
The system is implemented through a virtual memory hardware, which is shared among all the programs which are launched. Just one process at the time is granted the access to the system memory, therefore if more programs are launched a waiting queue is instantiated. This mechanism makes use of the POSIX semaphore api, to grant correct access to the critical section.\
The system volume is a folder created by the program itself in the current execution path of the program, and it is labeled as utils
.
LC3 programs written and compiled through the text editor are put inside this folder, allowing the Virtual Machine to index them
and the user to pick them up for execution. In the command line version of the project, programs may be chosen just typing the index associated to the object file, which is shown on screen. Once the program has been chosen, a separate shell is launched which runs in background. The shell shows the program execution, but in the meanwhile the main process (the OS) continue it's execution and the user can interact with it.
The lc3compiler
has been implemented through Flex/Bison
tools.
The file lc3.l
contains the sequence of regular definitions used to create tokens to be handled by the parser
### --- LEX SOURCE --- ###
/* Trap interrupts */
HALT HALT
TRAPINT GETC|IN|OUT|PUTS[P]?
BR BR[N|Z|P]
/* Opcode types */
ADD ADD
AND AND
OPCODE3I (LD|ST)R
OPCODE2R NOT
OPCODE2I ST[I]?|L(EA|D[I]?)
OPCODE1R J(MP|SRR)
OPCODE1I BR|JSR|TRAP
/* Operand types */
REGISTER [rR][0-7]
IMMED ([xX][-]?[0-9a-fA-F]+)|([#]?[-]?[0-9]+)
LABEL [A-Za-z][A-Za-z_0-9]*
STRING \"([^\"]*|(\\\"))*\"
UTSTRING \"[^\n\r]*
/* Program directives */
START \.ORIG
FINISH \.END
INTASSIGN .FILL
STRASSIGN .STRINGZ
/* Operand and white space spec */
SPACE [ \t]
OP_SEP {SPACE}*,{SPACE}*
COMMENT [;][^\n\r]*
EMPTYLINE {SPACE}*{COMMENT}?
ENDLINE {EMPTYLINE}\r?\n\r?
The LALR grammar for the parser is as follows:
### --- LALR GRAMMAR --- ###
0 $accept: program $end
1 program: fstprog sep decl_seq FINISH
2 fstprog: sep begin instr_seq HALT
3 begin: START SPACE IMMED ENDLINE
4 sep: sep ENDLINE
5 | ε
6 instr_seq: instr_seq instr sep
7 | ε
8 decl_seq: decl_seq decl sep
9 | ε
10 instr: op
11 | op3rlog
12 | op3ilog
13 | TRAPINT
14 | LABEL
15 decl: LABEL SPACE INTASSIGN SPACE IMMED
16 | LABEL SPACE STRASSIGN SPACE STRING
17 op3rlog: ADD SPACE REGISTER OP_SEP REGISTER OP_SEP REGISTER
18 | AND SPACE REGISTER OP_SEP REGISTER OP_SEP REGISTER
19 op3ilog: ADD SPACE REGISTER OP_SEP REGISTER OP_SEP IMMED
20 | AND SPACE REGISTER OP_SEP REGISTER OP_SEP IMMED
21 op: OPCODE3I SPACE REGISTER OP_SEP REGISTER OP_SEP IMMED
22 | OPCODE2R SPACE REGISTER OP_SEP REGISTER
23 | OPCODE2I SPACE REGISTER OP_SEP offset
24 | OPCODE1R SPACE REGISTER
25 | OPCODE1I SPACE offset
26 | BR SPACE offset
27 offset: LABEL
28 | IMMED
The associated LALR parser is as follows: