Skip to content
Samuele95 edited this page Jan 20, 2024 · 17 revisions

The LC-3 ISA

The choice of the LC-3 ISA Architecture came from the vast availability of docs and resources in the open source community, together with the rather simplicity of the overall architecture. Both properties made the choice ideal for the author's purpose, which was all about getting a deeper understanding of how an actual virtual machine is actually built on the host system, in all its flavours from a basic interpreter as the Java Virtual Machine to the most complexe ones as a QEMU/KVM instance. In the project repository, you can find a .pdf document describing the whole LC-3 ISA Architecture. It is stored inside the resources folder.

Here is a sample description of an ADD operation in the LC-3 ISA Architecture, in both Register mode and Immediate mode:

Register mode (Mode bit 0):

15          Dest    Src1   Mode       Src2  0
|-------------------------------------------|
| 0 0 0 1 | D D D | A A A | 0 | 0 0 | B B B |
|-------------------------------------------|

D D D = 3-bit Destination Register
A A A = 3-bit Source 1 Register
B B B = 3-bit Source 2 Register

Immediate mode (Mode bit 1):

15          Dest    Src1  Mode  Immediate   0
|-------------------------------------------|
| 0 0 0 1 | D D D | A A A | 1 | I I I I I   |
|-------------------------------------------|

D D D = 3-bit Destination Register
A A A = 3-bit Source 1 Register
I I I I I = 5-bit Immediate Value Two's Complement Integer

NOTE: The immediate value must be sign extended

Instructions virtualization and implementation

Every instruction has been encoded through an "uint16_t", since it is guaranteed to be a 16-bit long unsigned integer value, regardless of the host machine in which the VM is being run. Similarly, the instruction fields are parsed through bit shifting operators, and storing the result of the shifting operation in a "uint8_t" field, which is the smaller data type for which we can be sure it is going to represent a byte-long unsigned integer value. The host machine takes care to zero-extend the 3 bit-parsed REGISTER, the 4-bit parsed OPCODE or the 5-bit parsed "immediate value two's complement integer" value into the bigger uint8_t type.

The LC3-VM Operating System

The main process of the program is a basic OS, which gives you an user interface with all the necessary information to use the program efficiently. At the start, some basic system details are given, as the location of the virtualized "system volume" from which LC3 programs (object files) can be picked up for execution.
The OS shows the user all the relevant information about the system, like for example the number of processes launched from the system boot as the location of the system volume or the total amount of hard-drive space used by the virtual machine. The amount of virtual memory granted to the program is about 8.0 GB.

Memory virtualization and implementation

The LC-3 System memory hardware has been emulated through an user defined struct. This struct is composed of an array of uint16_t type, whose length is defined through the macro CPU_REG_NUM and is corresponding to the actual number of registers in a real LC-3 System memory hardware. The registry number address can be seen as an offset in this array. The RAM memory is also emulated through an array of uint16_t type, whose length is computed through the macro MEM_16BIT, set to the maximum size of the LC-3 System memory hardware, corresponding to 65536 bits or 8,0 GB.

CPU registers have been defined through an enumeration. This enumeration is defined with constants whose numerical values are equal to the binary addresses of the registers in a real LC-3 System. Those enum values can be used as offset in an array of "registers" type, in order to simulate a registry memory access from an instruction.

Shared memory

The system is implemented through a virtual memory hardware, which is shared among all the programs which are launched. Just one process at the time is granted the access to the system memory, therefore if more programs are launched a waiting queue is instantiated. This mechanism makes use of the POSIX semaphore api, to grant correct access to the critical section.\

File system

The system volume is a folder created by the program itself in the current execution path of the program, and it is labeled as utils. LC3 programs written and compiled through the text editor are put inside this folder, allowing the Virtual Machine to index them and the user to pick them up for execution. In the command line version of the project, programs may be chosen just typing the index associated to the object file, which is shown on screen. Once the program has been chosen, a separate shell is launched which runs in background. The shell shows the program execution, but in the meanwhile the main process (the OS) continue it's execution and the user can interact with it.

The LC3Compiler

The lc3compiler has been implemented through Flex/Bison tools.

Lexical analysis

LC3LEX

The file lc3.l contains the sequence of regular definitions used to create tokens to be handled by the parser

### --- LEX SOURCE --- ###

/* Trap interrupts */
HALT       HALT
TRAPINT    GETC|IN|OUT|PUTS[P]?
BR         BR[N|Z|P]

/* Opcode types */
ADD        ADD
AND        AND
OPCODE3I   (LD|ST)R
OPCODE2R   NOT
OPCODE2I   ST[I]?|L(EA|D[I]?)
OPCODE1R   J(MP|SRR)
OPCODE1I   BR|JSR|TRAP

/* Operand types */
REGISTER   [rR][0-7]
IMMED      ([xX][-]?[0-9a-fA-F]+)|([#]?[-]?[0-9]+)  
LABEL      [A-Za-z][A-Za-z_0-9]*
STRING     \"([^\"]*|(\\\"))*\"
UTSTRING   \"[^\n\r]*


/* Program directives */
START      \.ORIG
FINISH     \.END
INTASSIGN  .FILL
STRASSIGN  .STRINGZ


/* Operand and white space spec */
SPACE      [ \t]
OP_SEP     {SPACE}*,{SPACE}*
COMMENT    [;][^\n\r]*
EMPTYLINE  {SPACE}*{COMMENT}?
ENDLINE    {EMPTYLINE}\r?\n\r?

Syntax analysis

The LALR grammar for the parser is as follows:

### --- LALR GRAMMAR --- ###

0 $accept: program $end

1 program: fstprog sep decl_seq FINISH

2 fstprog: sep begin instr_seq HALT

3 begin: START SPACE IMMED ENDLINE

4 sep: sep ENDLINE
5    | ε

6 instr_seq: instr_seq instr sep
7          | ε

8 decl_seq: decl_seq decl sep
9         | ε

10 instr: op
11      | op3rlog
12      | op3ilog
13      | TRAPINT
14      | LABEL

15 decl: LABEL SPACE INTASSIGN SPACE IMMED
16     | LABEL SPACE STRASSIGN SPACE STRING

17 op3rlog: ADD SPACE REGISTER OP_SEP REGISTER OP_SEP REGISTER
18        | AND SPACE REGISTER OP_SEP REGISTER OP_SEP REGISTER

19 op3ilog: ADD SPACE REGISTER OP_SEP REGISTER OP_SEP IMMED
20        | AND SPACE REGISTER OP_SEP REGISTER OP_SEP IMMED

21 op: OPCODE3I SPACE REGISTER OP_SEP REGISTER OP_SEP IMMED
22   | OPCODE2R SPACE REGISTER OP_SEP REGISTER
23   | OPCODE2I SPACE REGISTER OP_SEP offset
24   | OPCODE1R SPACE REGISTER
25   | OPCODE1I SPACE offset
26   | BR SPACE offset

27 offset: LABEL
28       | IMMED

The associated LALR parser is as follows:

lc3parser