-
Format 1 instructions
-
Format 2 instructions
-
Format 3 instructions
-
Format 4 instructions
-
Base Directive
-
Comments and *Whitespaces
-
Program Relocation
-
Literals
-
LTORG directive
-
EQU directive
-
ORG directive
-
Expressions
-
Program Blocks
To run the assembler, simply run ./Run.sh
from your terminal after cloning the repository. The assembler will ask for the path to your file. If the program is not erroneous, it will generate the corresponding HEADER RECORD
in the Output directory.
Here is an example of execution of a correct program :
The implementation has a testing framwork and build workflow integrated into it. In case something needs to be modified, we can ensure it does not break other functions by running the tests. So if you find something that should be changed according to you, or perhaps you see an optimisation, do the required changes and add a test for it in the tests
directory. I am currently using gtest
for the testing. It involves comparing the headers as expected and generated by the assembler. Something like :
TEST({TEST_NAME}, CompareFiles) {
string filePath1 = "{GENERATED HEADER FILE PATH}";
string filePath2 = "{EXPECTED HEADER FILE PATH}";
ASSERT_TRUE(compareFiles(filePath1, filePath2));
}
To ensure that your changes pass the CI/CD pipeline for Formatting, run ./format.sh
which will run cmake-format -i
over the files mentioned in the script. In case you add another file, include the file in the script.
In case you get the following error :
Invalid opcode: {Your_Opcode}
It simply means that the Opcode is not included in the opcode.info file. To add it, simply follow the below format :
MNEMONIC | FORMAT | OPCODE
{Name} | {1,2,3/4} | {Code}
To ensure the correct execution of the assembler, Please write statements like
ADDR X , A
ADD TABLE2, X
EQU BUFEND - BUFFER
as
ADDR X,A
ADD TABLE2,X
EQU BUFEND-BUFFER
Notice the spacing between the statements after the Opcode MNEMONIC.
To Explain the design of the assembler, I will take the following code as reference :
COPY START 0
FIRST STL RETADR
CLOOP JSUB RDREC
LDA LENGTH
COMP #0
JEQ ENDFIL
JSUB WRREC
J CLOOP
ENDFIL LDA =C'EOF'
STA BUFFER
LDA #3
STA LENGTH
JSUB WRREC
J @RETADR
USE CDATA
RETADR RESW 1
LENGTH RESW 1
USE CBLKS
BUFFER RESB 4096
BUFEND EQU *
MAXLEN EQU BUFEND-BUFFER
USE
RDREC CLEAR X
CLEAR A
CLEAR S
+LDT #MAXLEN
RLOOP TD INPUT
JEQ RLOOP
RD INPUT
COMPR A,S
JEQ EXIT
STCH BUFFER,X
TIXR T
JLT RLOOP
EXIT STX LENGTH
RSUB
USE CDATA
INPUT BYTE X'F1'
USE
WRREC CLEAR X
LDT LENGTH
WLOOP TD =X'05'
JEQ WLOOP
LDCH BUFFER,X
WD =X'05'
TIXR T
JLT WLOOP
RSUB
USE CDATA
LTORG
END FIRST
The design follows 2 pass assembler format
and strictly follows the specifications specified in the Systems Software book by L.L. Beck. The following variables are used accross the 2 passes of the assembler :
map<string, Opcode> OPTAB;
vector<Instruction> INSTRUCTIONS;
map<string, int> SYMBOL_TABLE; // Name of the symbol -> Address
map<string, int> SYMBOL_BLOCK; // Name of the symbol -> Block Number
map<string, bool>
SYMBOL_FLAG; // Name of the symbol -> Need to be modified or not
vector<pair<string, int>> LIT_INTERMEDIATE; // {Literal, Length}
map<string, int> LITTAB; // Literal -> Address
map<string, int> LIT_BLOCK; // Literal -> Block Number
map<string, pair<int, int>> BLOCK_TABLE; // {Block Name, {Block Number, Length}}
map<int, string> BLOCK_NAMES; // {Block Number, Block Name}
map<int, int> BLOCK_LOCCTR; // {Block Number, LOCCTR}
vector<VariantType> OBJCODE; // UTILITY TO STORE THE OBJECT CODE
vector<pair<string, int>> RECORDS; // UTILITY TO STORE THE RECORDS
vector<string> MRECORDS; // UTILITY TO STORE THE MODIFICATION RECORDS
// BLOCK RELATED VARS
int BLOCK_NUMBER = 0;
int TOTAL_BLOCKS = 0;
string CURR_BLOCK_NAME = "DEFAULT";
// PROGRAM RELATED GLOBAL VARS
string NAME;
int START_ADDRESS;
int LOCCTR;
int PROGRAM_LENGTH;
bool NOBASE = false;
bool ORG = false;
int prevLOCCTR;
string BASE;
Multiple things happen in the pass1
of the assembler:
- The assembler first checks for a
USE
directive in the code. If it is so, it adds a reference for the same inBLOCK_TABLE
and takes account of the length of each block. - It then scans for an LTORG statement, if there is one, it creates a corresponding literal pool at that location.
- Then it checks for the
RSUB
instruction. I have added a separate check for this instruction alone. - Then there is a check for
SYMBOLS
in the program. If it is already present in the SYMBOL TABLE, we move on, else we add it. - After that, there is a check for
expressions
. The logic for expressions was quite complex to figure out since it involved identifying legal expressions and whether a modification record was required for the same or not. I have achieved this by identifying whether or not the generated expression is relative or absolute. Further there is a check for * in which case I simply assign it the current location counter. - Then there is a check for
ORG
statement. It mainly involves manipulation with the Location Counter. - Afte that it checks for the
BASE
directive and presence of WORD directives likeWORD , RESW, RESB, BASE
. It involved a lot of repetetive code and so I create a multi line define statement for it :
#define data_directive() \
{ \
instruction.address = LOCCTR; \
instruction.data = tokens[2]; \
instruction.format = Format::DATA; \
instruction.opcode.code = 0; \
instruction.opcode.format = Format::DATA; \
instruction.new_block = false; \
}
- Then it identifies the format of the instruction and look for literals in the program. This involved a tricky part where I had to ensure that multiple literals denoting the same value be assigned the same address for the generated literal for which i made a utility function.
- Finally, Update the Location Counter according to the number of bytes that the instruction occupies and process the next instruction.
That concludes pass1 after which the SYMBOL_TABLE and LITERAL_TABLE are updated according to the program blocks in which they were present. Then the PROGRAM_LENGTH is calculated, BASE address is taken account of and the required data is sent to pass2 :
- The major task in pass2 was taking account of new blocks and ensuring that a new text record was generated every time it is encountered. I did this by tricking the program into Reserving 0 bytes when a USE statement was encountered. This ensured the correctness of the program and realized the desired behaviour of making a new text record every time.
- Afterwards, I used the
format{X}
struct defined indef.h
to create object codes for every instruction. this uses a lot of utility functions which are defined in theutils.h
file. - In format three corresponding checks are put in place to take account of immediate addressing, simple addressing etc. Further a check for indexed addressing is also put. These are done by simply checking the presence of characters like '#,@,X' in the program.
- Then it checks for the presence of SYMBOLS and tries to find them in the SYMBOL/LITERAL table. The order followed here is PC --> BASE. Corresponding range checks are also placed.
- Format Four was made in a similar manner like Format three execpt that there were fewer checks. One tricky part was implementation of the Modification record. As I had already determined whether the generated symbol was absolute or relative, I simply used the
SYMBOL_FLAG
table to get the corresponding flag and generate the MODIFICATION record accordingly. - Finally for generating the opcode for data type instruction, I check for WORDS and simply convert the string to int. For Byte further checks for recognising X and C characters in the string are placed. The Location Counter is then incremented correspondingly. For the RESB and RESW type instructions, I have pushed a
SKIP
instruction in my objcode to help the assembler RECORD generating function recognise the specific number of bytes are to be skipped.
That concludes the pass2 and leaves only the HEADER record generation part of the assembler:
- This was again a little tricky and I divided it into two parts, one for generating the actual objcode from the objcode struct that I generated in my pass2 and the other for actually generating the string that will be rendered as the HEADER RECORD.
- I have used a
vector of variant
to store the objcode of differnet Formats. - The main part here was to take care of the size of every part of the record and generating the Hex Form from the corresponding int form.
- One particular thing to take care of was the negative displacement for which only the last 3 characters had to be taken.
- While the GETRECORDS function mainly involved taking care of breaking the record after it reaches a size of 0x1E and generating a new record when a SKIP statment it hit.
- After the record is generated, it is saved in the Output file with the name
{PROG_NAME}_generated.txt
That concludes the design of the Assembler.
PRs are welcome. Don't forget to follow the format specified in the How to run section.
Made with ❤️ By : shogo