-
Notifications
You must be signed in to change notification settings - Fork 2
Lexical Analyser
The design of the lexer is based on J’s sequential machine verb which is a general table driven DFA function that accepts a table of DFA transitions and corresponding actions, the mapped input and the string to be tokenized. The output of the DFA is a list of tokens.
The transition table’s columns correspond to the character type, its rows represent the current state. The next state and action are selected by indexing the table with the current state and mapped character type(explained below). The action is then applied and the new state is used again to choose the next required action. If an invalid cell in the table is chosen an error is signaled.
Since we may not directly index the transition table using the current character, we must first map it to a valid numeric range so that the selection fits the correct columns in the table. This is done by assigning a unique number to each of the character types within the range of the column indices.
The lexer is limited to 7 actions:
Code | Name | Action |
---|---|---|
EO | Nothing | Do nothing. |
EN | Word Start | Set word start index to current position in string. |
EW | Emit Word | Emit word start and length. |
EY | Emit Word Error | Emit word start and length and signal an error. |
EV | Emit Vector | The first action call outputs the word start, subsequent calls add the length of previous words to the total word length, on the final call to EV the total length of the collected words(vector) from multiple calls to emit vector is output. |
EZ | Emit Vector Error | Same as Emit Vector except an error is signaled after the action. |
ES | Stop | Stops the lexer immediately. |
The DFA table as implemented in the interpreter.
Character Type / States | Unknown [CX] | Space [CS] | Alpha [CA] | Letter N [CN] | Letter B [CB] | Numeric [C9] | Dot [CD] | Colon [CC] | Quote [CQ] |
---|---|---|---|---|---|---|---|---|---|
Space [SS] | SX, EN | SS, EO | SA, EN | SN, EN | SA, EN | S9, EN | SX, EN | SX, EN | SQ, EN |
Unknown [SX] | SX, EW | SS, EY | SA, EW | SN, EW | SA, EW | S9, EW | SX, EO | SX, EO | SQ, EW |
Alpha [SA] | SX, EW | SS, EY | SA, EO | SA, EO | SA, EO | SA, EO | SX, EO | SX, EO | SQ, EW |
Start of Comment (N) [SN] | SX, EW | SS, EY | SA, EO | SA, EO | SM, EO | SA, EO | SX, EO | SX, EO | SQ, EW |
Inside Comment (NB) [SM] | SX, EW | SS, EY | SA, EO | SA, EO | SA, EO | SA, EO | SO, EO | SX, EO | SQ, EW |
End Comment (NB.) [SO] | SZ, EO | SZ, EO | SZ, EO | SZ, EO | SZ, EO | SZ, EO | SX, EO | SX, EO | SZ, EO |
Numeric [S9] | SX, EV | SS, EZ | S9, EO | S9, EO | S9, EO | S9, EO | S9, EO | SX, EO | SQ, EV |
Quote [SQ] | SQ, EO | SQ, EO | SQ, EO | SQ, EO | SQ, EO | SQ, EO | SQ, EO | SQ, EO | SC, EO |
Colon [SC] | SX, EW | SS, EY | SA, EW | SN, EW | SA, EW | S9, EW | SX, EW | SX, EW | SQ, EO |
Comment [SZ] | SZ, EO | SZ, EO | SZ, EO | SZ, EO | SZ, EO | SZ, EO | SZ, EO | SZ, EO | SZ, EO |