Skip to content
zeotrope edited this page Aug 13, 2010 · 8 revisions

The design of the lexer is based on J’s sequential machine verb which is a general table driven DFA function that accepts a table of DFA transitions and corresponding actions, the mapped input and the string to be tokenized. The output of the DFA is a list of tokens.

Transition Table

The transition table’s columns correspond to the character type, its rows represent the current state. The next state and action are selected by indexing the table with the current state and mapped character type(explained below). The action is then applied and the new state is used again to choose the next required action. If an invalid cell in the table is chosen an error is signaled.

Mapped Input

Since we may not directly index the transition table using the current character, we must first map it to a valid numeric range so that the selection fits the correct columns in the table. This is done by assigning a unique number to each of the character types within the range of the column indices.

Actions

The lexer is limited to 7 actions:

Code Name Action
EO Nothing Do nothing.
EN Word Start Set word start index to current position in string.
EW Emit Word Emit word start and length.
EY Emit Word Error Emit word start and length and signal an error.
EV Emit Vector The first action call outputs the word start, subsequent calls add the length of previous words to the total word length, on the final call to EV the total length of the collected words(vector) from multiple calls to emit vector is output.
EZ Emit Vector Error Same as Emit Vector except an error is signaled after the action.
ES Stop Stops the lexer immediately.

DFA

The DFA table as implemented in the interpreter.

Character Type / States Unknown [CX] Space [CS] Alpha [CA] Letter N [CN] Letter B [CB] Numeric [C9] Dot [CD] Colon [CC] Quote [CQ]
Space [SS] SX, EN SS, EO SA, EN SN, EN SA, EN S9, EN SX, EN SX, EN SQ, EN
Unknown [SX] SX, EW SS, EY SA, EW SN, EW SA, EW S9, EW SX, EO SX, EO SQ, EW
Alpha [SA] SX, EW SS, EY SA, EO SA, EO SA, EO SA, EO SX, EO SX, EO SQ, EW
Start of Comment (N) [SN] SX, EW SS, EY SA, EO SA, EO SM, EO SA, EO SX, EO SX, EO SQ, EW
Inside Comment (NB) [SM] SX, EW SS, EY SA, EO SA, EO SA, EO SA, EO SO, EO SX, EO SQ, EW
End Comment (NB.) [SO] SZ, EO SZ, EO SZ, EO SZ, EO SZ, EO SZ, EO SX, EO SX, EO SZ, EO
Numeric [S9] SX, EV SS, EZ S9, EO S9, EO S9, EO S9, EO S9, EO SX, EO SQ, EV
Quote [SQ] SQ, EO SQ, EO SQ, EO SQ, EO SQ, EO SQ, EO SQ, EO SQ, EO SC, EO
Colon [SC] SX, EW SS, EY SA, EW SN, EW SA, EW S9, EW SX, EW SX, EW SQ, EO
Comment [SZ] SZ, EO SZ, EO SZ, EO SZ, EO SZ, EO SZ, EO SZ, EO SZ, EO SZ, EO
Clone this wiki locally