-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathsect13.html
169 lines (137 loc) · 5.77 KB
/
sect13.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
<!DOCTYPE html>
<html>
<head>
<title>The Z-Machine Standards Document</title>
<link rel="stylesheet" type="text/css" href="zspec.css">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
</head>
<body>
<img class="icon" src="images/icon13.gif" alt="">
<h1>13. The dictionary and lexical analysis</h1>
<hr>
<p>13.1 <a href="#13.1">Storage</a> /
13.2 <a href="#13.2">Header</a> /
13.3 <a href="#13.3">Entries (V1 to V3)</a> /
13.4 <a href="#13.4">Entries (later versions)</a> /
13.5 <a href="#13.5">Ordering</a> /
13.6 <a href="#13.6">Lexical analysis</a>
</p>
<hr>
<h2 id="13.1">13.1</h2>
<p>The dictionary table is held in static memory and its byte
address is stored in the word at <strong>$08</strong> in the <a href="sect11.html">header</a>.
</p>
<h2 id="13.2">13.2</h2>
<p>The table begins with a short header:</p>
<pre>
n list of keyboard input codes entry-length number-of-entries
byte ------n bytes----------------- byte 2-byte word
</pre>
<p>The keyboard input codes are "word-separators": typically (and under
Inform mandatorily) these are the ZSCII codes for full stop, comma and
double-quote. Note that a space character (32) should never be a
word-separator. The "entry length" is the length of each word's
entry in the dictionary table. (It must be at least 4 in Versions
1 to 3, and at least 6 in later Versions.)
</p>
<h2 id="13.2.1">13.2.1</h2>
<p>Note that the word-separators table can only contain codes
which are defined in ZSCII for both input and output.
</p>
<h2 id="13.3">3</h2>
<p>In Versions 1 to 3, each word has an entry in the form</p>
<pre>
encoded text of word bytes of data
------- 4 bytes ------ (entry length-4) bytes
</pre>
<p>The interpreter ignores the bytes of data (presumably the game's parser will
use them). The encoded text contains 6 Z-characters (it is always padded
out with Z-character 5's to make up 4 bytes: see <strong>§</strong> 3).
The text may include spaces or other word-separators
(though, if so, the interpreter will never match any text to the
dictionary word in question: surprisingly, this can be useful and is
a trick used in the Inform library).
</p>
<h2 id="13.4">13.4</h2>
<p>In Versions 4 and later, the encoded text has 6 bytes and
always contains 9 Z-characters.
</p>
<h2 id="13.5">13.5</h2>
<p>The word entries follow immediately after the dictionary header
and must be given in numerical order of the encoded text (when the encoded
text is regarded as a 32 or 48-bit binary number with most-significant
byte first). It must not contain two entries with the same encoded text.
</p>
<h2 id="13.6">13.6</h2>
<p>Lexical analysis takes place in two circumstances: on request
of a <a href="sect15.html#tokenise"><strong>tokenise</strong></a> opcode (in which case it can use any dictionary table
it likes, in the format above) and during acceptance of a game command
(in which case the standard dictionary is used).
</p>
<h2 id="13.6.1">13.6.1</h2>
<p>First, the text is broken up into words. Spaces divide up
words and are otherwise ignored. Word separators also divide words,
but each one of them is considered a word in its own right. Thus,
the erratically-spaced text "fred,go fishing" is divided into four
words:
</p>
<pre>
fred / , / go / fishing
</pre>
<h2 id="13.6.2">13.6.2</h2>
<p>Each word is then encoded as a Z-machine string in
dictionary form, and searched for in the dictionary.
</p>
<h2 id="13.6.3">13.6.3</h2>
<p>A "parse table" is then written, recording the number of
words, the length and position of each word and the dictionary
address of each word which is recognised. For the format, see the
<a href="sect15.html#read"><strong>read</strong></a> opcode.
</p>
<hr>
<h2 id="remarks">Remarks</h2>
<p>Usually (under Inform, mandatorily) there are three bytes of data
in the word entries, so that dictionary entry lengths are 7 and 9
in the early and late Z-machine, respectively.
</p>
<p>It is essential that dictionary entries are in numerical order of the
bytes of encrypted text so that interpreters can search the dictionary
efficiently (e.g. by a binary-chop algorithm). Because the letters in
<a href="sect03.html#3.5.3">A0</a> are in alphabetical order, because the bits are ordered in the right
way and because the pad character 5 is less than the values for the
letters, the numerical ordering corresponds to normal English alphabetical
order for ordinary words. (For instance "an" comes before
"anaconda".)
</p>
<p>Both Infocom and Inform-compiled games contain words whose initial
character is not a letter (for instance, "#record").
</p>
<p>Linards Ticmanis reports that some of Infocom's interpreters convert
question marks to spaces before lexical analysis. This is <em>not</em>
Standard behaviour. (Thus, typing "What is a grue?" into 'Zork I'
no longer works: the player must type "What is a grue" instead.)
</p>
<hr>
<p>
<a href="index.html">Contents</a> /
<a href="preface.html">Preface</a> /
<a href="overview.html">Overview</a>
</p>
<p>Section
<a href="sect01.html">1</a> / <a href="sect02.html">2</a> /
<a href="sect03.html">3</a> / <a href="sect04.html">4</a> /
<a href="sect05.html">5</a> / <a href="sect06.html">6</a> /
<a href="sect07.html">7</a> / <a href="sect08.html">8</a> /
<a href="sect09.html">9</a> / <a href="sect10.html">10</a> /
<a href="sect11.html">11</a> / <a href="sect12.html">12</a> /
<a href="sect13.html">13</a> / <a href="sect14.html">14</a> /
<a href="sect15.html">15</a> / <a href="sect16.html">16</a>
</p>
<p>Appendix
<a href="appa.html">A</a> / <a href="appb.html">B</a> /
<a href="appc.html">C</a> / <a href="appd.html">D</a> /
<a href="appe.html">E</a> / <a href="appf.html">F</a>
</p>
<hr>
</body>
</html>