-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreffmt.html
324 lines (305 loc) · 16.8 KB
/
reffmt.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
<!Doctype HTML>
<!-- This is Part II: Reference Format of the Assembly Language Reference. -->
<!-- Author: Menachem Salomon <[email protected]> -->
<html>
<head>
<title>
Assembly Language Reference - Reference Format
</title>
<style>
.center { text-align: center; }
.caps { font-variant: small-caps; }
.sublist { margin: 1em auto; }
</style>
</head>
<body>
<header class="center">
<h3 class="caps">Menachem Salomon's</h3>
<h2>ASSEMBLY LANGUAGE REFERENCE</h2>
<h4>For the 80x86 Series of Intel Microchips</h4>
<h2>Part II: Explanation of Reference Format</h2>
</header>
<nav>
<h3><a href="index.html">Complete Table of Contents</a></h3>
<ol type="I">
<li><a class="caps" href="instrfmt.html">Instruction Format</a></li>
<li><a class="caps" href="reffmt.html">Explanation of Reference Format</a>
<ol type="A">
<li><a href="#Intro">Introduction</a></li>
<li><a href="#Flags">Flag Information</a></li>
<li><a href="#Opcodes">Explanation of Opcodes</a></li>
<li><a href="#Operands">Explanation of Operands</a></li>
</ol>
</li>
<li><a class="caps" href="instrref.html">Instruction Reference</a></li>
</ol>
</nav>
<h3 name="PartII">PART II: EXPLANATION OF REFERENCE FORMAT:</h3>
<ol type="A"><li value="1">
<h4 id="Intro">Introduction</h4>
</li></ol>
<pre>
A) Each entry in this reference contains several bits of information,
describing a single assembly language instruction. An instruction
may take one of several forms, depending on the location and type of
the operands, each with its own opcode. For each instruction, the
various opcodes and their corresponding forms are given, as well as
information about which processor flags are affected, what the
instruction does, where the operands are (memory, register, etc.),
and a detailed explanation. In addition, the phrase of which the
instruction name is a mnemonic is given, as well as aliases (other
mnemonics for the same instruction). As is normal for Intel, the
instruction mnemonics are given in AT&T assembly format.
This section contains an description of the format used for the
reference entries.
</pre>
<ol type="A"><li value="2">
<h4 id="Flags">Flag Information</h4>
</li></ol>
<pre>
B) Flag Information:
The FLAGS (or EFLAGS) register contains several 1-bit flags that may
be set or cleared based on the results of an instruction. The 8086
to 80286 processors contain the following flags, listed here with
the corresponding 1-letter tag uses in this reference:
O - Overflow Flag - set what a number wraps around or goes beyond a
limit, such as when two positive integers are added and the
result, if taken as signed, is negative.
D - Direction Flag - directs whether string instructions will
increment (clear) or decrement (set) the eSI/eDI registers.
This flag is set and cleared directly with dedicated
instructions, not as a side effect of other instructions (with
the exception of instructions that affect the entire FLAGS
register, such as POPF or IRET).
I - Interrupt Flag - directs whether external hardware interrupts
are recognized. If cleared, external interrupts are ignored
(except for NMI, #2). This flag is not set or cleared as a side
effect of other instructions, but rather with its own dedicated
instructions. This flag is usually set during normal operation,
and is only cleared for sensitive operations.
T - Trap Flag - used to control single-step mode, for use with a
debugger. If set, an interrupt #1 is executed after each
instruction. This flag is can only be set or cleared via an
IRET instuction.
S - Sign Flag - set to the high-order bit of the result of an ALU
instructions. If set, indicates that the result was negative.
Z - Zero Flag - set to indicate that the result of an ALU operation
was zero, cleared otherwise. Also set by some instructions that
test protected-mode permissions (LSL, LAR, VERR/VERW, etc.).
A - Auxiliary Carry Flag - used to support BCD arithmetic. Set if
there was a carry into bit 3 of the result, cleared otherwise.
P - Parity Flag - indicates whether the low-order byte of the result
had an even (set) or odd (clear) number of bits set. Often used
with port I/O.
C - Carry Flag - indicates whether an ALU operation generated a
carry out of the high-order bit. Also used by shifts, bit-tests,
and error checking.
The following symbols are used to indicate how each flag changes as
a result of an instruction. Each instruction explains how and when
it changes the flags. The symbols are:
1 - The instruction always sets this flag (sets to 1).
0 - The instruction always clears this flag (sets to 0).
* - The instruction sets this flag to reflect the result.
? - The instruction leaves this flag in an undefined state.
The 16-bit FLAGS register is laid out as follows. The bits marked
"x" are not used; for their values, see the PUSHF/POPF instructions.
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | x | x | x | O | D | I | T | S | Z | x | A | x | P | x | C |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
The 32-bit EFLAGS register on the 80386 and later processors include
the 16-bit FLAGS register as the lower 16 bits, and extend the width
with another 16 bits. A description of the extended flags is beyond
the scope of this reference.
</pre>
<ol type="A"><li value="3">
<h4 id="Opcodes">Explanation of Opcodes</h4>
</li></ol>
<pre>
C) Explanation of Opcodes:
The opcode section contains the machine code for each version of an
instruction. These are the actual bytecodes understood directly by the
processor, expressed in hexadecimal (though they are not prefixed with 0x).
Some instruction opcodes are more than one byte long; in such cases, the
byte lower in memory is listed first. Most opcodes are followed by one or
more symbols, which generally supply the types (sizes) and/or the locations
of the operands. The symbols used are:
/digit (digit is a number from 0 to 7): the value of the reg field of the
ModR/M byte. The instruction has a single register-or-memory operand,
addressed by the mod and r/m fields. (There may be another, immediate,
operand.) The digit specifies 3 more bits of opcode, an extension to
the instruction's opcode.
/r: The instruction has two operands, specified in the ModR/M byte. They
are a register, encoded in the reg field, and a register/memory operand,
encoded in the combined mod and r/m fields. Exactly how the two
operands are interpreted (e.g. source and destination) is specific to
the instruction.
ib, iw, id: a 1, 2, or 4 byte immediate operand, respectively. The value
to be used is read from the instruction stream, directly following the
instruction opcode (and ModR/M byte, if any). (Note: id, a 4-byte
operand, is only available on the 386 and later processors.)
cb, cw, cd, cp: a 1, 2, 4, or 6 byte immediate operand, where the value is
used to calculate a pointer to executable code, as the target of a CALL
or JUMP instructions. The value may be offset to the current IP
(instruction pointer), or it may be absolute. The absolute forms may
contain a segment address in addition to the segment offset. (On the
386 and later processors, cd may be a 32-bit offset; in protected-mode
only, cp is a 16-bit segment selector and a 32-bit segment offset.)
+rb, +rw, +rd: The instruction has a register operand encoded directly in
the opcode. The register code is added to the base opcode to form the
actual opcode. (Note: +rd is only available on the 386 and later
processors.) The register codes are as follows:
+------+-----+-----+-----+-----+-----+-----+-----+-----+
| code | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+------+-----+-----+-----+-----+-----+-----+-----+-----+
| rb | AL | CL | DL | BL | AH | CH | DH | BH |
| rw | AX | CX | DX | BX | SP | BP | SI | DI |
| rd | EAX | ECX | EDX | EBX | ESP | EBP | ESI | EDI |
+------+-----+-----+-----+-----+-----+-----+-----+-----+
The explanation section of each instruction where this is used includes
the full list of opcodes and their corresponding operands.
</pre>
<ol type="A"><li value="4">
<h4 id="Operands">Explanation of Operands</h4>
</li></ol>
<pre>
D) Explanation of Operands:
In the instruction syntax, several symbols are use to represent operands,
the symbols describing what and where the operands are. In the standard
Intel syntax, the first operand is the DEST (destination) operand, and the
second operand is the SRC (source) operand. In the standard AT&T syntax,
the order of the operands is reversed: the first operand is SRC, the second
is DEST. For example, the Intel "MOV AX, BX" means "Move (copy) to AX the
value of BX" (in pseudo-code, this would be "AX = BX"). In AT&T syntax,
this is expressed as "movw %bx, %ax", meaning "Move (copy) the value of BX
to AX."
In addition, AT&T syntax requires that all register names be prefixed with
a percent sign ("%"), as in above "%bx" and "%ax", and that all constants,
static, or immediate operands be prefixed with an ampersand ("&"). Also,
the instruction mnemonic must have its size (Byte, Word, Long) affixed. In
the Intel syntax, this information is understood from the types or the
operands - for memory references, the data definition directive (DB, DW, or
DD) is used as a guide, but where ambiguous, BYTE PTR, WORD PTR, or DWORD
PTR can be used.
Another difference is how numbers are expressed: while both syntaxes
understand numbers to be decimal (base 10) by default, in the Intel syntax,
hexadecimal numbers are expressed by supplying an "H" or "h" suffix and
binary numbers by a "B" or "b" suffix; the AT&T syntax expects hexadecimal
numbers to be prefixed with "0x" or "0X". (To avoid being mistaken for
labels, number is the Intel syntax must begin with a digit; if the first
digit of a hexadecimal number is a letter, the number must begin with a 0,
such as 0ABh instead of ABh.) The effect of this is that the AT&T syntax
is much simpler to parse, but perhaps less easy to read and write.
In this reference, the Intel syntax is given. The following symbols are
used to indicate the instruction operands:
r8 - an 8-bit (byte) register; one of AL, CL, DL, BL, AH, CH, DH, or BH.
r16 - a 16-bit (word) register; one of AX, CX, DX, BX, SP, BP, SI, or DI.
r32 - a 32-bit (double-word/long) register; one of EAX, ECX, EDX, EBX, ESP,
EBP, ESI, or EDI (only on the 386 and later processors).
imm8 - an immediate 8-bit (byte) value, 0x00 to 0xFF. imm8 is a signed
number in the range {-128 to +127}. This value may be sign-extended to
a larger size (16 or 32 bits) - the upper byte(s) filled with a copy of
the sign-bit (bit 7) - to match the size of the other operand in some
arithmetic operations.
imm16 - an immediate 16-bit (2 bytes, or a word) value, 0x0000 to 0xFFFF.
imm16 is a signed integer in the range {-32,768 to +32,767}.
imm32 - an immediate 32-bit (4 bytes, or a double-word or long) value,
0x0000'0000 to 0xFFFF'FFFF. imm32 is a signed integer in the range
{-2,147,483,648 to +2,147,483,647}. (386 an later processors only.)
r/m8 - an 8-bit (byte) value, addressed by the mod and r/m fields of the
ModR/M byte: either the contents of a byte register (see r8), or a byte
from memory.
r/m16 - a 16-bit (word) value, either in a word register (see r16) or in
two consecutive bytes in memory. The address of the low-order byte is
given by the mod and r/m fields of the ModR/M byte, and the high-order
byte is one byte higher in memory (little-endian).
r/m32 - a 32-bit (double-word/long) value, either in a double-word register
(see r32) or in 4 consecutive bytes in memory, the low-order byte of
which is addressed by the mod and r/m fields of the ModR/M byte.
rel8 - an 8-bit signed immediate value, in the range {-128 to +127} (0x00
to 0xFF), specifying a relative address in the same code segment,
offset from the first byte in the following instruction. The value is
sign-extended and added to the eIP register, to effect a short jump.
rel16 - a 16-bit signed value, in the range {-32,768 to +32,767} (0x0000 to
0xFFFF) specifying a near (same code segment) relative address, offset
from the first byte of the following instruction. The word value is
added to the eIP register, to effect a near jump.
rel32 - a 32-bit signed immediate value, in the range {-2,147,483,648 to
+2,147,483,647} (0x0000'0000 to 0xFFFF'FFFF), specifying a near (same
code segment) relative address, offset from the first byte of the
following instruction. The double-word value is added to the EIP
register, to effect a near jump (on the 386 and later only).
ptr16:16 - an immediate far pointer, consisting of two parts: a 16-bit
segment selector (or base address, in real mode), and a 16-bit offset
within that segment. Read as a double-word in little-endian form, the
low-order word (lowest in memory, earlier in the instruction stream) is
the offset within the segment, and the high-order word (highest in
memory, later in the instruction stream) is the segment selector value.
Generally, the segment selector indicates a code segment, intended for
the CS register.
ptr16:32 - an immediate far pointer (on the 386 and later processors only),
consisting of a 16-bit segment selector or value, generally for the CS
register, and a 32-bit offset within that segment. The offset is lower
in memory (earlier in the instruction stream), low-order byte first.
m16:16 - a far pointer stored in memory (addressed by the ModR/M byte),
consisting of a 16-bit segement selector (stored lower in memory) and a
16-bit offset (or address) within that segment.
m16:32 - a 6-byte far pointer, stored in memory and addressed by the ModR/M
byte, consisting of (in order) a 16-bit segment selector and a 32-bit
offset within that segment.
m8 - an 8-bit (byte) operand of a string instruction.
m16 - a 16-bit (word) operand of a string instruction.
m32 - a 32-bit (long) operand of a string instruction.
Note: the m8, m16, and m32 operands are addressed by the register pairs
DS:eSI (use of the DS segement register may be overridden) for reading,
and ES:eDI (ES cannot be overridden if there are two operands) for
writing.
Sreg - a (16-bit) segement selector register. (See tables 1 and 2 for the
register codes of the segment selector registers.)
m16&16 - a pair of (16-bit) words, stored in memory and addressed by the
ModR/M byte. This operand form is used by the BOUND instruction to
give the lower bound (the word lower in memory) and the upper bound
(the word higher in memory) of an array or memory block.
m32&32 - a pair of (32-bit) double-words, stored in memory and addressed by
the ModR/M byte. This operand form is used by the BOUND instruction on
the 386 and later processors, to give the lower and upper bounds of an
array or memory block.
m16&32 - a 6-byte block consisting of a word followed by a double-word,
stored at the memory location addressed by the ModR/M byte. This
operand form is used by the LIDT/LGDT instructions to provide a word
for the LIMIT field and a double-word for the BASE field of the Global
and Interrupt Descriptor Table Registers.
moffs8 - a byte in memory, addressed by an immediate near pointer (16- or
32-bit), which is an offset into the current data (or overridden)
segment; used by some forms of the MOV instruction.
moffs16 - a word in memory, addressed by an immediate near (16 or 32-bit)
pointer, an offset into the current data (or overridden) segment; used
by some forms of the MOV instruction.
moffs32 - a double-word (long) in memory, addressed by an immediate near
pointer, an offset into the current data (or overridden) segment; used
by some forms of the MOV instruction.
Note: The operand size attribute of the instruction, whether specified by the
Code Segment status or by an override, indicates whether the operand is
16-bits long (r16, imm16, r/m16, rel16, ptr16:16, m16:16, m16, m16&16, and
moffs16) or 32-bits long (r32, imm32, r/m32, rel32, ptr16:32, m16:32, m32,
m32&32, and moffs32). The address-size attribute, whether specified by the
Code Segment status or by an override, indicates whether the near pointer
used by moffs16 or moffs32 is 16 bits or 32 bits (2 bytes or 4 bytes) long.
The operand of the LGDT and LIDT instructions is m16&32 regardless of the
operand-size attribute, although the high-order byte (bits 40 - 47) are not
used if the operand-size is 16 bits. (In effect, it is read as m16&24.)
</pre>
PART III: INSTRUCTION REFERENCE:
(PartIII-instrref.txt)
<footer>
<h4>Return to <a href="index.html">Table of Contents</a></h4>
<ol type="I">
<li><a href="instrfmt.html#PartI">PART I - INSTRUCTION FORMAT</a></li>
<li><a href="reffmt.html#PartII">PART II: EXPLANATION OF REFERENCE FORMAT
</a></li>
<li><a href="instrref.html#PartIII">PART III: INSTRUCTION REFERENCE</a></li>
</ol>
<small>Copyright © 2017 by Menachem A. Salomon</small>
</footer>
</body>
</html>