-
Notifications
You must be signed in to change notification settings - Fork 20
Vertex Shader ISA
The Tegra vertex shader ISA is a relatively straight-forward implementation of the Shader Model 2 instruction set. The instruction set seems to be a strict subset of the NV30 vertex-shader.
Each instruction contains up to two operations; one 4-component vector ALU operation, and one scalar SFU (special function unit) operation. The result of both units is a 4 component-vector, limited by a write-mask.
There's five operands, one destination register per unit (referred to as rD), and three source operands (referred to as rA, rB and rC). The ALU can use up to all three source operands, while the SFU only operates on rC (if at all).
Vertex processor has 32 local vec4 registers, 16 input vec4 attribute registers, 256 input vec4 constant registers, 16 export vec4 registers, 2 condition registers, 4 address registers. Maximum size of vertex program is 256 VLIW instructions.
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv30/nv30_vertprog.h
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv30/nv30_vertprog.c
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nv30/nvfx_vertprog.c
https://www.opengl.org/registry/specs/NV/vertex_program.txt
https://www.opengl.org/registry/specs/NV/vertex_program2.txt
https://www.opengl.org/registry/specs/NV/vertex_program3.txt
https://www.google.com/patents/US7755634
Bits | Meaning |
---|---|
127 | dummy / unused |
126 | export vector write enable |
125 | condition flags write enable |
124 | export relative addressing enable |
123 | attribute relative addressing enable |
122 | saturate result |
121 | condition register index |
120 | ??? |
119 | rC absolute value |
118 | rB absolute value |
117 | rA absolute value |
111..116 | vector destination register |
110 | condition set |
109 | condition check |
108 | predicate - greater than 0 |
107 | predicate - equal to 0 |
106 | predicate - less than 0 |
98..105 | predicate swizzle |
96..97 | address register select |
91..95 | scalar opcode |
86..90 | vector opcode |
76..85 | constant fetch index |
72..75 | attribute fetch index |
71 | rA negate |
63..70 | rA swizzle |
57..62 | rA register |
55..56 | rA type |
54 | rB negate |
46..53 | rB swizzle |
40..45 | rB register |
38..39 | rB type |
37 | rC negate |
29..36 | rC swizzle / IADDR |
23..28 | rC register |
21..22 | rC type |
17..20 | scalar op write-mask |
13..16 | vector op write-mask |
7..12 | scalar destination register |
2..6 | export write index |
1 | constant relative addressing enable |
0 | end of program |
Value | Mnemonic | Meaning |
---|---|---|
0 | NOP | |
1 | MOV | rD = rA |
2 | MUL | rD = rA * rB |
3 | ADD | rD = rA + rC |
4 | MAD | rD = rA * rB + rC |
5 | DP3 | rD = dot(rA.xyz, rB.xyz) |
6 | DPH | rD = dot(rA, vec4(rB.xyz, 1.0)) |
7 | DP4 | rD = dot(rA, rB) |
8 | DST | rD = vec4(1.0, rA.y * rB.y, rB.z, rA.w) |
9 | MIN | rD = min(rA, rB) |
10 | MAX | rD = max(rA, rB) |
11 | SLT | rD = lessThan(rA, rB) |
12 | SGE | rD = greaterThanEqual(rA, rB) |
13 | ARL | A0 = floor(rA) |
14 | FRC | rD = fract(rA) |
15 | FLR | rD = floor(rA) |
16 | SEQ | rD = equal(rA, rB) |
17 | SFL | rD = bvec4(false, false, false, false) |
18 | SGT | rD = greaterThan(rA, rB) |
19 | SLE | rD = lessThanEqual(rA, rB) |
20 | SNE | rD = notEqual(rA, rB) |
21 | STR | rD = bvec4(true, true, true, true) |
22 | SSG | rD = sign(rA) |
23 | ARR | A0 = round(rA) |
24 | MVA | A0.x = A0.z = A0.x + A0.z A0.y = A0.w = A0.y + A0.w |
25 | TXL | ??? |
26 | PSH | push(A0) |
27 | POP | A0 = pop() |
28..31 | ??? | ??? |
Value | Mnemonic | Meaning |
---|---|---|
0 | NOP | |
1 | MOV | rD = rC |
2 | RCP | rD = 1.0 / rC |
3 | RCC | rD = clamp(1.0 / abs(rC), pow(2.0, -64.0), pow(2.0, 64.0)) * sign(rC) |
4 | RSQ | rD = 1.0 / sqrt(rC) |
5 | EXP | rD = vec4(pow(2.0, floor(rC.x)), fract(rC.x), pow(2.0, rC.x), 1.0) |
6 | LOG | rD = vec4(floor(log2(abs(rC.x))), abs(rC.x) / pow(2.0, floor(log2(rC.x))), log2(abs(rC.x)), 1.0) |
7 | LIT | rD = vec4(1.0, max(rD.x, 0.0), rD.x > 0.0 ? pow(max(rC.y, 0.0), clamp(rC.w, -128.0, 128.0) : 0.0, 1.0) |
8 | BRA | ??? |
9 | BRI | jump(IADDR) |
10 | CLA | ??? |
11 | CLI | push(IP); jump(IADDR) |
12 | RET | jump( pop() ) |
13 | LG2 | rD = log2(rC) |
14 | EX2 | rD = exp2(rC) |
15 | SIN | rD = sin(rC) |
16 | COS | rD = cos(rC) |
17..18 | ??? | ??? |
19 | PUSHA | push(A0) |
20 | POPA | A0 = pop() |
Bits | Meaning |
---|---|
6..7 | select x-component |
4..5 | select y-component |
2..3 | select z-component |
0..1 | select w-component |
Value | Meaning |
---|---|
0 | source x-component |
1 | source y-component |
2 | source z-component |
3 | source w-component |
Bits | Meaning |
---|---|
3 | write x-component |
2 | write y-component |
1 | write z-component |
0 | write w-component |
Value | Meaning |
---|---|
0 | invalid/attribute |
1 | temporary |
2 | attribute |
3 | constant |
There are 2 condition registers, bit "condition register index" selects register to use. Condition register state is stored per .xyzw component, it is set to "equal to 0" on start of vertex program execution, it is altered accordingly to the rD.xyzw of the executed instruction (per-component) to one of the following states:
- rD.c less than 0.0
- rD.c equal to 0.0
- rD.c greater than 0.0
To update the content of the condition register, bits "condition set" and "condition flags write enable" must be set and resultant register component must be enabled in the op write-mask. Result of a vector operation takes precedence.
To execute instruction conditionally: "condition check" bit needs to be enabled combined with the "predicate - *" bits. The resultant register component will be updated if corresponding condition register component state, selected by predicate swizzle, satisfies the tested predicate.
As the result of a predicate vector instruction, corresponding components of the destination register are set to 1.0f (true) or 0.0f (false).
If vector opcode isn't NOP and rD is same as scalar's, then vector result takes precedence.
First component of swizzled source register rC determines actual source for scalar operation and result of that operation is repeated to .xyzw of destination register rD, taking write mask into account.
Scalar's MOV acts as vector's MOV, i.e. it fetches and writes all .xyzw components.
Program execution aborts if:
- rD, rA, rB or rC is set to an invalid value, even if it's not used by a particular instruction. For rA, rB and rC the invalid range is 32-63, for rD it is 32-62.
- Export register number is invalid value 16-30.
- On stack overflow/underflow.
In order to use result of scalar or vector operation further in shader pipeline, it needs to be stored in the export register.
To write to the export:
- "export write index" bitfield selected to the required export valid register number (0-15, 31).
- To write the result of the vector instruction, bit "export vector write enable" needs be set and rD must be either a valid destination register 0-31 or a dummy 63, which is used when only write out is desired without clobbering some of the local registers.
- To write the result of the scalar instruction, bit "export vector write enable" needs be unset.
When export register number is 31, export is disabled.
The respective components of the resultant vector of the executed instruction, enabled by the vector/scalar write-mask, will be written to the export register, so consecutively executed instructions may alter only required export register components.
To use relative addressing:
- Bit "export relative addressing enable" needs to be set.
- Bitfield "address register select" selected to the required address register.
export write index = A0.c + export write index
The vertex export index 0 seems to be dedicated solely to the gl_Position.
There are 4 relative base address registers (A0.xyzw). The ARL (address register load, rA floored) and ARR (address register load, rA rounded) vector operations are altering content of the address registers, so that each component of source register rA.xyzw represents the corresponding address register. The ARA (address register addition) adds 2 address register components together, so that A0 = (A0.x + A0.z, A0.y + A0.w, A0.x + A0.z, A0.y + A0.w).
Destination vector register write mask enables write to the address register component, the actual destination vector register isn't getting affected (like nv30). The address register value can be negative.
Note on ARR/ARL/ARA instructions: the destination vector register (rD) should be even value, otherwise address register isn't updated.
Note on a bit 120: when it is set, the fetched address register is overridden as A0.xyzw = (0.0f, 0.0f, 0.0f, 0.0f).
Address register select | Component |
---|---|
0 | A0.x |
1 | A0.y |
2 | A0.z |
3 | A0.w |
Address register A0.xyzw can be pushed/popped to the stack with a PUSHA/POPA scalar and vector operations. Simultaneous and identical scalar-vector push/pop operation within one instruction results into only one push/pop actually being performed, while simultaneous and non-identical into the no-op.
To multiplex source register rA/rB/rC to constant, its type needs to be set to "constant" and bitfield "constant fetch index" (in range of 0..1023) pointed to the required constant.
To use relative addressing:
- Bit "constant relative addressing enable" needs to be set.
- Bitfield "address register select" selected to the required address register.
Since the range of "constant fetch index" is 0..1023, the valid address register range is -1023..1023.
When constant index is out of range, the fetched constant value is assigned to vec4(0.0f, 0.0f, 0.0f, 0.0f) if fetched constant index > 1023 and to constant[1] if fetched constant index < 0.
fetched constant index = A0.c + constant fetch index
To multiplex source register rA/rB/rC to attribute, its type needs to be set to "attribute" and bitfield "attribute fetch index" pointed to the required vertex attribute.
To use relative addressing:
- Bit "attribute relative addressing enable" needs to be set.
- Bitfield "address register select" selected to the required address register.
fetched attribute index = A0.c + attribute fetch index
Scalars BRA operation is used to jump to an arbitrary instruction. Following nouveau terminology, an argument of the BRA operation is the destination instruction ID (IADDR), it is embedded into the rC swizzle bitfield.
Branching is performed only if the instruction predicate test is positive. So use of "predicate - *" bits is mandatory for the branching instruction.
Scalar instructions CAL and RET are used for the function calling and returning. Function call is essentially a branching operation with an instruction pointer being pushed to the stack. Return from the function is a branching to the address popped from the stack. Requirement of the predicate use is the same as for branching operation for both of CAL and RET operations, CAL's IADDR is embedded to the rC swizzle as well.
Stack is shared by the PUSHA/POPA and CAL/RET operations. Maximum depth of the stack is 8.
If bit "saturate result" is set, both vector and scalar operation results are clamped to [0.0f - 1.0f].