1 files changed, 1474 insertions, 0 deletions
diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
new file mode 100644
index 0000000000..ecab7cb809
--- /dev/null
+++ b/src/gallium/docs/source/tgsi.rst
@@ -0,0 +1,1474 @@
+TGSI
+====
+
+TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
+for describing shaders. Since Gallium is inherently shaderful, shaders are
+an important part of the API. TGSI is the only intermediate representation
+used by all drivers.
+
+Basics
+------
+
+All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
+floating-point four-component vectors. An opcode may have up to one
+destination register, known as *dst*, and between zero and three source
+registers, called *src0* through *src2*, or simply *src* if there is only
+one.
+
+Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
+components as integers. Other instructions permit using registers as
+two-component vectors with double precision; see :ref:`Double Opcodes`.
+
+When an instruction has a scalar result, the result is usually copied into
+each of the components of *dst*. When this happens, the result is said to be
+*replicated* to *dst*. :opcode:`RCP` is one such instruction.
+
+Instruction Set
+---------------
+
+Core ISA
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These opcodes are guaranteed to be available regardless of the driver being
+used.
+
+.. opcode:: ARL - Address Register Load
+
+.. math::
+
+  dst.x = \lfloor src.x\rfloor
+
+  dst.y = \lfloor src.y\rfloor
+
+  dst.z = \lfloor src.z\rfloor
+
+  dst.w = \lfloor src.w\rfloor
+
+
+.. opcode:: MOV - Move
+
+.. math::
+
+  dst.x = src.x
+
+  dst.y = src.y
+
+  dst.z = src.z
+
+  dst.w = src.w
+
+
+.. opcode:: LIT - Light Coefficients
+
+.. math::
+
+  dst.x = 1
+
+  dst.y = max(src.x, 0)
+
+  dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
+
+  dst.w = 1
+
+
+.. opcode:: RCP - Reciprocal
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = \frac{1}{src.x}
+
+
+.. opcode:: RSQ - Reciprocal Square Root
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = \frac{1}{\sqrt{|src.x|}}
+
+
+.. opcode:: EXP - Approximate Exponential Base 2
+
+.. math::
+
+  dst.x = 2^{\lfloor src.x\rfloor}
+
+  dst.y = src.x - \lfloor src.x\rfloor
+
+  dst.z = 2^{src.x}
+
+  dst.w = 1
+
+
+.. opcode:: LOG - Approximate Logarithm Base 2
+
+.. math::
+
+  dst.x = \lfloor\log_2{|src.x|}\rfloor
+
+  dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
+
+  dst.z = \log_2{|src.x|}
+
+  dst.w = 1
+
+
+.. opcode:: MUL - Multiply
+
+.. math::
+
+  dst.x = src0.x \times src1.x
+
+  dst.y = src0.y \times src1.y
+
+  dst.z = src0.z \times src1.z
+
+  dst.w = src0.w \times src1.w
+
+
+.. opcode:: ADD - Add
+
+.. math::
+
+  dst.x = src0.x + src1.x
+
+  dst.y = src0.y + src1.y
+
+  dst.z = src0.z + src1.z
+
+  dst.w = src0.w + src1.w
+
+
+.. opcode:: DP3 - 3-component Dot Product
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
+
+
+.. opcode:: DP4 - 4-component Dot Product
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
+
+
+.. opcode:: DST - Distance Vector
+
+.. math::
+
+  dst.x = 1
+
+  dst.y = src0.y \times src1.y
+
+  dst.z = src0.z
+
+  dst.w = src1.w
+
+
+.. opcode:: MIN - Minimum
+
+.. math::
+
+  dst.x = min(src0.x, src1.x)
+
+  dst.y = min(src0.y, src1.y)
+
+  dst.z = min(src0.z, src1.z)
+
+  dst.w = min(src0.w, src1.w)
+
+
+.. opcode:: MAX - Maximum
+
+.. math::
+
+  dst.x = max(src0.x, src1.x)
+
+  dst.y = max(src0.y, src1.y)
+
+  dst.z = max(src0.z, src1.z)
+
+  dst.w = max(src0.w, src1.w)
+
+
+.. opcode:: SLT - Set On Less Than
+
+.. math::
+
+  dst.x = (src0.x < src1.x) ? 1 : 0
+
+  dst.y = (src0.y < src1.y) ? 1 : 0
+
+  dst.z = (src0.z < src1.z) ? 1 : 0
+
+  dst.w = (src0.w < src1.w) ? 1 : 0
+
+
+.. opcode:: SGE - Set On Greater Equal Than
+
+.. math::
+
+  dst.x = (src0.x >= src1.x) ? 1 : 0
+
+  dst.y = (src0.y >= src1.y) ? 1 : 0
+
+  dst.z = (src0.z >= src1.z) ? 1 : 0
+
+  dst.w = (src0.w >= src1.w) ? 1 : 0
+
+
+.. opcode:: MAD - Multiply And Add
+
+.. math::
+
+  dst.x = src0.x \times src1.x + src2.x
+
+  dst.y = src0.y \times src1.y + src2.y
+
+  dst.z = src0.z \times src1.z + src2.z
+
+  dst.w = src0.w \times src1.w + src2.w
+
+
+.. opcode:: SUB - Subtract
+
+.. math::
+
+  dst.x = src0.x - src1.x
+
+  dst.y = src0.y - src1.y
+
+  dst.z = src0.z - src1.z
+
+  dst.w = src0.w - src1.w
+
+
+.. opcode:: LRP - Linear Interpolate
+
+.. math::
+
+  dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
+
+  dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
+
+  dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
+
+  dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
+
+
+.. opcode:: CND - Condition
+
+.. math::
+
+  dst.x = (src2.x > 0.5) ? src0.x : src1.x
+
+  dst.y = (src2.y > 0.5) ? src0.y : src1.y
+
+  dst.z = (src2.z > 0.5) ? src0.z : src1.z
+
+  dst.w = (src2.w > 0.5) ? src0.w : src1.w
+
+
+.. opcode:: DP2A - 2-component Dot Product And Add
+
+.. math::
+
+  dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
+
+  dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
+
+  dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
+
+  dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
+
+
+.. opcode:: FRC - Fraction
+
+.. math::
+
+  dst.x = src.x - \lfloor src.x\rfloor
+
+  dst.y = src.y - \lfloor src.y\rfloor
+
+  dst.z = src.z - \lfloor src.z\rfloor
+
+  dst.w = src.w - \lfloor src.w\rfloor
+
+
+.. opcode:: CLAMP - Clamp
+
+.. math::
+
+  dst.x = clamp(src0.x, src1.x, src2.x)
+
+  dst.y = clamp(src0.y, src1.y, src2.y)
+
+  dst.z = clamp(src0.z, src1.z, src2.z)
+
+  dst.w = clamp(src0.w, src1.w, src2.w)
+
+
+.. opcode:: FLR - Floor
+
+This is identical to :opcode:`ARL`.
+
+.. math::
+
+  dst.x = \lfloor src.x\rfloor
+
+  dst.y = \lfloor src.y\rfloor
+
+  dst.z = \lfloor src.z\rfloor
+
+  dst.w = \lfloor src.w\rfloor
+
+
+.. opcode:: ROUND - Round
+
+.. math::
+
+  dst.x = round(src.x)
+
+  dst.y = round(src.y)
+
+  dst.z = round(src.z)
+
+  dst.w = round(src.w)
+
+
+.. opcode:: EX2 - Exponential Base 2
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = 2^{src.x}
+
+
+.. opcode:: LG2 - Logarithm Base 2
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = \log_2{src.x}
+
+
+.. opcode:: POW - Power
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = src0.x^{src1.x}
+
+.. opcode:: XPD - Cross Product
+
+.. math::
+
+  dst.x = src0.y \times src1.z - src1.y \times src0.z
+
+  dst.y = src0.z \times src1.x - src1.z \times src0.x
+
+  dst.z = src0.x \times src1.y - src1.x \times src0.y
+
+  dst.w = 1
+
+
+.. opcode:: ABS - Absolute
+
+.. math::
+
+  dst.x = |src.x|
+
+  dst.y = |src.y|
+
+  dst.z = |src.z|
+
+  dst.w = |src.w|
+
+
+.. opcode:: RCC - Reciprocal Clamped
+
+This instruction replicates its result.
+
+XXX cleanup on aisle three
+
+.. math::
+
+  dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
+
+
+.. opcode:: DPH - Homogeneous Dot Product
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
+
+
+.. opcode:: COS - Cosine
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = \cos{src.x}
+
+
+.. opcode:: DDX - Derivative Relative To X
+
+.. math::
+
+  dst.x = partialx(src.x)
+
+  dst.y = partialx(src.y)
+
+  dst.z = partialx(src.z)
+
+  dst.w = partialx(src.w)
+
+
+.. opcode:: DDY - Derivative Relative To Y
+
+.. math::
+
+  dst.x = partialy(src.x)
+
+  dst.y = partialy(src.y)
+
+  dst.z = partialy(src.z)
+
+  dst.w = partialy(src.w)
+
+
+.. opcode:: KILP - Predicated Discard
+
+  discard
+
+
+.. opcode:: PK2H - Pack Two 16-bit Floats
+
+  TBD
+
+
+.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
+
+  TBD
+
+
+.. opcode:: PK4B - Pack Four Signed 8-bit Scalars
+
+  TBD
+
+
+.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
+
+  TBD
+
+
+.. opcode:: RFL - Reflection Vector
+
+.. math::
+
+  dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
+
+  dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
+
+  dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
+
+  dst.w = 1
+
+.. note::
+
+   Considered for removal.
+
+
+.. opcode:: SEQ - Set On Equal
+
+.. math::
+
+  dst.x = (src0.x == src1.x) ? 1 : 0
+
+  dst.y = (src0.y == src1.y) ? 1 : 0
+
+  dst.z = (src0.z == src1.z) ? 1 : 0
+
+  dst.w = (src0.w == src1.w) ? 1 : 0
+
+
+.. opcode:: SFL - Set On False
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = 0
+
+.. note::
+
+   Considered for removal.
+
+
+.. opcode:: SGT - Set On Greater Than
+
+.. math::
+
+  dst.x = (src0.x > src1.x) ? 1 : 0
+
+  dst.y = (src0.y > src1.y) ? 1 : 0
+
+  dst.z = (src0.z > src1.z) ? 1 : 0
+
+  dst.w = (src0.w > src1.w) ? 1 : 0
+
+
+.. opcode:: SIN - Sine
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = \sin{src.x}
+
+
+.. opcode:: SLE - Set On Less Equal Than
+
+.. math::
+
+  dst.x = (src0.x <= src1.x) ? 1 : 0
+
+  dst.y = (src0.y <= src1.y) ? 1 : 0
+
+  dst.z = (src0.z <= src1.z) ? 1 : 0
+
+  dst.w = (src0.w <= src1.w) ? 1 : 0
+
+
+.. opcode:: SNE - Set On Not Equal
+
+.. math::
+
+  dst.x = (src0.x != src1.x) ? 1 : 0
+
+  dst.y = (src0.y != src1.y) ? 1 : 0
+
+  dst.z = (src0.z != src1.z) ? 1 : 0
+
+  dst.w = (src0.w != src1.w) ? 1 : 0
+
+
+.. opcode:: STR - Set On True
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = 1
+
+
+.. opcode:: TEX - Texture Lookup
+
+  TBD
+
+
+.. opcode:: TXD - Texture Lookup with Derivatives
+
+  TBD
+
+
+.. opcode:: TXP - Projective Texture Lookup
+
+  TBD
+
+
+.. opcode:: UP2H - Unpack Two 16-Bit Floats
+
+  TBD
+
+.. note::
+
+   Considered for removal.
+
+.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
+
+  TBD
+
+.. note::
+
+   Considered for removal.
+
+.. opcode:: UP4B - Unpack Four Signed 8-Bit Values
+
+  TBD
+
+.. note::
+
+   Considered for removal.
+
+.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
+
+  TBD
+
+.. note::
+
+   Considered for removal.
+
+.. opcode:: X2D - 2D Coordinate Transformation
+
+.. math::
+
+  dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
+
+  dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
+
+  dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
+
+  dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
+
+.. note::
+
+   Considered for removal.
+
+
+.. opcode:: ARA - Address Register Add
+
+  TBD
+
+.. note::
+
+   Considered for removal.
+
+.. opcode:: ARR - Address Register Load With Round
+
+.. math::
+
+  dst.x = round(src.x)
+
+  dst.y = round(src.y)
+
+  dst.z = round(src.z)
+
+  dst.w = round(src.w)
+
+
+.. opcode:: BRA - Branch
+
+  pc = target
+
+.. note::
+
+   Considered for removal.
+
+.. opcode:: CAL - Subroutine Call
+
+  push(pc)
+  pc = target
+
+
+.. opcode:: RET - Subroutine Call Return
+
+  pc = pop()
+
+  Potential restrictions:  
+  * Only occurs at end of function.
+
+.. opcode:: SSG - Set Sign
+
+.. math::
+
+  dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
+
+  dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
+
+  dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
+
+  dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
+
+
+.. opcode:: CMP - Compare
+
+.. math::
+
+  dst.x = (src0.x < 0) ? src1.x : src2.x
+
+  dst.y = (src0.y < 0) ? src1.y : src2.y
+
+  dst.z = (src0.z < 0) ? src1.z : src2.z
+
+  dst.w = (src0.w < 0) ? src1.w : src2.w
+
+
+.. opcode:: KIL - Conditional Discard
+
+.. math::
+
+  if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
+    discard
+  endif
+
+
+.. opcode:: SCS - Sine Cosine
+
+.. math::
+
+  dst.x = \cos{src.x}
+
+  dst.y = \sin{src.x}
+
+  dst.z = 0
+
+  dst.y = 1
+
+
+.. opcode:: TXB - Texture Lookup With Bias
+
+  TBD
+
+
+.. opcode:: NRM - 3-component Vector Normalise
+
+.. math::
+
+  dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
+
+  dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
+
+  dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
+
+  dst.w = 1
+
+
+.. opcode:: DIV - Divide
+
+.. math::
+
+  dst.x = \frac{src0.x}{src1.x}
+
+  dst.y = \frac{src0.y}{src1.y}
+
+  dst.z = \frac{src0.z}{src1.z}
+
+  dst.w = \frac{src0.w}{src1.w}
+
+
+.. opcode:: DP2 - 2-component Dot Product
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = src0.x \times src1.x + src0.y \times src1.y
+
+
+.. opcode:: TXL - Texture Lookup With LOD
+
+  TBD
+
+
+.. opcode:: BRK - Break
+
+  TBD
+
+
+.. opcode:: IF - If
+
+  TBD
+
+
+.. opcode:: ELSE - Else
+
+  TBD
+
+
+.. opcode:: ENDIF - End If
+
+  TBD
+
+
+.. opcode:: PUSHA - Push Address Register On Stack
+
+  push(src.x)
+  push(src.y)
+  push(src.z)
+  push(src.w)
+
+.. note::
+
+   Considered for cleanup.
+
+.. note::
+
+   Considered for removal.
+
+.. opcode:: POPA - Pop Address Register From Stack
+
+  dst.w = pop()
+  dst.z = pop()
+  dst.y = pop()
+  dst.x = pop()
+
+.. note::
+
+   Considered for cleanup.
+
+.. note::
+
+   Considered for removal.
+
+
+Compute ISA
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+These opcodes are primarily provided for special-use computational shaders.
+Support for these opcodes indicated by a special pipe capability bit (TBD).
+
+XXX so let's discuss it, yeah?
+
+.. opcode:: CEIL - Ceiling
+
+.. math::
+
+  dst.x = \lceil src.x\rceil
+
+  dst.y = \lceil src.y\rceil
+
+  dst.z = \lceil src.z\rceil
+
+  dst.w = \lceil src.w\rceil
+
+
+.. opcode:: I2F - Integer To Float
+
+.. math::
+
+  dst.x = (float) src.x
+
+  dst.y = (float) src.y
+
+  dst.z = (float) src.z
+
+  dst.w = (float) src.w
+
+
+.. opcode:: NOT - Bitwise Not
+
+.. math::
+
+  dst.x = ~src.x
+
+  dst.y = ~src.y
+
+  dst.z = ~src.z
+
+  dst.w = ~src.w
+
+
+.. opcode:: TRUNC - Truncate
+
+.. math::
+
+  dst.x = trunc(src.x)
+
+  dst.y = trunc(src.y)
+
+  dst.z = trunc(src.z)
+
+  dst.w = trunc(src.w)
+
+
+.. opcode:: SHL - Shift Left
+
+.. math::
+
+  dst.x = src0.x << src1.x
+
+  dst.y = src0.y << src1.x
+
+  dst.z = src0.z << src1.x
+
+  dst.w = src0.w << src1.x
+
+
+.. opcode:: SHR - Shift Right
+
+.. math::
+
+  dst.x = src0.x >> src1.x
+
+  dst.y = src0.y >> src1.x
+
+  dst.z = src0.z >> src1.x
+
+  dst.w = src0.w >> src1.x
+
+
+.. opcode:: AND - Bitwise And
+
+.. math::
+
+  dst.x = src0.x & src1.x
+
+  dst.y = src0.y & src1.y
+
+  dst.z = src0.z & src1.z
+
+  dst.w = src0.w & src1.w
+
+
+.. opcode:: OR - Bitwise Or
+
+.. math::
+
+  dst.x = src0.x | src1.x
+
+  dst.y = src0.y | src1.y
+
+  dst.z = src0.z | src1.z
+
+  dst.w = src0.w | src1.w
+
+
+.. opcode:: MOD - Modulus
+
+.. math::
+
+  dst.x = src0.x \bmod src1.x
+
+  dst.y = src0.y \bmod src1.y
+
+  dst.z = src0.z \bmod src1.z
+
+  dst.w = src0.w \bmod src1.w
+
+
+.. opcode:: XOR - Bitwise Xor
+
+.. math::
+
+  dst.x = src0.x \oplus src1.x
+
+  dst.y = src0.y \oplus src1.y
+
+  dst.z = src0.z \oplus src1.z
+
+  dst.w = src0.w \oplus src1.w
+
+
+.. opcode:: SAD - Sum Of Absolute Differences
+
+.. math::
+
+  dst.x = |src0.x - src1.x| + src2.x
+
+  dst.y = |src0.y - src1.y| + src2.y
+
+  dst.z = |src0.z - src1.z| + src2.z
+
+  dst.w = |src0.w - src1.w| + src2.w
+
+
+.. opcode:: TXF - Texel Fetch
+
+  TBD
+
+
+.. opcode:: TXQ - Texture Size Query
+
+  TBD
+
+
+.. opcode:: CONT - Continue
+
+  TBD
+
+.. note::
+
+   Support for CONT is determined by a special capability bit,
+   ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
+
+
+Geometry ISA
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These opcodes are only supported in geometry shaders; they have no meaning
+in any other type of shader.
+
+.. opcode:: EMIT - Emit
+
+  TBD
+
+
+.. opcode:: ENDPRIM - End Primitive
+
+  TBD
+
+
+GLSL ISA
+^^^^^^^^^^
+
+These opcodes are part of :term:`GLSL`'s opcode set. Support for these
+opcodes is determined by a special capability bit, ``GLSL``.
+
+.. opcode:: BGNLOOP - Begin a Loop
+
+  TBD
+
+
+.. opcode:: BGNSUB - Begin Subroutine
+
+  TBD
+
+
+.. opcode:: ENDLOOP - End a Loop
+
+  TBD
+
+
+.. opcode:: ENDSUB - End Subroutine
+
+  TBD
+
+
+.. opcode:: NOP - No Operation
+
+  Do nothing.
+
+
+.. opcode:: NRM4 - 4-component Vector Normalise
+
+This instruction replicates its result.
+
+.. math::
+
+  dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
+
+
+ps_2_x
+^^^^^^^^^^^^
+
+XXX wait what
+
+.. opcode:: CALLNZ - Subroutine Call If Not Zero
+
+  TBD
+
+
+.. opcode:: IFC - If
+
+  TBD
+
+
+.. opcode:: BREAKC - Break Conditional
+
+  TBD
+
+.. _doubleopcodes:
+
+Double ISA
+^^^^^^^^^^^^^^^
+
+The double-precision opcodes reinterpret four-component vectors into
+two-component vectors with doubled precision in each component.
+
+Support for these opcodes is XXX undecided. :T
+
+.. opcode:: DADD - Add
+
+.. math::
+
+  dst.xy = src0.xy + src1.xy
+
+  dst.zw = src0.zw + src1.zw
+
+
+.. opcode:: DDIV - Divide
+
+.. math::
+
+  dst.xy = src0.xy / src1.xy
+
+  dst.zw = src0.zw / src1.zw
+
+.. opcode:: DSEQ - Set on Equal
+
+.. math::
+
+  dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F
+
+  dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F
+
+.. opcode:: DSLT - Set on Less than
+
+.. math::
+
+  dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F
+
+  dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F
+
+.. opcode:: DFRAC - Fraction
+
+.. math::
+
+  dst.xy = src.xy - \lfloor src.xy\rfloor
+
+  dst.zw = src.zw - \lfloor src.zw\rfloor
+
+
+.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
+
+Like the ``frexp()`` routine in many math libraries, this opcode stores the
+exponent of its source to ``dst0``, and the significand to ``dst1``, such that
+:math:`dst1 \times 2^{dst0} = src` .
+
+.. math::
+
+  dst0.xy = exp(src.xy)
+
+  dst1.xy = frac(src.xy)
+
+  dst0.zw = exp(src.zw)
+
+  dst1.zw = frac(src.zw)
+
+.. opcode:: DLDEXP - Multiply Number by Integral Power of 2
+
+This opcode is the inverse of :opcode:`DFRACEXP`.
+
+.. math::
+
+  dst.xy = src0.xy \times 2^{src1.xy}
+
+  dst.zw = src0.zw \times 2^{src1.zw}
+
+.. opcode:: DMIN - Minimum
+
+.. math::
+
+  dst.xy = min(src0.xy, src1.xy)
+
+  dst.zw = min(src0.zw, src1.zw)
+
+.. opcode:: DMAX - Maximum
+
+.. math::
+
+  dst.xy = max(src0.xy, src1.xy)
+
+  dst.zw = max(src0.zw, src1.zw)
+
+.. opcode:: DMUL - Multiply
+
+.. math::
+
+  dst.xy = src0.xy \times src1.xy
+
+  dst.zw = src0.zw \times src1.zw
+
+
+.. opcode:: DMAD - Multiply And Add
+
+.. math::
+
+  dst.xy = src0.xy \times src1.xy + src2.xy
+
+  dst.zw = src0.zw \times src1.zw + src2.zw
+
+
+.. opcode:: DRCP - Reciprocal
+
+.. math::
+
+   dst.xy = \frac{1}{src.xy}
+
+   dst.zw = \frac{1}{src.zw}
+
+.. opcode:: DSQRT - Square Root
+
+.. math::
+
+   dst.xy = \sqrt{src.xy}
+
+   dst.zw = \sqrt{src.zw}
+
+
+Explanation of symbols used
+------------------------------
+
+
+Functions
+^^^^^^^^^^^^^^
+
+
+  :math:`|x|`       Absolute value of `x`.
+
+  :math:`\lceil x \rceil` Ceiling of `x`.
+
+  clamp(x,y,z)      Clamp x between y and z.
+                    (x < y) ? y : (x > z) ? z : x
+
+  :math:`\lfloor x\rfloor` Floor of `x`.
+
+  :math:`\log_2{x}` Logarithm of `x`, base 2.
+
+  max(x,y)          Maximum of x and y.
+                    (x > y) ? x : y
+
+  min(x,y)          Minimum of x and y.
+                    (x < y) ? x : y
+
+  partialx(x)       Derivative of x relative to fragment's X.
+
+  partialy(x)       Derivative of x relative to fragment's Y.
+
+  pop()             Pop from stack.
+
+  :math:`x^y`       `x` to the power `y`.
+
+  push(x)           Push x on stack.
+
+  round(x)          Round x.
+
+  trunc(x)          Truncate x, i.e. drop the fraction bits.
+
+
+Keywords
+^^^^^^^^^^^^^
+
+
+  discard           Discard fragment.
+
+  pc                Program counter.
+
+  target            Label of target instruction.
+
+
+Other tokens
+---------------
+
+
+Declaration
+^^^^^^^^^^^
+
+
+Declares a register that is will be referenced as an operand in Instruction
+tokens.
+
+File field contains register file that is being declared and is one
+of TGSI_FILE.
+
+UsageMask field specifies which of the register components can be accessed
+and is one of TGSI_WRITEMASK.
+
+Interpolate field is only valid for fragment shader INPUT register files.
+It specifes the way input is being interpolated by the rasteriser and is one
+of TGSI_INTERPOLATE.
+
+If Dimension flag is set to 1, a Declaration Dimension token follows.
+
+If Semantic flag is set to 1, a Declaration Semantic token follows.
+
+CylindricalWrap bitfield is only valid for fragment shader INPUT register
+files. It specifies which register components should be subject to cylindrical
+wrapping when interpolating by the rasteriser. If TGSI_CYLINDRICAL_WRAP_X
+is set to 1, the X component should be interpolated according to cylindrical
+wrapping rules.
+
+
+Declaration Semantic
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+  Follows Declaration token if Semantic bit is set.
+
+  Since its purpose is to link a shader with other stages of the pipeline,
+  it is valid to follow only those Declaration tokens that declare a register
+  either in INPUT or OUTPUT file.
+
+  SemanticName field contains the semantic name of the register being declared.
+  There is no default value.
+
+  SemanticIndex is an optional subscript that can be used to distinguish
+  different register declarations with the same semantic name. The default value
+  is 0.
+
+  The meanings of the individual semantic names are explained in the following
+  sections.
+
+TGSI_SEMANTIC_POSITION
+""""""""""""""""""""""
+
+Position, sometimes known as HPOS or WPOS for historical reasons, is the
+location of the vertex in space, in ``(x, y, z, w)`` format. ``x``, ``y``, and ``z``
+are the Cartesian coordinates, and ``w`` is the homogenous coordinate and used
+for the perspective divide, if enabled.
+
+As a vertex shader output, position should be scaled to the viewport. When
+used in fragment shaders, position will be in window coordinates. The convention
+used depends on the FS_COORD_ORIGIN and FS_COORD_PIXEL_CENTER properties.
+
+XXX additionally, is there a way to configure the perspective divide? it's
+accelerated on most chipsets AFAIK...
+
+Position, if not specified, usually defaults to ``(0, 0, 0, 1)``, and can
+be partially specified as ``(x, y, 0, 1)`` or ``(x, y, z, 1)``.
+
+XXX usually? can we solidify that?
+
+TGSI_SEMANTIC_COLOR
+"""""""""""""""""""
+
+Colors are used to, well, color the primitives. Colors are always in
+``(r, g, b, a)`` format.
+
+If alpha is not specified, it defaults to 1.
+
+TGSI_SEMANTIC_BCOLOR
+""""""""""""""""""""
+
+Back-facing colors are only used for back-facing polygons, and are only valid
+in vertex shader outputs. After rasterization, all polygons are front-facing
+and COLOR and BCOLOR end up occupying the same slots in the fragment, so
+all BCOLORs effectively become regular COLORs in the fragment shader.
+
+TGSI_SEMANTIC_FOG
+"""""""""""""""""
+
+The fog coordinate historically has been used to replace the depth coordinate
+for generation of fog in dedicated fog blocks. Gallium, however, does not use
+dedicated fog acceleration, placing it entirely in the fragment shader
+instead.
+
+The fog coordinate should be written in ``(f, 0, 0, 1)`` format. Only the first
+component matters when writing from the vertex shader; the driver will ensure
+that the coordinate is in this format when used as a fragment shader input.
+
+TGSI_SEMANTIC_PSIZE
+"""""""""""""""""""
+
+PSIZE, or point size, is used to specify point sizes per-vertex. It should
+be in ``(s, 0, 0, 1)`` format, where ``s`` is the (possibly clamped) point size.
+Only the first component matters when writing from the vertex shader.
+
+When using this semantic, be sure to set the appropriate state in the
+:ref:`rasterizer` first.
+
+TGSI_SEMANTIC_GENERIC
+"""""""""""""""""""""
+
+Generic semantics are nearly always used for texture coordinate attributes,
+in ``(s, t, r, q)`` format. ``t`` and ``r`` may be unused for certain kinds
+of lookups, and ``q`` is the level-of-detail bias for biased sampling.
+
+These attributes are called "generic" because they may be used for anything
+else, including parameters, texture generation information, or anything that
+can be stored inside a four-component vector.
+
+TGSI_SEMANTIC_NORMAL
+""""""""""""""""""""
+
+Vertex normal; could be used to implement per-pixel lighting for legacy APIs
+that allow mixing fixed-function and programmable stages.
+
+TGSI_SEMANTIC_FACE
+""""""""""""""""""
+
+FACE is the facing bit, to store the facing information for the fragment
+shader. ``(f, 0, 0, 1)`` is the format. The first component will be positive
+when the fragment is front-facing, and negative when the component is
+back-facing.
+
+TGSI_SEMANTIC_EDGEFLAG
+""""""""""""""""""""""
+
+XXX no clue
+
+
+Properties
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+  Properties are general directives that apply to the whole TGSI program.
+
+FS_COORD_ORIGIN
+"""""""""""""""
+
+Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
+The default value is UPPER_LEFT.
+
+If UPPER_LEFT, the position will be (0,0) at the upper left corner and
+increase downward and rightward.
+If LOWER_LEFT, the position will be (0,0) at the lower left corner and
+increase upward and rightward.
+
+OpenGL defaults to LOWER_LEFT, and is configurable with the
+GL_ARB_fragment_coord_conventions extension.
+
+DirectX 9/10 use UPPER_LEFT.
+
+FS_COORD_PIXEL_CENTER
+"""""""""""""""""""""
+
+Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
+The default value is HALF_INTEGER.
+
+If HALF_INTEGER, the fractionary part of the position will be 0.5
+If INTEGER, the fractionary part of the position will be 0.0
+
+Note that this does not affect the set of fragments generated by
+rasterization, which is instead controlled by gl_rasterization_rules in the
+rasterizer.
+
+OpenGL defaults to HALF_INTEGER, and is configurable with the
+GL_ARB_fragment_coord_conventions extension.
+
+DirectX 9 uses INTEGER.
+DirectX 10 uses HALF_INTEGER.
+
+
+
+Texture Sampling and Texture Formats
+------------------------------------
+
+This table shows how texture image components are returned as (x,y,z,w) tuples
+by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
+:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
+well.
+
++--------------------+--------------+--------------------+--------------+
+| Texture Components | Gallium      | OpenGL             | Direct3D 9   |
++====================+==============+====================+==============+
+| R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
++--------------------+--------------+--------------------+--------------+
+| RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
++--------------------+--------------+--------------------+--------------+
+| RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
++--------------------+--------------+--------------------+--------------+
+| RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
++--------------------+--------------+--------------------+--------------+
+| A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
++--------------------+--------------+--------------------+--------------+
+| L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
++--------------------+--------------+--------------------+--------------+
+| LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
++--------------------+--------------+--------------------+--------------+
+| I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
++--------------------+--------------+--------------------+--------------+
+| UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
+|                    |              | [#envmap-bumpmap]_ |              |
++--------------------+--------------+--------------------+--------------+
+| Z                  | XXX TBD      | (z, z, z, 1)       | (0, z, 0, 1) |
+|                    |              | [#depth-tex-mode]_ |              |
++--------------------+--------------+--------------------+--------------+
+
+.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
+.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
+   or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.