Age | Commit message (Collapse) | Author |
|
|
|
This is a set of changes that optimizes the memory use of fragment
operation programs (by using and transmitting only as much memory as is
needed for the fragment ops programs, instead of maximal sizes), as well
as eliminate the dependency on hard-coded maximal program sizes. State
that is not dependent on fragment facing (i.e. that isn't using
two-sided stenciling) will only save and transmit a single
fragment operation program, instead of two identical programs.
- Added the ability to emit a LNOP (No Operation (Load)) instruction.
This is used to pad the generated fragment operations programs to
a multiple of 8 bytes, which is necessary for proper operation of
the dual instruction pipeline, and also required for proper SPU-side
decoding.
- Added the ability to allocate and manage a variant-length
struct cell_command_fragment_ops. This structure now puts the
generated function field at the end, where it can be as large
as necessary.
- On the PPU side, we now combine the generated front-facing and
back-facing code into a single variant-length buffer (and only use one
if the two sets of code are identical) for transmission to the SPU.
- On the SPU side, we pull the correct sizes out of the buffer,
allocate a new code buffer if the one we have isn't large enough,
and save the code to that buffer. The buffer is deallocated when
the SPU exits.
- Commented out the emit_fetch() static function, which was not being used.
|
|
Many stencil tests were failing because of a failure to read the
stencil buffer, due to "twiddling" (or "untwiddling") "an unsupported
texture format". This is fixed for the case of a stencil/Z S824Z format
(which twiddles just like the 32-bit color formats).
tests/stencilwrap.c was failing on the GL_INVERT test, because
the emitted code for "spe_xori" turned out not to be an actual
"xori" instruction, but rather a "stqd" instruction, because
of a typo in the rtasm code. This is now fixed, and
tests/stencil_wrap now works.
|
|
|
|
|
|
immediate field
This type of checking should be expanded to cover more instructions...
|
|
Used for SIN, COS, EXP2, LOG2, POW instructions. TEX next.
Fixed some bugs in MIN, MAX, DP3, DP4, DPH instructions.
In rtasm code:
Special-case spe_lqd(), spe_stqd() functions so they take byte offsets but
low-order 4 bits are shifted out. This makes things consistant with SPU
assembly language conventions.
Added spe_get_registers_used() function.
|
|
This set of code changes are for stencil code generation
support. Both one-sided and two-sided stenciling are supported.
In addition to the raw code generation changes, these changes had
to be made elsewhere in the system:
- Added new "register set" feature to the SPE assembly generation.
A "register set" is a way to allocate multiple registers and free
them all at the same time, delegating register allocation management
to the spe_function unit. It's quite useful in complex register
allocation schemes (like stenciling).
- Added and improved SPE macro calculations.
These are operations between registers and unsigned integer
immediates. In many cases, the calculation can be performed
with a single instruction; the macros will generate the
single instruction if possible, or generate a register load
and register-to-register operation if not. These macro
functions are: spe_load_uint() (which has new ways to
load a value in a single instruction), spe_and_uint(),
spe_xor_uint(), spe_compare_equal_uint(), and spe_compare_greater_uint().
- Added facing to fragment generation. While rendering, the rasterizer
needs to be able to determine front- and back-facing fragments, in order
to correctly apply two-sided stencil. That requires these changes:
- Added front_winding field to the cell_command_render block, so that
the state tracker could communicate to the rasterizer what it
considered to be the front-facing direction.
- Added fragment facing as an input to the fragment function.
- Calculated facing is passed during emit_quad().
|
|
|
|
|
|
|
|
- rtasm_ppc_spe.c, rtasm_ppc_spe.h: added a new macro function
"spe_load_uint" for loading and splatting unsigned integers
in a register; it will use "ila" for values 18 bits or less,
"ilh" for word values that are symmetric across halfwords,
"ilhu" for values that have zeroes in their bottom halfwords,
or "ilhu" followed by "iohl" for general 32-bit values.
Of the 15 color masks of interest, 4 are 18 bits or less,
2 are symmetric across halfwords, 3 are zero in the bottom
halfword, and 6 require two instructions to load.
- cell_gen_fragment.c: added full codegen for logic op and
color mask.
|
|
- Added new "macro" functions spe_float_min() and spe_float_max()
to rtasm_ppc_spe.{ch}. These emit instructions that cause
the minimum or maximum of each element in a vector of floats
to be saved in the destination register.
- Major changes to cell_gen_fragment.c to implement all the blending
modes (except for the mysterious D3D-based PIPE_BLENDFACTOR_SRC1_COLOR,
PIPE_BLENDFACTOR_SRC1_ALPHA, PIPE_BLENDFACTOR_INV_SRC1_COLOR, and
PIPE_BLENDFACTOR_INV_SRC1_ALPHA).
- Some revamping of code in cell_gen_fragment.c: use the new spe_float_min()
and spe_float_max() functions (instead of expanding these calculations
inline via macros); create and use an inline utility function for handling
"optional" register allocation (for the {1,1,1,1} vector, and the
blend color vectors) instead of expanding with macros; use the Float
Multiply and Subtract (fnms) instruction to simplify and optimize many
blending calculations.
|
|
|
|
|
|
spe_splat()
|
|
Fix incorrect opcode for fsmbi.
Added "macro" functions for loading floats/ints, register complement, zero, move.
Added #defines for return address and stack pointer registers.
Added assertions to check that the instruction buffer doesn't overflow.
|
|
|
|
|
|
Move the register allocator to a common location. There is more code
on the way that will make use of this interface.
|
|
Moving files since these are not being used outside gallium.
|