Age | Commit message (Collapse) | Author |
|
This is a set of changes that optimizes the memory use of fragment
operation programs (by using and transmitting only as much memory as is
needed for the fragment ops programs, instead of maximal sizes), as well
as eliminate the dependency on hard-coded maximal program sizes. State
that is not dependent on fragment facing (i.e. that isn't using
two-sided stenciling) will only save and transmit a single
fragment operation program, instead of two identical programs.
- Added the ability to emit a LNOP (No Operation (Load)) instruction.
This is used to pad the generated fragment operations programs to
a multiple of 8 bytes, which is necessary for proper operation of
the dual instruction pipeline, and also required for proper SPU-side
decoding.
- Added the ability to allocate and manage a variant-length
struct cell_command_fragment_ops. This structure now puts the
generated function field at the end, where it can be as large
as necessary.
- On the PPU side, we now combine the generated front-facing and
back-facing code into a single variant-length buffer (and only use one
if the two sets of code are identical) for transmission to the SPU.
- On the SPU side, we pull the correct sizes out of the buffer,
allocate a new code buffer if the one we have isn't large enough,
and save the code to that buffer. The buffer is deallocated when
the SPU exits.
- Commented out the emit_fetch() static function, which was not being used.
|
|
With these changes, the tests/stencil_twoside test now works.
- Eliminate blending from the stencil_twoside test, as it produces an
unneeded dependency on having blending working
- The spe_splat() function will now work if the register being splatted
and the destination register are the same
- Separate fragment code generated for front-facing and back-facing
fragments. Often these are the same; if two-sided stenciling is on,
they can be different. This is easier and faster than generating
code that does both tests and merges the results.
- Fixed a cut/paste bug where if the back Z-pass stencil operation
were different from all the other operations, the back Z-fail
results were incorrect.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Will be used for instructions like SIN/COS/POW/TEX/etc. The PPU needs to
know the address of some functions in the SPU address space. Send that
info to the PPU/main memory rather than patch up shaders on the SPU side.
Not finished/tested yet...
|
|
32-byte boundary
To make sure even/odd instructions hit the right pipes.
|
|
|
|
cmd_state_fragment_ops() was inverted
|
|
- Added two new debug flags (to be used with the CELL_DEBUG environment
variable). The first, "CELL_DEBUG=fragops", activates SPE fragment
ops debug messages. The second, "CELL_DEBUG=fragopfallback", will
eventually be used to disable the use of generated SPE code for
fragment ops in favor of the default fallback reference routine.
(During development, though, the parity of this flag is reversed:
all users will get the reference code *unless* CELL_DEBUG=fragopfallback
is set. This will prevent hiccups in code generation from affecting
the other developers.)
- Formalized debug message usage and macros in spu/spu_main.c.
- Added lots of new code to ppu/cell_gen_fragment.c to extend the
number of supported source RGB factors from 4 to 15, and to
complete the list of supported blend equations.
More coming, to complete the source and destination RGB and alpha
factors, and to complete the rest of the fragment operations...
|
|
TGSI shaders are translated into SPE instructions which are then sent to
the SPEs for execution. Only a few opcodes work, no swizzling yet, no
support for constants/immediates, etc.
|
|
|
|
|
|
|
|
Do code generation for alpha test, z test, stencil, blend, colormask
and framebuffer/tile read/write as a single code block.
Ian's previous blend/z/stencil test code is still there but mostly disabled
and will be removed soon.
|
|
|
|
|
|
Options so far:
"checker" module tile clear color by SPU ID to see where the tiles are
"sync" to do synchronous DMA (only partially implemented)
|
|
Precompute tiles_per_row. Use ushort multiplies in a few places. New comments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The later follows the naming scheme of other limits.
Keep the old definition until all possible usage is updated.
|
|
This also implements code-gen for the float-to-packed color
conversion. It's currently hardcoded for A8R8G8B8, but that can
easily be fixed as soon as other color depths are supported by the
Cell driver.
|
|
I suspect that there was some other bug in the blend code-gen that
made this work-around necessary.
|
|
So far this is only tested when GL_BLEND is disabled.
|
|
Stencil is still broken.
|
|
I'm not sure these are quite correct. The reflect demo doesn't assert
anymore, but it doesn't produce correct results either. SPE-based
vertex shader code needs to be disabled for relfect to run.
|
|
|
|
Alpha test is currently broken because all per-fragment testing occurs
before alpha is calculated.
Stencil test is currently broken because the Z-clear code asserts if
there is a stencil buffer.
|
|
|
|
|
|
|
|
|
|
Doubles are still unsupported.
|
|
Update the Makefiles and includes for the new paths.
Note that there hasn't been no separation of the Makefiles yet, and make is
jumping all over the place. That will be taken care shortly. But for now, make
should work. It was tested with linux and linux-dri. Linux-cell and linux-llvm
might require some minor tweaks.
|
|
This is in a separate commit to ensure renames are properly preserved.
|