Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
The new instruction order is 10 cycles faster.
|
|
|
|
|
|
|
|
|
|
This is a set of changes that optimizes the memory use of fragment
operation programs (by using and transmitting only as much memory as is
needed for the fragment ops programs, instead of maximal sizes), as well
as eliminate the dependency on hard-coded maximal program sizes. State
that is not dependent on fragment facing (i.e. that isn't using
two-sided stenciling) will only save and transmit a single
fragment operation program, instead of two identical programs.
- Added the ability to emit a LNOP (No Operation (Load)) instruction.
This is used to pad the generated fragment operations programs to
a multiple of 8 bytes, which is necessary for proper operation of
the dual instruction pipeline, and also required for proper SPU-side
decoding.
- Added the ability to allocate and manage a variant-length
struct cell_command_fragment_ops. This structure now puts the
generated function field at the end, where it can be as large
as necessary.
- On the PPU side, we now combine the generated front-facing and
back-facing code into a single variant-length buffer (and only use one
if the two sets of code are identical) for transmission to the SPU.
- On the SPU side, we pull the correct sizes out of the buffer,
allocate a new code buffer if the one we have isn't large enough,
and save the code to that buffer. The buffer is deallocated when
the SPU exits.
- Commented out the emit_fetch() static function, which was not being used.
|
|
Fixed a boneheaded error in the generation of SPU code that calculates
the results of the stencil test. Basically, all the greater than/less than
calculations were exactly inverted: they were coded as though the
given comparison took the stencil value as a left-hand operand and the
reference value as a right-hand operand, but the actual semantics always
put the reference as the left-hand operand and the stencil as the right-hand
operand.
With this fix, tests/dinoshade runs, as do all the other Mesa tests
and samples that use stencil (and that don't use texture formats
unsupported by Cell).
|
|
With these changes, the tests/stencil_twoside test now works.
- Eliminate blending from the stencil_twoside test, as it produces an
unneeded dependency on having blending working
- The spe_splat() function will now work if the register being splatted
and the destination register are the same
- Separate fragment code generated for front-facing and back-facing
fragments. Often these are the same; if two-sided stenciling is on,
they can be different. This is easier and faster than generating
code that does both tests and merges the results.
- Fixed a cut/paste bug where if the back Z-pass stencil operation
were different from all the other operations, the back Z-fail
results were incorrect.
|
|
This small set of changes repairs several different stenciling problems;
now redbook/stencil also runs correctly (and maybe others - I haven't
checked everything yet).
- The number of instructions that had been allocated for fragment ops
used to be 64 (in cell/common.h). With complicated stencil use, we
managed to get up to 93, which caused a segfault before we noticed
we'd overran our memory buffer. It's now been bumped to 128,
which should be enough for even complicated stencil and fragment op
usage.
- The status of cell surfaces never changed beyond the initial
PIPE_SURFACE_STATUS_UNDEFINED. When a user called glClear()
to clear just the Z buffer (but not the stencil buffer), this caused
the check_clear_depth_with_quad() function to return false (because
the surface status was believed to be undefined), and so the device
was instructed to clear the whole buffer (including the stencil buffer),
instead of correctly using a quad to clear just the depth, leaving the
stencil alone.
This has been fixed similarly to the way the i915 driver handles
the surface status: during cell_clear_surface(), the status is
set to PIPE_SURFACE_STATUS_DEFINED. Then a partial buffer clear is
handled with a quad, as expected. Note that we are *not* using
PIPE_SURFACE_STATUS_CLEAR (also similar to the i915); technically,
we should be setting the surface status to CLEAR on a clear, and
to DEFINED when we actually draw something (say on cell_vbuf_draw()),
but it's difficult to figure out exactly which surfaces are affected
by a cell_vbuf_draw(), so for now we're doing the easy thing.
- The fragment ops handling was very clever about only pulling out the
parts of the Z/stencil buffer that it needed for calculations;
but this failed when only part of the buffer was written, because
the part that was never pulled out was inadvertently cleared.
Now all the data from the combined Z/stencil buffer is pulled out,
just so the proper values can be recombined later and written back
to the buffer correctly. As a bonus, the fragment op code generation
is simplified.
|
|
The Cell stencil tests were completely ignoring the stencil value mask.
Now the original code paths are still used if the stencil value mask
is all 1s; but code to use the mask for the stencil value and reference
value comparisons is now emitted if the mask is not all 1s.
|
|
Two definitive bugs in stenciling were fixed.
The first, reversed registers in the generated Select Bytes (selb)
instruction, caused the stenciling INCR and DECR operations to
fail dramatically, putting new values in where old values were
supposed to be and vice versa.
The second caused stencil tiles to not be read and written from
main memory by the SPUs. A per-spu flag, spu.read_depth, was used
to indicate whether the SPU should be reading depth tiles, and was set
only when depth was enabled. A second flag, spu.read_stencil, was
set when stenciling was enabled, but never referenced.
As stenciling and depth are in the same tiles on the Cell, and there
is no corresponding TAG_WRITE_TILE_STENCIL to complement
TAG_WRITE_TILE_COLOR and TAG_WRITE_TILE_Z, I fixed this by
eliminating the unused "spu.read_stencil", renaming "spu.read_depth"
to "spu.read_depth_stencil", and setting it if either stenciling or
depth is enabled.
I also added an optimization to the fragment ops generation code,
that avoids calculating stencil values and/or stencil writemask
when the stencil operations are all KEEP.
|
|
These are the defects found and fixed so far. Several more have
been observed; I'm working on them.
- Fixed an error in spe_load_uint() that caused incorrect values to be
loaded if the given unsigned value had the low 18 bits as 0,
and that caused inefficient code to be emitted if the given value
had the high 14 bits as 0.
- Fixed a problem in stencil code generation where optional registers
weren't tracked correctly.
- Fixed a problem that the stencil function NEVER was acting as ALWAYS.
- Fixed several problems that could occur if stenciling were enabled but
depth was disabled.
- Fixed a problem with two-sided stencil writemask handling that could
cause a stencil writemask to not be applied.
- Fixed several state permutations that were incorrectly flagged as
not requiring stencil values to be calculated.
|
|
|
|
This set of code changes are for stencil code generation
support. Both one-sided and two-sided stenciling are supported.
In addition to the raw code generation changes, these changes had
to be made elsewhere in the system:
- Added new "register set" feature to the SPE assembly generation.
A "register set" is a way to allocate multiple registers and free
them all at the same time, delegating register allocation management
to the spe_function unit. It's quite useful in complex register
allocation schemes (like stenciling).
- Added and improved SPE macro calculations.
These are operations between registers and unsigned integer
immediates. In many cases, the calculation can be performed
with a single instruction; the macros will generate the
single instruction if possible, or generate a register load
and register-to-register operation if not. These macro
functions are: spe_load_uint() (which has new ways to
load a value in a single instruction), spe_and_uint(),
spe_xor_uint(), spe_compare_equal_uint(), and spe_compare_greater_uint().
- Added facing to fragment generation. While rendering, the rasterizer
needs to be able to determine front- and back-facing fragments, in order
to correctly apply two-sided stencil. That requires these changes:
- Added front_winding field to the cell_command_render block, so that
the state tracker could communicate to the rasterizer what it
considered to be the front-facing direction.
- Added fragment facing as an input to the fragment function.
- Calculated facing is passed during emit_quad().
|
|
|
|
|
|
The colormask code generation had assumed that its input packed pixels were
in RGBA format. In fact, the format they're in is dependent on the
pipe color format.
Now the color format is passed in to gen_colormask(), and proper
color format-dependent SPU code is generated.
|
|
Also: improve float/int Z conversion.
Use clgt instead of cgt in depth test since we're comparing unsigned values.
|
|
|
|
|
|
- rtasm_ppc_spe.c, rtasm_ppc_spe.h: added a new macro function
"spe_load_uint" for loading and splatting unsigned integers
in a register; it will use "ila" for values 18 bits or less,
"ilh" for word values that are symmetric across halfwords,
"ilhu" for values that have zeroes in their bottom halfwords,
or "ilhu" followed by "iohl" for general 32-bit values.
Of the 15 color masks of interest, 4 are 18 bits or less,
2 are symmetric across halfwords, 3 are zero in the bottom
halfword, and 6 require two instructions to load.
- cell_gen_fragment.c: added full codegen for logic op and
color mask.
|
|
- Added new "macro" functions spe_float_min() and spe_float_max()
to rtasm_ppc_spe.{ch}. These emit instructions that cause
the minimum or maximum of each element in a vector of floats
to be saved in the destination register.
- Major changes to cell_gen_fragment.c to implement all the blending
modes (except for the mysterious D3D-based PIPE_BLENDFACTOR_SRC1_COLOR,
PIPE_BLENDFACTOR_SRC1_ALPHA, PIPE_BLENDFACTOR_INV_SRC1_COLOR, and
PIPE_BLENDFACTOR_INV_SRC1_ALPHA).
- Some revamping of code in cell_gen_fragment.c: use the new spe_float_min()
and spe_float_max() functions (instead of expanding these calculations
inline via macros); create and use an inline utility function for handling
"optional" register allocation (for the {1,1,1,1} vector, and the
blend color vectors) instead of expanding with macros; use the Float
Multiply and Subtract (fnms) instruction to simplify and optimize many
blending calculations.
|
|
|
|
|
|
- Added two new debug flags (to be used with the CELL_DEBUG environment
variable). The first, "CELL_DEBUG=fragops", activates SPE fragment
ops debug messages. The second, "CELL_DEBUG=fragopfallback", will
eventually be used to disable the use of generated SPE code for
fragment ops in favor of the default fallback reference routine.
(During development, though, the parity of this flag is reversed:
all users will get the reference code *unless* CELL_DEBUG=fragopfallback
is set. This will prevent hiccups in code generation from affecting
the other developers.)
- Formalized debug message usage and macros in spu/spu_main.c.
- Added lots of new code to ppu/cell_gen_fragment.c to extend the
number of supported source RGB factors from 4 to 15, and to
complete the list of supported blend equations.
More coming, to complete the source and destination RGB and alpha
factors, and to complete the rest of the fragment operations...
|
|
|
|
|
|
|
|
|
|
Do code generation for alpha test, z test, stencil, blend, colormask
and framebuffer/tile read/write as a single code block.
Ian's previous blend/z/stencil test code is still there but mostly disabled
and will be removed soon.
|