summaryrefslogtreecommitdiff
path: root/src/gallium/drivers/cell/spu/spu_tri.c
AgeCommit message (Collapse)Author
2010-05-21gallium: remnants of old ccw stateKeith Whitwell
2010-02-19Replace the _mesa_*printf() wrappers with the plain libc versionsKristian Høgsberg
2009-05-21cell: perform triangle cull a little earlierJonathan Adamczewski
In spu_tri.c:setup_sort_vertices() triangles are culled after the vertices are sorted. This patch moves the check a little earlier and performs the actual check a little faster through intrinsics and a little trickery. Reduced code size and less work is done before a triangle is deemed OK to skip.
2009-05-21cell: unroll inner loop of spu_render.c:cmd_render()Jonathan Adamczewski
It was taking approximately 50 cycles to extract the vertex indices, calculate the vertex_header pointers and call tri_draw() for each three vertices - . Unrolled, it takes less than 100 cycles to extract, unpack, calculate pointers and call tri_draw() eight times. It does have a nasty jump-tabled switch. I'm sure that there's a better way... Code size of spu_render.o gets larger due to the extra constants and work in the inner loop, there are extra stack saves and loads because there are more registers in use, and an assert. spu_tri.o gets a little smaller.
2009-02-16cell: use some SPU intrinsics to get slightly better code in eval_inputs()Brian Paul
Suggested by Jonathan Adamczewski. There may be more places to do this...
2009-02-15cell: new/tighter code for computing fragment program inputsBrian Paul
2009-02-15cell: combine eval_z(), eval_w() functionsBrian Paul
2009-01-14cell: Specify constant as float for CEILF().Jonathan Adamczewski
Without the f, the constant is treated as a double, resulting in slower arithmetic and libgcc conversion calls each time CEILF() is used.
2009-01-05cell: SIMDize sorting in setup_sort_vertices()Jonathan Adamczewski
Put setup.v{min,mid,max,provoke} into a union with qword vertex_headers. Rewrite vertex sorting to more efficiently handle the packed data items. Reduces spu_tri.o by ~128 bytes.
2009-01-05cell: SIMDize some subtractionsJonathan Adamczewski
Put edge.{dx,dy} into a union with a vector and perform subtractions in setup_sort_vertices() on vectors. Reduces spu_tri.o by ~300 bytes.
2009-01-04cell: improvements to spu_tri.cJonathan Adamczewski
Replace int setup.span{left,right}[2] with vec_uint4 setup.span.quad SIMDize calculate_mask() and inline into into flush_spans() Set setup.span.quad members using spu_shuffle() or spu_sel(). Reduces spu_tri.o by ~116 bytes.
2008-11-11CELL: two-sided stencil fixesRobert Ellison
With these changes, the tests/stencil_twoside test now works. - Eliminate blending from the stencil_twoside test, as it produces an unneeded dependency on having blending working - The spe_splat() function will now work if the register being splatted and the destination register are the same - Separate fragment code generated for front-facing and back-facing fragments. Often these are the same; if two-sided stenciling is on, they can be different. This is easier and faster than generating code that does both tests and merges the results. - Fixed a cut/paste bug where if the back Z-pass stencil operation were different from all the other operations, the back Z-fail results were incorrect.
2008-10-30CELL: stencil bug fixesRobert Ellison
Two definitive bugs in stenciling were fixed. The first, reversed registers in the generated Select Bytes (selb) instruction, caused the stenciling INCR and DECR operations to fail dramatically, putting new values in where old values were supposed to be and vice versa. The second caused stencil tiles to not be read and written from main memory by the SPUs. A per-spu flag, spu.read_depth, was used to indicate whether the SPU should be reading depth tiles, and was set only when depth was enabled. A second flag, spu.read_stencil, was set when stenciling was enabled, but never referenced. As stenciling and depth are in the same tiles on the Cell, and there is no corresponding TAG_WRITE_TILE_STENCIL to complement TAG_WRITE_TILE_COLOR and TAG_WRITE_TILE_Z, I fixed this by eliminating the unused "spu.read_stencil", renaming "spu.read_depth" to "spu.read_depth_stencil", and setting it if either stenciling or depth is enabled. I also added an optimization to the fragment ops generation code, that avoids calculating stencil values and/or stencil writemask when the stencil operations are all KEEP.
2008-10-16cell: implement KIL instructionBrian Paul
2008-10-15cell: get rid of last usage of float4 union/typedefBrian Paul
Results in slightly tighter code.
2008-10-15cell: simplify triangle front/back face determinationBrian Paul
2008-10-15cell: send rasterizer state to SPUs in proper way, remove front_winding hackBrian Paul
2008-10-15cell: updated vertex dump/debug codeBrian Paul
2008-10-13cell: more clean-up in spu_tri.cBrian Paul
2008-10-13cell: remove dead code, clean-up, reformattingBrian Paul
2008-10-13cell: finish-up perspective-corrected interpolationBrian Paul
2008-10-13cell: remove old texture codeBrian Paul
2008-10-10cell: updates in response to draw's struct vertex_info changesBrian Paul
2008-10-09cell: implement basic TXP instruction in fragment shadersBrian Paul
Lots of restrictions for now (one 2D texture, no mipmaps, etc.) for now but basic texture demos work. TEX, TXD, TXP do the same thing for the time being.
2008-10-03CELL: changes to generate SPU code for stencilingRobert Ellison
This set of code changes are for stencil code generation support. Both one-sided and two-sided stenciling are supported. In addition to the raw code generation changes, these changes had to be made elsewhere in the system: - Added new "register set" feature to the SPE assembly generation. A "register set" is a way to allocate multiple registers and free them all at the same time, delegating register allocation management to the spe_function unit. It's quite useful in complex register allocation schemes (like stenciling). - Added and improved SPE macro calculations. These are operations between registers and unsigned integer immediates. In many cases, the calculation can be performed with a single instruction; the macros will generate the single instruction if possible, or generate a register load and register-to-register operation if not. These macro functions are: spe_load_uint() (which has new ways to load a value in a single instruction), spe_and_uint(), spe_xor_uint(), spe_compare_equal_uint(), and spe_compare_greater_uint(). - Added facing to fragment generation. While rendering, the rasterizer needs to be able to determine front- and back-facing fragments, in order to correctly apply two-sided stencil. That requires these changes: - Added front_winding field to the cell_command_render block, so that the state tracker could communicate to the rasterizer what it considered to be the front-facing direction. - Added fragment facing as an input to the fragment function. - Calculated facing is passed during emit_quad().
2008-09-12cell: evaluate multiple fragment inputsBrian Paul
2008-09-12cell: setup fragment program inputs in SOA formatBrian Paul
Also remove old code, etc.
2008-09-11cell: initial support for fragment shader code generation.Brian Paul
TGSI shaders are translated into SPE instructions which are then sent to the SPEs for execution. Only a few opcodes work, no swizzling yet, no support for constants/immediates, etc.
2008-09-11cell: asst. clean-upBrian Paul
2008-09-11cell: remove old per-fragment code, replace with all new codeBrian Paul
2008-09-11cell: checkpoint commit of new per-fragment processingBrian Paul
Do code generation for alpha test, z test, stencil, blend, colormask and framebuffer/tile read/write as a single code block. Ian's previous blend/z/stencil test code is still there but mostly disabled and will be removed soon.
2008-09-11cell: commentsBrian Paul
2008-08-25cell: asst fixes to get driver building/running again.Brian
Note that SPU vertex transformation is disabled at this time.
2008-08-24gallium: refactor/replace p_util.h with util/u_memory.h and util/u_math.hBrian Paul
Also, rename p_tile.[ch] to u_tile.[ch]
2008-04-01cell: more multi-texture fixes (mostly working now)Brian
2008-04-01cell: checkpoint: more multi-texture workBrian
2008-03-31cell: more work for multi-texture supportBrian
2008-03-31cell: initial work to support multi-textureBrian
2008-03-26cell: Implement code-gen for logic opIan Romanick
This also implements code-gen for the float-to-packed color conversion. It's currently hardcoded for A8R8G8B8, but that can easily be fixed as soon as other color depths are supported by the Cell driver.
2008-03-21cell: Change code-gen for CONST_COLOR blend factorIan Romanick
Previously the constant color blend factor was compiled into the generated code. This meant that the code had to be regenerated each time the constant color was changed. This doesn't fit with the model used in Gallium. As-is, the code could be better. The constant color is loaded for every quad processed, even if it is not used. Also, if a lot of (1-x) blend factors are used, 1.0 will be loaded and reloaded into registers many times.
2008-03-20cell: Fix bus error when there is no depth bufferIan Romanick
2008-03-20cell: Use code-gen for alpha blendIan Romanick
So far this is only tested when GL_BLEND is disabled.
2008-03-17cell: Initial code-gen for alpha / stencil / depth testingIan Romanick
Alpha test is currently broken because all per-fragment testing occurs before alpha is calculated. Stencil test is currently broken because the Z-clear code asserts if there is a stencil buffer.
2008-02-15Code reorganization: move files into their places.José Fonseca
This is in a separate commit to ensure renames are properly preserved.