Age | Commit message (Collapse) | Author |
|
The LOOP/ENDLOOP pair is renamed to BGNFOR/ENDFOR as its behaviour
is similar to a C language for-loop.
The BGNLOOP2/ENDLOOP2 pair is renamed to BGNLOOP/ENDLOOP as now
there is no name collision.
|
|
Various opcodes which can be implemented trivially with other TGSI opcodes,
such as matrix multiplication and negation. These were not used by any
state tracker or implemented by any of the drivers.
|
|
|
|
Remove the need to have a pointer in this struct by just including
the immediate data inline. Having a pointer in the struct introduces
complications like needing to alloc/free the data pointed to, uncertainty
about who owns the data, etc. There doesn't seem to be a need for it,
and it is unlikely to make much difference plus or minus to performance.
Added some asserts as we now will trip up on immediates with more
than four elements. There were actually already quite a few such asserts,
but the >4 case could be used in the future to specify indexable immediate
ranges, such as lookup tables.
|
|
Conflicts:
Makefile
src/gallium/drivers/softpipe/sp_screen.c
src/mesa/main/version.h
|
|
|
|
In spu_tri.c:setup_sort_vertices() triangles are culled after the
vertices are sorted. This patch moves the check a little earlier
and performs the actual check a little faster through intrinsics and
a little trickery.
Reduced code size and less work is done before a triangle is deemed
OK to skip.
|
|
It was taking approximately 50 cycles to extract the vertex indices,
calculate the vertex_header pointers and call tri_draw() for each
three vertices - .
Unrolled, it takes less than 100 cycles to extract, unpack,
calculate pointers and call tri_draw() eight times. It does have a
nasty jump-tabled switch. I'm sure that there's a better way...
Code size of spu_render.o gets larger due to the extra constants and
work in the inner loop, there are extra stack saves and loads
because there are more registers in use, and an assert. spu_tri.o
gets a little smaller.
|
|
Also implement context member functions to optimize away those
flushes whenever possible.
Signed-off-by: Thomas Hellstrom <thellstrom-at-vmware-dot-com>
|
|
|
|
|
|
I should have gotten most uses and implementation
correctly fixed, but things might break.
Feel free to blame me.
|
|
|
|
The core reference counting code is centralized in p_refcnt.h.
This has some consequences related to struct pipe_buffer:
* The screen member of struct pipe_buffer must be initialized, or
pipe_buffer_reference() will crash trying to destroy a buffer with reference
count 0. u_simple_screen takes care of this, but I may have missed some of
the drivers not using it.
* Except for rare exceptions deep in winsys code, buffers must always be
allocated via pipe_buffer_create() or via screen->*buffer_create() rather
than via winsys->*buffer_create().
|
|
Updated to use the new pipe_transfer functions, etc.
Texturing is working again. Though there's some bugs in mipmap texturing
but I believe those predate the pipe_transfer changes.
|
|
|
|
Start adding some new pipe_transfer code.
Texturing is totally broken at this point but non-texture programs
seem to run OK.
|
|
Update framebuffer color/z/stencil mapping/unmapping.
|
|
|
|
|
|
|
|
|
|
The debug functions depend on several util function for os abstractions, and
these depend on debug functions, so a seperate module is not possible.
|
|
Suggested by Jonathan Adamczewski. There may be more places to do this...
|
|
|
|
|
|
|
|
|
|
The Cell texture code really needs a thorough inspection and clean-up someday...
|
|
|
|
|
|
|
|
|
|
move it to pipe/internal/p_winsys_screen.h and start converting
the state trackers to the screen usage
|
|
allows the driver to overwrite buffer allocation, first step on the way
to making winsys interface internal to the drivers. state trackers and
the code above it will go through the screen
|
|
|
|
|
|
|
|
Without the f, the constant is treated as a double, resulting in
slower arithmetic and libgcc conversion calls each time CEILF()
is used.
|
|
|
|
Replace cell_batch{align,alloc)*() with cell_batch_alloc16(), allocating
multiples of 16 bytes that are 16 byte aligned.
Opcodes are stored in preferred slot of SPU machine word.
Various structures are explicitly padded to 16 byte multiples.
Added STATIC_ASSERT().
|
|
|
|
|
|
The new instruction order is 10 cycles faster.
|
|
|
|
|
|
|
|
|
|
|
|
|