Age | Commit message (Collapse) | Author |
|
|
|
Fixes glsl-fs-texturecube-2-*
|
|
Combined with the previous pass, this lets other optimization passes
do their work thanks to ir_tree_grafting. Still have regression in
instruction count with INTEL_NEW_FS, but register count is even
better.
|
|
This is a step towards implementing a GLSL IR backend for the 965
fragment shader. Because it has downsides with the current codegen,
it is hidden under the environment variable INTEL_NEW_FS.
This results in an increase in instruction count at the moment (1444
-> 1752 for glsl-fs-raytrace, 345 -> 359 on my demo), because dot
products are turned into a series of multiplies and adds instead of a
custom expansion of MULs and MACs, and by not splitting the variable
types up we don't get tree grafting and thus there are extra moves of
temporary storage. However, register count drops for the non-GLSL
path (64 -> 56 on my demo shader) because the register allocator sees
all the sub-operations.
|
|
|
|
|
|
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
|
|
|
|
This is recommended by the B-Spec. I wasn't able to measure any
difference in ETQW.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
even with vs disabled, still doesn't work.
|
|
|
|
|
|
|
|
|
|
|
|
This adds missing pkg-config lookup for intel and moves the radeon
lookup into a case...esac so it's only looked up when one or more of
the radeon drivers are enabled.
|
|
Driver Makefiles can still add symlink dependencies/rules if needed.
|
|
|
|
We currently weasel out of supporting the timeout parameter, but otherwise
this extension looks ready, and should make the common case happy.
|
|
I was getting tired of doing the dance of INTEL_DEBUG=batch, copying it out,
and running intel-gen4disasm on it.
|
|
|
|
In addition to being HW accelerated, it avoids the incorrect
(black) rendering of the mipmaps that SW was doing in fbo-generatemipmap.
Improves the performance of the mipmap generation and drawing in
fbo-generatemipmap by 30%.
|
|
Also, only create VS surface state if there's a VS constant buffer to be
uploaded, and set the contents of the buffer at the same time as creation.
|
|
|
|
|
|
|
|
intel_swapbuffers.c
|
|
We only allow combined depth+stencil renderbuffers so the complicated code
for splitting and combining separate depth and stencil buffers is no longer
needed.
|
|
|
|
|
|
|
|
|
|
|
|
This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
|
|
This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm_surface_state.c
|
|
|
|
Both had some useful bits for the other.
|
|
|
|
The GEM flags are much more descriptive for what we need. Since this makes
bufmgr_fake rather device-specific, move it to the intel common directory.
We've wanted to do device-specific stuff to it before.
|
|
|
|
These don't appear to have ever been used.
|
|
To do so, merge the remainnig necessary code from the buffers, blit, span, and
screen code to shared, and replace it with those.
|
|
This removes the delayed texture upload optimization from 965, in exchange for
bringing us closer to PBO support. It also disables SGIS_generate_mipmap,
which didn't seem to be working before anyway, according to the lodbias demo.
|
|
The user-space suballocator that was used avoided relocation computations by
using the general and surface state base registers and allocating those types
of buffers out of pools built on top of single buffer objects. It also
avoided calls into the buffer manager for these small state allocations, since
only one buffer object was being used.
However, the buffer allocation cost appears to be low, and with relocation
caching, computing relocations for buffers is essentially free. Additionally,
implementing the suballocator required a don't-fence-subdata flag to disable
waiting on buffer maps so that writing new data didn't block on rendering using
old data, and careful handling when mapping to update old data (which we need
to do for unavoidable relocations with FBOs). More importantly, when the
suballocator filled, it had no replacement algorithm and just threw out all
of the contents and forced them to be recomputed, which is a significant cost.
This is the first step, which just changes the buffer type, but doesn't yet
improve the hash table to not result in full recompute on overflow. Because
the buffers are all allocated out of the general buffer allocator, we can
no longer use the general/surface state bases to avoid relocations, and they
are set to 0 instead.
|