Age | Commit message (Collapse) | Author |
|
We had to fill out all that junk when using the cache, but no more.
|
|
|
|
This makes the binding table code simpler, and is required for gen6,
which requires binding table addresses to be under 64k offset from the
surface state base addr.
No significant change in performance on firefox-talos-gfx.
|
|
Now that the binding table is streamed indirect state, they were
always NULL/0.
|
|
|
|
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
|
|
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
This was bothering me when redoing the binding tables.
|
|
|
|
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
|
|
|
|
It's more likely that we wrap badly in state setup than in the little
primitive packet.
|
|
It just duplicated the default/core Mesa behaviour.
|
|
|
|
It just duplicated the default/core Mesa behaviour.
|
|
|
|
In exchange we end up with an extra memcpy, but that seems better than
calloc/free. Each buffer is 4k maximum, and on the i965-streaming
branch this allocation was showing up as the top entry in
brw_validate_state profiling for cairo-gl.
|
|
The new API makes so much more sense, I'd like to forget how the old
one worked.
|
|
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
This will help in bufmgr debugging and aub dumping.
|
|
We could potentially do this on G45 as well, though the units are
different. On 965, the timestamp is tied to hclk, which would make
supporting it harder.
|
|
|
|
|
|
We should be able to do 16, but are limited by Mesa's static buffer
allocations.
|
|
If you used all 4 color targets we currently support, we would see 0
and end up just writing the first output. Give enough bits that we
can do the maximum of 16.
Fixes piglit fbo-drawbuffers-maxtargets.
|
|
The idea would be that you could have multiple send messages going on
if nothing depended on the previous message's results and you used a
different send message. The problem is that the later send requires
the VUE handle returned by the first send's allocate anyway.
|
|
|
|
|
|
This basically restores the previous state, where a vertex result slot
is set up for the texcoord to be replaced with point coord. Fixes
piglit point-sprite test.
Bug #27625
|
|
|
|
This is trying to follow the spirit of the invariance rules, though
they're not specific on this point. Fixes quad-invariance piglit test
while retaining the 22s -> 18s win on glean blendFunc.
This was a regression in c67d9d84f501f145f841c0b981caff6f4dfd936f.
|
|
GL doesn't actually let you begin an OQ while one is active, so the
extra work was pointless.
|
|
|
|
Bug #24470: glean clipFlat test.
|
|
|
|
|
|
The RGBX version isn't supported as a vertex input type, but since we
force the last channel's value anyway, this should be fine. The only
potential risk I see is in the limiter on VBO reads past the end of
the buffer forcing the whole vertex to 0 when the A channel lands past
the end.
Fixes piglit draw-vertices-half-float.
|
|
|
|
|
|
This is similar to the GL_QUAD_STRIP -> TRIANGLE_STRIP optimization --
the GS usage to split the quads into tris is a huge bottleneck, so a
quick check improves glean blendFunc time massively (width * height of
the window of single-pixel GL_QUADS, many many times). This may also
end up helping with cairo performance, which sometimes ends up drawing
a single quad.
|
|
|
|
Conflicts:
src/mesa/drivers/dri/common/dri_util.h
|
|
|
|
|
|
Rename old IGDNG to Ironlake, and set 'gen' number for
Ironlake as 5, so tracking the features with generation num
instead of special is_ironlake flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
Most of the failure from using uninlined function calls ends up being
just bad rendering, but nested function calls in the VS currently hang
the GPU, so reject them and explain why.
|
|
We were doubling up the offsets for the mipmap levels for CPU access.
Instead of reimplementing i945_miptree_layout_2d with 6 cube images
separated by qpitch, share that function and provide the level offsets
later.
Fixes piglit cubemap and fbo-cubemap.
|
|
This should be functionally equivalent, with the possible exception of
NaN handling.
|
|
We could use this to reduce constant register pressure, but for now it
makes the resulting program assembly much more readable.
|