Age | Commit message (Collapse) | Author |
|
|
|
When EU executes 'wait' instruction, it stalls and sets notification
register state. Host can issue MMIO write to clear notification
register state to allow EU continue on executing again.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
|
|
|
|
|
|
1. Move all GL entrypoint functions and files into src/mesa/main/
This includes the ARB vp/vp, NV vp/fp, ATI fragshader and GLSL bits
that were in src/mesa/shader/
2. Move src/mesa/shader/slang/ to src/mesa/slang/ to reduce the tree depth
3. Rename src/mesa/shader/ to src/mesa/program/ since all the
remaining files are concerned with GPU programs.
4. Misc code refactoring. In particular, I got rid of most of the
GLSL-related ctx->Driver hook functions. None of the drivers used
them.
Conflicts:
src/mesa/drivers/dri/i965/brw_context.c
|
|
|
|
|
|
I broke this with the state streaming changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 31.791 32.287 1.11% 6/6
after:
[ 0] gl firefox-talos-gfx 31.198 31.675 0.96% 6/6
|
|
|
|
We had to fill out all that junk when using the cache, but no more.
|
|
|
|
This makes the binding table code simpler, and is required for gen6,
which requires binding table addresses to be under 64k offset from the
surface state base addr.
No significant change in performance on firefox-talos-gfx.
|
|
Now that the binding table is streamed indirect state, they were
always NULL/0.
|
|
|
|
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
|
|
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
This was bothering me when redoing the binding tables.
|
|
|
|
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
|
|
|
|
It's more likely that we wrap badly in state setup than in the little
primitive packet.
|
|
It just duplicated the default/core Mesa behaviour.
|
|
|
|
It just duplicated the default/core Mesa behaviour.
|
|
|
|
In exchange we end up with an extra memcpy, but that seems better than
calloc/free. Each buffer is 4k maximum, and on the i965-streaming
branch this allocation was showing up as the top entry in
brw_validate_state profiling for cairo-gl.
|
|
The new API makes so much more sense, I'd like to forget how the old
one worked.
|
|
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
This will help in bufmgr debugging and aub dumping.
|
|
We could potentially do this on G45 as well, though the units are
different. On 965, the timestamp is tied to hclk, which would make
supporting it harder.
|
|
|
|
|
|
We should be able to do 16, but are limited by Mesa's static buffer
allocations.
|
|
If you used all 4 color targets we currently support, we would see 0
and end up just writing the first output. Give enough bits that we
can do the maximum of 16.
Fixes piglit fbo-drawbuffers-maxtargets.
|
|
The idea would be that you could have multiple send messages going on
if nothing depended on the previous message's results and you used a
different send message. The problem is that the later send requires
the VUE handle returned by the first send's allocate anyway.
|
|
|
|
|
|
This basically restores the previous state, where a vertex result slot
is set up for the texcoord to be replaced with point coord. Fixes
piglit point-sprite test.
Bug #27625
|
|
|
|
This is trying to follow the spirit of the invariance rules, though
they're not specific on this point. Fixes quad-invariance piglit test
while retaining the 22s -> 18s win on glean blendFunc.
This was a regression in c67d9d84f501f145f841c0b981caff6f4dfd936f.
|
|
GL doesn't actually let you begin an OQ while one is active, so the
extra work was pointless.
|