Age | Commit message (Collapse) | Author |
|
Clarifies program assembly, and with a little tweak to always use
constant_map, we could cut down on constant buffer payload.
|
|
This should be more useful for developers and for bug triaging than
just generating wrong code.
|
|
Fixes glsl-vs-arrays. Bug #27388.
|
|
Fixes glsl-vs-point-size.
|
|
This has confused me twice now. It's a fixed width of 4 (usually a
region description of <4,4,1>), not 1. If it was 1, we'd have been
skipping all over register space.
|
|
|
|
|
|
The ARL value is increments of vec4 in the register file. But
PROGRAM_TEMPORARY or PROGRAM_INPUT are stored as vec4s interleaved
between the two verts being executed (thus a vec8 each), compared to
PROGRAM_STATE_VAR being packed vec4s.
Fixes:
glsl-vs-arrays-2
glsl-vs-mov-after-deref
(without regressing glsl-vs-arrays-3)
|
|
|
|
The previous support was overly complicated by trying to use the same
1-OWORD message for both offsets.
|
|
|
|
|
|
Otherwise, the second half isn't written, and we end up reading back
black.
Fixes the remaining junk drawn in glsl-max-varyings, and will likely
help with a number of large real-world shaders.
|
|
They go into the render cache, so while we don't care about their
contents after execution, failing to note them could cause the writes
to be flushed over important buffer contents later.
|
|
|
|
Otherwise, the subsequent read may not get the written value.
|
|
|
|
To quiet a compiler warning.
|
|
There was confusion on both the size of message we can send, and on
what the URB destination offset means.
The remaining problems appear to be due to spilling of regs in the
fragment shader being broken.
|
|
|
|
This cleans up some chipset dependency sprinkled around, and fixes a
potential overflow of the attribute offset array for many vertex
results.
|
|
|
|
|
|
|
|
When EU executes 'wait' instruction, it stalls and sets notification
register state. Host can issue MMIO write to clear notification
register state to allow EU continue on executing again.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
|
|
|
|
|
|
The original glsl compiler would generate a.x * b.x + a.y * b.y, which
we would do mul+mul+add for instead of this mul+mac.
Fixes glsl-fs-dot-vec2.
|
|
The old compiler didn't use SSG, and instead emitted SGT/SGT/SUB. We
can do a little better for SSG than we do for the SGT series.
|
|
1. Move all GL entrypoint functions and files into src/mesa/main/
This includes the ARB vp/vp, NV vp/fp, ATI fragshader and GLSL bits
that were in src/mesa/shader/
2. Move src/mesa/shader/slang/ to src/mesa/slang/ to reduce the tree depth
3. Rename src/mesa/shader/ to src/mesa/program/ since all the
remaining files are concerned with GPU programs.
4. Misc code refactoring. In particular, I got rid of most of the
GLSL-related ctx->Driver hook functions. None of the drivers used
them.
Conflicts:
src/mesa/drivers/dri/i965/brw_context.c
|
|
|
|
|
|
I broke this with the state streaming changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 31.791 32.287 1.11% 6/6
after:
[ 0] gl firefox-talos-gfx 31.198 31.675 0.96% 6/6
|
|
|
|
We had to fill out all that junk when using the cache, but no more.
|
|
|
|
This makes the binding table code simpler, and is required for gen6,
which requires binding table addresses to be under 64k offset from the
surface state base addr.
No significant change in performance on firefox-talos-gfx.
|
|
Now that the binding table is streamed indirect state, they were
always NULL/0.
|
|
|
|
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
|
|
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
This was bothering me when redoing the binding tables.
|