Age | Commit message (Collapse) | Author |
|
Fixes: glsl-vs-varying-array
|
|
Shader inputs appear in source registers, not dst registers. Catches
unsupported shaders in glsl-fs-varying-array and Humus
RaytracedShadows.
|
|
Texcoords in AmbientApertureLighting were getting trashed since the
move of math arguments to implied moves, due to the logic for
detecting ALU message reg writes overriding the logic for SEND
implicit message reg writes.
|
|
|
|
|
|
My merge of Zhenyu's patch on top of my previous patches broke it by
my code expecting simd16 single write and Zhenyu's simd8 path being
disabled by mine. Merge the two for success.
|
|
The docs claim two conflicting things: One, that a scalar source is
supported. Two, source hstride must be 1 and width must be exec size.
So splat a constant argument out into a full reg to operate on, since
violating the second set of constraints is clearly failing.
The alternative here might be to do a 1-wide exec on a constant
argument for math1. It would probably save cycles too. But I'll
leave that for the glsl2-965 branch.
Fixes glsl-algebraic-div-one-2.shader_test.
|
|
Fixes glsl-algebraic-add-add-1.
|
|
|
|
Only 8 out of the up to 13 regs are for source/dest depth, so the name
wasn't particularly appropriate. Note that this doesn't count the
constant or URB payload regs. Also, don't pre-divide by 2, so it's
actually a number of registers.
|
|
|
|
|
|
|
|
|
|
Whenever the accumulator results are needed, this bit must be set.
|
|
|
|
|
|
This makes reading the code easier when matching up to the specs,
which also use this format.
|
|
The SIMD16 message no longer has the goofy interleaved format that
made Compr4 compression necessary before.
|
|
format ‘%d’ expects type ‘int’, but argument 2 has type ‘long int’
|
|
Otherwise, we might end up with the if stack pointing at the wrong
place. Fixes GPU hang with glsl-vs-if-loop.
|
|
Fixes glsl-vs-if-nested (70.0 is not <= 70.000648 thanks to the
swizzle bits getting set). Some safety checks are added to make sure
this doesn't happen again as we increase the usage of immediate values
in program generation.
|
|
We'll need to use the HALT instruction to do this right, like returns
from other functions.
|
|
Fixes glsl-vs-dot-vec2.
|
|
This saves an extra message reg move in the program, though I'm not
clear on whether it will have any performance impact other than cache
footprint. It will also fix those math calls on Sandybridge, where
the brw_eu_emit.c brw_math() support relies on the implied move being
used.
|
|
|
|
|
|
|
|
|
|
Mixing stderr (_mesa_print_program, _mesa_print_instruction,
_mesa_print_alu) with stdout means that when writing both to a file,
there isn't a consistent ordering between the two.
|
|
|
|
mtypes.h does not use any symbols from compiler.h.
Also add the required headers for files that depended on symbols from
compiler.h but were indirectly including compiler.h through mtypes.h.
|
|
|
|
|
|
|
|
This pulls in multiple i965 driver fixes which will help ensure better
testing coverage during development, and also gets past the conflicts
of the src/mesa/shader -> src/mesa/program move.
Conflicts:
src/mesa/Makefile
src/mesa/main/shaderapi.c
src/mesa/main/shaderobj.h
|
|
Also fix up comments, so that the difference between the two passes is
clarified.
|
|
|
|
|
|
|
|
|
|
|
|
Several routines directly analyze the grf-to-mrf moves from the Gen
binary code. When it is possible, the mov is removed and the message
register is directly written in the arithmetic instruction
Also redundant mrf-to-grf moves are removed (frequently for example,
when sampling many textures with the same uv)
Code was tested with piglit, warsow and nexuiz on an Ironlake
machine. No regression was found there
Note that the optimizations are *deactivated* on Gen4 and Gen6 since I
did test them properly yet. No reason there are bugs but who knows
The optimizations are currently done in branch free programs *only*.
Considering branches is more complicated and there are actually two
paths: one for branch free programs and one for programs with branches
Also some other optimizations should be done during the emission
itself but considering that some code is shader between vertex shaders
(AOS) and pixel shaders (SOA) and that we may have branches or not, it
is pretty hard to both factorize the code and have one good set of
strategies
|
|
Clarifies program assembly, and with a little tweak to always use
constant_map, we could cut down on constant buffer payload.
|
|
This should be more useful for developers and for bug triaging than
just generating wrong code.
|
|
Fixes glsl-vs-arrays. Bug #27388.
|
|
Fixes glsl-vs-point-size.
|
|
This has confused me twice now. It's a fixed width of 4 (usually a
region description of <4,4,1>), not 1. If it was 1, we'd have been
skipping all over register space.
|
|
|
|
|