Age | Commit message (Collapse) | Author |
|
|
|
XOR makes much more sense. Note that the previous code would have
failed for not(not(x)), but that gets optimized out.
|
|
|
|
Otherwise, it would try to handle arrays as structures, use
uninitialized memory, and crash.
|
|
We often use reg_null as the destination when setting up the flag
regs. However, on gen6 there aren't general implicit conversions to
destination types from src types, so the comparison to produce the
flag regs would be done on the integer result interpreted as a float.
Hilarity ensued.
Fixes 20 piglit cases.
|
|
|
|
Previously _LinkedShaders was a compact array of the linked shaders
for each shader stage. Now it is arranged such that each slot,
indexed by the MESA_SHADER_* defines, refers to a specific shader
stage. As a result, some slots will be NULL. This makes things a
little more complex in the linker, but it simplifies things in other
places.
As a side effect _NumLinkedShaders is removed.
NOTE: This may be a candidate for the 7.9 branch. If there are other
patches that get backported to 7.9 that use _LinkedShader, this patch
should be cherry picked also.
|
|
I broke it in 06fd639c519214b6ebcbf29127b6d9ed429f8641 by only testing
2 generations of hardware :(
|
|
|
|
It is now to the point where we have no regressing piglit tests. It
also fixes Yo Frankie! and Humus DynamicBranching, probably due to the
piglit bias tests that work that didn't on the Mesa IR backend.
As a downside, performance takes about a 5-10% performance hit at the
moment (e.g. nexuiz 19.8fps -> 18.8fps), which I plan to resolve by
reintroducing 16-wide fragment shaders where possible. It is a win,
though, for fragment shaders using flow control.
|
|
The existing code used RNDD, which rounds down, rather than toward zero.
|
|
Fixes piglit test glsl-fs-ceil.
|
|
This cuts usually 2 out of 3 instructions for flag reg generation (if
statements, conditional assignment) by producing the conditional mod
in the expression representing the boolean value.
Fixes glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined (register
allocation no longer fails for the conditional generation
proliferation)
|
|
This will be a place to peephole comparisions directly to the flag
regs, and for now avoids using MOV with conditional mod on gen6, which
is now illegal.
|
|
Improves nexuiz performance 0.91% (+/- 0.54%, n=8)
|
|
|
|
So far, I've only seen this be a valgrind warning and not a real failure.
|
|
|
|
|
|
Fixes glsl-fs-i2b.
|
|
It's now much more correct for gen6 than the old backend, with just 2
regressions I've found (one of which is common with pre-gen6 and will
be fixed by an array splitting IR pass).
This does leave the old Mesa IR backend getting used still when we
don't have GLSL IR, but the plan is to get GLSL IR input to the driver
for the ARB programs and fixed function by the next release.
|
|
Pre-gen6, you could mix int and float just fine. Now, you get goofy
results.
Fixes:
glsl-arb-fragment-coord-conventions
glsl-fs-fragcoord
glsl-fs-if-greater
glsl-fs-if-greater-equal
glsl-fs-if-less
glsl-fs-if-less-equal
|
|
This is a hw requirement in math args. This also is inefficient, as
we're calculating the same result 8 times, but then we've been doing
that on pre-gen6 as well. If we're doing math on uniforms, though,
we'd probably be better served by having some sort of mechanism for
precalculating those results into another uniform value to use.
Fixes 7 piglit math tests.
|
|
|
|
This was leftover from the pre-gen6 cleanups. One tests regresses
where compute-to-MRF now occurs.
|
|
This didn't produce a statistically significant performance difference
in my demo (n=4) or nexuiz (n=3), but it still seems like a good idea
and is recommended by the HW team.
|
|
|
|
This is progress towards enabling a compute-to-MRF pass.
|
|
It's time to start splitting some of this up.
|
|
While I don't know of any performance changes from this (once extra
reg available out of 128), it makes the generated asm a lot cleaner
looking.
|
|
Having the single opcode write then read the reg meant that single
instruction opcodes had to consider their source regs to interfere
with their dest regs.
|
|
Improves performance of my GLSL demo 14.3% (+/- 4%, n=4) by
eliminating the moves used in ir_assignment and ir_swizzle handling.
Still 16.5% to go to catch up to the Mesa IR backend, presumably
because instructions are almost perfectly mis-scheduled now.
|
|
We'd overwrite the same element twice.
|
|
Fixes glsl-fs-texturecube-2-*
|
|
|
|
We sensibly only provide it if the FS asks for it. We could actually
skip WPOS unless the FS needed WPOS.zw, but that's something for
later.
Fixes: glsl-texture2d and probably many others.
|
|
Fixes glsl1-gl_FrontFacing var (2) with new FS.
|
|
|
|
This should fix texturing in the new FS backend.
|
|
Not needed now that we're doing barycentric.
|
|
I only set it on the color_regions == 0 case, missing the important
case, causing GPU hangs on pre-gen6.
|
|
It's not that hard to detect when we need the header.
|
|
We could do the first operand as well by flipping the comparison, but
this covered several CMPs in code I was looking at.
|
|
A debug disable had slipped in.
|
|
This uses message headers for now, since we'll need it for MRT. We
can cut out the header later.
|
|
We could try to detect this in expression handling and do it
proactively there, but it seems like less logic to do it in one
optional pass at the end.
|
|
The glsl core should be handling most dead code issues for us, but we
generate some things in codegen that may not get used, like the 1/w
value or pixel deltas. It seems a lot easier this way than trying to
work out up front whether we're going to use those values or not.
|
|
This also means that our intervals now highlight dead code.
|
|
|
|
|