Age | Commit message (Collapse) | Author |
|
This can significantly ease thinking about the asm.
|
|
Previously the code only handled scalars and vectors. This new code
is modeled somewhat after similar code in ir_to_mesa.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
gen6 builtin RSQ apparently clamps negative values to 0 instead of
returning the RSQ of the absolute value like ARB_fragment_program
desires and pre-gen6 apparently does.
Fixes:
glean/fp1-RSQ test 2 (reciprocal square root of negative value)
glean/vp1-RSQ test 2 (reciprocal square root of negative value)
|
|
We get saturate as an argument to brw_math() instead of as compile
state, since that's how the pre-gen6 send instructions work. Fixes
fp-ex2-sat.
|
|
This is the push constant buffer, not the pull constants.
|
|
Be sure polygon stipple mode is updated. This fixes 'gamma' demo.
|
|
It is used for point width on vertex. This fixes mesa demo spriteblast and pointblast.
|
|
It was only used for gen6 fragment programs (not GLSL shaders) at this
point, and it was clearly unsuited to the task -- missing opcodes,
corrupted texturing, and assertion failures hit various applications
of all sorts. It was easier to patch up the non-glsl for remaining
gen6 changes than to make brw_wm_glsl.c complete.
Bug #30530
|
|
Since the 8-wide first-quarter and 16-wide first-half have the same
bit encoding, we now need to track "do you want instruction
compression" in the compile state.
|
|
The FS backend is fine with register level granularity. But for the
brw_wm_emit.c backend, it expects pairs of regs to be used for the
constants, because the whole world is pairs of regs. If an odd number
got used, we went looking for interpolation in the wrong place.
|
|
We were accidentally doing a float-to-uint conversion.
|
|
We were trying to do the implied move even when we'd already manually
moved the real header in place.
|
|
In the SF and brw_fs.cpp fixes to set up interpolation sanely on gen6,
the setup for 16-wide interpolation was left behind. This brings
relative sanity to that path too.
|
|
|
|
|
|
Fixes many assertion failures in that path.
|
|
Payload reg setup on gen6 depends more on the dispatch width as well
as the uses_depth, computes_depth, and other flags. That's something
we want to decide at compile time, not at cache lookup. As a bonus,
the fragment shader program cache lookup should be cheaper now that
there's less to compute for the hash key.
|
|
Need to check the required primitive type for GS on Sandybridge,
and when GS is disabled, the new state has to be issued too, instead
of only updating URB state with no GS entry, that caused hang on Sandybridge.
This fixes hang issue during conformance suite testing.
|
|
use constant interpolation instead of linear interpolation for
attributes COL0,COL1 if GL_FLAT is used. This fixes mesa demo bounce.
|
|
|
|
SF state depends on what inputs there are to the fragment program, not
just the outputs of the VS.
|
|
|
|
This fixes some mesa demos such as fslight/engine in wireframe mode.
|
|
This follows the changes done for the FS alongside the EU emit code.
|
|
While the actual IF instructions were fixed by Zhenyu, we were still
flattening them to conditional moves.
|
|
At this point, piglit tests for fragment shader loops are working.
|
|
There are now two targets: the hop-to-end-of-block target, and the
target for where to resume execution for active channels.
|
|
There's no more DO since there's no more mask stack, and WHILE has
been shuffled like IF was.
|
|
|
|
Like Eric's workaround patch of commit 490c23ee6be2e8531b5a14d42f808de83d401130.
This forces to align1 mode for math2 too.
|
|
Fixes glsl-fs-fragdata-1, and hopefully Eve Online where I noticed
this bug in the generated shader. Bug #31952.
|
|
https://bugs.freedesktop.org/show_bug.cgi?id=31894
|
|
Cuts the extra CMP instruction that used to precede SEL.
|
|
|
|
|
|
|
|
|
|
This is quite common for multitexture sampling, and not only cuts down
on the second and later set of MOVs, but typically also allows
compute-to-MRF on the first set.
No statistically siginficant performance difference in nexuiz (n=3),
but it reduces instruction count in one of its shaders and seems like
a good idea.
|
|
We were skipping it if the instruction producing the value we were
going to compute-to-mrf used its result reg as a source reg. This
meant that the typical "write interpolated color to fragment color" or
"texture from interpolated texcoord" shader didn't compute-to-MRF.
Just don't check for the interference cases until after we've checked
if this is the instruction we wanted to compute-to-MRF.
Improves nexuiz high-settings performance on my laptop 0.48% +- 0.08%
(n=3).
|
|
On pre-gen6, this turns 4 instructions into 1. We could still do
better by folding the saturate into the instruction generating the
value if nobody else uses it, but that should be a separate pass.
|
|
This hits a common case with min/max operations.
|
|
|
|
This should make it a lot harder to forget to zero things.
|
|
Fixes glsl-fs-copy-propagation-texcoords-1.
|
|
This should save on the overhead of tree-walking and provide a
convenient place to add more instruction lowering in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
The vector operator collects 2, 3, or 4 scalar components into a
vector. Doing this has several advantages. First, it will make
ud-chain tracking for components of vectors much easier. Second, a
later optimization pass could collect scalars into vectors to allow
generation of SWZ instructions (or similar as operands to other
instructions on R200 and i915). It also enables an easy way to
generate IR for SWZ instructions in the ARB_vertex_program assembler.
|
|
This may grow in the near future.
|
|
The operate just like ir_unop_sin and ir_unop_cos except that they
expect their inputs to be limited to the range [-pi, pi]. Several
GPUs require this limited range for their sine and cosine
instructions, so having these as operations (along with a to-be-written
lowering pass) helps this architectures.
These new operations also matche the semantics of the
GL_ARB_fragment_program SCS instruction. Having these as operations
helps in generating GLSL IR directly from assembly fragment programs.
|
|
If an instruction writes reg but nothing later uses it, then we don't
need to bother doing it. Before, we were just killing code that was
never read after it was ever written.
This removes many interpolation instructions for attributes with only
a few comopnents used. Improves nexuiz high-settings performance .46%
+/- .12% (n=3) on my Ironlake.
|
|
|