Age | Commit message (Collapse) | Author |
|
Remove incorrect headless comment for gen6 fb write message.
Note current SIMD16 mode has already done right for control message.
|
|
Determine header present for fb write by msg length is not right
for SIMD16 dispatch, and if there're more output attributes, header
present is not easy to tell from msg length. This explicitly adds
new param for fb write to say header present or not.
Fixes many cases' hang and failure in GL conformance test.
|
|
The SNB alt-mode math does the denorm and inf reduction even for a
"raw MOV" like we do for g0 message header setup, where we are moving
values that aren't actually floats. Just use UD type, where raw MOVs
really are raw MOVs.
Fixes glxgears since c52adfc2e1d130effea940e75690897eb5d3ceaa, but no
piglit tests had regressed(!)
|
|
This is still awful, but my ability to care about reworking the old
backend so we can just get a temporary value into a POW is awfully low
since the new backend does this all sensibly.
Fixes:
fp1-LIT test 1
fp1-LIT test 3 (case x < 0)
fp1-POW test (exponentiation)
fp-lit-mask
|
|
Fixes fp-arb-fragment-coord-conventions-none
|
|
|
|
Fixes:
fp-kil
fp-generic/kil-swizzle.
|
|
Fixes:
fbo-drawbuffers2-blend
fbo-drawbuffers2-colormask
|
|
With the change of extended math from having the arguments moved into
mrfs and handed off through message passing to being directly hooked
up to the EU, it looks like the piece for doing source modifiers
(negate and abs) was left out.
Fixes:
fog-modes
glean/fp1-ARB_fog_exp test
glean/fp1-ARB_fog_exp2 test
glean/fp1-Computed fog exp test
glean/fp1-Computed fog exp2 test
ext_fog_coord-modes
|
|
|
|
Payload reg setup on gen6 depends more on the dispatch width as well
as the uses_depth, computes_depth, and other flags. That's something
we want to decide at compile time, not at cache lookup. As a bonus,
the fragment shader program cache lookup should be cheaper now that
there's less to compute for the hash key.
|
|
Fixes 10 piglit cases that were assertion failing.
|
|
Fixes assertion failure with texture swizzling in the GLSL path when
it's triggered (such as gen6 FF or ARB_fp shadow comparisons).
Fixes:
texdepth
texSwizzle
fp1-DST test
fp1-LIT test 3
|
|
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
|
|
Fixes glsl-fs-uniform-array-5, but not 6 which fails in ir_to_mesa.
|
|
It can be tested with if (0) replaced with if (1) to force spilling for all
virtual GRFs. Some simple tests work, but large texturing tests fail.
|
|
Hopefully this code can just go away soon.
|
|
|
|
Sandybridge doesn't have Xstart/Ystart in payload header.
|
|
Sandybridge has not much change on texture sampler with Ironlake.
|
|
Accumulator update flag must be set for implicit update on sandybridge.
|
|
Fixes glsl-algebraic-pow-2 in brw_wm_glsl.c mode.
|
|
Things are simpler these days thanks to barycentric interpolation
parameters being handed in in the payload.
|
|
|
|
My merge of Zhenyu's patch on top of my previous patches broke it by
my code expecting simd16 single write and Zhenyu's simd8 path being
disabled by mine. Merge the two for success.
|
|
The docs claim two conflicting things: One, that a scalar source is
supported. Two, source hstride must be 1 and width must be exec size.
So splat a constant argument out into a full reg to operate on, since
violating the second set of constraints is clearly failing.
The alternative here might be to do a 1-wide exec on a constant
argument for math1. It would probably save cycles too. But I'll
leave that for the glsl2-965 branch.
Fixes glsl-algebraic-div-one-2.shader_test.
|
|
|
|
|
|
The SIMD16 message no longer has the goofy interleaved format that
made Compr4 compression necessary before.
|
|
This saves an extra message reg move in the program, though I'm not
clear on whether it will have any performance impact other than cache
footprint. It will also fix those math calls on Sandybridge, where
the brw_eu_emit.c brw_math() support relies on the implied move being
used.
|
|
|
|
This pulls in multiple i965 driver fixes which will help ensure better
testing coverage during development, and also gets past the conflicts
of the src/mesa/shader -> src/mesa/program move.
Conflicts:
src/mesa/Makefile
src/mesa/main/shaderapi.c
src/mesa/main/shaderobj.h
|
|
Also fix up comments, so that the difference between the two passes is
clarified.
|
|
|
|
|
|
Several routines directly analyze the grf-to-mrf moves from the Gen
binary code. When it is possible, the mov is removed and the message
register is directly written in the arithmetic instruction
Also redundant mrf-to-grf moves are removed (frequently for example,
when sampling many textures with the same uv)
Code was tested with piglit, warsow and nexuiz on an Ironlake
machine. No regression was found there
Note that the optimizations are *deactivated* on Gen4 and Gen6 since I
did test them properly yet. No reason there are bugs but who knows
The optimizations are currently done in branch free programs *only*.
Considering branches is more complicated and there are actually two
paths: one for branch free programs and one for programs with branches
Also some other optimizations should be done during the emission
itself but considering that some code is shader between vertex shaders
(AOS) and pixel shaders (SOA) and that we may have branches or not, it
is pretty hard to both factorize the code and have one good set of
strategies
|
|
The original glsl compiler would generate a.x * b.x + a.y * b.y, which
we would do mul+mul+add for instead of this mul+mac.
Fixes glsl-fs-dot-vec2.
|
|
The old compiler didn't use SSG, and instead emitted SGT/SGT/SUB. We
can do a little better for SSG than we do for the SGT series.
|
|
|
|
Rename old IGDNG to Ironlake, and set 'gen' number for
Ironlake as 5, so tracking the features with generation num
instead of special is_ironlake flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
The hope is to later take advantage of the reduced constant usage to
free up regs. This only covers the GLSL path at the moment, because
the brw_wm_emit path doesn't get the information as to whether a float
value is a constant or a uniform.
|
|
Tested with piglit glsl-fs-sqrt-branch, fp-cmp.vpfp.
|
|
MOV, MOV."
This reverts commit 46450c1f3f93bf4dc96696fc7e0f0eb808d9c08a. I was
wrong about null reg behavior -- it reads undefined, not 0. And
they're not kidding.
|
|
|
|
|
|
|
|
This was obvious when looking at the compiled output of ETQW's
shaders.
|
|
Saves an instruction over doing conditional moves.
|
|
Saves an instruction in PINTERP, LINTERP, and PIXEL_W from
brw_wm_glsl.c For non-GLSL it isn't used yet because the deltas have
to be laid out differently.
|
|
This would be triggered by use of sqrt() along with control flow.
Fixes piglit-fs-sqrt-branch and a bug in Yo Frankie!.
|