summaryrefslogtreecommitdiff
path: root/src/mesa/drivers/dri/i965
AgeCommit message (Collapse)Author
2010-12-07i965: Work around gen6 ignoring source modifiers on math instructions.Eric Anholt
With the change of extended math from having the arguments moved into mrfs and handed off through message passing to being directly hooked up to the EU, it looks like the piece for doing source modifiers (negate and abs) was left out. Fixes: fog-modes glean/fp1-ARB_fog_exp test glean/fp1-ARB_fog_exp2 test glean/fp1-Computed fog exp test glean/fp1-Computed fog exp2 test ext_fog_coord-modes
2010-12-07i965: Add disabled debug code for dumping out the WM constant payload.Eric Anholt
This can significantly ease thinking about the asm.
2010-12-07i965: Correctly emit constants for aggregate types (array, matrix, struct)Ian Romanick
Previously the code only handled scalars and vectors. This new code is modeled somewhat after similar code in ir_to_mesa. Reviewed-by: Eric Anholt <eric@anholt.net>
2010-12-07i965: Always hand the absolute value to RSQ.Eric Anholt
gen6 builtin RSQ apparently clamps negative values to 0 instead of returning the RSQ of the absolute value like ARB_fragment_program desires and pre-gen6 apparently does. Fixes: glean/fp1-RSQ test 2 (reciprocal square root of negative value) glean/vp1-RSQ test 2 (reciprocal square root of negative value)
2010-12-07i965: Handle saturates on gen6 math instructions.Eric Anholt
We get saturate as an argument to brw_math() instead of as compile state, since that's how the pre-gen6 send instructions work. Fixes fp-ex2-sat.
2010-12-07i965: Fix comment about gen6_wm_constants.Eric Anholt
This is the push constant buffer, not the pull constants.
2010-12-07i965: upload WM state for _NEW_POLYGON on sandybridgeZhenyu Wang
Be sure polygon stipple mode is updated. This fixes 'gamma' demo.
2010-12-07i965: set minimum/maximum Point Width on SandybridgeXiang, Haihao
It is used for point width on vertex. This fixes mesa demo spriteblast and pointblast.
2010-12-06i965: Nuke brw_wm_glsl.c.Eric Anholt
It was only used for gen6 fragment programs (not GLSL shaders) at this point, and it was clearly unsuited to the task -- missing opcodes, corrupted texturing, and assertion failures hit various applications of all sorts. It was easier to patch up the non-glsl for remaining gen6 changes than to make brw_wm_glsl.c complete. Bug #30530
2010-12-06i965: Add support for the instruction compression bits on gen6.Eric Anholt
Since the 8-wide first-quarter and 16-wide first-half have the same bit encoding, we now need to track "do you want instruction compression" in the compile state.
2010-12-06i965: Align gen6 push constant size to dispatch width.Eric Anholt
The FS backend is fine with register level granularity. But for the brw_wm_emit.c backend, it expects pairs of regs to be used for the constants, because the whole world is pairs of regs. If an odd number got used, we went looking for interpolation in the wrong place.
2010-12-06i965: Make the sampler's implied move on gen6 be a raw move.Eric Anholt
We were accidentally doing a float-to-uint conversion.
2010-12-06i965: Fix up gen6 samplers for their usage by brw_wm_emit.cEric Anholt
We were trying to do the implied move even when we'd already manually moved the real header in place.
2010-12-06i965: Fix gen6 interpolation setup for 16-wide.Eric Anholt
In the SF and brw_fs.cpp fixes to set up interpolation sanely on gen6, the setup for 16-wide interpolation was left behind. This brings relative sanity to that path too.
2010-12-06i965: Don't smash a group of coordinates doing gen6 16-wide sampler headers.Eric Anholt
2010-12-06i965: Fix up 16-wide gen6 FB writes after various refactoring.Eric Anholt
2010-12-06i965: Provide delta_xy reg to gen6 non-GLSL path PINTERP.Eric Anholt
Fixes many assertion failures in that path.
2010-12-06i965: Move payload reg setup to compile, not lookup time.Eric Anholt
Payload reg setup on gen6 depends more on the dispatch width as well as the uses_depth, computes_depth, and other flags. That's something we want to decide at compile time, not at cache lookup. As a bonus, the fragment shader program cache lookup should be cheaper now that there's less to compute for the hash key.
2010-12-06i965: Fix GS state uploading on SandybridgeZhenyu Wang
Need to check the required primitive type for GS on Sandybridge, and when GS is disabled, the new state has to be issued too, instead of only updating URB state with no GS entry, that caused hang on Sandybridge. This fixes hang issue during conformance suite testing.
2010-12-06i965: fix for flat shading on SandybridgeXiang, Haihao
use constant interpolation instead of linear interpolation for attributes COL0,COL1 if GL_FLAT is used. This fixes mesa demo bounce.
2010-12-04i965: Fix compile warning about missing opcodes.Eric Anholt
2010-12-04i965: Update gen6 SF state on fragment program change too.Eric Anholt
SF state depends on what inputs there are to the fragment program, not just the outputs of the VS.
2010-12-04i965: Update gen6 WM state on compiled program change, not just FP change.Eric Anholt
2010-12-02i965: add support for polygon mode on Sandybridge.Xiang, Haihao
This fixes some mesa demos such as fslight/engine in wireframe mode.
2010-12-01i965: Add support for loops in the VS.Eric Anholt
This follows the changes done for the FS alongside the EU emit code.
2010-12-01i965: Enable IF statements in the VS.Eric Anholt
While the actual IF instructions were fixed by Zhenyu, we were still flattening them to conditional moves.
2010-12-01i965: Add support for gen6 CONTINUE instruction emit.Eric Anholt
At this point, piglit tests for fragment shader loops are working.
2010-12-01i965: Add support for gen6 BREAK ISA emit.Eric Anholt
There are now two targets: the hop-to-end-of-block target, and the target for where to resume execution for active channels.
2010-12-01i965: Add support for gen6 DO/WHILE ISA emit.Eric Anholt
There's no more DO since there's no more mask stack, and WHILE has been shuffled like IF was.
2010-12-01i965: Dump the WHILE jump distance on gen6.Eric Anholt
2010-12-01i965: also using align1 mode for math2 on sandybridgeZhenyu Wang
Like Eric's workaround patch of commit 490c23ee6be2e8531b5a14d42f808de83d401130. This forces to align1 mode for math2 too.
2010-11-29i965: Fix type of gl_FragData[] dereference for FB write.Eric Anholt
Fixes glsl-fs-fragdata-1, and hopefully Eve Online where I noticed this bug in the generated shader. Bug #31952.
2010-11-24i965: Don't write mrf assignment for pointsize outputKristian Høgsberg
https://bugs.freedesktop.org/show_bug.cgi?id=31894
2010-11-23i965: Use the new embedded compare in SEL on gen6 for VS MIN and MAX opcodes.Eric Anholt
Cuts the extra CMP instruction that used to precede SEL.
2010-11-23i965: Don't upload line smooth params unless we're line smoothing.Eric Anholt
2010-11-23i965: Don't upload line stipple pattern unless we're stippling.Eric Anholt
2010-11-23i965: Don't upload polygon stipple unless required.Eric Anholt
2010-11-23i965: Move gen4 blend constant color to the gen4 blending file.Eric Anholt
2010-11-19i965: Remove duplicate MRF writes in the FS backend.Eric Anholt
This is quite common for multitexture sampling, and not only cuts down on the second and later set of MOVs, but typically also allows compute-to-MRF on the first set. No statistically siginficant performance difference in nexuiz (n=3), but it reduces instruction count in one of its shaders and seems like a good idea.
2010-11-19i965: Improve compute-to-mrf.Eric Anholt
We were skipping it if the instruction producing the value we were going to compute-to-mrf used its result reg as a source reg. This meant that the typical "write interpolated color to fragment color" or "texture from interpolated texcoord" shader didn't compute-to-MRF. Just don't check for the interference cases until after we've checked if this is the instruction we wanted to compute-to-MRF. Improves nexuiz high-settings performance on my laptop 0.48% +- 0.08% (n=3).
2010-11-19i965: Recognize saturates and turn them into a saturated mov.Eric Anholt
On pre-gen6, this turns 4 instructions into 1. We could still do better by folding the saturate into the instruction generating the value if nobody else uses it, but that should be a separate pass.
2010-11-19i965: Fold constants into the second arg of BRW_SEL as well.Eric Anholt
This hits a common case with min/max operations.
2010-11-19i965: Remove extra \n at the end of every instruction in INTEL_DEBUG=wm.Eric Anholt
2010-11-19i965: Just use memset() to clear most members in FS constructors.Eric Anholt
This should make it a lot harder to forget to zero things.
2010-11-19i965: Fix compute_to_mrf to not move a MRF write up into another live range.Eric Anholt
Fixes glsl-fs-copy-propagation-texcoords-1.
2010-11-19glsl: Combine many instruction lowering passes into one.Kenneth Graunke
This should save on the overhead of tree-walking and provide a convenient place to add more instruction lowering in the future. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2010-11-19glsl: Add ir_quadop_vector expressionIan Romanick
The vector operator collects 2, 3, or 4 scalar components into a vector. Doing this has several advantages. First, it will make ud-chain tracking for components of vectors much easier. Second, a later optimization pass could collect scalars into vectors to allow generation of SWZ instructions (or similar as operands to other instructions on R200 and i915). It also enables an easy way to generate IR for SWZ instructions in the ARB_vertex_program assembler.
2010-11-19glsl: Eliminate assumptions about size of ir_expression::operandsIan Romanick
This may grow in the near future.
2010-11-19glsl: Add ir_unop_sin_reduced and ir_unop_cos_reducedIan Romanick
The operate just like ir_unop_sin and ir_unop_cos except that they expect their inputs to be limited to the range [-pi, pi]. Several GPUs require this limited range for their sine and cosine instructions, so having these as operations (along with a to-be-written lowering pass) helps this architectures. These new operations also matche the semantics of the GL_ARB_fragment_program SCS instruction. Having these as operations helps in generating GLSL IR directly from assembly fragment programs.
2010-11-18i965: Eliminate dead code more aggressively.Eric Anholt
If an instruction writes reg but nothing later uses it, then we don't need to bother doing it. Before, we were just killing code that was never read after it was ever written. This removes many interpolation instructions for attributes with only a few comopnents used. Improves nexuiz high-settings performance .46% +/- .12% (n=3) on my Ironlake.