summaryrefslogtreecommitdiff
path: root/src/mesa/drivers/dri/i965/brw_fs.cpp
AgeCommit message (Collapse)Author
2010-11-19i965: Fix compute_to_mrf to not move a MRF write up into another live range.Eric Anholt
Fixes glsl-fs-copy-propagation-texcoords-1.
2010-11-19glsl: Combine many instruction lowering passes into one.Kenneth Graunke
This should save on the overhead of tree-walking and provide a convenient place to add more instruction lowering in the future. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2010-11-19glsl: Add ir_quadop_vector expressionIan Romanick
The vector operator collects 2, 3, or 4 scalar components into a vector. Doing this has several advantages. First, it will make ud-chain tracking for components of vectors much easier. Second, a later optimization pass could collect scalars into vectors to allow generation of SWZ instructions (or similar as operands to other instructions on R200 and i915). It also enables an easy way to generate IR for SWZ instructions in the ARB_vertex_program assembler.
2010-11-19glsl: Eliminate assumptions about size of ir_expression::operandsIan Romanick
This may grow in the near future.
2010-11-19glsl: Add ir_unop_sin_reduced and ir_unop_cos_reducedIan Romanick
The operate just like ir_unop_sin and ir_unop_cos except that they expect their inputs to be limited to the range [-pi, pi]. Several GPUs require this limited range for their sine and cosine instructions, so having these as operations (along with a to-be-written lowering pass) helps this architectures. These new operations also matche the semantics of the GL_ARB_fragment_program SCS instruction. Having these as operations helps in generating GLSL IR directly from assembly fragment programs.
2010-11-18i965: Eliminate dead code more aggressively.Eric Anholt
If an instruction writes reg but nothing later uses it, then we don't need to bother doing it. Before, we were just killing code that was never read after it was ever written. This removes many interpolation instructions for attributes with only a few comopnents used. Improves nexuiz high-settings performance .46% +/- .12% (n=3) on my Ironlake.
2010-11-18i965: Fail on loops on gen6 for now until we write the EU emit code for it.Eric Anholt
2010-11-18i965: Shut up spurious gcc warning about GLSL_TYPE enums.Eric Anholt
2010-11-17glsl: Remove the ir_binop_cross opcode.Kenneth Graunke
2010-11-14i965: Fix gl_FragCoord inversion when drawing to an FBO.Eric Anholt
This showed up as cairo-gl gradients being inverted on everyone but Intel, where I'd apparently tweaked the transformation to work around the bug. Fixes piglit fbo-fragcoord.
2010-11-13i965: Silence uninitialized variable warning.Vinson Lee
Silences this GCC warning. brw_fs.cpp: In member function 'void fs_visitor::split_virtual_grfs()': brw_fs.cpp:2516: warning: unused variable 'reg'
2010-11-10i965: re-enable gen6 IF statements in the fragment shader.Eric Anholt
IF statements were getting flattened while they were broken. With Zhenyu's last fix for ENDIF's type, everything appears to have lined up to actually work. This regresses two tests: glsl1-! (not) operator (1, fail) glsl1-! (not) operator (1, pass) but fixes tests that couldn't work before because the IFs couldn't be flattened: glsl-fs-discard-01 occlusion-query-discard (and, naturally, this should be a performance improvement for apps that actually use IF statements to avoid executing a bunch of code).
2010-11-03intel: Annotate debug printout checks with unlikely().Eric Anholt
This provides the optimizer with hints about code hotness, which we're quite certain about for debug printouts (or, rather, while we developers often hit the checks for debug printouts, we don't care about performance while doing so).
2010-10-27i965: Add bit operation support to the fragment shader backend.Kenneth Graunke
2010-10-27i965: Make FS uniforms be the actual type of the uniform at upload time.Eric Anholt
This fixes some insanity that would otherwise be required for GLSL 1.30 bit ops or gen6 integer uniform operations in general, at the cost of upload-time pain. Given that we only have that pain because mesa's mangling our integer uniforms to be floats, this something that should be fixed outside of the shader codegen.
2010-10-27Track separate programs for each stageIan Romanick
The assumption is that all stages are the same program or that varyings are passed between stages using built-in varyings.
2010-10-26i965: Add support for discard instructions on gen6.Eric Anholt
It's a little more painful than before because we don't have the handy mask register any more, and have to make do with cooking up a value out of the flag register.
2010-10-26i965: Clear some undefined fields of g0 when using them for gen6 FB writes.Eric Anholt
This doesn't appear to help any testcases I'm looking at, but it looks like it's required.
2010-10-22i965: Add support for pull constants to the new FS backend.Eric Anholt
Fixes glsl-fs-uniform-array-5, but not 6 which fails in ir_to_mesa.
2010-10-22i965: Move the FS disasm/annotation printout to codegen time.Eric Anholt
This makes it a lot easier to track down where we failed when some code emit triggers an assert. Plus, less memory allocation for codegen.
2010-10-21i965: Be more aggressive in tracking live/dead intervals within loops.Eric Anholt
Fixes glsl-fs-convolution-2, which was blowing up due to the array access insanity getting at the uniform values within the loop. Each temporary was considered live across the whole loop.
2010-10-21i965: Correct scratch space allocation.Eric Anholt
One, it was allocating increments of 1kb, but per thread scratch space is a power of two. Two, the new FS wasn't getting total_scratch set at all, so everyone thought they had 1kb and writes beyond 1kb would go stomping on a neighbor thread. With this plus the previous register spilling for the new FS, glsl-fs-convolution-1 passes.
2010-10-21i965: Add support for register spilling.Eric Anholt
It can be tested with if (0) replaced with if (1) to force spilling for all virtual GRFs. Some simple tests work, but large texturing tests fail.
2010-10-21i965: Fix gl_FrontFacing emit on pre-gen6.Eric Anholt
It's amazing this code worked. Basically, we would get lucky in register allocation and the tests using frontfacing would happen to allocate gl_FrontFacing storage and the instructions generating gl_FrontFacing but pointing at another register to the same hardware register. Noticed during register spilling debug, when suddenly they didn't get allocatd the same storage.
2010-10-21i965: Split register allocation out of the ever-growing brw_fs.cpp.Eric Anholt
2010-10-19i965: Use the new style of IF statement with embedded comparison on gen6.Eric Anholt
"Everyone else" does it this way, so follow suit. It's fewer instructions, anyway.
2010-10-18i965: Remove unused variable.Kenneth Graunke
2010-10-18i965: Fix a weirdness in NOT handling.Eric Anholt
XOR makes much more sense. Note that the previous code would have failed for not(not(x)), but that gets optimized out.
2010-10-18i965: Disable the debug printf I added for FS disasm.Eric Anholt
2010-10-18i965: Add missing "break" statement.Kenneth Graunke
Otherwise, it would try to handle arrays as structures, use uninitialized memory, and crash.
2010-10-15i965: Set the type of the null register to fix gen6 FS comparisons.Eric Anholt
We often use reg_null as the destination when setting up the flag regs. However, on gen6 there aren't general implicit conversions to destination types from src types, so the comparison to produce the flag regs would be done on the integer result interpreted as a float. Hilarity ensued. Fixes 20 piglit cases.
2010-10-15i965: Fix indentation after commit 3322fbafIan Romanick
2010-10-14glsl: Slightly change the semantic of _LinkedShadersIan Romanick
Previously _LinkedShaders was a compact array of the linked shaders for each shader stage. Now it is arranged such that each slot, indexed by the MESA_SHADER_* defines, refers to a specific shader stage. As a result, some slots will be NULL. This makes things a little more complex in the linker, but it simplifies things in other places. As a side effect _NumLinkedShaders is removed. NOTE: This may be a candidate for the 7.9 branch. If there are other patches that get backported to 7.9 that use _LinkedShader, this patch should be cherry picked also.
2010-10-14i965: Fix texturing on pre-gen5.Eric Anholt
I broke it in 06fd639c519214b6ebcbf29127b6d9ed429f8641 by only testing 2 generations of hardware :(
2010-10-14i965: Add support for ir_unop_round_even via the RNDE instruction.Kenneth Graunke
2010-10-14i965: Enable the new FS backend on pre-gen6 as well.Eric Anholt
It is now to the point where we have no regressing piglit tests. It also fixes Yo Frankie! and Humus DynamicBranching, probably due to the piglit bias tests that work that didn't on the Mesa IR backend. As a downside, performance takes about a 5-10% performance hit at the moment (e.g. nexuiz 19.8fps -> 18.8fps), which I plan to resolve by reintroducing 16-wide fragment shaders where possible. It is a win, though, for fragment shaders using flow control.
2010-10-14i965: Use RNDZ for ir_unop_trunc in the new FS.Kenneth Graunke
The existing code used RNDD, which rounds down, rather than toward zero.
2010-10-14i965: Use logical-not when emitting ir_unop_ceil.Kenneth Graunke
Fixes piglit test glsl-fs-ceil.
2010-10-14i965: Add peepholing of conditional mod generation from expressions.Eric Anholt
This cuts usually 2 out of 3 instructions for flag reg generation (if statements, conditional assignment) by producing the conditional mod in the expression representing the boolean value. Fixes glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined (register allocation no longer fails for the conditional generation proliferation)
2010-10-14i965: Add a function for handling the move of boolean values to flag regs.Eric Anholt
This will be a place to peephole comparisions directly to the flag regs, and for now avoids using MOV with conditional mod on gen6, which is now illegal.
2010-10-14i965: Add a pass to the FS to split virtual GRFs to float channels.Eric Anholt
Improves nexuiz performance 0.91% (+/- 0.54%, n=8)
2010-10-14i965: Update the live interval when coalescing regs.Eric Anholt
2010-10-14i965: Set class_sizes[] for the aligned reg pair class.Eric Anholt
So far, I've only seen this be a valgrind warning and not a real failure.
2010-10-13i965: Add support for rescaling GL_TEXTURE_RECTANGLE coords to new FS.Eric Anholt
2010-10-13Drop GLcontext typedef and use struct gl_context insteadKristian Høgsberg
2010-10-12i965: Fix missing "break;" in i2b/f2b, and missing AND of CMP result.Eric Anholt
Fixes glsl-fs-i2b.
2010-10-11i965: Always use the new FS backend on gen6.Eric Anholt
It's now much more correct for gen6 than the old backend, with just 2 regressions I've found (one of which is common with pre-gen6 and will be fixed by an array splitting IR pass). This does leave the old Mesa IR backend getting used still when we don't have GLSL IR, but the plan is to get GLSL IR input to the driver for the ARB programs and fixed function by the next release.
2010-10-11i965: Fix gen6 pixel_[xy] setup to avoid mixing int and float src operands.Eric Anholt
Pre-gen6, you could mix int and float just fine. Now, you get goofy results. Fixes: glsl-arb-fragment-coord-conventions glsl-fs-fragcoord glsl-fs-if-greater glsl-fs-if-greater-equal glsl-fs-if-less glsl-fs-if-less-equal
2010-10-11i965: Expand uniform args to gen6 math to full registers to get hstride == 1.Eric Anholt
This is a hw requirement in math args. This also is inefficient, as we're calculating the same result 8 times, but then we've been doing that on pre-gen6 as well. If we're doing math on uniforms, though, we'd probably be better served by having some sort of mechanism for precalculating those results into another uniform value to use. Fixes 7 piglit math tests.
2010-10-11i965: Don't compute-to-MRF in gen6 math instructions.Eric Anholt