summaryrefslogtreecommitdiff
path: root/src/mesa/drivers/dri/i965/brw_wm.h
AgeCommit message (Collapse)Author
2010-09-21i965: Track the windowizer's dispatch for kill pixel, promoted, and OQEric Anholt
Looks like the problem was we weren't passing the depth to the render target as expected, so the chip would wedge. Fixes GPU hang in occlusion-query-discard. Bug #30097
2010-09-21i965: Share the KIL_NV implementation between glsl and non-glsl.Eric Anholt
2010-08-26i965: Start building direct GLSL2 IR to 965 assembly codegen.Eric Anholt
Our channel-expressions and vector-splitting changes now happen into a private copy of the IR that we maintain for ourselves. Uniform assignment still happens by the core, so we continue using Mesa IR generation not just for swrast fallbacks but also for uniform values (since there's no storage for their contents other than shader_program->FragmentProgram->Parameters->ParameterValues). And most importantly, at the moment no actual codegen is hooked up other than emitting our favorite color to the framebuffer.
2010-08-26i965: Add new pass to split vectors into scalar variablesEric Anholt
Combined with the previous pass, this lets other optimization passes do their work thanks to ir_tree_grafting. Still have regression in instruction count with INTEL_NEW_FS, but register count is even better.
2010-08-26i965: Add a pass for the FS to reduce vector expressions down to scalar.Eric Anholt
This is a step towards implementing a GLSL IR backend for the 965 fragment shader. Because it has downsides with the current codegen, it is hidden under the environment variable INTEL_NEW_FS. This results in an increase in instruction count at the moment (1444 -> 1752 for glsl-fs-raytrace, 345 -> 359 on my demo), because dot products are turned into a series of multiplies and adds instead of a custom expansion of MULs and MACs, and by not splitting the variable types up we don't get tree grafting and thus there are extra moves of temporary storage. However, register count drops for the non-GLSL path (64 -> 56 on my demo shader) because the register allocator sees all the sub-operations.
2010-08-26i965: Start building 965 FS backend.Eric Anholt
2010-08-20i965: Rename nr_depth_regs to nr_payload_regs.Eric Anholt
Only 8 out of the up to 13 regs are for source/dest depth, so the name wasn't particularly appropriate. Note that this doesn't count the constant or URB payload regs. Also, don't pre-divide by 2, so it's actually a number of registers.
2010-07-26Merge remote branch 'origin/master' into glsl2Eric Anholt
This pulls in multiple i965 driver fixes which will help ensure better testing coverage during development, and also gets past the conflicts of the src/mesa/shader -> src/mesa/program move. Conflicts: src/mesa/Makefile src/mesa/main/shaderapi.c src/mesa/main/shaderobj.h
2010-07-02i965: Add support for the DP2 opcode, which we use for dot(vec2, vec2).Eric Anholt
The original glsl compiler would generate a.x * b.x + a.y * b.y, which we would do mul+mul+add for instead of this mul+mac. Fixes glsl-fs-dot-vec2.
2010-06-30i965: Add support for OPCODE_SSG.Eric Anholt
The old compiler didn't use SSG, and instead emitted SGT/SGT/SUB. We can do a little better for SSG than we do for the SGT series.
2010-06-10mesa: rename src/mesa/shader/ to src/mesa/program/Brian Paul
2010-05-23i965: Fix bit allocation for number of color regions for ARB_draw_buffers.Eric Anholt
If you used all 4 color targets we currently support, we would see 0 and end up just writing the first output. Give enough bits that we can do the maximum of 16. Fixes piglit fbo-drawbuffers-maxtargets.
2010-03-22i965: Allow FS constants to be used as immediates instead of push/pull.Eric Anholt
The hope is to later take advantage of the reduced constant usage to free up regs. This only covers the GLSL path at the moment, because the brw_wm_emit path doesn't get the information as to whether a float value is a constant or a uniform.
2010-03-10i965: Add support for the CMP opcode in the GLSL path.Eric Anholt
This would be triggered by use of sqrt() along with control flow. Fixes piglit-fs-sqrt-branch and a bug in Yo Frankie!.
2010-01-26i965: Fix fp fragment.position handling and enable HW part of ARB_fcc.Eric Anholt
As with swrast, this fixes the default pixel center behavior which was broken, and implements the previous behavior for integer. Fixes piglit fp-arb-fragment-coord-conventions-none. The extension won't be exposed until we get the GLSL part implemented. The DRI1 origin_x/y parts are dropped since they're no longer relevant.
2009-11-19i965: Pack the brw_wm_prog_key better.Eric Anholt
2009-11-17Merge branch 'outputswritten64'Ian Romanick
Add a GLbitfield64 type and several macros to operate on 64-bit fields. The OutputsWritten field of gl_program is changed to use that type. This results in a fair amount of fallout in drivers that use programs. No changes are strictly necessary at this point as all bits used are below the 32-bit boundary. Fairly soon several bits will be added for clip distances written by a vertex shader. This will cause several bits used for varyings to be pushed above the 32-bit boundary. This will affect any drivers that support GLSL. At this point, only the i965 driver has been modified to support this eventuality. I did this as a "squash" merge. There were several places through the outputswritten64 branch where things were broken. I foresee this causing difficulties later for bisecting. The history is still available in the branch. Conflicts: src/mesa/drivers/dri/i965/brw_wm.h
2009-11-13i965: Share OPCODE_TXB between brw_wm_emit.c and brw_wm_glsl.cEric Anholt
This should fix TXB on G45 and older in the GLSL case.
2009-11-13i965: Share OPCODE_TEX between brw_wm_emit.c and brw_wm_glsl.c.Eric Anholt
New comments should explain some of the confusion about how this message works.
2009-11-10i965: avoid memsetting all the BRW_WM_MAX_INSN arrays for every compile.Eric Anholt
For an app that's blowing out the state cache, like sauerbraten, the memset of the giant arrays ended up taking 11% of the CPU even when only a "few" of the entries got used. With this, the WM program compile drops back down to 1% of CPU time. Bug #24981 (bisected to BRW_WM_MAX_INSN increase).
2009-11-06i965: Share min/max between brw_wm_emit.c and brw_wm_glsl.cEric Anholt
2009-11-06i965: Share emit_fb_write() between brw_wm_emit.c and brw_wm_glsl.cEric Anholt
This should fix issues with antialiased lines in GLSL.
2009-11-06i965: Share most of the WM functions between brw_wm_glsl.c and brw_wm_emit.cEric Anholt
The PINTERP code should be faster for brw_wm_glsl.c now since brw_wm_emit.c's had been improved, and pixel_w should no longer stomp on a neighbor to dst.
2009-11-06i965: Share math functions between brw_wm_glsl.c and brw_wm_emit.c.Eric Anholt
2009-11-06i965: Share the sop opcodes between brw_wm_glsl.c and brw_wm_emit.c.Eric Anholt
2009-11-06i965: Share OPCODE_MAD between brw_wm_glsl.c and brw_wm_emit.cEric Anholt
2009-11-06i965: Share the DP3, DP4, and DPH between brw_wm_glsl.c and brw_wm_emit.cEric Anholt
2009-11-06i965: Add generic GLSL code for unaliasing a 3-arg opcode, and share LRP code.Eric Anholt
2009-11-06i965: Share basic ALU ops between brw_wm_glsl and brw_wm_emit.cEric Anholt
This drops support for get_src_reg_imm in these, but the prospect of getting brw_wm_pass*.c onto our GLSL path is well worth some temporary pain.
2009-11-06i965: Collect GLSL src/dst regs up in generic code.Eric Anholt
This matches brw_wm_emit.c, which we'll be using shortly. There's a possible penalty here in that we'll allocate registers for unused channels, since we aren't doing ref tracking like brw_wm_pass*.c does. However, my measurements on GM965 don't show any for either OA or UT2004 with the GLSL path forced.
2009-10-30i965: Fix BRW_WM_MAX_INSN to reflect current limits.Eric Anholt
Part of fixing bug #24355.
2009-10-29i965: make brw_wm_prog_key a little smallerBrian Paul
GLushort is big enough for the swizzle and origin fields. The key could probably be made smaller still by re-ordering things. I'll hold off on that until after the outputswritten64 branch is merged. The key will get a little larger again with the GLbitfield64 fields.
2009-10-29i965: don't use context state in emit_fb_write()Brian Paul
Put the state that we care about in the hash key. Issue spotted by Keith Whitwell.
2009-10-29i965: use macros to get/set prog_instruction::Aux fieldBrian Paul
This makes things a bit easier to remember/understand.
2009-09-11i965: Move OPCODE_DDX/DDY to brw_wm_emit.c and make it actually work.Eric Anholt
Previously, it was trying to mess around with the varying's WM setup data to produce a result. Along with not actually working when passed a varying, this wouldn't work if you did dFd[xy]() on a temporary. Instead, just calculate the derivative using the neighbors in the subspan.
2009-08-12i965: drop dead scalar handling in GLSL.Eric Anholt
2009-08-12i965: Store the dispatch width in the WM compile struct.Eric Anholt
I'll be using this in merging brw_wm_emit.c and brw_wm_glsl.c
2009-08-12i965: Handle scalar result swizzling in shared GLSL/non-GLSL code.Eric Anholt
This is preparation for merging of brw_wm_glsl.c and brw_wm_emit.c, and glsl.c doesn't swizzle channel results around.
2009-08-05i965: Fix source depth reg setting for FSes reading and writing to depth.Eric Anholt
For some IZ setups, we'd forget to account for the source depth register being present, so we'd both read the wrong reg, and write output depth to the wrong reg. Bug #22603.
2009-06-16Merge branch 'mesa_7_5_branch'Brian Paul
Conflicts: src/mesa/main/api_validate.c
2009-06-16i965: fix bugs in projective texture coordinatesBrian Paul
For the TXP instruction we check if the texcoord is really a 4-component atttibute which requires the divide by W step. This check involved the projtex_mask field. However, the projtex_mask field was being miscalculated because of some confusion between vertex program outputs and fragment program inputs. 1. Rework the size_masks calculation so we correctly set bits corresponding to fragment program input attributes. 2. Rename projtex_mask to proj_attrib_mask since we're interested in more than just texcoords (generic varying vars too). 3. Simply the indexing of the size_masks and proj_attrib_mask fields. 4. The tracker::active[] array was mis-dimensioned. Use MAX_PROGRAM_TEMPS instead of a magic number. 5. Update comments, add new assertions. With these changes the Lightsmark demo/benchmark renders correctly, until we eventually hit a GPU lockup...
2009-06-12i965: interpolate colors with perspective correction by defaultBrian Paul
...rather than with linear interpolation. Modern hardware should use perspective-corrected interpolation for colors (as for texcoords). glHint(GL_PERSPECTIVE_CORRECTION_HINT, mode) can be used to get linear interpolation if mode = GL_FASTEST.
2009-05-14i965: Fix register allocation of GLSL fp inputs.Eric Anholt
Before, if the VP output something that is in the attributes coming into the WM but which isn't used by the WM, then WM would end up reading subsequent varyings from the wrong places. This was visible with a GLSL demo using gl_PointSize in the VS and a varying in the WM, as point size is in the VUE but not used by the WM. There is now a regression test in piglit, glsl-unused-varying.
2009-05-08i965: don't use GRF regs 126,127 for WM programsBrian Paul
They seem to be used for something else and using them for shader temps seems to lead to GPU lock-ups. Call _mesa_warning() when we run out of temps. Also, clean up some debug code.
2009-04-27i965: only upload constant buffer data when we actually need the const bufferBrian Paul
Make the use_const_buffer field per-program and only call the code which updates the constant buffer's data if the flag is set. This should undo the perf regression from 20f3497e4b6756e330f7b3f54e8acaa1d6c92052 (cherry picked from master, commit dc9705d12d162ba6d087eb762e315de9f97bc456)
2009-04-27i965: only upload constant buffer data when we actually need the const bufferBrian Paul
Make the use_const_buffer field per-program and only call the code which updates the constant buffer's data if the flag is set. This should undo the perf regression from 20f3497e4b6756e330f7b3f54e8acaa1d6c92052
2009-04-24i965: rework GLSL/WM register allocationBrian Paul
Use a bitvector of used/free flags. If we run out of temps, examine the live intervals of the temp regs in the program and free those which are no longer alive. Also, enable the new WM const buffer code.
2009-04-03i965: another checkpoint commit of new constant buffer supportBrian Paul
Everything is in place now for using a true constant buffer for GLSL fragment shaders. Still some bugs to find though.
2009-04-03i965: fix indentationBrian Paul
2009-03-23i965: Fix glFrontFacing in twoside GLSL demo.Eric Anholt
This also cuts instructions by just using the existing bit in the payload rather than computing it from the determinant in the SF unit and passing it as a varying down to the WM. Something still goes wrong with getting the backface color right, but a simpler shader appears to get the right result.