summaryrefslogtreecommitdiff
path: root/src/mesa/drivers/dri/i965
AgeCommit message (Collapse)Author
2009-08-07i965: Replace the subroutine-skipping jump in VS with a NOP if it's a NOP.Eric Anholt
This showed a 1.9% (+/-.3%, n=3) improvement in OA performance with high geometry settings.
2009-08-07i965: minor context commentsBrian Paul
2009-08-05i965: Fix source depth reg setting for FSes reading and writing to depth.Eric Anholt
For some IZ setups, we'd forget to account for the source depth register being present, so we'd both read the wrong reg, and write output depth to the wrong reg. Bug #22603.
2009-08-04i965: Fix dangerous warning I let slip in.Eric Anholt
2009-08-04i965: Respect CondSwizzle in OPCODE_IF.Eric Anholt
Fixes piglit glsl-vs-if-bool and progs/glsl/twoside, and will likely be useful for the looping code. Bug #18992
2009-08-04i965: Emit conditional code updates as required for GLSL VS if statements.Eric Anholt
Previously, we'd be branching based on whatever condition code happened to be laying around.
2009-08-04i965: Don't set pop_count in the reserved MBZ area of IF statements.Eric Anholt
2009-08-04i965: Print out ELSE and ENDIF src1 arguments like IF does.Eric Anholt
2009-08-04intel: Add support for EXT_provoking_vertex.Eric Anholt
2009-08-04i965: Spell "conditional" correctly.Eric Anholt
2009-08-04i965: Hook up the disassembler for INTEL_DEBUG={wm,vs}.Eric Anholt
I was getting tired of doing the dance of INTEL_DEBUG=batch, copying it out, and running intel-gen4disasm on it.
2009-08-04i965: Initial import of disasm code from intel-gen4asm.Eric Anholt
There's a bunch of stuff from gen4asm and gpu-tools that we probably want to make into a library instead of cargo-culting it around.
2009-08-04i965: warning fixEric Anholt
2009-08-04i965: Fix RECT shadow sampling by not losing the other texcoords.Eric Anholt
Bug #20821
2009-08-03i965: Assert that the offset in the VBO is below the VBO size.Eric Anholt
This avoids sending a bad buffer address to the GPU due to programmer error, and is permitted by the ARB_vbo spec. Note that we still have the opportunity to dereference past the end of the GPU, because we aren't clipping to a correct _MaxElement, but that appears to be harder than it should be. This gets us the 90% solution. Bug #19911.
2009-08-03i965: Even if no VS inputs are set, still load some amount of URB as required.Eric Anholt
See comment on Vertex URB Entry Read Length for VS_STATE. This, combined with the previous three commits, fixes #22945.
2009-08-03i965: Make sure the VS URB size is big enough to fit a VF VUE.Eric Anholt
This fix is just from code and docs inspection, but it may fix hangs on some applications.
2009-08-03i965: Don't emit bad packets when no VBs are referenced.Eric Anholt
It appears that sometimes Mesa (and I suppose a VS could as well) emits a program which references no vertex data, and thus we end up with nr_enabled == 0 even though some VBs are enabled. We'd end up emitting VB/VE packet headers of 0xffffffff in that case, leading to GPU hangs. Bug #22945 (wine with an uncompiled VS)
2009-08-03i965: Calculate enabled[] and nr_enabled once and re-use the values.Eric Anholt
The code duplication bothered me.
2009-07-31Rename TGSI LOOP instruction to better match theri usage.Michal Krol
The LOOP/ENDLOOP pair is renamed to BGNFOR/ENDFOR as its behaviour is similar to a C language for-loop. The BGNLOOP2/ENDLOOP2 pair is renamed to BGNLOOP/ENDLOOP as now there is no name collision.
2009-07-30i965: Postpone ff_sync message in CLIP kernel on IGDNGXiang, Haihao
In addition, it guarantees ff_sync message is issued
2009-07-20i965: Don't clip everything if FRONT_AND_BACK culling while culling disabled.Eric Anholt
Fixes everything-black with meta_clear_tris on quake4-mpdemo and doom3-demo. Bug #18844, 22077.
2009-07-16i965: Add missing state dependency of sf_unit on _NEW_BUFFERS.Eric Anholt
2009-07-15i965: the offset of any branch/jump instruction is in unit of 64bits on IGDNGXiang, Haihao
2009-07-13i965: add support for new chipsetsXiang, Haihao
1. new PCI ids 2. fix some 3D commands on new chipset 3. fix send instruction on new chipset 4. new VUE vertex header 5. ff_sync message (added by Zou Nan Hai <nanhai.zou@intel.com>) 6. the offset in JMPI is in unit of 64bits on new chipset 7. new cube map layout
2009-07-07i965: Remove BRW_NEW_INPUT_VARYINGEric Anholt
This state flag has been unused since the ffvertex_prog move to core.
2009-07-02i965: fixes for JMPIXiang, Haihao
1. the data type of <src1> (JMPI offset) must be D 2. execution size must be 1 3. NoMask 4. instruction compression isn't allowed.
2009-06-30i965: Increase G4X default VS URB allocation to actually allow 32 threads.Eric Anholt
This improves the performance of my GLSL demo by 30%. It also fixes the VS deadlock that ut2004 had, for reasons I can't explain. Bug #21330.
2009-06-30i965: first attempt at handling URB overflow when there's too many vs outputsBrian Paul
If we can't fit all the VS outputs into the MRF, we need to overflow into temporary GRF registers, then use some MOVs and a second brw_urb_WRITE() instruction to place the overflow vertex results into the URB. This is hit when a vertex/fragment shader pair has a large number of varying variables (12 or more). There's still something broken here, but it seems close...
2009-06-30i965: use BRW_MAX_MRFBrian Paul
2009-06-30i965: use BRW_MAX_GRF, BRW_MAX_MRFBrian Paul
2009-06-30i965: move BRW_MAX_GRF, define BRW_MAX_MRFBrian Paul
2009-06-30i965: defined BRW_MAX_MRFBrian Paul
2009-06-30i965: comments and a new assertionBrian Paul
2009-06-29intel: Move note_unlock() implementation to the one place it's needed.Eric Anholt
2009-06-26i965: fix fetching constants from constant buffer in glsl pathRoland Scheidegger
the driver used to overwrite grf0 then use implicit move by send instruction to move contents of grf0 to mrf1. However, we must not overwrite grf0 since it's still used later for fb write. Instead, do the move directly do mrf1 (we could use implicit move from another grf reg to mrf1 but since we need a mov to encode the data anyway it doesn't seem to make sense). I think the dp_READ/WRITE_16 functions may suffer from the same issue. While here also remove unnecessary msg_reg_nr parameter from the dataport functions since always message register 1 is used.
2009-06-23i965: Set the max index buffer address correctly according to the docs.Eric Anholt
It's the last addressable byte, not the byte after the end of the buffer.
2009-06-23i965: Don't set a reserved bit in MI_FLUSH.Eric Anholt
I noticed this when this MI_FLUSH showed up in IPEHR for the ut2004 hang. Not setting the reserved bit didn't help, though.
2009-06-23i965: Fix packed depth/stencil textures to be Y-tiled as well.Eric Anholt
Fixes shadowtex.c. And an assert is added to catch this sooner next time.
2009-06-19intel: Also get the DRI2 front buffer when doing front buffer reading.Eric Anholt
2009-06-19intel: Update Mesa state before span setup in glReadPixels.Eric Anholt
We could have mapped the wrong set of draw buffers. Noticed while looking into a DRI2 glean ReadPixels issue.
2009-06-19i965: initial code for loops in vertex programsBrian Paul
2009-06-19i965: asst clean-ups, etc in brw_vs_emit()Brian Paul
2009-06-19i965: asst clean-ups, var renaming in brw_wm_emit_glsl()Brian Paul
2009-06-17i965: Add decode for the G4X x,y offset in surface state.Eric Anholt
2009-06-17i965: Fix up texture layout for small things with wide pitches (tiled)Eric Anholt
We were packing according to the pitch, while the hardware appears to base it on the base level width. With this and the previous commit, fbo-cubemap now matches untiled behavior.
2009-06-17i965: Fall back or appropriately adjust offsets of drawing to tiled regions.Eric Anholt
3D rendering to tiled textures was being done with non-tile-aligned offsets. The G4X hardware has fields to let us support it easily and correctly, while the pre-G4X hardware requires a path full of suffering, so we just fall back.
2009-06-16Merge branch 'mesa_7_5_branch'Brian Paul
Conflicts: src/mesa/main/api_validate.c
2009-06-16i965: fix bugs in projective texture coordinatesBrian Paul
For the TXP instruction we check if the texcoord is really a 4-component atttibute which requires the divide by W step. This check involved the projtex_mask field. However, the projtex_mask field was being miscalculated because of some confusion between vertex program outputs and fragment program inputs. 1. Rework the size_masks calculation so we correctly set bits corresponding to fragment program input attributes. 2. Rename projtex_mask to proj_attrib_mask since we're interested in more than just texcoords (generic varying vars too). 3. Simply the indexing of the size_masks and proj_attrib_mask fields. 4. The tracker::active[] array was mis-dimensioned. Use MAX_PROGRAM_TEMPS instead of a magic number. 5. Update comments, add new assertions. With these changes the Lightsmark demo/benchmark renders correctly, until we eventually hit a GPU lockup...
2009-06-16i965: handle OPCODE_SWZ in the glsl pathRoland Scheidegger
glsl compiler will not generate OPCODE_SWZ, and as a first step it would be translated away to a MOV anyway (why?), but later internally this opcode is generated (for EXT_texture_swizzling).