Age | Commit message (Collapse) | Author |
|
|
|
|
|
The ARL value is increments of vec4 in the register file. But
PROGRAM_TEMPORARY or PROGRAM_INPUT are stored as vec4s interleaved
between the two verts being executed (thus a vec8 each), compared to
PROGRAM_STATE_VAR being packed vec4s.
Fixes:
glsl-vs-arrays-2
glsl-vs-mov-after-deref
(without regressing glsl-vs-arrays-3)
|
|
|
|
The previous support was overly complicated by trying to use the same
1-OWORD message for both offsets.
|
|
|
|
To quiet a compiler warning.
|
|
There was confusion on both the size of message we can send, and on
what the URB destination offset means.
The remaining problems appear to be due to spilling of regs in the
fragment shader being broken.
|
|
|
|
|
|
|
|
|
|
This should be functionally equivalent, with the possible exception of
NaN handling.
|
|
We could use this to reduce constant register pressure, but for now it
makes the resulting program assembly much more readable.
|
|
Rename old IGDNG to Ironlake, and set 'gen' number for
Ironlake as 5, so tracking the features with generation num
instead of special is_ironlake flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
|
|
Just emit the URB write at END time. Subroutine code that sits after
OPCODE_END won't be executed since we've ended the thread at the point
that the URB write is done.
|
|
MOV, MOV."
This reverts commit 8ef3b1834a896927bdd4f2aea552cdb732849da9. Fixes
piglit glsl-vs-if.
|
|
|
|
This is recommended by the B-Spec. I wasn't able to measure any
difference in ETQW.
|
|
We were patching up all the break and continues between the start of
our loop and the end of our loop, even if they were breaks/continues
for an inner loop. Avoiding patching already patched breaks/continues
fixes piglit glsl-vs-loop-nested.
|
|
|
|
|
|
This gets the VS to the point of accepting vertices. \o/
|
|
This is untested at this point.
|
|
|
|
|
|
The pull constants require sending out to an overworked shared unit
and waiting for a response, while push constants are nicely loaded in
for us at thread dispatch time. By putting things we access in every
VS invocation there, ETQW performance improved by 2.5% +/- 1.6% (n=6).
|
|
The codepaths in the function were almost entirely different.
|
|
|
|
Fixes piglit vp-arl-constant-array-huge-overwritten.
|
|
Saves another 600 bytes or so of code.
|
|
Saves ~480 bytes of code.
|
|
Bug #25628. Fixes piglit case glsl-vs-sqrt-zero.
|
|
Add a GLbitfield64 type and several macros to operate on 64-bit
fields. The OutputsWritten field of gl_program is changed to use that
type. This results in a fair amount of fallout in drivers that use
programs.
No changes are strictly necessary at this point as all bits used are
below the 32-bit boundary. Fairly soon several bits will be added for
clip distances written by a vertex shader. This will cause several
bits used for varyings to be pushed above the 32-bit boundary. This
will affect any drivers that support GLSL.
At this point, only the i965 driver has been modified to support this
eventuality.
I did this as a "squash" merge. There were several places through the
outputswritten64 branch where things were broken. I foresee this
causing difficulties later for bisecting. The history is still
available in the branch.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm.h
|
|
This is a 2.9% (+/-.3%) performance win for my GL demo, which hits MAD
sequences for matrix transforms.
|
|
Fixes piglit vp-sge-alias test, and the googleearth ground shader. \o/
Bug #22228
|
|
Fixes piglit arl.vp.
|
|
Passes piglit glsl-vs-loop testcase.
Bug #20171
|
|
This should help with things like lightsmark, but I don't have a testcase
for this commit.
|
|
|
|
This showed a 1.9% (+/-.3%, n=3) improvement in OA performance with high
geometry settings.
|
|
|
|
Fixes piglit glsl-vs-if-bool and progs/glsl/twoside, and will likely be
useful for the looping code.
Bug #18992
|
|
Previously, we'd be branching based on whatever condition code happened to be
laying around.
|
|
I was getting tired of doing the dance of INTEL_DEBUG=batch, copying it out,
and running intel-gen4disasm on it.
|
|
See comment on Vertex URB Entry Read Length for VS_STATE.
This, combined with the previous three commits, fixes #22945.
|
|
This fix is just from code and docs inspection, but it may fix hangs on
some applications.
|
|
1. new PCI ids
2. fix some 3D commands on new chipset
3. fix send instruction on new chipset
4. new VUE vertex header
5. ff_sync message (added by Zou Nan Hai <nanhai.zou@intel.com>)
6. the offset in JMPI is in unit of 64bits on new chipset
7. new cube map layout
|
|
If we can't fit all the VS outputs into the MRF, we need to overflow into
temporary GRF registers, then use some MOVs and a second brw_urb_WRITE()
instruction to place the overflow vertex results into the URB.
This is hit when a vertex/fragment shader pair has a large number of varying
variables (12 or more).
There's still something broken here, but it seems close...
|