Age | Commit message (Collapse) | Author |
|
Looks like the problem was we weren't passing the depth to the render
target as expected, so the chip would wedge. Fixes GPU hang in
occlusion-query-discard.
Bug #30097
|
|
|
|
Our channel-expressions and vector-splitting changes now happen into a
private copy of the IR that we maintain for ourselves. Uniform
assignment still happens by the core, so we continue using Mesa IR
generation not just for swrast fallbacks but also for uniform values
(since there's no storage for their contents other than
shader_program->FragmentProgram->Parameters->ParameterValues). And
most importantly, at the moment no actual codegen is hooked up other
than emitting our favorite color to the framebuffer.
|
|
Combined with the previous pass, this lets other optimization passes
do their work thanks to ir_tree_grafting. Still have regression in
instruction count with INTEL_NEW_FS, but register count is even
better.
|
|
This is a step towards implementing a GLSL IR backend for the 965
fragment shader. Because it has downsides with the current codegen,
it is hidden under the environment variable INTEL_NEW_FS.
This results in an increase in instruction count at the moment (1444
-> 1752 for glsl-fs-raytrace, 345 -> 359 on my demo), because dot
products are turned into a series of multiplies and adds instead of a
custom expansion of MULs and MACs, and by not splitting the variable
types up we don't get tree grafting and thus there are extra moves of
temporary storage. However, register count drops for the non-GLSL
path (64 -> 56 on my demo shader) because the register allocator sees
all the sub-operations.
|
|
|
|
Only 8 out of the up to 13 regs are for source/dest depth, so the name
wasn't particularly appropriate. Note that this doesn't count the
constant or URB payload regs. Also, don't pre-divide by 2, so it's
actually a number of registers.
|
|
This pulls in multiple i965 driver fixes which will help ensure better
testing coverage during development, and also gets past the conflicts
of the src/mesa/shader -> src/mesa/program move.
Conflicts:
src/mesa/Makefile
src/mesa/main/shaderapi.c
src/mesa/main/shaderobj.h
|
|
The original glsl compiler would generate a.x * b.x + a.y * b.y, which
we would do mul+mul+add for instead of this mul+mac.
Fixes glsl-fs-dot-vec2.
|
|
The old compiler didn't use SSG, and instead emitted SGT/SGT/SUB. We
can do a little better for SSG than we do for the SGT series.
|
|
|
|
If you used all 4 color targets we currently support, we would see 0
and end up just writing the first output. Give enough bits that we
can do the maximum of 16.
Fixes piglit fbo-drawbuffers-maxtargets.
|
|
The hope is to later take advantage of the reduced constant usage to
free up regs. This only covers the GLSL path at the moment, because
the brw_wm_emit path doesn't get the information as to whether a float
value is a constant or a uniform.
|
|
This would be triggered by use of sqrt() along with control flow.
Fixes piglit-fs-sqrt-branch and a bug in Yo Frankie!.
|
|
As with swrast, this fixes the default pixel center behavior which was
broken, and implements the previous behavior for integer. Fixes
piglit fp-arb-fragment-coord-conventions-none. The extension won't be
exposed until we get the GLSL part implemented.
The DRI1 origin_x/y parts are dropped since they're no longer relevant.
|
|
|
|
Add a GLbitfield64 type and several macros to operate on 64-bit
fields. The OutputsWritten field of gl_program is changed to use that
type. This results in a fair amount of fallout in drivers that use
programs.
No changes are strictly necessary at this point as all bits used are
below the 32-bit boundary. Fairly soon several bits will be added for
clip distances written by a vertex shader. This will cause several
bits used for varyings to be pushed above the 32-bit boundary. This
will affect any drivers that support GLSL.
At this point, only the i965 driver has been modified to support this
eventuality.
I did this as a "squash" merge. There were several places through the
outputswritten64 branch where things were broken. I foresee this
causing difficulties later for bisecting. The history is still
available in the branch.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm.h
|
|
This should fix TXB on G45 and older in the GLSL case.
|
|
New comments should explain some of the confusion about how this message
works.
|
|
For an app that's blowing out the state cache, like sauerbraten, the
memset of the giant arrays ended up taking 11% of the CPU even when only a
"few" of the entries got used. With this, the WM program compile drops back
down to 1% of CPU time.
Bug #24981 (bisected to BRW_WM_MAX_INSN increase).
|
|
|
|
This should fix issues with antialiased lines in GLSL.
|
|
The PINTERP code should be faster for brw_wm_glsl.c now since brw_wm_emit.c's
had been improved, and pixel_w should no longer stomp on a neighbor to dst.
|
|
|
|
|
|
|
|
|
|
|
|
This drops support for get_src_reg_imm in these, but the prospect of getting
brw_wm_pass*.c onto our GLSL path is well worth some temporary pain.
|
|
This matches brw_wm_emit.c, which we'll be using shortly. There's a
possible penalty here in that we'll allocate registers for unused channels,
since we aren't doing ref tracking like brw_wm_pass*.c does. However, my
measurements on GM965 don't show any for either OA or UT2004 with the GLSL
path forced.
|
|
Part of fixing bug #24355.
|
|
GLushort is big enough for the swizzle and origin fields.
The key could probably be made smaller still by re-ordering things.
I'll hold off on that until after the outputswritten64 branch is merged.
The key will get a little larger again with the GLbitfield64 fields.
|
|
Put the state that we care about in the hash key.
Issue spotted by Keith Whitwell.
|
|
This makes things a bit easier to remember/understand.
|
|
Previously, it was trying to mess around with the varying's
WM setup data to produce a result. Along with not actually working when
passed a varying, this wouldn't work if you did dFd[xy]() on a temporary.
Instead, just calculate the derivative using the neighbors in the subspan.
|
|
|
|
I'll be using this in merging brw_wm_emit.c and brw_wm_glsl.c
|
|
This is preparation for merging of brw_wm_glsl.c and
brw_wm_emit.c, and glsl.c doesn't swizzle channel results around.
|
|
For some IZ setups, we'd forget to account for the source depth register
being present, so we'd both read the wrong reg, and write output depth to
the wrong reg.
Bug #22603.
|
|
Conflicts:
src/mesa/main/api_validate.c
|
|
For the TXP instruction we check if the texcoord is really a 4-component
atttibute which requires the divide by W step. This check involved the
projtex_mask field. However, the projtex_mask field was being miscalculated
because of some confusion between vertex program outputs and fragment
program inputs.
1. Rework the size_masks calculation so we correctly set bits corresponding
to fragment program input attributes.
2. Rename projtex_mask to proj_attrib_mask since we're interested in more
than just texcoords (generic varying vars too).
3. Simply the indexing of the size_masks and proj_attrib_mask fields.
4. The tracker::active[] array was mis-dimensioned. Use MAX_PROGRAM_TEMPS
instead of a magic number.
5. Update comments, add new assertions.
With these changes the Lightsmark demo/benchmark renders correctly, until
we eventually hit a GPU lockup...
|
|
...rather than with linear interpolation. Modern hardware should use
perspective-corrected interpolation for colors (as for texcoords).
glHint(GL_PERSPECTIVE_CORRECTION_HINT, mode) can be used to get
linear interpolation if mode = GL_FASTEST.
|
|
Before, if the VP output something that is in the attributes coming into
the WM but which isn't used by the WM, then WM would end up reading subsequent
varyings from the wrong places. This was visible with a GLSL demo
using gl_PointSize in the VS and a varying in the WM, as point size is in
the VUE but not used by the WM. There is now a regression test in piglit,
glsl-unused-varying.
|
|
They seem to be used for something else and using them for shader temps
seems to lead to GPU lock-ups.
Call _mesa_warning() when we run out of temps.
Also, clean up some debug code.
|
|
Make the use_const_buffer field per-program and only call the code which
updates the constant buffer's data if the flag is set.
This should undo the perf regression from 20f3497e4b6756e330f7b3f54e8acaa1d6c92052
(cherry picked from master, commit dc9705d12d162ba6d087eb762e315de9f97bc456)
|
|
Make the use_const_buffer field per-program and only call the code which
updates the constant buffer's data if the flag is set.
This should undo the perf regression from 20f3497e4b6756e330f7b3f54e8acaa1d6c92052
|
|
Use a bitvector of used/free flags.
If we run out of temps, examine the live intervals of the temp regs in
the program and free those which are no longer alive.
Also, enable the new WM const buffer code.
|
|
Everything is in place now for using a true constant buffer for GLSL fragment
shaders. Still some bugs to find though.
|
|
|
|
This also cuts instructions by just using the existing bit in the payload
rather than computing it from the determinant in the SF unit and passing it
as a varying down to the WM. Something still goes wrong with getting the
backface color right, but a simpler shader appears to get the right result.
|