Age | Commit message (Collapse) | Author |
|
|
|
|
|
Fixes glsl-algebraic-add-add-1.
|
|
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 31.791 32.287 1.11% 6/6
after:
[ 0] gl firefox-talos-gfx 31.198 31.675 0.96% 6/6
|
|
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
|
|
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
This was bothering me when redoing the binding tables.
|
|
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
|
|
In exchange we end up with an extra memcpy, but that seems better than
calloc/free. Each buffer is 4k maximum, and on the i965-streaming
branch this allocation was showing up as the top entry in
brw_validate_state profiling for cairo-gl.
|
|
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
We should be able to do 16, but are limited by Mesa's static buffer
allocations.
|
|
GL doesn't actually let you begin an OQ while one is active, so the
extra work was pointless.
|
|
|
|
|
|
|
|
Saves an instruction in PINTERP, LINTERP, and PIXEL_W from
brw_wm_glsl.c For non-GLSL it isn't used yet because the deltas have
to be laid out differently.
|
|
|
|
even with vs disabled, still doesn't work.
|
|
|
|
For the tiny bis of data we generally upload through the CURBEs, the
overhead of the kernel's pagetable trickery is actually rather high.
This improves cairo-gl gnome-terminal-vim performance by 3.8%.
|
|
|
|
The pull constants require sending out to an overworked shared unit
and waiting for a response, while push constants are nicely loaded in
for us at thread dispatch time. By putting things we access in every
VS invocation there, ETQW performance improved by 2.5% +/- 1.6% (n=6).
|
|
Everything has been constant-sized until now, but constant buffer
handling changes will make us want some additional variable sized
array.
|
|
As part of the DRI driver interface rewrite I merged __DRIscreenPrivate
and __DRIscreen, and likewise for __DRIdrawablePrivate and
__DRIcontextPrivate. I left typedefs in place though, to avoid renaming
all the *Private use internal to the driver. That was probably a
mistake, and it turns out a one-line find+sed combo can do the mass
rename. Better late than never.
|
|
Saves another 600 bytes or so of code.
|
|
Saves ~2KB of code.
|
|
|
|
|
|
Add a GLbitfield64 type and several macros to operate on 64-bit
fields. The OutputsWritten field of gl_program is changed to use that
type. This results in a fair amount of fallout in drivers that use
programs.
No changes are strictly necessary at this point as all bits used are
below the 32-bit boundary. Fairly soon several bits will be added for
clip distances written by a vertex shader. This will cause several
bits used for varyings to be pushed above the 32-bit boundary. This
will affect any drivers that support GLSL.
At this point, only the i965 driver has been modified to support this
eventuality.
I did this as a "squash" merge. There were several places through the
outputswritten64 branch where things were broken. I foresee this
causing difficulties later for bisecting. The history is still
available in the branch.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm.h
|
|
|
|
|
|
This keeps the individual state files from having to export their
structures for brw_state_cache initialization.
|
|
i965 might support more than 4 color draw buffers. But if not, this protects
from breakage if the Mesa limit is raised.
|
|
This reverts commit 8810b8f67135185d1044746bb861fe2ff997626c.
It turns out the i965 driver uses the intel->Fallback field as a boolean,
not as a bitmask. The intelFallback() function is a no-op in the i965
driver. It would have been nice if there were some comments about this.
I'll fix that next...
|
|
|
|
Setting intel->Fallback = 1 clobbered any fallback state that was already
set. Not sure where this hack originated (the git history is a little
convoluted). Define and use a new BRW_FALLBACK_DRAW bit instead. This
shouldn't break anything and could potentially fix some bugs (but no
specific ones are known).
|
|
|
|
The value was probably wrong too.
It was the same as INTEL_FALLBACK_DRAW_BUFFER.
|
|
|
|
Conflicts:
Makefile
configs/default
progs/glsl/Makefile
src/gallium/auxiliary/util/u_simple_shaders.c
src/gallium/state_trackers/glx/xlib/xm_api.c
src/mesa/drivers/dri/i965/brw_draw_upload.c
src/mesa/drivers/dri/i965/brw_vs_emit.c
src/mesa/drivers/dri/intel/intel_context.h
src/mesa/drivers/dri/intel/intel_pixel.c
src/mesa/drivers/dri/intel/intel_pixel_read.c
src/mesa/main/texenvprogram.c
src/mesa/main/version.h
|
|
|
|
The code duplication bothered me.
(cherry picked from commit 9b9cb30d128fc5f1ba77287696ecd508e640efde)
|
|
We'll use this for debug/sanity checking.
|
|
No performance difference proven at 95% confidence with my GLSL demo (n=10).
|
|
|
|
I was getting tired of doing the dance of INTEL_DEBUG=batch, copying it out,
and running intel-gen4disasm on it.
|
|
The code duplication bothered me.
|
|
This state flag has been unused since the ffvertex_prog move to core.
|
|
|
|
|