Age | Commit message (Collapse) | Author |
|
Improves performance of a hacked-up scissor-many (to reuse a small set
of scissors instead of blowing out the cache, and then to run 100x
more iterations so it actually took some time) by 3.6% +/- 1.2% (n=10)
|
|
Replace the intermediate tests due to the logical or with the bitwise
or.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
These are only used for debugging, but should be there.
Found by inspection.
|
|
|
|
This reverts commit de6fd527a545f8344e074312544517d05573fb72.
Revert this workaround as it seems the real trouble is caused by
lineloop, which doesn't require GS convert on sandybridge actually.
|
|
This makes conformance tests stable on sandybridge D0 to track
multisample state before SF/WM state.
|
|
This is the push constant buffer, not the pull constants.
|
|
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
|
|
This was slightly confused because gen6_wm_constants does the push
constant buffer, while brw_wm_constants does pull constants.
|
|
|
|
Fix incorrect scissor rect struct and missed scissor state pointer
setting for sandybridge.
|
|
Another optional ARB_imaging subset extension.
|
|
|
|
Fixes glsl-algebraic-add-add-1.
|
|
|
|
|
|
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
|
|
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
This was bothering me when redoing the binding tables.
|
|
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
|
|
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
even with vs disabled, still doesn't work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This may break the SUNOS4 build, but it's no longer relevant.
|
|
|
|
|
|
Bug #25194.
|
|
Before, if we just called glXMakeCurrent() and didn't render anything we'd
still trigger a flushFrontBuffer() call.
Now only set the intel->front_buffer_dirty field at state validation time
just before we draw something.
NOTE: additional calls to intel_check_front_buffer_rendering() might be
needed if I missed some rendering paths.
|
|
|
|
Its flagging of extra state that's already flagged by the vtbl new_batch
when appropriate was confusing my tracking down of the OA clear bug.
|
|
Check that all the textures needed by the current fragment program
actually exist and are valid.
|