Age | Commit message (Collapse) | Author |
|
|
|
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
|
|
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
This was bothering me when redoing the binding tables.
|
|
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
|
|
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
even with vs disabled, still doesn't work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This may break the SUNOS4 build, but it's no longer relevant.
|
|
|
|
|
|
Bug #25194.
|
|
Before, if we just called glXMakeCurrent() and didn't render anything we'd
still trigger a flushFrontBuffer() call.
Now only set the intel->front_buffer_dirty field at state validation time
just before we draw something.
NOTE: additional calls to intel_check_front_buffer_rendering() might be
needed if I missed some rendering paths.
|
|
|
|
Its flagging of extra state that's already flagged by the vtbl new_batch
when appropriate was confusing my tracking down of the OA clear bug.
|
|
Check that all the textures needed by the current fragment program
actually exist and are valid.
|
|
No performance difference proven at 95% confidence with my GLSL demo (n=10).
|
|
This state flag has been unused since the ffvertex_prog move to core.
|
|
This can avoid re-uploading constant data when it isn't necessary, and is
a step towards not updating other surfaces just because constants change.
It also brings the upload of the constant buffer next to the creation.
This brings openarena performance up another 4%, to 91% of the Mesa 7.4 branch.
|
|
Also, only create VS surface state if there's a VS constant buffer to be
uploaded, and set the contents of the buffer at the same time as creation.
|
|
The new, second cache will only be used for surface-related items.
Since we can create many surfaces the original, single cache could get
filled quickly. When we cleared it, we had to regenerate shaders, etc.
With two caches, we can avoid doing that.
|
|
|
|
No more dynamic atoms so we can simplify the state validation code a little.
|
|
|
|
|
|
|
|
Everything now depends on either BRW_NEW_FRAGMENT_PROGRAM or
BRW_NEW_VERTEX_PROGRAM.
|
|
This was causing a prepare of wm state at every primitive emit.
|
|
|
|
Now i965 also uses the vertex program created by Mesa Core, but this vertex program
is not only depend on mesa state _NEW_PROGRAM, so always check the current vertex
program is updated or not. This fixes broken demo cubemap.
|
|
Previously, since my check_aperture API change, we would check each piece of
state against the batchbuffer individually, but not all the state against the
batchbuffer at once. In addition to not being terribly useful in assuring
success, it probably also increased CPU load by calling check_aperture many
times per primitive.
|
|
This avoids issues with dereferencing stale cliprects around intel_draw_buffer
time. Additionally, take advantage of cliprects staying constant for FBOs and
DRI2, and emit cliprects in the batchbuffer instead of having to flush batch
each time they change.
|
|
|