Age | Commit message (Collapse) | Author |
|
Rather than waiting on the first batch after the last swapbuffers to be
retired, call into the kernel to wait upon the retirement of any request
less than 20ms old. This has the twofold advantage of (a) not blocking
any other clients from utilizing the device whilst we wait and (b) we
attain higher throughput without overloading the system.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Rather than performing lots of little writes to update the common bo
upon each update, write those into a static buffer and flush that when
full (or at the end of the batch). Doing so gives a dramatic performance
improvement over and above using mmaped access.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Dynamic arrays have the tendency to be small and so allocating a bo for
each one is overkill and we can exploit many efficiency gains by packing
them together.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Old MI_FLUSH command is deprecated on sandybridge blt.
|
|
|
|
Sometimes I'm on the train and want to just read what's generated
under INTEL_DEBUG=vs,wm for some code on another generation. Or, for
the next gen enablement we'll want to dump aub files before we have
the actual hardware. This will let us do that.
|
|
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
|
|
No measurable performance difference on cairo-perf-trace, but
simplifies the code and should have cache benefit in general.
|
|
Must issue a pipe control with any non-zero post sync op before
write cache flush = 1 pipe control.
|
|
This came from commit cf255e382d147fe3ca450f0dcec3525190e7dcbc
|
|
|
|
The new API makes so much more sense, I'd like to forget how the old
one worked.
|
|
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|
|
|
|
Fixes the assert (and buffer overrun):
glknots: intel_batchbuffer.c:164: _intel_batchbuffer_flush: Assertion
'used >= batch->buf->size' failed.
Reported in bug:
Bug 28274 - xscreensaver's glknots hangs GPU (945GME/Pineview)
https://bugs.freedesktop.org/show_bug.cgi?id=28274
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
This is a workaround for Ironlake errata. The emit_mi_flush is used
for a few purposes:
1) Flushing write caches for RTT (including blit to texture)
2) Pipe fencing for sync objects
3) Spamming cache flushes to track down cache flush bugs
Spamming cache flushes seems less important than following the docs,
and we should probably do that with a different mechanism than the one
for render cache flushes.
|
|
Before we would throttle in the flush callback prior to round-tripping
to the server to do copyregion or swapbuffer. Now, instead just note
that we need to throttle and do it in intel_prepare_render(), which
will be called after receiving the response from the server but before
we start rendering the next frame. Even if the server also throttles
us in swapbuffer, this just makes the throttling a no-op when we hit
intel_prepare_render(). With that we can drop the
using_dri2_swapbuffers hack and just always throttle.
|
|
|
|
|
|
Cuts another 1800 bytes from the driver.
|
|
This improves tiled texture performance of OA on my 945 from 25.3fps
to 29.0fps, whereas untiled is 28.2fps, by avoiding stalls for fence
register changes.
|
|
This was accidentally (it seems) deleted in
5203b7227ccb6b618fa42f08434d4a3cf123dca2
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This really isn't supported at this point. GEM's been in the kernel for
a year, and the fake bufmgr never really worked.
|
|
|
|
This should do all the things that MI_FLUSH did, but it can be pipelined
so that further rendering isn't blocked on the flush completion unless
necessary.
|
|
|
|
Conflicts:
src/mesa/main/state.c
|
|
This fixes jerkiness in doom3 and other apps since the kernel change to
throttle less absurdly, which led to a thundering herd of frames.
Because this is a rather minimal fix, there is at least one downside: If
the whole scene completes in one batchbuffer, we'll end up stalling the GPU.
Thanks to Michel Dänzer for suggesting using glFlush to signal frame end
instead of going to all the effort of adding a new DRI2 extension.
(cherry picked from master, commit 0828579a658af01a64b5e699175dc9bbbedcd685)
|
|
This fixes jerkiness in doom3 and other apps since the kernel change to
throttle less absurdly, which led to a thundering herd of frames.
Because this is a rather minimal fix, there is at least one downside: If
the whole scene completes in one batchbuffer, we'll end up stalling the GPU.
Thanks to Michel Dänzer for suggesting using glFlush to signal frame end
instead of going to all the effort of adding a new DRI2 extension.
|
|
|
|
I keep wanting to hack this knob in as a one-time thing, so it seemed useful
to have all the time.
|
|
This ensures all batchbuffers have a same cliprect mode after calling
_intel_batchbuffer_flush even if there aren't invalid commands in the
current batch buffer. (fix bug#18362).
|
|
This avoids issues with dereferencing stale cliprects around intel_draw_buffer
time. Additionally, take advantage of cliprects staying constant for FBOs and
DRI2, and emit cliprects in the batchbuffer instead of having to flush batch
each time they change.
|
|
|
|
|
|
Mesa requires that we be able to share objects between contexts, which means
that the objects need to be created by the same bufmgr, and the bufmgr
internally requires pthread protection for thread safety.
Rely on the bufmgr having appropriate locking.
|
|
This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
|
|
This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm_surface_state.c
|
|
To do this, I had to clean up some of 965 state upload stuff. We may end
up over-emitting state in the aperture overflow case, but that should be rare,
and I'd rather have the simplification of state management.
|