Age | Commit message (Collapse) | Author |
|
batchbuffer > aperture size.
So with compiz on Intel hw with fake bufmgr, opening 4 firefox windows at 1680x1050 and hitting alt-tab, could cause the batchbuffer to try and reference more than the 32MB of RAM allocated.
Fix 1:
Fix 1 is to pre-verify the list of buffers against the current batchbuffer and if it can't possibly fit in the aperture to flush the batchbuffer to the hardware
and try again. If the buffers still can't fit well then you are hosed as I'm not sure there is a nice way to tell anyone.
Fix 2:
Next problem was that even with a simple check for total < aperture, we ran
into fragmentation issues, this meant that half way down a set of buffers,
we would fail as no blocks were available. Fix this by nuking the memory
manager from orbit and letting it start again and relayout the blocks in a
manner that fits.
Fix 3:
Finally the initial problem we were seeing was a memcpy to a NULL backing store.
We seem to end up with a texture at some point that never gets mapped but ends up with data in it. compiz al-tab icons have this property. So I created a card dirty bit that memcpy's any buffer that is !static and is written to back to memory. This probably is wrong but it makes compiz work for now.
Caveats:
965 support is still fail.
|
|
It was harmless, as the only time we need to clear PRESUMED_OFFSET, the
variable had been initialized already.
|
|
This is defaulted off as it has potentially large memory costs for a modest
performance gain. Ideally we will improve DRM performance to the point where
this optimization is not worth the memory cost in any case, or find some
middle ground in caching only limited numbers of certain buffers. For now,
this provides a modest 4% improvement in openarena on GM965 and 10% in openarena
on GM945.
|
|
|
|
It's not worth the extra effort to avoid a free/malloc, and we'd rather
auto-size the reloc data buffer at some point so we don't need to have
max_relocs.
|
|
Harmless since we don't yet have any bits above 31 for flags.
|
|
|
|
|
|
The failure mode that was a available was:
reloc 1 -> target_buf
exec: PRESUMED_OFFSET wrong, buffer migrates, r1 entry updated.
reloc 2 -> target_buf
exec: suppose buffer migrates again. PRESUMED_OFFSET wrong. r2 entry updated.
reloc 1 -> target_buf
exec: suppose buffer doesn't migrate. PRESUMED_OFFSET right. no relocations
performed. r1 has stale pointer at original location.
Failures were reported with OGLconform's VBO test and SPECviewperf90, though
I haven't confirmed that this fixes it.
|
|
This reverts commit e2cb905bc6e23eaafaeeb2abdc9480e70959ee3f.
It was a reversion of an optimization hidden as otherwise.
pre_target_buf_handle was always NULL, so the optimization was never enabled,
rather than fixing the important optimization (resulting in 25-50% performance
loss).
|
|
buffer is used for a relocatee in the former relocation process
then another target buffer is used for this relocatee at the same
offset in the current relocation process.
|
|
|
|
|
|
|
|
Around 10% of a CPU was being wasted to create the linked list which we
threw out immediately after passing it to the kernel.
|
|
|
|
Good for a 10-15% improvement to OA.
|
|
|
|
|
|
|
|
By avoiding the repeated relocation buffer creation/map/unmap/destroy for each
new batch buffer, this improves OpenArena framerates by 30%. Caching batch
buffers themselves doesn't appear to be a significant performance win over
this change.
|
|
We have two consumers of relocations. One is static state buffers, which
want the same relocation every time. The other is the batchbuffer, which gets
thrown out immediately after submit. This lets us reduce repeated computation
for static state buffers, and clean up the code by moving relocations nearer
to where the state buffer is computed.
|
|
This reverts commit 8bb9ae3693362a302206255c61f512d942df9bbf.
Validating our kernel buffers with the caching off in flags but on in mask
means that the kernel migrates the buffer to be uncached, which is undesired.
|
|
|
|
The 'mask' value used in the validation operation specifies which of the
'flags' bits are being modified. Buffer validation wants to pass the memory
type and access mode (rwx) to the kernel so that the buffer will be placed
correctly, and so that the right kind of fence will be created (read vs
write). That means we actually want a constant mask for these operations,
and not something computed from the bits coming in. The constant we want is
DRM_BO_MASK_MEM | DRM_BO_FLAG_READ | DRM_BO_FLAG_WRITE | DRM_BO_FLAG_EXE.
|
|
|
|
|
|
|
|
|
|
Each buffer object now has a relocation buffer pointer, which contains the
relocations for the buffer if there are any. At the point where we have to
create a new type of relocation entry, we can change the code over to allowing
multiple relocation lists, but trying to anticipate what that'll look like
now just increases complexity.
This is a 30% performance improvement on 965.
|
|
Now that the dri_bufmgr is stored in the context rather than the screen, all
access to one is single-threaded anyway.
|
|
|
|
Putting the bufmgr in the screen is not thread-safe since the emit_reloc
changes. It also led to a significant performance hit from pthread usage
for the attempted thread-safety (up to 12% of a cpu spent on refcounting
protection in single-threaded 965). The motivation had been to allow
multi-context bufmgr sharing in classic mode, but it wasn't worth the cost.
|
|
|
|
This takes advantage of the DRM_BO_HINT_PRESUMED_OFFSET change and allows
the kernel to avoid mapping and re-writing buffers when relocations occur.
|
|
|
|
|
|
|
|
The uint64_t flags (as defined by drm.h) were being used as unsigned ints in
many places.
|
|
|
|
|