Age | Commit message (Collapse) | Author |
|
This is a multi-threading optimization which hides the kernel overhead
behind a thread. It improves performance in CPU-limited apps by 2-15%.
Of course you must have at least 2 cores for it to make any difference.
It can be disabled with:
export RADEON_THREAD=0
|
|
Ooops.
|
|
The VBO module uses both, but they are somewhat opposite to each other.
In this case, we pick UNSYNCHRONIZED and ignore DONTBLOCK.
|
|
This is the last one I think.
|
|
Because an app may do something like this:
while (!(ptr = bo_map(..., DONT_BLOCK))) {
/* Do some other work. */
}
And it would be looping endlessly if we didn't flush.
|
|
|
|
Accidentally negated in 685c3262b945a7f0e9f1f3a9409a12fdda08c828.
|
|
|
|
This should prevent calling into radeon_get_reloc when there's
only one context.
|
|
|
|
|
|
We don't need the read/write flags.
|
|
Based on Dave's branch.
The majority of this commit is a cleanup, mainly renaming things.
There wasn't much code to import, just ioctl calls.
Also done:
- implemented unsynchronized bo_map (important optimization!)
- radeon_bo_is_referenced_by_cs is no longer a refcount hack
- dropped the libdrm_radeon dependency
I'm surprised that this has resulted in less code in the end.
|
|
Print warnings and continue build.
|
|
|
|
|
|
|
|
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
|
|
|
|
|
|
|
|
Small perf improvement in ipers.
radeon_drm_get_cs_handle is exactly what this commit tries to avoid
in every write_reloc.
|
|
On lightsmark on my r500 this drop the bufmgr allocations of the sysprof.
|
|
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=31841
|
|
|
|
NOTE: This is a candidate for the 7.9 branch.
|
|
Based on commit 3ddc714b20ac4e28b80c6f88d1993445fff2262c by Dave Airlie.
NOTE: This is a candidate for the 7.9 branch.
|
|
caused by 0b9eb5c9bb03e5134d9a41786178100109e80c5a
test run glxgears, resize.
|
|
This fixes a DRM deadlock in the cubestorm xscreensaver, because somehow
there must not be 2 different BOs relocated in one CS if both BOs back
the same handle. I was told it is impossible to happen, but apparently
it is not, or there is something else wrong.
|
|
https://bugs.freedesktop.org/show_bug.cgi?id=30145
|
|
If the buffer we are attempting to map is referenced by the unsubmitted
command stream for this context, we need to flush the command stream,
however to do that we need to be able to access the context at the lowest
level map function, currently we set the buffer in the toplevel map, but this
racy between context. (we probably have a lot more issues than that.)
I'll look into a proper solution as suggested by jrfonseca when I get some time.
|
|
|
|
This makes it compatible with the modified DRM interface in drm-radeon-testing.
Also, now you need to set RADEON_HYPERZ=1 to be able to use hyperz.
It's not bug-free yet.
|
|
|
|
|
|
This implements fast Z clear, Z compression, and HiZ support for r300->r500
GPUs.
It also allows cbzb clears when fast Z clears are being used for the ZB.
It requires a kernel with hyper-z support.
Thanks to Marek Olšák <maraeo@gmail.com>, who started this off, and Alex Deucher at AMD for providing lots of hints.
v2:
squashed zmask ram size fix]
squashed r300g/blitter: fix Z readback when compressed]
v3:
rebase around texture changes in master - .1 fix more bits
v4:
migrated to using u_mm in r300_texture to manage hiz/zmask rams consistently
disabled HiZ when using OQ
flush z-cache before turning hyper-z off
update hyper-z state on dsa state change
store depthclearvalue across cbzb clears and replace it afterwards.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
The driver gets a buffer and its size in resource_from_handle.
It computes the required minimum buffer size from given texture
properties, and compares the two sizes.
This is to early detect DDX bugs.
|
|
|
|
|
|
Also print relocation failures on non-debug builds too.
|
|
|
|
This flush happens when changing the tiling flags, and it should really be
done in the context.
I hope this fixes FDO bug #28630.
|
|
Conflicts:
src/gallium/state_trackers/egl/x11/native_dri2.c
src/gallium/state_trackers/egl/x11/native_x11.c
src/gallium/state_trackers/egl/x11/native_x11.h
src/gallium/state_trackers/xorg/xorg_driver.c
src/gallium/winsys/radeon/drm/radeon_drm.c
|
|
|
|
|
|
Currently unconditional and causes segfaults.
|
|
|
|
|
|
I have had a look at the libdrm sources and they just contain more or less
the same checking we do in macros, and begin_cs may realloc the CS buffer
if we overflow it, which never happens with r300g. So these are pretty
much useless.
There is a small but measurable performance increase by dropping the two
functions.
|
|
|