Age | Commit message (Collapse) | Author |
|
|
|
This is a multi-threading optimization which hides the kernel overhead
behind a thread. It improves performance in CPU-limited apps by 2-15%.
Of course you must have at least 2 cores for it to make any difference.
It can be disabled with:
export RADEON_THREAD=0
|
|
Ooops.
|
|
The VBO module uses both, but they are somewhat opposite to each other.
In this case, we pick UNSYNCHRONIZED and ignore DONTBLOCK.
|
|
This is the last one I think.
|
|
Because an app may do something like this:
while (!(ptr = bo_map(..., DONT_BLOCK))) {
/* Do some other work. */
}
And it would be looping endlessly if we didn't flush.
|
|
|
|
Accidentally negated in 685c3262b945a7f0e9f1f3a9409a12fdda08c828.
|
|
|
|
This should prevent calling into radeon_get_reloc when there's
only one context.
|
|
|
|
|
|
We don't need the read/write flags.
|
|
Based on Dave's branch.
The majority of this commit is a cleanup, mainly renaming things.
There wasn't much code to import, just ioctl calls.
Also done:
- implemented unsynchronized bo_map (important optimization!)
- radeon_bo_is_referenced_by_cs is no longer a refcount hack
- dropped the libdrm_radeon dependency
I'm surprised that this has resulted in less code in the end.
|
|
|
|
|
|
Exactly one half would be the ideal, but this is a soft limit, and one
more byte over brings us to synchronous behavior.
Flushing when the referred GMR exceeds one third of the aperture gives us
statistically better performance.
|
|
|
|
If we see a MACRO bit on r600g its 2D tiled,
if don't see a MACRO bit and we do see a MICRO bit then its 1D tiled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
this just adds the ioctl interface and sets the tile type
and array mode in the correct place.
This seems to bring eg 1D tiling to the same level, and issues
as on r600. No idea how to address 2D yet.
|
|
Print warnings and continue build.
|
|
the context init is separate for these gpus.
|
|
6xx/7xx have a max of 4 DBs, evergreen have a max of 8.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
Like on some r5xx, there are multiple DB backends on the r600,
we need to add up the query results from each of these to get the
final correct value.
So far I'm not 100% sure how to calculate the num_db, value
setting it to 4 should be harmless enough until we do.
This fixes occulsion_query piglit test on my rv740.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
==5547== Conditional jump or move depends on uninitialised value(s)
==5547== at 0x8FE745D: r600_drm_winsys_create (r600_drm.c:86)
|
|
|
|
|
|
This avoid any issue when context is free and we still try to
access fence through radeon structure.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
|
|
still a bit of work to do, the winsys gen setting is a bit of a hack.
|
|
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
|
|
This adds support for Barts, Turks, and Caicos asics.
|
|
|
|
|
|
|
|
For drivers that does DMA transfers instead of mapping directly
|
|
|
|
|
|
|
|
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|
this code was pretty much duplicated, thanks to Henri Verbeet on irc for
pointing it out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
Spoted by Alex Diomin
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
|
|
|