summaryrefslogtreecommitdiff
path: root/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
AgeCommit message (Collapse)Author
2011-03-08r300g: decide whether a flush should be asynchronous when calling itMarek Olšák
Thread offloading is not sometimes desirable, e.g. when mapping a buffer.
2011-03-02r300g: do not use ioctl thread offloading on single-core machinesMarek Olšák
2011-02-19r300g: fix invalid dereference in winsysMarek Olšák
radeon_bo_unref may destroy the buffer, so call it after p_atomic_dec, not before.
2011-02-16r300g: fix a race between CS and SET_TILING ioctlsMarek Olšák
2011-02-15r300g: offload the CS ioctl to another threadMarek Olšák
This is a multi-threading optimization which hides the kernel overhead behind a thread. It improves performance in CPU-limited apps by 2-15%. Of course you must have at least 2 cores for it to make any difference. It can be disabled with: export RADEON_THREAD=0
2011-02-12r300g: improve function radeon_bo_is_referenced_by_csMarek Olšák
This should prevent calling into radeon_get_reloc when there's only one context.
2011-02-11r300g: import the last bits of libdrm and cleanup the whole thingMarek Olšák
Based on Dave's branch. The majority of this commit is a cleanup, mainly renaming things. There wasn't much code to import, just ioctl calls. Also done: - implemented unsynchronized bo_map (important optimization!) - radeon_bo_is_referenced_by_cs is no longer a refcount hack - dropped the libdrm_radeon dependency I'm surprised that this has resulted in less code in the end.
2011-01-08r300g: rework command submission and resource space checkingMarek Olšák
The motivation behind this rework is to get some speed by reducing CPU overhead. The performance increase depends on many factors, but it's measurable (I think it's about 10% increase in Torcs). This commit replaces libdrm's radeon_cs_gem with our own implemention. It's optimized specifically for r300g, but r600g could use it as well. Reloc writes and space checking are faster and simpler than their counterparts in libdrm (the time complexity of all the functions is O(1) in nearly all scenarios, thanks to hashing). (libdrm's radeon_bo_gem is still being used in the driver.) It works like this: cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and also adds the size of 'buf' to the used_gart and used_vram winsys variables based on the domains, which are simply or'd for the accounting purposes. The adding is skipped if the reloc is already present in the list, but it accounts any newly-referenced domains. cs_validate is then called, which just checks: used_vram/gart < vram/gart_size * 0.8 The 0.8 number allows for some memory fragmentation. If the validation fails, the pipe driver flushes CS and tries do the validation again, i.e. it validates only that one operation. If it fails again, it drops the operation on the floor and prints some nasty message to stderr. cs_write_reloc(cs, buf) just writes a reloc that has been added using cs_add_reloc. The read_domain and write_domain parameters have been removed, because we already specify them in cs_add_reloc. The space checking has been tested by putting small values in vram/gart_size variables.