Age | Commit message (Collapse) | Author |
|
We can't safely use fixed size arrays since Gen6+ supports unlimited
nesting of control flow.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
The code that generates MATH instructions attempts to work around
the hardware ignoring source modifiers (abs and negate) by emitting
moves into temporaries. Unfortunately, this pass coalesced those
registers, restoring the original problem. Avoid doing that.
Fixes several OpenGL ES2 conformance failures on Sandybridge.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
Single-operand math already had these workarounds, but POW (the only two
operand function) did not. It needs them too - otherwise we can hit
assertion failures in brw_eu_emit.c when code is actually generated.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
gl_PointSize (VERT_RESULT_PSIZ) doesn't take up a message register,
as it's part of the header. Without this fix, writing to gl_PointSize
would cause the SF to read and use the wrong attributes, leading to all
kinds of random looking failure.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
... should have no impact on a properly formatted draw operation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Don't trust the applications not to reference beyond the end of the
vertex buffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fixes regression from 559435d9152acc7162e4e60aae6591c7c6c8274b.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Fixes regression in scissor-stencil-clear and 5 other tests.
|
|
|
|
I was being overly miserly and gave the offset of the buffer into the bo
insufficient bits, distracted by the adjacency of the buffer[4096].
Ref: https://bugs.freedesktop.org/show_bug.cgi?id=34541
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
... a leftover from a bad merge.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Reducing the number of relocations has lots of nice knock-on effects,
not least including reducing batch buffer size, auxilliary array sizes
(vmalloced and copied into the kernel), processing of uncached
relocations etc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Replace the intermediate tests due to the logical or with the bitwise
or.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Rather than waiting on the first batch after the last swapbuffers to be
retired, call into the kernel to wait upon the retirement of any request
less than 20ms old. This has the twofold advantage of (a) not blocking
any other clients from utilizing the device whilst we wait and (b) we
attain higher throughput without overloading the system.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we will flush when reading the return values of the blit, we can forgo
the earlier flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the next vertex arrays are a (discontiguous) continuation of the
current arrays, such that the new vertices are simply offset from the
start of the current vertex buffer definitions we can reuse those
defintions and avoid the overhead of relocations and invalidations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Use a temporary glarray variable to replace the numerous input->glarray.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Using a temporary buffer for large discontiguous uploads into the common
buffer and a single buffered upload is faster than performing the
discontiguous copies through a mapping into the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Upload the non-vbo arrays into a single interleaved buffer object, and
so need to just emit a single vertex buffer relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If the user passed in several arrays interleaved in the same vbo, only
emit a single vertex buffer and relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Track reuse of the vertex buffer objects and so minimise the number of
vertex buffers used by the hardware (and their relocations).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we now pack the indices into a common upload buffer, we can reuse a
single CMD_INDEX_BUFFER packet and translate each invocation with a
start vertex offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we use state relocations and we know that all the state belongs to
the same bo, we can drop the multiple references to the same bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
In preparation for a greater change, use the color_calc_state_bo already
provisioned for this purpose.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Rather than performing lots of little writes to update the common bo
upon each update, write those into a static buffer and flush that when
full (or at the end of the batch). Doing so gives a dramatic performance
improvement over and above using mmaped access.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Rather than performing a blit to completely overwrite a busy bo, simply
discard it and create a new one with the fresh data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Reuse the new common upload buffer for uploading temporary indices and
rebuilt vertex arrays.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Dynamic arrays have the tendency to be small and so allocating a bo for
each one is overkill and we can exploit many efficiency gains by packing
them together.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Dynamic draw buffers are used by clients for temporary arrays and for
uploading normal vertex arrays. By keeping the data in memory, we can
avoid reusing active buffer objects and reallocate them as they are
changed. This is important for Sandybridge which can not issue blits
within a batch and so ends up flushing the batch upon every update, that
is each batch only contains a single draw operation (if using dynamic
arrays or regular arrays from system memory).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Following a GPU hang, or other error, the render target is not likely to
have an allocated BO and so we must fallback to avoid using it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32534
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
|
|
|
|
https://bugs.freedesktop.org/show_bug.cgi?id=34030
NOTE: This is a candidate for the 7.10 branch.
|
|
This an adds --enable-shared-dricore option to configure. When enabled,
DRI modules will link against a shared copy of the common mesa routines
rather than statically linking these.
This saves about 30MB on disc with a full complement of classic DRI
drivers.
v2: Only enable with a gcc-compatible compiler that handles rpath
Handle DRI_CFLAGS without filter-out magic
Build shared libraries with the full mklib voodoo
Fix typos
v3: Resolve conflicts with talloc removal patches
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
|
|
NOTE: This is a candidate for the 7.9 and 7.10 branches.
|