Age | Commit message (Collapse) | Author |
|
Use a single swizzled tile per colorbuf (and per thread) to avoid
accumulating large amounts of cached swizzled data.
Now that the SSE3 code has been merged to master, the performance delta
of this change is minimal, the main benefit is reduced memory usage
due to no longer keeping swizzled copies of render targets.
It's clear from the performance of the in-place version of this code
that there is still quite a bit of time being spent swizzling &
unswizzling, but it's not clear exactly how to reduce that.
|
|
-mssse3 is not supported/enabled.
|
|
|
|
And express all the other drawing functions in terms of
llvmpipe_draw_range_elements_instanced().
|
|
Thanks to Vinson for spotting this.
|
|
|
|
The lp_rast_shader_inputs' alignment is irrelevant now that it contains
pointers instead of actual data.
Likewise, lp_rast_triangle's size alignment is meaningless.
|
|
|
|
|
|
there's no point of having this per context, so move to screen
(and protect with a mutex).
|
|
|
|
Just put a pointer to the state in the tri->inputs struct. Remove
some complex logic for eliminating unused statechanges in bins at the
expense of a slightly larger triangle struct.
|
|
Move this code back out to C for now, will generate separately.
Shader now takes a mask parameter instead of C0/C1/C2/etc.
Shader does not currently use that parameter and rasterizes whole
pixel stamps always.
|
|
Rather than inserting an lp_rast_fence command at the end of each
bin, have each rasterizer thread call this function directly once
it has run out of work to do on a particular scene.
This results in fewer calls to the mutex & related functions, but more
importantly makes it easier to recognize empty bins.
|
|
This was already the case, but the generated (un)swizzling code was not
benefiting of that knowledge.
|
|
|
|
|
|
lp_test_round uses the math functions round and trunc, which aren't
available with MSVC.
Fixes the MSVC build for now.
|
|
|
|
|
|
it was wrong to put this in the fs paths, but it was easier to just
stuff it along the fragment texture sampling paths. the patch
disconnects vertex texture sampling and just maps the textures
before the draw itself and unmaps them after.
|
|
|
|
|
|
This fixes bug #28757, though does not yet address the issue that fences aren't
always emitted.
|
|
This allows to do the unpacking of formats that fit in 4 x unorm8 in
parallel, 4 pixels at a time.
|
|
Also fix the test.
|
|
|
|
Uses code and ideas from Brian Paul.
|
|
Allow for example to convert from 4 x float32 to 4 x unorm8 and vice versa.
Uses code and ideas from Brian Paul.
|
|
Unnecessary special case.
|
|
|
|
|
|
Functions for using dummy tiles when we detect OOM conditions.
|
|
changes.
It's a rare condition, but it may happen if all primitives are
clipped/culled.
For now we just do a no-op rasterization, but we could bypass it.
|
|
The previous rendering may have secondary effects on the zsbuf.
Fixes the missing tiles on gearbox.
|
|
Check for null pointers and return early, etc.
|
|
This undoes part of commit 8be645d53a0d5d0ca50e4e9597043225e2231b6d
and fixes fd.o bug 28822 as well as other regressions.
The 'draw' module may issue additional state-change commands while
we're inside the draw_arrays/elements() call so it's important to
check for updated state at this point.
|
|
laying down the foundation for everything and implementing most of the
stuff.
linking, gl_VerticesIn and multidimensional inputs are left.
|
|
|
|
The cpu_access is redundant in a software rasterizer.
|
|
lp_setup_bind_framebuffer().
We were starting a scene whenever lp_setup_get_vertex_info() was called by
the draw module. So when when all primitives were culled/clipped, not only
did we create a new scene for nothing, but we end up using the old scene
with the old framebuffer state instead of a new one.
Fix consists in:
- don't call lp_setup_update_state() in lp_setup_get_vertex_info() -- no
longer necessary
- always setting the scene state before binning a command -- query
commands were bypassing it
- assert no old scene is reused in lp_setup_bind_framebuffer()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
llvmpipe can create a large number of shader variants for a single shader
(which are quite big), and they were only ever deleted if the shader itself
was deleted. This is especially apparent in things like glean
blendFunc where a new variant is created for every different subtest, chewing
up all memory.
This change limits the numbers of fragment shader variants (for all shaders)
which are kept around to a fixed number. If that would be exceeded a fixed
portion of the cached variants is deleted (since without tracking the used
variants this involves flushing we don't want to delete only one).
Always the least recently used variants (from all shaders together) are
deleted.
For now this is all per-context.
Both the number of how many variants are cached (1024) as well as how many
will be deleted at once (1/4 of the cache size) are just rough guesses and
subject to further optimization.
|
|
fixes bug 28450.
|
|
we need to change it to support composite types
|