Age | Commit message (Collapse) | Author |
|
There was actually a large quantity of scalar code in these functions
previously. This tries to move more into intrinsics.
Introduce an sse2 mm_mullo_epi32 replacement to avoid sse4 dependency
in the new rasterization code.
|
|
The engine is a global owned by gallivm module.
|
|
Useful to amortize the command submission/reloc overhead (e.g. etracer
goes from 72 to 109 FPS on nv4b).
|
|
fixes https://bugs.freedesktop.org/show_bug.cgi?id=30771
Reported-by: Kevin DeKorte
|
|
|
|
|
|
This could probably be done much nicer, I've spent a day chasing
a coherency problem in the kernel, that turned out to be incorrect
scissor setup.
|
|
fixes glsl1-2D Texture lookup with explicit lod (Vertex shader)
|
|
We need to move the texture sampler resources out of the range of the vertex attribs.
We could probably improve this using an allocator but this is the simple answer for now.
makes mesa-demos/src/glsl/vert-tex work.
|
|
|
|
|
|
|
|
Simply rely on mem2reg pass. It's easier and more reliable.
|
|
|
|
|
|
We've been using these in the linear path for a while now. Based on
Chris's SSSE3 code, but using only sse2 opcodes. Speed seems to be
identical, but code is simpler & removes dependency on SSE3.
Should be easier to extend to other rgba8 formats.
|
|
Specifically, can do early-depth-test even when alpahtest or
kill-pixel are active, providing we defer the actual z write until the
final mask is avaialable.
Improves demos/fire.c especially in the case where you get close to
the trees.
|
|
Don't branch more than once in quick succession. Don't branch at the
end of the shader.
|
|
Avoid unnecessary masking of non-existant stencil component.
|
|
Better than GALLIVM_DEBUG if you're only interested in fragment shaders.
|
|
Don't try to emit our own phi's, let llvm mem2reg do it for us.
|
|
Don't calculate 1/w for quads which aren't visible...
|
|
The current interpolation schemes causes precision loss.
Changing the operation order helps, but does not completely avoid the
problem.
The only short term solution is to clamp z to 1.0.
This is unfortunate, but probably unavoidable until interpolation is
improved.
|
|
|
|
|
|
|
|
|
|
|
|
Avoid multiplying fixed-point values. Calculate triangle area in
floating point use that for culling.
Lift area calculations up a level as we are already doing this in the
triangle_both() case.
Would like to share the calculated area with attribute interpolation,
but the way the code is structured makes this difficult.
|
|
|
|
these aren't used anywhere, so just waste memory.
|
|
|
|
we should be checking output array not input to decide.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
we want to use the format from the sampler view which isn't always the
same as the texture format when creating sampler views.
|
|
interp data is stored in gpr0 so first interp overwrote it
and subsequent ones got wrong values
reserve register 0 so it's not used for attribs.
alternative is to interpolate attrib0 last (reverse, as r600c does)
|
|
Only cosmetic changes. No actual practical difference.
|
|
|
|
Q coordinate's coefficients also need to be multiplied by w, otherwise
it will have 1/w, causing problems with TXP.
|
|
Once a fragment is generated with LP_INTERP_PERSPECTIVE set for an input,
it will do a divide by w for that input. Therefore it's not OK to treat LP_INTERP_PERSPECTIVE as
LP_INTERP_LINEAR or vice-versa, even if the attribute is known to not
vary.
A better strategy would be to take the primitive in consideration when
generating the fragment shader key, and therefore avoid the per-fragment
perspective divide.
|
|
|
|
|
|
this sets the stencil up for evergreen properly.
|
|
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
|
|
Since flush rework there could be only one relocation per
register in a block.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
|
|
Got a speed up by tracking the dirty blocks in a seperate list instead of looping through all blocks. This version should work with block that get their dirty state disabled again and I added a dirty check during the flush as some blocks were already dirty.
|
|
|
|
Flush read cache before writting register. Track flushing inside
of a same cs and avoid reflushing same bo if not necessary. Allmost
properly force flush if bo rendered too and then use as a texture
in same cs (missing pipeline flush dunno if it's needed or not).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
|
|
since we plan on using dx10 constant buffers everywhere.
|
|
These texture formats (like R16G16B16A16_UNORM) were untested until now
because st/mesa doesn't use them. I am testing this with a hacked st/mesa
here.
|
|
Add bo offset everywhere needed if r600_bo is ever a sub bo
of a bigger bo.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
|