Age | Commit message (Collapse) | Author |
|
|
|
Apply Jose's suggestions for a small but measurable improvement in
isosurf.
|
|
This reverts commit 9773722c2b09d5f0615a47cecf4347859474dc56.
Looks like there are some floor/rounding issues here that need
to be better understood.
|
|
|
|
|
|
MSVC doesn't accept more than 3 __m128i arguments.
|
|
|
|
Avoid accumulating more and more fixed point bits.
|
|
|
|
There was actually a large quantity of scalar code in these functions
previously. This tries to move more into intrinsics.
Introduce an sse2 mm_mullo_epi32 replacement to avoid sse4 dependency
in the new rasterization code.
|
|
The engine is a global owned by gallivm module.
|
|
Useful to amortize the command submission/reloc overhead (e.g. etracer
goes from 72 to 109 FPS on nv4b).
|
|
fixes https://bugs.freedesktop.org/show_bug.cgi?id=30771
Reported-by: Kevin DeKorte
|
|
|
|
|
|
This could probably be done much nicer, I've spent a day chasing
a coherency problem in the kernel, that turned out to be incorrect
scissor setup.
|
|
fixes glsl1-2D Texture lookup with explicit lod (Vertex shader)
|
|
We need to move the texture sampler resources out of the range of the vertex attribs.
We could probably improve this using an allocator but this is the simple answer for now.
makes mesa-demos/src/glsl/vert-tex work.
|
|
|
|
|
|
|
|
Simply rely on mem2reg pass. It's easier and more reliable.
|
|
|
|
|
|
We've been using these in the linear path for a while now. Based on
Chris's SSSE3 code, but using only sse2 opcodes. Speed seems to be
identical, but code is simpler & removes dependency on SSE3.
Should be easier to extend to other rgba8 formats.
|
|
Specifically, can do early-depth-test even when alpahtest or
kill-pixel are active, providing we defer the actual z write until the
final mask is avaialable.
Improves demos/fire.c especially in the case where you get close to
the trees.
|
|
Don't branch more than once in quick succession. Don't branch at the
end of the shader.
|
|
Avoid unnecessary masking of non-existant stencil component.
|
|
Better than GALLIVM_DEBUG if you're only interested in fragment shaders.
|
|
Don't try to emit our own phi's, let llvm mem2reg do it for us.
|
|
Don't calculate 1/w for quads which aren't visible...
|
|
The current interpolation schemes causes precision loss.
Changing the operation order helps, but does not completely avoid the
problem.
The only short term solution is to clamp z to 1.0.
This is unfortunate, but probably unavoidable until interpolation is
improved.
|
|
|
|
|
|
|
|
|
|
|
|
Avoid multiplying fixed-point values. Calculate triangle area in
floating point use that for culling.
Lift area calculations up a level as we are already doing this in the
triangle_both() case.
Would like to share the calculated area with attribute interpolation,
but the way the code is structured makes this difficult.
|
|
|
|
these aren't used anywhere, so just waste memory.
|
|
|
|
we should be checking output array not input to decide.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
we want to use the format from the sampler view which isn't always the
same as the texture format when creating sampler views.
|
|
interp data is stored in gpr0 so first interp overwrote it
and subsequent ones got wrong values
reserve register 0 so it's not used for attribs.
alternative is to interpolate attrib0 last (reverse, as r600c does)
|
|
Only cosmetic changes. No actual practical difference.
|
|
|
|
Q coordinate's coefficients also need to be multiplied by w, otherwise
it will have 1/w, causing problems with TXP.
|
|
Once a fragment is generated with LP_INTERP_PERSPECTIVE set for an input,
it will do a divide by w for that input. Therefore it's not OK to treat LP_INTERP_PERSPECTIVE as
LP_INTERP_LINEAR or vice-versa, even if the attribute is known to not
vary.
A better strategy would be to take the primitive in consideration when
generating the fragment shader key, and therefore avoid the per-fragment
perspective divide.
|
|
|
|
|