Age | Commit message (Collapse) | Author |
|
|
|
This is a step towards moving this code into the rasterizer.
|
|
|
|
Further reduce the size of a binned triangle.
|
|
|
|
|
|
But bin lazily only into bins which are receiving geometry.
|
|
|
|
|
|
Together with the previous commit, this generalize the benefits of
d2cf757f44f4ee5554243f3279483a25886d9927 to all depth formats, in
particular:
- simpler float -> 24unorm conversion
- avoid unsigned comparisons (not directly supported on SSE) by aligning
to the least significant bit
- avoid unecessary/repeated mask ANDing
Verified with trivial/tri-z that the exact same assembly is produced for
X8Z24.
|
|
Z32_FLOAT uses <4 x float> as intermediate/destination type,
instead of <4 x i32>.
The necessary bitcasts got removed with commit
5b7eb868fde98388d80601d8dea39e679828f42f
Also use depth/stencil type and build contexts consistently, and
make the depth pointer argument a ordinary <i8 *>, to catch this
sort of issues in the future (and also to pave way for Z16 and
Z32_FLOAT_S8_X24 support).
|
|
There's no apparent reason for the former to exist. And they didn't
even have the same value.
|
|
|
|
Apply Jose's suggestions for a small but measurable improvement in
isosurf.
|
|
This reverts commit 9773722c2b09d5f0615a47cecf4347859474dc56.
Looks like there are some floor/rounding issues here that need
to be better understood.
|
|
|
|
MSVC doesn't accept more than 3 __m128i arguments.
|
|
|
|
Avoid accumulating more and more fixed point bits.
|
|
|
|
There was actually a large quantity of scalar code in these functions
previously. This tries to move more into intrinsics.
Introduce an sse2 mm_mullo_epi32 replacement to avoid sse4 dependency
in the new rasterization code.
|
|
The engine is a global owned by gallivm module.
|
|
|
|
|
|
|
|
Simply rely on mem2reg pass. It's easier and more reliable.
|
|
|
|
|
|
We've been using these in the linear path for a while now. Based on
Chris's SSSE3 code, but using only sse2 opcodes. Speed seems to be
identical, but code is simpler & removes dependency on SSE3.
Should be easier to extend to other rgba8 formats.
|
|
Specifically, can do early-depth-test even when alpahtest or
kill-pixel are active, providing we defer the actual z write until the
final mask is avaialable.
Improves demos/fire.c especially in the case where you get close to
the trees.
|
|
Don't branch more than once in quick succession. Don't branch at the
end of the shader.
|
|
Avoid unnecessary masking of non-existant stencil component.
|
|
Better than GALLIVM_DEBUG if you're only interested in fragment shaders.
|
|
Don't try to emit our own phi's, let llvm mem2reg do it for us.
|
|
Don't calculate 1/w for quads which aren't visible...
|
|
The current interpolation schemes causes precision loss.
Changing the operation order helps, but does not completely avoid the
problem.
The only short term solution is to clamp z to 1.0.
This is unfortunate, but probably unavoidable until interpolation is
improved.
|
|
|
|
|
|
|
|
|
|
|
|
Avoid multiplying fixed-point values. Calculate triangle area in
floating point use that for culling.
Lift area calculations up a level as we are already doing this in the
triangle_both() case.
Would like to share the calculated area with attribute interpolation,
but the way the code is structured makes this difficult.
|
|
|
|
Only cosmetic changes. No actual practical difference.
|
|
|
|
Q coordinate's coefficients also need to be multiplied by w, otherwise
it will have 1/w, causing problems with TXP.
|
|
Once a fragment is generated with LP_INTERP_PERSPECTIVE set for an input,
it will do a divide by w for that input. Therefore it's not OK to treat LP_INTERP_PERSPECTIVE as
LP_INTERP_LINEAR or vice-versa, even if the attribute is known to not
vary.
A better strategy would be to take the primitive in consideration when
generating the fragment shader key, and therefore avoid the per-fragment
perspective divide.
|
|
|
|
|
|
Remove duplicated include.
Signed-off-by: Brian Paul <brianp@vmware.com>
|