Age | Commit message (Collapse) | Author |
|
Operate simultanouesly on <width, height, depth> vector as much as possible,
instead of doing the operations on vectors with broadcasted scalars.
Also do the 24.8 fixed point scalar with integer shift of the texture size,
for unnormalized coordinates.
AoS path only for now -- the same thing can be done for SoA.
|
|
Only requires sse2 now.
|
|
Clamp against 0 instead of -0.5, which simplifies things.
The former version would have resulted in both int coords being zero
(in case of coord being smaller than 0) and some "unused" weight value,
whereas now the int coords will be 0 and 1, but weight will be 0, hence the
lerp should produce the same value.
Still not happy about differences between normalized and non-normalized...
|
|
Haven't looked at what code this exactly generates but URem can't be fast.
Instead of using two URem only use one and replace the second one with
select/add (this is what the corresponding aos code already does).
|
|
Rearrange order of operations a bit to make some clamps easier.
All calculations should be equivalent.
Note there seems to be some inconsistency in the clamp to edge case
wrt normalized/non-normalized coords, could potentially simplify this too.
|
|
Sometimes coords are clamped to positive numbers before doing conversion
to int, or clamped to 0 afterwards, in this case can use itrunc
instead of ifloor which is easier. This is only the case for nearest
calculations unfortunately, except linear MIRROR_CLAMP_TO_EDGE which
for the same reason can use a unsigned float build context so the
ifloor_fract helper can reduce this to itrunc in the ifloor helper itself.
|
|
|
|
sse2 supports round to nearest directly (or rather, assuming default nearest
rounding mode in MXCSR). Use intrinsic to use this rather than round (sse41)
or bit manipulation whenever possible.
|
|
trunc of -1.5 is -1.0 not 1.0...
|
|
|
|
Doesn't change generated code quality, but saves some typing.
|
|
|
|
Also, pass more stuff trhough the sample build context, instead of
arguments.
|
|
|
|
|
|
Fixes assertion failures on LLVM 2.6.
|
|
Nice reduction in the number of operations required for final color
output in many shaders.
|
|
|
|
|
|
|
|
Forgot this one before.
|
|
|
|
|
|
If lod < 0, then invariably follows that ilevel0 == ilevel1 == 0.
|
|
|
|
More accurate/faster results for PIPE_TEX_MIPFILTER_NEAREST. Less
FP <-> SI conversion overall.
|
|
|
|
|
|
The only way to ensure we don't do redundant FP <-> SI conversions.
|
|
Not tested yet, but should be correct.
|
|
|
|
|
|
|
|
This lets us avoid the shift and max() operations.
|
|
|
|
|
|
The pipe_sampler_view's swizzle terms also apply to the texture border
color. Simply move the apply_sampler_swizzle() call after we fetch
the border color.
Fixes many piglit texwrap failures.
|
|
The trick of casting the coord to an unsigned value only works for POT
textures. Add a bias instead. This fixes a few piglit texwrap failures.
|
|
|
|
|
|
this is more a proof to show vector shifts on x86 with per-element shift count
are evil. Since we can avoid the shift with a single compare/select, use that
instead. Replaces more than 20 instructions (and slow ones at that) with about 3,
and cuts compiled shader size with mesa's yuvsqure demo by over 10%
(no performance measurements done - but selection is blazing fast).
Might want to revisit that for future cpus - unfortunately AVX won't have vector
shifts neither, but AMD's XOP will, but even in that case using selection here
is probably not slower.
|
|
While it's true that llvm can and will indeed replace this with bit
arithmetic (since block height/width is POT), it does so (llvm 2.7) by element
and hence extracts/shifts/reinserts each element individually.
This costs about 16 instructions (and extract is not really fast) vs. 1...
|
|
looks like pot_depth should be used, not pot_height
(found by accident, not verified)
|
|
|
|
Before, changing any of these sampler values triggered generation
of new JIT code. Added a new flag for the special case of
min_lod == max_lod which is hit during auto mipmap generation.
|
|
Unfortunately this can cause segfault with LLVM 2.6, if x is a constant.
|
|
The old code didn't really make sense. We only need to compare the
X channel of the texture (depth) against the texcoord.
For (bi)linear sampling we should move the calls to this function
and compute the final result as (s1+s2+s3+s4) * 0.25. Someday.
This fixes the glean glsl1 shadow2D() tests. See fd.o bug 29307.
|
|
|
|
Fixes fd.o bug 30245
|
|
|