Age | Commit message (Collapse) | Author |
|
Derivatives are now scalar.
Broken since 17dbd41cf23e7e7de2f27e5e9252d7f792d932f3.
|
|
|
|
There's no LLVM C LLVMBuildLoadVolatile() function so roll our own.
Not used anywhere at this time but can come in handy during debugging.
|
|
|
|
Instead of having a NAME_SOFTWARE check just use the GALLIUM_DRIVER
instead but set the default to native which is the same as not wrapped.
|
|
|
|
|
|
|
|
|
|
Disabling address printing is helpful for diffing.
|
|
|
|
The bug only happens on the AOS / fixed-pt path.
|
|
|
|
This is relying on lp_build_pack2 using the sse2 pack intrinsics which
handle clamping.
(Alternatively could have make it use lp_build_packs2 but it might
not even produce more efficient code than not using the fastpath
in the first place.)
|
|
There's no apparent reason for the former to exist. And they didn't
even have the same value.
|
|
|
|
There seems to be no reason for it, so do same math for both
(except the scale mul, of course).
|
|
|
|
Has similiar use cases to the S8X24 and X24S8 formats.
|
|
these formats are needed for hw that can sample and write stencil values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
this adds the capability + a stencil semantic id, + tgsi scan support.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
|
|
|
|
To allow more optimizations, in particular for direct textures.
|
|
Useful to give human legible names in other cases.
|
|
SSE support for 32bit and 16bit unsigned arithmetic is not complete, and
can easily result in inefficient code.
In most cases signed/unsigned doesn't make a difference, such as for
integer texture coordinates.
So remove uint_coord_type and uint_coord_bld to avoid inefficient
operations to sneak in the future.
|
|
We end up treating them as scalars in the end, and it saves some
instructions.
|
|
With this commit all explicit Phi emission is now gone.
|
|
GALLIVM_DEBUG=no_brilinear runtime option
|
|
We can't patch true-block at end-if time, as there is no guarantee that
the block at the beginning of the true stanza is the same at the end of
the true stanza -- other control flow elements may have been emitted half
way the true stanza.
Although this bug surfaced recently with the commit to skip mip filtering
when lod is an integer the bug was always there, although probably it
was avoided until now: e.g., cubemap selection nests if-then-else on the
else stanza, which does not suffer from the same problem.
|
|
|
|
No need for for a flow stack anymore.
|
|
|
|
Simply rely on mem2reg pass. It's easier and more reliable.
|
|
|
|
|
|
|
|
|
|
|
|
Stop disassembling on unconditional backwards jumps.
|
|
Don't branch more than once in quick succession. Don't branch at the
end of the shader.
|
|
LLVM seems to finds it easier to reason about these than our
mantissa-manipulation code.
|
|
|
|
Fixes slowdown in isosurf with earlier versions of llvm.
|
|
Operate simultanouesly on <width, height, depth> vector as much as possible,
instead of doing the operations on vectors with broadcasted scalars.
Also do the 24.8 fixed point scalar with integer shift of the texture size,
for unnormalized coordinates.
AoS path only for now -- the same thing can be done for SoA.
|
|
Only requires sse2 now.
|
|
Clamp against 0 instead of -0.5, which simplifies things.
The former version would have resulted in both int coords being zero
(in case of coord being smaller than 0) and some "unused" weight value,
whereas now the int coords will be 0 and 1, but weight will be 0, hence the
lerp should produce the same value.
Still not happy about differences between normalized and non-normalized...
|
|
Haven't looked at what code this exactly generates but URem can't be fast.
Instead of using two URem only use one and replace the second one with
select/add (this is what the corresponding aos code already does).
|
|
Rearrange order of operations a bit to make some clamps easier.
All calculations should be equivalent.
Note there seems to be some inconsistency in the clamp to edge case
wrt normalized/non-normalized coords, could potentially simplify this too.
|