Age | Commit message (Collapse) | Author |
|
|
|
|
|
this is more a proof to show vector shifts on x86 with per-element shift count
are evil. Since we can avoid the shift with a single compare/select, use that
instead. Replaces more than 20 instructions (and slow ones at that) with about 3,
and cuts compiled shader size with mesa's yuvsqure demo by over 10%
(no performance measurements done - but selection is blazing fast).
Might want to revisit that for future cpus - unfortunately AVX won't have vector
shifts neither, but AMD's XOP will, but even in that case using selection here
is probably not slower.
|
|
While it's true that llvm can and will indeed replace this with bit
arithmetic (since block height/width is POT), it does so (llvm 2.7) by element
and hence extracts/shifts/reinserts each element individually.
This costs about 16 instructions (and extract is not really fast) vs. 1...
|
|
looks like pot_depth should be used, not pot_height
(found by accident, not verified)
|
|
NOTE: This is a candidate for the 7.9 branch.
|
|
|
|
Before, changing any of these sampler values triggered generation
of new JIT code. Added a new flag for the special case of
min_lod == max_lod which is hit during auto mipmap generation.
|
|
This commit fixes an infinite loop in foreach_s if remove_from_list is used
more than once on the same item with other list operations in between.
NOTE: This is a candidate for the 7.9 branch because the commit
"r300g: fixup long-lived BO maps being incorrectly unmapped when flushing"
depends on it.
|
|
Some pathological triangles cause a theoritically impossible number of
clipped vertices.
The clipper will still assert, but at least release builds will not
crash, while this problem is further investigated.
|
|
If a triangle was completely culled by clipping, we would still try to
fix up its provoking vertex.
|
|
reimplement the flush stage added for r300 to allow a custom DSA stage
to be used in the pipeline, this allows for r600 hw DB->CB flushes.
|
|
As introduced with commit d21301675c249602e19310d5b62fad424f2f2ac2
NOTE: This is a candidate for the 7.9 branch.
|
|
Unfortunately this can cause segfault with LLVM 2.6, if x is a constant.
|
|
The old code didn't really make sense. We only need to compare the
X channel of the texture (depth) against the texcoord.
For (bi)linear sampling we should move the calls to this function
and compute the final result as (s1+s2+s3+s4) * 0.25. Someday.
This fixes the glean glsl1 shadow2D() tests. See fd.o bug 29307.
|
|
|
|
|
|
|
|
Thanks to José for the more complete list of supported opcodes.
NOTE: This is a candidate for the 7.9 branch.
|
|
NOTE: This is a candidate for the 7.9 branch.
|
|
|
|
As it was, we weren't obeying the draw->pipeline.point_sprite state.
Fixes point sprites in llvmpipe driver.
|
|
|
|
|
|
These can be used by other drivers, like r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
|
|
|
|
Depth-only and stencil-only clears should mask out depth/stencil from the
output, mask out stencil/input from input, and OR or ADD them together.
However, due to a typo they were being ANDed, resulting in zeroing the buffer.
|
|
Implement the pipe_rasterizer_state::sprite_coord_enable field
in the draw module (and softpipe) according to what's specified
in the documentation.
The draw module can now add any number of extra vertex attributes
to a post-transformed vertex and generate texcoords for those
attributes per sprite_coord_enable. Auto-generated texcoords
for sprites only worked for one texcoord unit before.
The frag shader gl_PointCoord input is now implemented like any
other generic/texcoord attribute.
The draw module now needs to be informed about fragment shaders
since we need to look at the fragment shader's inputs to know
which ones need auto-generated texcoords.
Only softpipe has been updated so far.
|
|
Fixes glean texture_srgb test.
|
|
Fixes fd.o bug 30245
|
|
Basically, change the loop from:
do {...} while (--num_inputs != 0)
into:
while (num_inputs != 0) { ... --num_inputs; }
Fixes fd.o bug 29987.
|
|
|
|
|
|
Prevents crashes with bogus data, or bad shader translation.
|
|
|
|
We're actually doing a double swizzling:
indirect_reg->Swizzle[indirect_reg->SwizzleX]
instead of simply
indirect_reg->SwizzleX
|
|
|
|
|
|
The permutation vector must always be a vector of int32 values.
|
|
|
|
|
|
...and all texture targets (1D/2D/3D/CUBE).
|
|
The shadow versions of the texture targets use an extra component
(Z) to express distance from light source to the fragment.
Fixes the shadowtex demo with llvmpipe.
|
|
|
|
|
|
Changes in v3:
- Also change trace, which I forgot about
Changes in v2:
- No longer adds tessellation shaders
Currently each shader cap has FS and VS versions.
However, we want a version of them for geometry, tessellation control,
and tessellation evaluation shaders, and want to be able to easily
query a given cap type for a given shader stage.
Since having 5 duplicates of each shader cap is unmanageable, add
a new get_shader_param function that takes both a shader cap from a
new enum and a shader stage.
Drivers with non-unified shaders will first switch on the shader
and, within each case, switch on the cap.
Drivers with unified shaders instead first check whether the shader
is supported, and then switch on the cap.
MAX_CONST_BUFFERS is now per-stage.
The geometry shader cap is removed in favor of checking whether the
limit of geometry shader instructions is greater than 0, which is also
used for tessellation shaders.
WARNING: all drivers changed and compiled but only nvfx tested
|
|
|
|
If the buffer we are attempting to map is referenced by the unsubmitted
command stream for this context, we need to flush the command stream,
however to do that we need to be able to access the context at the lowest
level map function, currently we set the buffer in the toplevel map, but this
racy between context. (we probably have a lot more issues than that.)
I'll look into a proper solution as suggested by jrfonseca when I get some time.
|
|
This is erroneously throwing non plain formats out of the faster
AoS sampling path.
Doing 8bit interpolation for single channels such as L8 should be no
worse than with floating point. But this may need more investigation.
|