Age | Commit message (Collapse) | Author |
|
Src/Dst aliasing (aka SOA dependencies) requires some care to ensure
intermediate results do not overwrite yet-to-be read source registers.
This change ensures that MOV/SWZ handle this correctly, which is poor but
no worse than the current tgsi_exec.c path. Remove the fallback as there
is nothing to be gained correctness-wise between the two implementations now.
Fixing this properly looks like a bit of work in this code, but might be
easily achieved by sending destination writes to temporary storage.
|
|
Fix recent performance regression.
|
|
|
|
Previously ureg would always call the driver's create-shader function. This
allows the caller the opportunity to hold onto the tokens if it needs to
reuse them, eg. to create an internal draw shader.
|
|
Couldn't previously emit these except by calling the opcode-specific helper.
|
|
Avoid the need to emit all constant declarations in order. Makes
referring to a specific constant in the constant buffer much easier.
|
|
Fix ureg_DECL_vs_input to reflect this and fix up all callers.
|
|
|
|
|
|
|
|
This fixes some issues when "return"ing from nested loops/conditionals.
|
|
|
|
Manual merge of ureg changes on the branch. Too much unrelated stuff
for a proper merge.
|
|
Can be implemented with CMP src2, src1, src0
|
|
Shorthand.
(cherry picked from commit de911220bbbe74cff0c79b260456ff36122b7b5b)
|
|
Simplifies migration to tgsi_ureg.
(cherry picked from commit f574398c07c41cb8d31249a7186fc178ef7d552a)
|
|
When translating an incoming shader (rather than building one from scratch)
it's preferable to be able to call a single, generic instruction emitter
rather than figuring out which of the opcode-specific functions to call.
|
|
|
|
Fall back to interpreter for now. This doesn't happen very often.
|
|
SOA dependencies can happen when a register is used both as a source and
destination and the source is swizzled. For example:
MOV T, T.yxwz; would expand into:
MOV t0, t1;
MOV t1, t0;
MOV t2, t3;
MOV t3, t2;
The second instruction will produce the wrong result since we wrote to t0
in the first instruction. We need to use an intermediate temporary to fix
this.
This will take more work to fix for all TGSI instructions. This seems to
happen with MOV instructions more than anything else so fix that case now
and warn on others.
Fixes piglit glsl-vs-loop test (when not using SSE). See bug 23317.
|
|
Users of the parser can make use of this.
|
|
(cherry picked from commit d2787c02c130b1fe20d0c032d468622f2fdaef79)
|
|
(cherry picked from commit aa40c9abc7787fdf46cb661a4d0bb8bec513fc63)
|
|
|
|
|
|
Could previously emit opcodes with label arguments, but was no way to
patch them with the actual destinations of those labels.
Adds two functions:
ureg_get_instruction_number - to get the id of the next instruction
to be emitted
ureg_fixup_label - to patch an emitted label to point to a given
instruction number.
Need some more complex examples than u_simple_shader, so far this has
only been compile-tested.
|
|
|
|
|
|
Fixes piglit fp-generic tests/shaders/generic/lrp_sat.fp, bug 23316.
|
|
This fixes invalid values for CondStackTop, LoopStackTop, etc.
|
|
|
|
Also fix a typo in ureg_src().
|
|
|
|
|
|
This is modelled on the nice & easy-to-use facilities we had
for building shaders in mesa, eg. in texenvprogram.c and friends.
Key points include pass-by-value register structs that can be manipulated
in a functional style, eg:
negate(swizzle(reg, X,X,X,X))
and per-opcode instruction functions, eg:
emit_MOV( p, writemask(dst, 0x1), negate(src));
and similar.
Additionally, the interface allows mixed emit of instructions and decls,
which are sorted out internally to obey TGSI ordering.
Immediates may be emitted at any time and are scanned against existing
immediates to try and reduce redundancy.
Not all TGSI functionality is accessible through this interface, but
most or all of what mesa uses should be.
|
|
|
|
|
|
|
|
The LOOP/ENDLOOP pair is renamed to BGNFOR/ENDFOR as its behaviour
is similar to a C language for-loop.
The BGNLOOP2/ENDLOOP2 pair is renamed to BGNLOOP/ENDLOOP as now
there is no name collision.
|
|
|
|
The only valid usage for LOOP/ENDLOOP instructions
is LOOP[0] as a destination register.
The only valid usage for the remaining instructions
is LOOP[0].x as an indirect register.
|
|
|
|
|
|
When sampling a 2D shadow map we need 3 texcoord components, not 2.
The third component (distance from light source) is compared against
the texture sample to return the result (visible vs. occluded).
Also, enable proper handling of TGSI_TEXTURE_SHADOW targets in Mesa->TGSI
translation. There's a possibility for breakage in gallium drivers if
they fail to handle the TGSI_TEXTURE_SHADOW1D / TGSI_TEXTURE_SHADOW2D /
TGSI_TEXTURE_SHADOWRECT texture targets for TGSI_OPCODE_TEX/TXP instructions,
but that should be easy to fix.
With these changes, progs/demos/shadowtex.c renders properly again with
softpipe.
|
|
Various opcodes which can be implemented trivially with other TGSI opcodes,
such as matrix multiplication and negation. These were not used by any
state tracker or implemented by any of the drivers.
|
|
|
|
This is a source of ongoing confusion. TGSI has multiple names for
opcodes where the same semantics originate in multiple shader APIs.
For instance, TGSI includes both Mesa/GLSL and DX/SM30 names for
opcodes with the same semantics, but aliases those names to the same
underlying opcode number.
This makes it very difficult to visually inspect two sets of opcodes
(eg in state tracker & driver) and check if they implement the same
functionality.
This patch arbitarily rips out the versions of the opcodes not currently
favoured by the mesa state tracker and leaves us with a single name
for each distinct operation.
|
|
Remove the need to have a pointer in this struct by just including
the immediate data inline. Having a pointer in the struct introduces
complications like needing to alloc/free the data pointed to, uncertainty
about who owns the data, etc. There doesn't seem to be a need for it,
and it is unlikely to make much difference plus or minus to performance.
Added some asserts as we now will trip up on immediates with more
than four elements. There were actually already quite a few such asserts,
but the >4 case could be used in the future to specify indexable immediate
ranges, such as lookup tables.
|
|
|
|
This function was calling get_input_base() and get_output_base() to
get the names of a couple of register to use as temps. Those
functions no longer return registers, so adjust it to get the
registers elsewhere.
This change doesn't address the issue that it's a fairly poor way to
grab a register name by calling a function with an apparently
unrelated meaning.
|