Age | Commit message (Collapse) | Author |
|
These haven't been used by the mesa state tracker since the
conversion to tgsi_ureg, and it seems that none of the
other state trackers are using it either.
This helps simplify one of the biggest suprises when starting off with
TGSI shaders.
|
|
Allow indirect uniform access and increase the
limit on parameters from 128 to 512.
|
|
|
|
|
|
We only have a per nv50_reg negation flag, if an
nv50_reg is used more than once in a TGSI op with
different sign modes, we'd generate wrong code.
We probably can't do much better without more
invasive changes.
|
|
|
|
If you e.g. only need alpha, it ends up in the first reg,
not the last, as it would when reading rgb too.
|
|
|
|
Separated the integer rounding mode flag for cvt.
|
|
There's a good chance a loop won't execute correctly
though since our TEMP allocation assumes programs to
be executed linearly. Will fix later.
|
|
|
|
When swapping sources 0 and 1, EQ of course does *not*
become NE, etc.
Introduced in 2b963f5c723401aa2646bd48eefe065cd335e280.
|
|
Allocation is unnecessary since all uniforms are
uploaded on every constant buffer change anyway.
|
|
|
|
|
|
|
|
This moves construction of the mapping between VP outputs
and FP inputs into validation.
The map also contains slots for special outputs like clip
distance and point size, so we need to at least merge the
VP related and FP related parts on validation if we want
to support those.
Now we match every single FP input component with results
from the VP and leave those not read out of the map, or
replace those not written by 0 (xyz) or 1 (w).
The bitmap indicating linear interpolants is also filled,
and flat FP inputs are mapped in only after non-flat ones,
as is required.
Furthermore, we can save some space by only fetching VP
attrs we actually use, and avoid wasting any output regs
because of TGSI using less than 4 components.
|
|
Make use of tgsi_shader_info to determine how many nv50_regs we
need to allocate, whether program uses KIL, or writes DEPR.
|
|
|
|
|
|
|
|
|
|
Makes some opcode cases nicer and might reduce the total
nr of TEMPs required, or save some MOVs.
|
|
|
|
We're going to try to reorder the scalar ops of a vector instr
to accomodate swizzles that would otherwise require us to emit
to an additional TEMP first (like MOV R0.xy, R0.zx).
|
|
Extend its usage to avoiding e.g. emission of negation
instructions in tx_insn for sources we don't need.
|
|
Before this, just the perspective divide bit was moved in
convert_to_long of the load interpolant instruction.
|
|
|
|
|
|
The TEX instruction is passed the first index of a contiguous
range of 4 TEMP registers that contain coordinates / LOD and,
after execution, the texel values.
It seems the first index is required to be a multiple of 4 on
some (older ?) cards.
|
|
|
|
Remove the need to have a pointer in this struct by just including
the immediate data inline. Having a pointer in the struct introduces
complications like needing to alloc/free the data pointed to, uncertainty
about who owns the data, etc. There doesn't seem to be a need for it,
and it is unlikely to make much difference plus or minus to performance.
Added some asserts as we now will trip up on immediates with more
than four elements. There were actually already quite a few such asserts,
but the >4 case could be used in the future to specify indexable immediate
ranges, such as lookup tables.
|
|
|
|
|
|
libdrm_nouveau is linked with the winsys, there's no good reason to do all
this through yet another layer.
|
|
|
|
|
|
This makes some code cleaner, and we can now easily
do CEIL and TRUNC.
|
|
For TXP we need to divide texture coords by their w component, or
use the coords' 1/w in the perspective interpolation instruction.
This also tries to support 1D, 3D and CUBE textures, and lets the
instruction only load the components that are used.
|
|
Use different buffers for immds, FP params, and VP params.
One has to map constant buffer indices in shader code to buffers
defined via CB_DEF. In principle, we could use more buffers so
we'd have to change the shader code less frequently.
|
|
Since we stopped using alloc_temp to get hw indices for FP attrs
there shouldn't be any non-deallocated temps left.
|
|
Since we know when we don't use a TEMP or FP ATTR register anymore,
we can release their hw resources early.
|
|
Immediates are inlined now where possible, so we need to set
pc->allow32 to FALSE in LIT where we have the conditional MOV,
since immediates swallow the predicate bits.
|
|
|
|
I chose to just convert unpaired 32 bit length instructions
after parsing all instructions, although it might be possible
to determine beforehand whether there would be any lone ones,
and then even do some swapping to bring them together ...
|
|
|
|
This would have happened in p.e. ADD TEMP[0], TEMP[0].xyxy, TEMP[1]
or RCP/RSQ TEMP[i], TEMP[i].
|
|
Depth output in fragment programs should end up in the first
register after the color outputs.
|
|
VP outputs that should be loadable in the FP are mapped to
interpolant indices by HPOS, COL0 etc.; of course HPOS is
always written, so the highest byte of 1988 is a bitmask that
selects which components of HPOS are used for interpolants,
i.e. the FP inputs in COL0 start at index POPCNT(1988[24:28]).
|
|
Record interpolation mode for attributes while parsing declarations,
and also remember the indices of FP color inputs and FP depth output,
which has to end up in the highest output register.
|