Age | Commit message (Collapse) | Author |
|
This is the new register generation toolkit in use by nouveau.
As far as I know, this is the best register description toolkit in
existence, and you should use it too for your hardware :)
Thanks to Marcin Kościelnicki for inventing it and performing
invaluable reverse engineering work of nVidia chips.
|
|
|
|
Thanks for Dave Airlie and Jerome Glisse for their code which made
me realize I need this too.
|
|
Hardware sets it to 0, so we add an ADD to put an 1 there if the
application really wants the alpha channel.
|
|
Completely untested, since Mesa apparently never uses this currently.
In particular, it might not work with scalar slot op.
|
|
The old swtnl code was broken by the new shader linkage support for
GLSL.
This is a rewrite of swtnl support, which should instead work properly,
be faster and more closer to the much more tested hardware pipeline.
|
|
|
|
|
|
|
|
Intuition != mathematics, so this time I actually worked out the right
formula for first order approximation of perspective interpolation.
Ironically, per quad divide actually makes things slower when compared
with per pixel divide -- probably because the divide hardware unit is
rarely used, whereas the multiply unit is typically already saturated
and the first order approximation imply more multiplications.
|
|
These are the non-trivial conversions that this function recognizes,
which was produced by u_format_compatible_test.c:
b8g8r8a8_unorm -> b8g8r8x8_unorm
a8r8g8b8_unorm -> x8r8g8b8_unorm
b5g5r5a1_unorm -> b5g5r5x1_unorm
b4g4r4a4_unorm -> b4g4r4x4_unorm
l8_unorm -> r8_unorm
i8_unorm -> l8_unorm
i8_unorm -> a8_unorm
i8_unorm -> r8_unorm
l16_unorm -> r16_unorm
z24_unorm_s8_uscaled -> z24x8_unorm
s8_uscaled_z24_unorm -> x8z24_unorm
r8g8b8a8_unorm -> r8g8b8x8_unorm
a8b8g8r8_srgb -> x8b8g8r8_srgb
b8g8r8a8_srgb -> b8g8r8x8_srgb
a8r8g8b8_srgb -> x8r8g8b8_srgb
a8b8g8r8_unorm -> x8b8g8r8_unorm
r10g10b10a2_uscaled -> r10g10b10x2_uscaled
r10sg10sb10sa2u_norm -> r10g10b10x2_snorm
State trackers and pipe drivers should be updated to take advantage of
this knowledge, e.g., in surface_copy.
|
|
Also, include the color buffer in the key. Not having it there
causes a tight knots in the logic to determine when it is OK or not
to discard previous color buffer contents.
|
|
color format.
|
|
It seems to be working correctly with gcc 4.4, and enabling it allows to
test some of the llvmpipe instrinsics on Windows.
|
|
Much more convenient than boolean arrays.
|
|
|
|
This extra validation is very useful when working on the built-ins, but
in general overkill - the results should stay the same unless the
built-ins or ir_validate have changed.
Also, validating all the built-in functions in every test case makes
piglit run unacceptably slow.
|
|
This should fix bogus reports "Too many temporaries." and maybe some others.
|
|
This fixes glsl-fs-loop-nested.
|
|
|
|
We might want to copy them as color ones though.
Also works around crash in Unigine Heaven due to failing to allocate
a 64 MB temporary in GART for a CPU copy.
Unigine Heaven now works on nv40, albeit with very heavy glitches (with
the floating branch with render_hdr 0).
|
|
Broken with commit d774b0c710bb7d833d17bd12f5151a0176baad96.
Reported by Chris Rankin.
|
|
|
|
|
|
|
|
Actually, we may want to get rid of the x/y coordinates for linear
surfaces, and realign the origin from scratch if necessary, instead
of doing this "on-demand realignment".
|
|
|
|
|
|
|
|
a temp."
This reverts commit 5ad74779cea07cc6a19a52874cdaef8b018e2f1b.
Sorry, but I had to revert this.
Any commit which needlessly increases the number of temporaries is wrong.
More temporaries mean less shader performance because of reduced parallelism
and therefore less efficient latency hiding. In this case, there is possible
performance degradation of every shader which uses GL state variables.
I cannot accept this.
|
|
This reverts commit 5cdedaaf295acae13ac10feeb3143d83bc53d314.
https://bugs.freedesktop.org/show_bug.cgi?id=30002
Conflicts:
src/gallium/drivers/r300/r300_texture.c
|
|
|
|
|
|
|
|
|
|
Register allocation can now reallocate temporaries right after the last indexed
source operand, instead of being disabled for the whole shader.
|
|
|
|
|
|
|
|
Those are:
- dead-code elimination
- constant folding
- peephole (mainly copy propagation)
- register allocation
There are some bugs which I need to track down.
Also fix up the descriptions of all the debug options.
|
|
And not during the register allocation, which may be skipped for debugging
purposes. Also the predicate register is now added to the number of temps.
|
|
|
|
|
|
We use RC_OPCODE_REPL_ALPHA instead.
|
|
This cleans up the mess in r3xx_compile_fragment_program.
|
|
|
|
First list compiler passes in an array, then run the new function rc_run_compiler.
Every backend may need a different set of passes.
This cleans up the mess in r3xx_compile_vertex_program.
|
|
|
|
&c->Base == c.
|
|
I need to reduce the number of parameters of each compiler pass function.
This is part of a larger cleanup.
|