Age | Commit message (Collapse) | Author |
|
This patch cleans up many of the extra copies in GLSL IR introduced by
i965's scalarizing passes. It doesn't result in a statistically
significant performance difference on nexuiz high settings (n=3) or my
demo (n=10), due to brw_fs.cpp's register coalescing covering most of
those extra moves anyway. However, it does make the debug of wine's
GLSL shaders much more tractable, and reduces instruction count of
glsl-fs-convolution-2 from 376 to 288.
|
|
|
|
|
|
|
|
|
|
Python is already necessary for other parts of Mesa, so there's no
reason we can't just generate it. This patch updates both make and
SCons to do so.
|
|
We always want to use '.' as the decimal point.
See http://bugs.freedesktop.org/show_bug.cgi?id=24531
NOTE: this is a candidate for the 7.10 branch.
|
|
This should allow lower_if_to_cond_assign to work in the presence of
discards, fixing bug #31690 and likely #31983.
NOTE: This is a candidate for the 7.9 branch.
|
|
NOTE: This is a candidate for the 7.9 branch.
|
|
This should save on the overhead of tree-walking and provide a
convenient place to add more instruction lowering in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
The vector operator collects 2, 3, or 4 scalar components into a
vector. Doing this has several advantages. First, it will make
ud-chain tracking for components of vectors much easier. Second, a
later optimization pass could collect scalars into vectors to allow
generation of SWZ instructions (or similar as operands to other
instructions on R200 and i915). It also enables an easy way to
generate IR for SWZ instructions in the ARB_vertex_program assembler.
|
|
This helps distinguish between lowering passes, optimization passes, and
other compiler code.
|
|
First, it changes autoconf to use a "python2" binary when available,
rather than plain "python" (which is ambiguous). Secondly, it changes
the Makefiles to use $(PYTHON) $(PYTHON_FLAGS) rather than calling
python directly.
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Matthew William Cox <matt@mattcox.ca>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
Currenly GLSL happily generates indirect addressing of any kind of
arrays.
Unfortunately DirectX 9 GPUs are not guaranteed to support any of them in
general.
This pass fixes that by lowering such constructs to a binary search on the
values, followed at the end by vectorized generation of equality masks, and
4 conditional assignments for each mask generation.
Note that this requires the ir_binop_equal change so that we can emit SEQ
to generate the boolean masks.
Unfortunately, ir_structure_splitting is too dumb to turn the resulting
constant array references to individual variables, so this will need to
be added too before this pass can actually be effective for temps.
Several patches in the glsl2-lower-variable-indexing were squashed
into this commit. These patches fix bugs in Luca's original
implementation, and the individual patches can be seen in that branch.
This was done to aid bisecting in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
Changes in v2:
- Base class renamed to ir_control_flow_visitor
- Tried to comply with coding style
This is a new pass that supersedes ir_if_return and "lowers" jumps
to if/else structures.
Currently it causes no regressions on softpipe and nv40, but I'm not sure
whether the piglit glsl tests are thorough enough, so consider this
experimental.
It can be asked to:
1. Pull jumps out of ifs where possible
2. Remove all "continue"s, replacing them with an "execute flag"
3. Replace all "break" with a single conditional one at the end of the loop
4. Replace all "return"s with a single return at the end of the function,
for the main function and/or other functions
This gives several great benefits:
1. All functions can be inlined after this pass
2. nv40 and other pre-DX10 chips without "continue" can be supported
3. nv30 and other pre-DX10 chips with no control flow at all are better supported
Note that for full effect we should also teach the unroller to unroll
loops with a fixed maximum number of iterations but with the canonical
conditional "break" that this pass will insert if asked to.
Continues are lowered by adding a per-loop "execute flag", initialized to
TRUE, that when cleared inhibits all execution until the end of the loop.
Breaks are lowered to continues, plus setting a "break flag" that is checked
at the end of the loop, and trigger the unique "break".
Returns are lowered to breaks/continues, plus adding a "return flag" that
causes loops to break again out of their enclosing loops until all the
loops are exited: then the "execute flag" logic will ignore everything
until the end of the function.
Note that "continue" and "return" can also be implemented by adding
a dummy loop and using break.
However, this is bad for hardware with limited nesting depth, and
prevents further optimization, and thus is not currently performed.
|
|
|
|
|
|
This is the next step on the road to loop unrolling
|
|
This is the first step eventually leading to loop unrolling.
|
|
As of 1.20, variable names, function names, and structure type names all
share a single namespace, and should conflict with one another in the
same scope, or hide each other in nested scopes.
However, in 1.10, variables and functions can share the same name in the
same scope. Structure types, however, conflict with/hide both.
Fixes piglit tests redeclaration-06.vert, redeclaration-11.vert,
redeclaration-19.vert, and struct-05.vert.
|
|
this has make clean remove all the objects.
|
|
I had used pkg-config from the Makefile because I didn't want to screw
around with the non-autoconf build, but that doesn't work because the
PKG_CONFIG_PATH or TALLOC_LIBS/TALLOC_CFLAGS that people set at
configure time needs to be respected and may not be present at build
time.
Bug #29585
|
|
This copies over a dummy builtin_functions.cpp and rebuilds a
bootstrapped version of the compiler, then uses that to generate the
proper list of builtins. Finally, it rebuilds the compiler with the new
list.
Unfortunately, it's no longer automatic, but at least it works.
|
|
Each language version/extension and target now has a "profile" containing
all of the available builtin function prototypes. These are written in
GLSL, and come directly out of the GLSL spec (except for expanding genType).
A new builtins/ir/ folder contains the hand-written IR for each builtin,
regardless of what version includes it. Only those definitions that have
prototypes in the profile will be included.
The autogenerated IR for texture builtins is no longer written to disk,
so there's no longer any confusion as to what's hand-written or
generated.
All scripts are now in python instead of perl.
|
|
With the glsl2-965 branch, the optimization of glsl-algebraic-rcp-rcp
regressed due to noop swizzles hiding information from ir_algebraic.
This cleans up those noop swizzles for us.
|
|
I keep copy and pasting this code all over, so consolidate it in one
place.
|
|
Also remove the --never-interactive command line option for the
preprocessor lexer. This was already done for main compiler lexer.
|
|
Remove --never-interactive because it is already specified in the
source using %option. Use -o instead of --outfile. Some of the
%option commands may also need to be removed for compatibility with
older versions (e.g., 2.5.4) of flex.
This should fix bugzilla #29209.
|
|
Bison version 2.3 doesn't seem to support %name-prefix in the source.
This should fix bugzilla #29207.
|
|
All the current HW backends transform subtract to adding the negation,
so I haven't bothered peepholing it back out in Mesa IR. This allows
some subtract of subtract to get removed in ir_algebraic.
|
|
Whereas constant folding evaluates constant expressions at rvalue
nodes, constant propagation tracks constant components of vectors
across execution to replace (possibly swizzled) variable dereferences
with constant values, triggering possible constant folding or reduced
variable liveness.
|
|
This lets us handle arrays much better than trying to work backwards
from assembly.
Fixes fbo-drawbuffers-maxtargets on swrast (i965 needs loop unrolling)
|
|
Fixes ir_to_mesa handling of unop_log, which used the weird ARB_vp LOG
opcode that doesn't do what we want. This also lets the multiplication
coefficients in there get constant-folded, possibly.
Fixes:
glsl-fs-log
|
|
This doesn't do anything if your structure goes through an uninlined
function call or if whole-structure assignment occurs. As such, the
impact is limited, at least until we do some global copy propagation
to reduce whole-structure assignment.
|
|
For a shader involving many small functions, this avoids running
optimization across all of them after they've been inlined
post-linking.
Reduces the runtime of linking and running a fragment shader from Yo
Frankie from 1.6 seconds to 0.9 seconds (-44.9%, +/- 3.3%).
|
|
Calling exit() on a memory failure probably made sense for the
standalone preprocessor, but doesn't seem too appealing as part of
the GL library. Also, we don't use it in the main compiler.
|
|
|
|
|
|
|
|
Otherwise, we lose DEBUG, which causes mtypes.h to set NDEBUG, which
causes assertions to not happen, which is no fun for anyone.
|
|
This cleans up the assembly output of almost all the non-logic tests
glsl-algebraic-*. glsl-algebraic-pow-two needs love (basically,
flattening to a temporary and squaring it).
|
|
This pulls in multiple i965 driver fixes which will help ensure better
testing coverage during development, and also gets past the conflicts
of the src/mesa/shader -> src/mesa/program move.
Conflicts:
src/mesa/Makefile
src/mesa/main/shaderapi.c
src/mesa/main/shaderobj.h
|
|
Fixes:
draw_buffers-08.frag
draw_buffers-09.frag
glsl-vs-texturematrix-2
|
|
|
|
Coming changes to the handling of built-in functions necessitate this.
|
|
This is convenient for testing the preprocessor independent of the rest of
mesa, (just run glcpp-test in the src/glsl/glcpp/tests).
|
|
This handles the easy case of linking a function in a different
compilation unit that doesn't call any functions or reference any
global variables.
|
|
This will be used on 915 and similar hardware of that generation.
|