Age | Commit message (Collapse) | Author |
|
we used to create and cache unltimited number of variant, this
change limits the number of variants kept around to a fixed number.
the change is based on a similar patch by Roland for llvmpipe fragment
shaders.
|
|
|
|
|
|
|
|
|
|
|
|
Note that this also requires X86 for llvm, if llvmpipe/draw_llvm works
on PPC then the condition should be extended to include && x86.
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
|
|
This gives a ~30% shader optimization time improvement on blender.
Tested by comparing the dumped LLVM modules.
Current ordering:
time ~/llvm-git/obj/Release-Asserts/bin/opt l.bc -constprop -instcombine
-mem2reg -gvn -simplifycfg
real 0m1.126s
user 0m1.108s
sys 0m0.012s
With this patch:
time ~/llvm-git/obj/Release-Asserts/bin/opt l.bc -mem2reg -constprop -instcombine -gvn -simplifycfg
real 0m0.885s
user 0m0.880s
sys 0m0.000s
The overall improvement in blender is ~15%.
Blender without the patch takes 1m13s:
edwin 5934 87.6 11.5 729440 458296 pts/5 SLl+ 17:35 1:13 blender
Blender with the patch takes 1m3s:
edwin 5726 94.2 11.2 716424 446168 pts/5 SLl+ 17:32 1:03 blender
It is still slow with the patch, but better (most of the optimization time is
taken up by GVN, see LLVM PR7023).
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
|
|
|
|
if fetch_count % 4 != 0 then on the last iteration we fetch garbage.
this patch makes sure we stay within bounds
|
|
we were only running the llvm paths when the input elts were linear,
now we can handle abritrary fetch elts arrays. we do this by generating
two paths - linear and fetch_elts one and just selecting the right one
at run time.
|
|
Everybody should respect max_index, specially llvm generated code, which
likes to eat vertices 4 at a time, so it may end up chew a bit a bit more
than actually exists.
|
|
|
|
a bit more involved than indirect addressing over consts, but still
fairly reasonable. we allocate an array instead of individual alloca's,
and we do it only if the shader does indirect addressing.
|
|
|
|
use just one constructor to figure out whether to use llvm.
|
|
|
|
|
|
|
|
|
|
our code resets pipe_vertex_buffer's with different offsets when rendering
vbo, meaning that we kept creating insane number of shaders even for simple
apps e.g. geartrain had 54 shaders and it was taking almost 27 seconds just to
compile them. this patch passes pipe_vertex_buffer's to the jit function and lets
it to the stride/buffer_offset computation at run time. the slowdown at runtime
is largely unnoticable but the we go from 54 shaders to 3, and from 27 seconds to less
than 1.
|
|
|
|
|
|
|
|
|
|
we were trying to store the outputs starting at the same offset we
were using for the input arrays, which was writing beyond the end of
the output array.
|
|
it's whatever the var step is (4 usually) not an unconditional 1
|
|
we don't index within the outputs but only within the inputs
|
|
the loop was doing a NE comparison which we could have skipped if the prim
was triangles (3 verts) and our step was 4 verts. also fix offsets in conversion
to aos.
|
|
change the name to not clash and accuretly represent the number of inputs
we store in the data member
|
|
there's no good way of aligning the output's, and since the vertex_header
is variable sized in the first place we need to extract elements from a vector
and store them individually into an array. this gets the basic examples working
again
|
|
|
|
fetching was converting garbage
|
|
|
|
|
|
the vs_type selection isn't ideal, but for now both llvmpipe's fs and vs
do the same thing which is operate on 4xfloat vector as the base type
|
|
the from translation isn't quite right yet
|
|
the values passed are still not right, but the general scheme
is looking good.
|
|
code generate big chunks of the vertex pipeline in order to speed up
software vertex processing.
|