Age | Commit message (Collapse) | Author |
|
We were previously calculating a value which was either the geometry
shader output primitive or the application's input primitive, and
passing that to the various front/middle/back components for use as
the ultimate rendering primtive.
Unfortunately, this was not correct -- if the vcache decomposition
path is active and geometry shaders are *not* active, we can end up
with a third primitive -- specifically the decomposed version of the
input primitive.
Rather than trying to precalculate this, just let the individual
components inform their successors about which primitive type they are
recieving.
|
|
register masks, multiple output buffers, multiple primitives,
non-linear vertices (elts) and stride semantics.
|
|
Keith came up with a new way of running the pipeline which involves passing
a few info structs around (for fetch, vertices and prims) and allows us
to correctly handle cases where we endup with multiple primitives generated
by the pipeline itself.
|
|
of vertices
lots and lots of fixes for geometry shaders. in particular now we work when the gs
emits a different primitive than the one the pipeline was started with and also
we work when gs emits more vertices than would fit in the original buffer.
|
|
interface wise we have everything needed by d3d10 and gl transform feedback.
the draw module misses implementation of some corner cases (e.g. when stream
output wants different number of components per output than normal rendering
paths)
|
|
|
|
Unneeded since we code generate the vertex fecthes.
|
|
|
|
we were only running the llvm paths when the input elts were linear,
now we can handle abritrary fetch elts arrays. we do this by generating
two paths - linear and fetch_elts one and just selecting the right one
at run time.
|
|
we were only using the jited function in the linear case, now drawelts
correctly uses the same path. it results in a significant gain in
real world apps (openarena went from 23fps to 29fps)
|
|
our code resets pipe_vertex_buffer's with different offsets when rendering
vbo, meaning that we kept creating insane number of shaders even for simple
apps e.g. geartrain had 54 shaders and it was taking almost 27 seconds just to
compile them. this patch passes pipe_vertex_buffer's to the jit function and lets
it to the stride/buffer_offset computation at run time. the slowdown at runtime
is largely unnoticable but the we go from 54 shaders to 3, and from 27 seconds to less
than 1.
|
|
|
|
|
|
|
|
we don't index within the outputs but only within the inputs
|
|
the loop was doing a NE comparison which we could have skipped if the prim
was triangles (3 verts) and our step was 4 verts. also fix offsets in conversion
to aos.
|
|
there's no good way of aligning the output's, and since the vertex_header
is variable sized in the first place we need to extract elements from a vector
and store them individually into an array. this gets the basic examples working
again
|
|
|
|
|
|
the values passed are still not right, but the general scheme
is looking good.
|
|
code generate big chunks of the vertex pipeline in order to speed up
software vertex processing.
|