Age | Commit message (Collapse) | Author |
|
The original glsl compiler would generate a.x * b.x + a.y * b.y, which
we would do mul+mul+add for instead of this mul+mac.
Fixes glsl-fs-dot-vec2.
|
|
The old compiler didn't use SSG, and instead emitted SGT/SGT/SUB. We
can do a little better for SSG than we do for the SGT series.
|
|
|
|
This reverts commit a9ee95651131e27d5acf3d10909b5b7e5c8d3e92.
It was based on a failure to understand how ther driver allocates
memory, and causes a regression with Celestia.
Set MaxLevel to dstLevel before allocating new mipmap level.
The radeon driver will fail to allocate space for a new level that
is outside of BaseLevel..MaxLevel. Set MaxLevel before allocating.
Signed-off-by: Maciej Cencora <m.cencora@gmail.com>
|
|
Fixes segfault in mipmap_view.c demo. Bug #27212.
|
|
|
|
|
|
|
|
|
|
When building OSMesa and xlib GL, the resulting OSMesa would be linked
against libGL instead of the internal mesa libraries. However, when
building with -fvisibility=hidden, some of the internal functions used
in OSMesa could not be resolved through libGL.
Instead, always build OSMesa standalone without linking against libGL.
This has the advantage that OSMesa is always built the same way, but it
means that disk space is wasted when libGL is installed since both
libraries will contain the internal objects.
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Tested-by: Tom Fogal <tfogal@alumni.unh.edu>
|
|
|
|
I broke this with the state streaming changes.
|
|
|
|
BOs are stored in the bufmgr, which is freed as part of the screen
structure.
|
|
|
|
|
|
|
|
|
|
|
|
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 31.791 32.287 1.11% 6/6
after:
[ 0] gl firefox-talos-gfx 31.198 31.675 0.96% 6/6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
e.g. for(i=10; i>0; i--)
|
|
This only works with for loops that increment the counter.
e.g. for(i=0; i<10; i++)
|
|
The loop emulation unrolls loops as may times as possbile while still
keeping the shader program below the maximum instruction limit. At this
point, there are no checks for constant conditionals. This is only enabled
for fragment shaders.
|
|
We had to fill out all that junk when using the cache, but no more.
|
|
|
|
This makes the binding table code simpler, and is required for gen6,
which requires binding table addresses to be under 64k offset from the
surface state base addr.
No significant change in performance on firefox-talos-gfx.
|
|
Now that the binding table is streamed indirect state, they were
always NULL/0.
|
|
|
|
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
|
|
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
This was bothering me when redoing the binding tables.
|
|
|
|
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
|
|
|
|
It's more likely that we wrap badly in state setup than in the little
primitive packet.
|
|
It just duplicated the default/core Mesa behaviour.
|
|
|
|
|
|
This avoids many pipeline stalls in cairo-gl.
[ # ] backend test min(s) median(s) stddev. count
Before:
[ 0] gl firefox-talos-gfx 36.799 36.851 2.34% 3/3
[ 0] gl firefox-talos-svg 33.429 35.360 3.46% 3/3
After:
[ 0] gl firefox-talos-gfx 35.895 36.250 0.48% 3/3
[ 0] gl firefox-talos-svg 26.669 29.888 5.34% 3/3
This doesn't avoid all the pipeline stalls because the kernel reports
!busy for buffers on the flushing list. That should be fixed in .36.
|
|
In exchange we end up with an extra memcpy, but that seems better than
calloc/free. Each buffer is 4k maximum, and on the i965-streaming
branch this allocation was showing up as the top entry in
brw_validate_state profiling for cairo-gl.
|
|
There were entries to this function (most imporantly, prepare_render
-> update_renderbuffers) that wouldn't have had NEW_BUFFERS set, but
brw_wm_surface_state (the i965 state tracking the drawing regions)
expected this to change.
|
|
The new API makes so much more sense, I'd like to forget how the old
one worked.
|
|
The slightly less mechanical change of converting the emit_reloc calls
will follow.
|
|
|