Age | Commit message (Collapse) | Author |
|
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
As we use state relocations and we know that all the state belongs to
the same bo, we can drop the multiple references to the same bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
|
|
Again, this makes it match the documentation.
|
|
This should make it easier to cross-reference the code and hardware
documentation, as well as clear up any confusion on whether constants
like CMD_3D_WM_STATE mean WM_STATE (pre-gen6) or 3DSTATE_WM (gen6+).
This does not rename any pre-gen6 defines.
|
|
This reverts commit de6fd527a545f8344e074312544517d05573fb72.
Revert this workaround as it seems the real trouble is caused by
lineloop, which doesn't require GS convert on sandybridge actually.
|
|
This makes conformance tests stable on sandybridge D0 to track
multisample state before SF/WM state.
|
|
|
|
|
|
|
|
|
|
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
|
|
These were for debugging in bringup. Now that relatively complicated
apps are working, they haven't helped debug anything in quite a while.
|
|
|
|
This reverts commit 0a1910c26760762eb8d67f68dfd87494ab479e38.
oops, shouldn't apply tiling depth buffer for other chips as well.
|
|
Sandybridge only support tiling depth buffer, always set tiling bit.
Fix 'fbo_firecube' demo.
|
|
This includes several corrections for fixing depth test on sandybridge.
Fix wrong bits definition in depth stencil state. Fix wrong order of
state buffer offset in 3DSTATE_CC_STATE_POINTERS command. Correctly use
buffer width parameter in depth buffer setting.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
|
|
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 31.791 32.287 1.11% 6/6
after:
[ 0] gl firefox-talos-gfx 31.198 31.675 0.96% 6/6
|
|
This makes the binding table code simpler, and is required for gen6,
which requires binding table addresses to be under 64k offset from the
surface state base addr.
No significant change in performance on firefox-talos-gfx.
|
|
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
|
|
|
|
|
|
Rename old IGDNG to Ironlake, and set 'gen' number for
Ironlake as 5, so tracking the features with generation num
instead of special is_ironlake flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
|
|
|
|
|
|
|
It hangs the GPU at the clipper stage, presumably because we're lacking
other setup.
|
|
|
|
|
|
|
|
|
|
As part of the DRI driver interface rewrite I merged __DRIscreenPrivate
and __DRIscreen, and likewise for __DRIdrawablePrivate and
__DRIcontextPrivate. I left typedefs in place though, to avoid renaming
all the *Private use internal to the driver. That was probably a
mistake, and it turns out a one-line find+sed combo can do the mass
rename. Better late than never.
|
|
Saves another 600 bytes or so of code.
|
|
Saves ~2KB of code.
|
|
Saves ~480 bytes of code.
|
|
Fixing this is a prereq for avoiding flagging all state at new
batch time. Eliminating that still causes problems, though (notably
glean logicOp fails on my GM965).
|
|
|
|
1. new PCI ids
2. fix some 3D commands on new chipset
3. fix send instruction on new chipset
4. new VUE vertex header
5. ff_sync message (added by Zou Nan Hai <nanhai.zou@intel.com>)
6. the offset in JMPI is in unit of 64bits on new chipset
7. new cube map layout
|
|
Fixes shadowtex.c. And an assert is added to catch this sooner next time.
|
|
Also, only create VS surface state if there's a VS constant buffer to be
uploaded, and set the contents of the buffer at the same time as creation.
|
|
Hook up a constant buffer, binding table, etc for the VS unit.
This will allow using large constant buffers with vertex shaders.
The new code is disabled at this time (use_const_buffer=FALSE).
|
|
The polygon stipple pattern, like the viewport and the
polygon face orientation, must be inverted on the i965
when rendering to a FBO (which itself has an inverted pixel
coordinate system compared to raw Mesa).
In addition, the polygon stipple offset, which orients
the stipple to the window system, disappears when rendering
to an FBO (because the window system offset doesn't apply,
and there's no associated FBO offset).
With these fixes, the conform triangle and polygon stipple
tests pass when rendering to texture.
|
|
|
|
|
|
|
|
The mobile and desktop chipsets are the same, and having them separate is
more typing and more chances to screw up.
|
|
Previously, since my check_aperture API change, we would check each piece of
state against the batchbuffer individually, but not all the state against the
batchbuffer at once. In addition to not being terribly useful in assuring
success, it probably also increased CPU load by calling check_aperture many
times per primitive.
|
|
This avoids issues with dereferencing stale cliprects around intel_draw_buffer
time. Additionally, take advantage of cliprects staying constant for FBOs and
DRI2, and emit cliprects in the batchbuffer instead of having to flush batch
each time they change.
|