Age | Commit message (Collapse) | Author |
|
Michel Hermier reported libdrm segfault (and kernel crash) on nv40 using
gallium :
http://www.mail-archive.com/nouveau@lists.freedesktop.org/msg06563.html
It turns out these were caused by some missing WAIT_RING (or wrong
computation of the WAIT_RING sizes). Unlike all other libdrm_nouveau users,
nvfx gallium tried to use a mininum calls of WAIT_RING, one WAIT_RING could
apply to many methods for different code paths and spread across several
functions. This made it too tricky to find out what the missing or wrong
WAIT_RING was.
By restoring BEGIN_RING, we force one WAIT_RING per method, and it's much
easier to check if the free size required in the pushbuffer is correct. As
curro said, "let's keep it simple for the maintainers until the big
bottlenecks are gone"
Benchmarked on nv35 with openarena, nexuiz and ut2004 and no performance
regression.
The core of this patch was made with Coccinelle, with minor manual fixes
made on top.
Tested-by: Michel Hermier <hermier@frugalware.org>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
|
|
This fixes piglit/glsl-vs-main-return and glsl-fs-main-return for the drivers
which don't support RET (i915g, r300g, r600g, svga).
ir_to_mesa does not currently generate subroutines, but it's a matter of time
till it's added. It would then break all the drivers which don't implement
them, so this CAP makes sense.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
|
|
We do not know how to use more, GL_ARB_draw_buffers is not exposed on blob.
|
|
To match shader model 2.0.
|
|
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
Changes in v3:
- Also change trace, which I forgot about
Changes in v2:
- No longer adds tessellation shaders
Currently each shader cap has FS and VS versions.
However, we want a version of them for geometry, tessellation control,
and tessellation evaluation shaders, and want to be able to easily
query a given cap type for a given shader stage.
Since having 5 duplicates of each shader cap is unmanageable, add
a new get_shader_param function that takes both a shader cap from a
new enum and a shader stage.
Drivers with non-unified shaders will first switch on the shader
and, within each case, switch on the cap.
Drivers with unified shaders instead first check whether the shader
is supported, and then switch on the cap.
MAX_CONST_BUFFERS is now per-stage.
The geometry shader cap is removed in favor of checking whether the
limit of geometry shader instructions is greater than 0, which is also
used for tessellation shaders.
WARNING: all drivers changed and compiled but only nvfx tested
|
|
|
|
This is the new register generation toolkit in use by nouveau.
As far as I know, this is the best register description toolkit in
existence, and you should use it too for your hardware :)
Thanks to Marcin Kościelnicki for inventing it and performing
invaluable reverse engineering work of nVidia chips.
|
|
The old swtnl code was broken by the new shader linkage support for
GLSL.
This is a rewrite of swtnl support, which should instead work properly,
be faster and more closer to the much more tested hardware pipeline.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We were incorrectly setting a register that limited the range of
constants accessible via indirect addressing.
Setting it correctly, we can address all the constants the GPU
supports.
|
|
Does any API even use rounding-up?
|
|
Fixes piglit lodbias
|
|
|
|
Negative or huge offsets not yet supported.
|
|
|
|
Signed-off-by: Patrice Mandin <patmandin@gmail.com>
|
|
Should improve performance, possibly significantly.
|
|
Still no control flow support, but basic stuff works.
|
|
This is a full rewrite of the drawing and buffer management logic.
It offers a lot of improvements:
1. A copy of buffers is now always kept in system memory. This is
necessary to allow software processing of them, which is necessary
or improves performance in many cases.
2. Support for pushing vertices on the FIFO, with index lookup if necessary.
3. "Smart" draw code that tries to intelligently choose the cheapest
way to draw something: whether to use inline vertices or hardware
vertex buffer, and whether to use hardware index buffers
4. Support for all vertex formats supported by the hardware
5. Usage of translate to push vertices, supporting all formats that are
sensible to use as vertex formats
6. Support for base vertex
7. Usage of Ben Skeggs' primitive splitter originally for nv50, allowing
correct splitting of line loops, triangle fans, etc.
8. Support for instancing
9. Precomputation using the vertex elements CSO
Thanks to Ben Skeggs for his primitive splitter originally for nv50.
Thanks to Christoph Bumiller for his nv50 push code, that was the basis
of this work, even though I changed his code dramatically, in particular
to replace his ad-hoc vertex data emitter with translate.
The changes could also go into nv50 too, but there are substantial
differences due to the additional nv50 hardware features.
|
|
This is a significant refactoring of the sampling code that:
- Moves all generic functions in nvfx_fragtex.c
- Adds a driver-specific sampler view structure and uses it to
precompute texture setup as it should be done
- Unifies a bit more of code between nv30 and nv40
- Adds support for sampler view swizzles
- Support for specifying as sampler view format different from the
resource one (only trivially)
- Support for sampler view specification of first and last level
- Support for depth textures on nv30, both for reading depth and
for compare
- Support for sRGB textures
- Unifies the format table between nv30 and nv40
- Expands the format table to include essentially all supportable formats
except mixed sign and "autonormal" formats
- Fixes the "is format supported" logic, which was quite broken, and
makes it use the format table
Only tested on nv30 currently.
|
|
This patch implements nv04_surface_copy/fill using the new 2D engine module.
It supports falling back to the 3D engine using the u_blitter module, which will be
added in a later patch.
Also adds support for using the 3D engine, reusing the u_blitter module
created for r300.
This is used for unswizzling and copies between swizzled surfaces.
|
|
A source line was put in the wrong place.
|
|
|
|
Signed-off-by: Patrice Mandin <patmandin@gmail.com>
|
|
|
|
Conflicts:
src/mesa/state_tracker/st_gen_mipmap.c
src/mesa/state_tracker/st_texture.c
|
|
Signed-off-by: Patrice Mandin <patmandin@gmail.com>
|
|
this probably needs further cleanup (just getting a surface for the resource
seems quite nonoptimal and potentially cause unnecessary copies I think)
|
|
Signed-off-by: Corbin Simpson <MostAwesomeDude@gmail.com>
|
|
Signed-off-by: Patrice Mandin <patmandin@gmail.com>
|
|
|
|
|
|
|
|
uncached reads on nv3x
Faster, simpler and more flexible.
Also, we set those flags properly on nv3x so that we don't allocate buffers in GART.
Since on AGP GART is uncached, OpenGL doesn't distinguish between vertex and index buffers, and we don't support hardware index buffers for now, this caused uncached reads.
Also check bind and not usage for PIPE_BIND_* flags, got broken in the gallium-resources transition.
|
|
This patch allocates a bigger chunk of memory to store queries in,
increasing the (hidden) outstanding query limit.
|
|
Currently on nv30/nv40 an assert will be triggered once 32 queries are
outstanding.
This violates the OpenGL/Gallium interface, which requires support for
an unlimited number of fences.
This patch fixes the problem by putting queries in a linked list and
waiting on the oldest one if allocation fails.
nVidia seems to use a similar strategy, but with 1024 instead of 32 fences.
The next patch will improve this.
|
|
|
|
They only apparently work on nv40 grclass cards, and this was the
previous behavior of the driver.
This really needs to be investigated more.
|
|
This is implemented in nvfx_state_fb and fragtex but was missing
in nvfx_screen.
This allows to avoid glCopyTexSubImage CPU fallbacks and makes Doom 3
much faster as a result.
|
|
Should improve performance and fix serious regressions on AGP cards.
|
|
|
|
This makes the code faster due to the lack of indirect calls and also
makes it much easier to understand what is actually going on.
|
|
|