From e5cf3aadb8d57dcc70b597092ecac276042f73cb Mon Sep 17 00:00:00 2001 From: Eric Anholt Date: Thu, 24 Jun 2010 15:41:40 -0700 Subject: glsl2: Add a README file for the new compiler. --- src/glsl/README | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 src/glsl/README (limited to 'src/glsl/README') diff --git a/src/glsl/README b/src/glsl/README new file mode 100644 index 0000000000..8b6162ab67 --- /dev/null +++ b/src/glsl/README @@ -0,0 +1,153 @@ +Welcome to Mesa's GLSL compiler. A brief overview of how things flow: + +1) lex and yacc-based preprocessor takes the incoming shader string +and produces a new string containing the preprocessed shader. This +takes care of things like #if, #ifdef, #define, and preprocessor macro +invocations. Note that #version, #extension, and some others are +passed straight through. See glcpp/* + +2) lex and yacc-based parser takes the preprocessed string and +generates the AST (abstract syntax tree). Almost no checking is +performed in this stage. See glsl_lexer.lpp and glsl_parser.ypp. + +3) The AST is converted to "HIR". This is the intermediate +representation of the compiler. Constructors are generated, function +calls are resolved to particular function signatures, and all the +semantic checking is performed. See ast_*.cpp for the conversion, and +ir.h for the IR structures. + +4) The driver (Mesa, or main.cpp for the standalone binary) performs +optimizations. These include copy propagation, dead code elimination, +constant folding, and others. Generally the driver will call +optimizations in a loop, as each may open up opportunities for other +optimizations to do additional work. See most files called ir_*.cpp + +5) linking is performed. This does checking to ensure that the +outputs of the vertex shader match the inputs of the fragment shader, +and assigns locations to uniforms, attributes, and varyings. See +linker.cpp. + +6) The driver may perform additional optimization at this point, as +for example dead code elimination previously couldn't remove functions +or global variable usage when we didn't know what other code would be +linked in. + +7) The driver performs code generation out of the IR, taking a linked +shader program and producing a compiled program for each stage. See +ir_to_mesa.cpp for Mesa IR code generation. + +FAQ: + +Q: What is HIR versus IR versus LIR? + +A: The idea behind the naming was that ast_to_hir would produce a +high-level IR ("HIR"), with things like matrix operations, structure +assignments, etc., present. A series of lowering passes would occur +that do things like break matrix multiplication into a series of dot +products/MADs, make structure assignment be a series of assignment of +components, flatten if statements into conditional moves, and such, +producing a low level IR ("LIR"). + +However, it now appears that each driver will have different +requirements from a LIR. A 915-generation chipset wants all functions +inlined, all loops unrolled, all ifs flattened, no variable array +accesses, and matrix multiplication broken down. The Mesa IR backend +for swrast would like matrices and structure assignment broken down, +but it can support function calls and dynamic branching. A 965 vertex +shader IR backend could potentially even handle some matrix operations +without breaking them down, but the 965 fragment shader IR backend +would want to break to have (almost) all operations down channel-wise +and perform optimization on that. As a result, there's no single +low-level IR that will make everyone happy. So that usage has fallen +out of favor, and each driver will perform a series of lowering passes +to take the HIR down to whatever restrictions it wants to impose +before doing codegen. + +Q: How is the IR structured? + +A: The best way to get started seeing it would be to run the +standalone compiler against a shader: + +./glsl --dump-lir ~/src/piglit/tests/shaders/glsl-orangebook-ch06-bump.frag + +So for example one of the ir_instructions in main() contains: + +(assign (constant bool (1)) (var_ref litColor) (expression vec3 * (var_ref Surf +aceColor) (var_ref __retval) ) ) + +Or more visually: + (assign) + / | \ + (var_ref) (expression *) (constant bool 1) + / / \ +(litColor) (var_ref) (var_ref) + / \ + (SurfaceColor) (__retval) + +which came from: + +litColor = SurfaceColor * max(dot(normDelta, LightDir), 0.0); + +(the max call is not represented in this expression tree, as it was a +function call that got inlined but not brought into this expression +tree) + +Each of those nodes is a subclass of ir_instruction. A particular +ir_instruction instance may only appear once in the whole IR tree with +the exception of ir_variables, which appear once as variable +declarations: + +(declare () vec3 normDelta) + +and multiple times as the targets of variable dereferences: +... +(assign (constant bool (1)) (var_ref __retval) (expression float dot + (var_ref normDelta) (var_ref LightDir) ) ) +... +(assign (constant bool (1)) (var_ref __retval) (expression vec3 - + (var_ref LightDir) (expression vec3 * (constant float (2.000000)) + (expression vec3 * (expression float dot (var_ref normDelta) (var_ref + LightDir) ) (var_ref normDelta) ) ) ) ) +... + +Each node has a type. Expressions may involve several different types: +(declare (uniform ) mat4 gl_ModelViewMatrix) +((assign (constant bool (1)) (var_ref constructor_tmp) (expression + vec4 * (var_ref gl_ModelViewMatrix) (var_ref gl_Vertex) ) ) + +An expression tree can be arbitrarily deep, and the compiler tries to +keep them structured like that so that things like algebraic +optimizations ((color * 1.0 == color) and ((mat1 * mat2) * vec == mat1 +* (mat2 * vec))) or recognizing operation patterns for code generation +(vec1 * vec2 + vec3 == mad(vec1, vec2, vec3)) are easier. This comes +at the expense of additional trickery in implementing some +optimizations like CSE where one must navigate an expression tree. + +Q: Why no SSA representation? + +A: Converting an IR tree to SSA form makes dead code elmimination, +common subexpression elimination, and many other optimizations much +easier. However, in our primarily vector-based language, there's some +major questions as to how it would work. Do we do SSA on the scalar +or vector level? If we do it at the vector level, we're going to end +up with many different versions of the variable when encountering code +like: + +(assign (constant bool (1)) (swiz x (var_ref __retval) ) (var_ref a) ) +(assign (constant bool (1)) (swiz y (var_ref __retval) ) (var_ref b) ) +(assign (constant bool (1)) (swiz z (var_ref __retval) ) (var_ref c) ) + +If every masked update of a component relies on the previous value of +the variable, then we're probably going to be quite limited in our +dead code elimination wins, and recognizing common expressions may +just not happen. On the other hand, if we operate channel-wise, then +we'll be prone to optimizing the operation on one of the channels at +the expense of making its instruction flow different from the other +channels, and a vector-based GPU would end up with worse code than if +we didn't optimize operations on that channel! + +Once again, it appears that our optimization requirements are driven +significantly by the target architecture. For now, targeting the Mesa +IR backend, SSA does not appear to be that important to producing +excellent code, but we do expect to do some SSA-based optimizations +for the 965 fragment shader backend when that is developed. -- cgit v1.2.3