summaryrefslogtreecommitdiff
path: root/glcpp-lex.l
AgeCommit message (Collapse)Author
2010-06-01Implement comment handling in the lexer (with test).Carl Worth
We support both single-line (//) and multi-line (/* ... */) comments and add a test for this, (trying to stress the rules just a bit by embedding one comment delimiter into a comment delimited with the other style, etc.). To keep the test suite passing we do now discard any output lines from glcpp that consist only of spacing, (in addition to blank lines as previously). We also discard any initial whitespace from gcc output. In neither case should the absence or presence of this whitespace affect correctness.
2010-06-01Fix #if-skipping to *really* skip the skipped group.Carl Worth
Previously we were avoiding printing within a skipped group, but we were still evluating directives such as #define and #undef and still emitting diagnostics for things such as macro calls with the wrong number of arguments. Add a test for this and fix it with a high-priority rule in the lexer that consumes the skipped content.
2010-05-29Fix pass-through of '=' and add a test for it.Carl Worth
Previously '=' was not included in our PUNCTUATION regeular expression, but it *was* excldued from our OTHER regular expression, so we were getting the default (and hamful) lex action of just printing it. The test we add here is named "punctuator" with the idea that we can extend it as needed for other punctuator testing.
2010-05-27Implement token pasting of integers.Carl Worth
To do this correctly, we change the lexer to lex integers as string values, (new token type of INTEGER_STRING), and only convert to integer values when evaluating an expression value. Add a new test case for this, (which does pass now).
2010-05-26Fix lexing of "defined" as an operator, not an identifier.Carl Worth
Simply need to move the rule for IDENTIFIER to be after "defined" and everything is happy. With this change, tests 50 through 53 all pass now.
2010-05-26stashCarl Worth
2010-05-25Collapse multiple spaces in input down to a single space.Carl Worth
This is what gcc does, and it's actually less work to do this. Previously we were having to save the contents of space tokens as a string, but we don't need to do that now. We extend test #0 to exercise this feature here.
2010-05-25Pass through literal space values from replacement lists.Carl Worth
This makes test 15 pass and also dramatically simplifies the lexer. We were previously using a CONTROL state in the lexer to only emit SPACE tokens when on text lines. But that's not actually what we want. We need SPACE tokens in the replacement lists as well. Instead of a lexer state for this, we now simply set a "space_tokens" flag whenever we start constructing a pp_tokens list and clear the flag whenever we see a '#' introducing a directive. Much cleaner this way.
2010-05-25Implement simplified substitution for function-like macro invocation.Carl Worth
This supports function-like macro invocation but without any argument substitution. This now makes test 11 through 14 pass.
2010-05-25Make the lexer pass whitespace through (as OTHER tokens) for text lines.Carl Worth
With this change, we can recreate the original text-line input exactly. Previously we were inserting a space between every pair of tokens so our output had a lot more whitespace than our input. With this change, we can drop the "-b" option to diff and match the input exactly.
2010-05-25Starting over with the C99 grammar for the preprocessor.Carl Worth
This is a fresh start with a much simpler approach for the flex/bison portions of the preprocessor. This isn't functional yet, (produces no output), but can at least read all of our test cases without any parse errors. The grammar here is based on the grammar provided for the preprocessor in the C99 specification.
2010-05-24Add support for octal and hexadecimal integer literals.Carl Worth
In addition to the decimal literals which we already support. Note that we use strtoll here to get the large-width integers demanded by the specification.
2010-05-24Add the '~' operator to the lexer.Carl Worth
This was simply missing before, (and unnoticed since we had no test of the '~' operator).
2010-05-24Implement all operators specified for GLSL #if expressions (with tests).Carl Worth
The operator coverage here is quite complete. The one big thing missing is that we are not yet doing macro expansion in #if lines. This makes the whole support fairly useless, so we plan to fix that shortcoming right away.
2010-05-20Implement #if, #else, #elif, and #endif with tests.Carl Worth
So far the only expression implemented is a single integer literal, but obviously that's easy to extend. Various things including nesting are tested here.
2010-05-20Pre-expand macro arguments at time of invocation.Carl Worth
Previously, we were using the same lexing stack as we use for macro expansion to also expand macro arguments. Instead, we now do this earlier by simply recursing over the macro-invocations replacement list and constructing a new expanded list, (and pushing only *that* onto the stack). This is simpler, and also allows us to more easily implement token pasting in the future.
2010-05-20Finish cleaning up whitespace differences.Carl Worth
The last remaining thing here was that when a line ended with a macro, and the parser looked ahead to the newline token, the lexer was printing that newline before the parser printed the expansion of the macro. The fix is simple, just make the lexer tell the parser that a newline is needed, and the parser can wait until reducing a production to print that newline. With this, we now pass the entire test suite with simply "diff -u", so we no longer have any diff options hiding whitespace bugs from us. Hurrah!
2010-05-20Avoid printing a space at the beginning of lines in the output.Carl Worth
This fixes more differences compared to "gcc -E" so removes several cases of erroneously failing test cases. The implementation isn't very elegant, but it is functional.
2010-05-20Avoid re-expanding a macro name that has once been rejected from expansion.Carl Worth
The specification of the preprocessor in C99 says that when we see a macro name that we are already expanding that we refuse to expand it now, (which we've done for a while), but also that we refuse to ever expand it later if seen in other contexts at which it would be legitimate to expand. We add a test case for that here, and fix it to work. The fix takes advantage of a new token_t value for tokens and argument words along with the recently added IDENTIFIER_FINALIZED token type which instructs the parser to not even look for another expansion.
2010-05-19Perform "re lexing" on string list values rathern than on text.Carl Worth
Previously, we would pass original strings back to the original lexer whenever we needed to re-lex something, (such as an expanded macro or a macro argument). Now, we instead parse the macro or argument originally to a string list, and then re-lex by simply returning each string from this list in turn. We do this in the recently added glcpp_parser_lex function that sits on top of the lower-level glcpp_lex that only deals with text. This doesn't change any behavior (at least according to the existing test suite which all still passes) but it brings us much closer to being able to "finalize" an unexpanded macro as required by the specification.
2010-05-18Rewrite macro handling to support function-like macro invocation in macro valuesCarl Worth
The rewrite her discards the functions that did direct, recursive expansion of macro values. Instead, the parser now pushes the macro definition string over to a stack of buffers for the lexer. This way, macro expansion gets access to all parsing machinery. This isn't a small change, but the result is simpler than before (I think). It passes the entire test suite, including the four tests added with the previous commit that were failing before.
2010-05-17Fix (and add test for) function-like macro invocation with newlines.Carl Worth
The test has a newline before the left parenthesis, and newlines to separate the parentheses from the argument. The fix involves more state in the lexer to only return a NEWLINE token when termniating a directive. This is very similar to our previous fix with extra lexer state to only return the SPACE token when it would be significant for the parser. With this change, the exact number and positioning of newlines in the output is now different compared to "gcc -E" so we add a -B option to diff when testing to ignore that.
2010-05-14Fix two whitespace bugs in the lexer.Carl Worth
The first bug was not allowing whitespace between '#' and the directive name. The second bug was swallowing a terminating newline along with any trailing whitespace on a line. With these two fixes, and the previous commit to stop emitting SPACE tokens, the recently added extra-whitespace test now passes.
2010-05-14Don't return SPACE tokens unless strictly needed.Carl Worth
This reverts the unconditional return of SPACE tokens from the lexer from commit 48b94da0994b44e41324a2419117dcd81facce8b . That commit seemed useful because it kept the lexer simpler, but the presence of SPACE tokens is causing lots of extra complication for the parser itself, (redundant productions other than whitespace differences, several productions buggy in the case of extra whitespace, etc.) Of course, we'd prefer to never have any whitespace token, but that's not possible with the need to distinguish between "#define foo()" and "#define foo ()". So we'll accept a little bit of pain in the lexer, (enough state to support this special-case token), in exchange for keeping most of the parser blissffully ignorant of whether tokens are separated by whitespace or not. This change does mean that our output now differs from that of "gcc -E", but only in whitespace. So we test with "diff -w now to ignore those differences.
2010-05-14Make the lexer return SPACE tokens unconditionally.Carl Worth
It seems strange to always be returning SPACE tokens, but since we were already needing to return a SPACE token in some cases, this actually simplifies our lexer. This also allows us to fix two whitespace-handling differences compared to "gcc -E" so that now the recent modification to the test suite passes once again.
2010-05-14Fix parsing of object-like macro with a definition that begins with '('.Carl Worth
Previously our parser was incorrectly treating this case as a function-like macro. We fix this by conditionally passing a SPACE token from the lexer, (but only immediately after the identifier immediately after #define).
2010-05-13Add support for the structure of function-like macros.Carl Worth
We accept the structure of arguments in both macro definition and macro invocation, but we don't yet expand those arguments. This is just enough code to pass the recently-added tests, but does not yet provide any sort of useful function-like macro.
2010-05-13Make the lexer distinguish between identifiers and defined macros.Carl Worth
This is just a minor style improvement for now. But the same mechanism, (having the lexer peek into the table of defined macros), will be essential when we add function-like macros in addition to the current object-like macros.
2010-05-12Simplify lexer significantly (remove all stateful lexing).Carl Worth
We are able to remove all state by simply passing NEWLINE through as a token unconditionally (as opposed to only passing newline when on a driective line as we did previously).
2010-05-12Add support for the #undef macro.Carl Worth
This isn't ideal for two reasons: 1. There's a bunch of stateful redundancy in the lexer that should be cleaned up. 2. The hash table does not provide a mechanism to delete an entry, so we waste memory to add a new NULL entry in front of the existing entry with the same key. But this does at least work, (it passes the recently added undef test case).
2010-05-12Convert lexer to talloc and add xtalloc wrappers.Carl Worth
The lexer was previously using strdup (expecting the parser to free), but is now more consistent, easier to use, and slightly more efficent by using talloc along with the parser. Also, we add xtalloc and xtalloc_strdup wrappers around talloc and talloc_strdup to put all of the out-of-memory-checking code in one place.
2010-05-12Fix defines involving both literals and other defined macros.Carl Worth
We now store a list of tokens in our hash-table rather than a single string. This lets us replace each macro in the value as necessary. This code adds a link dependency on talloc which does exactly what we want in terms of memory management for a parser. The 3 tests added in the previous commit now pass.
2010-05-10Implment #defineCarl Worth
By using the recently-imported hash_table implementation.
2010-05-10Add some compiler warnings and corresponding fixes.Carl Worth
Most of the current problems were (mostly) harmless things like missing declarations, but there was at least one real error, (reversed argument order for yyerrror).
2010-05-10Make the lexer reentrant (to avoid "still reachable" memory).Carl Worth
This allows the final program to be 100% "valgrind clean", (freeing all memory that it allocates). This will make it much easier to ensure that any allocation that parser actions perform are also cleaned up.
2010-05-10Add the tiniest shell of a flex/bison-based parser.Carl Worth
It doesn't really *do* anything yet---merlely parsing a stream of whitespace-separated tokens, (and not interpreting them at all).