[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Traditional (pre-standard) C preprocessing is rather different from the preprocessing specified by the standard. When GCC is given the `-traditional-cpp' option, it attempts to emulate a traditional preprocessor.
GCC versions 3.2 and later only support traditional mode semantics in the preprocessor, and not in the compiler front ends. This chapter outlines the traditional preprocessor semantics we implemented.
The implementation does not correspond precisely to the behavior of earlier versions of GCC, nor to any true traditional preprocessor. After all, inconsistencies among traditional implementations were a major motivation for C standardization. However, we intend that it should be compatible with true traditional preprocessors in all ways that actually matter.
10.1 Traditional lexical analysis 10.2 Traditional macros 10.3 Traditional miscellany 10.4 Traditional warnings
The traditional preprocessor does not decompose its input into tokens the same way a standards-conforming preprocessor does. The input is simply treated as a stream of text with minimal internal form.
This implementation does not treat trigraphs (see trigraphs) specially since they were an invention of the standards committee. It handles arbitrarily-positioned escaped newlines properly and splices the lines as you would expect; many traditional preprocessors did not do this.
The form of horizontal whitespace in the input file is preserved in the output. In particular, hard tabs remain hard tabs. This can be useful if, for example, you are preprocessing a Makefile.
Traditional CPP only recognizes C-style block comments, and treats the
`/*' sequence as introducing a comment only if it lies outside
quoted text. Quoted text is introduced by the usual single and double
quotes, and also by an initial `<' in a #include
directive.
Traditionally, comments are completely removed and are not replaced with a space. Since a traditional compiler does its own tokenization of the output of the preprocessor, this means that comments can effectively be used as token paste operators. However, comments behave like separators for text handled by the preprocessor itself, since it doesn't re-lex its input. For example, in
#if foo/**/bar |
`foo' and `bar' are distinct identifiers and expanded separately if they happen to be macros. In other words, this directive is equivalent to
#if foo bar |
rather than
#if foobar |
Generally speaking, in traditional mode an opening quote need not have a matching closing quote. In particular, a macro may be defined with replacement text that contains an unmatched quote. Of course, if you attempt to compile preprocessed output containing an unmatched quote you will get a syntax error.
However, all preprocessing directives other than #define
require matching quotes. For example:
#define m This macro's fine and has an unmatched quote "/* This is not a comment. */ /* This is a comment. The following #include directive is ill-formed. */ #include <stdio.h |
Just as for the ISO preprocessor, what would be a closing quote can be escaped with a backslash to prevent the quoted text from closing.
The major difference between traditional and ISO macros is that the former expand to text rather than to a token sequence. CPP removes all leading and trailing horizontal whitespace from a macro's replacement text before storing it, but preserves the form of internal whitespace.
One consequence is that it is legitimate for the replacement text to contain an unmatched quote (see section 10.1 Traditional lexical analysis). An unclosed string or character constant continues into the text following the macro call. Similarly, the text at the end of a macro's expansion can run together with the text after the macro invocation to produce a single token.
Normally comments are removed from the replacement text after the macro is expanded, but if the `-CC' option is passed on the command line comments are preserved. (In fact, the current implementation removes comments even before saving the macro replacement text, but it careful to do it in such a way that the observed effect is identical even in the function-like macro case.)
The ISO stringification operator `#' and token paste operator `##' have no special meaning. As explained later, an effect similar to these operators can be obtained in a different way. Macro names that are embedded in quotes, either from the main file or after macro replacement, do not expand.
CPP replaces an unquoted object-like macro name with its replacement text, and then rescans it for further macros to replace. Unlike standard macro expansion, traditional macro expansion has no provision to prevent recursion. If an object-like macro appears unquoted in its replacement text, it will be replaced again during the rescan pass, and so on ad infinitum. GCC detects when it is expanding recursive macros, emits an error message, and continues after the offending macro invocation.
#define PLUS + #define INC(x) PLUS+x INC(foo); ==> ++foo; |
Function-like macros are similar in form but quite different in behavior to their ISO counterparts. Their arguments are contained within parentheses, are comma-separated, and can cross physical lines. Commas within nested parentheses are not treated as argument separators. Similarly, a quote in an argument cannot be left unclosed; a following comma or parenthesis that comes before the closing quote is treated like any other character. There is no facility for handling variadic macros.
This implementation removes all comments from macro arguments, unless the `-C' option is given. The form of all other horizontal whitespace in arguments is preserved, including leading and trailing whitespace. In particular
f( ) |
is treated as an invocation of the macro `f' with a single argument consisting of a single space. If you want to invoke a function-like macro that takes no arguments, you must not leave any whitespace between the parentheses.
If a macro argument crosses a new line, the new line is replaced with a space when forming the argument. If the previous line contained an unterminated quote, the following line inherits the quoted state.
Traditional preprocessors replace parameters in the replacement text with their arguments regardless of whether the parameters are within quotes or not. This provides a way to stringize arguments. For example
#define str(x) "x" str(/* A comment */some text ) ==> "some text " |
Note that the comment is removed, but that the trailing space is preserved. Here is an example of using a comment to effect token pasting.
#define suffix(x) foo_/**/x suffix(bar) ==> foo_bar |
Here are some things to be aware of when using the traditional preprocessor.
Presently `-Wtraditional' warns about:
UINT_MAX
may well be defined as 4294967295U
, but
you will not be warned if you use UINT_MAX
.
You can usually avoid the warning, and the related warning about constants which are so large that they are unsigned, by writing the integer constant in question in hexadecimal, with no U suffix. Take care, though, because this gives the wrong result in exotic cases.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |