[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here we document details of how the preprocessor's implementation affects its user-visible behavior. You should try to avoid undue reliance on behavior described here, as it is possible that it will change subtly in future implementations.
Also documented here are obsolete features and changes from previous versions of CPP.
11.1 Implementation-defined behavior 11.2 Implementation limits 11.3 Obsolete Features 11.4 Differences from previous versions
This is how CPP behaves in all the cases which the C standard describes as implementation-defined. This term means that the implementation is free to do what it likes, but must document its choice and stick to it.
Currently, GNU cpp only supports character sets that are strict supersets of ASCII, and performs no translation of characters.
In textual output, each whitespace sequence is collapsed to a single space. For aesthetic reasons, the first token on each non-directive line of output is preceded with sufficient spaces that it appears in the same column as it did in the original source file.
The preprocessor and compiler interpret character constants in the same way; i.e. escape sequences such as `\a' are given the values they would have on the target machine.
The compiler values a multi-character character constant a character
at a time, shifting the previous value left by the number of bits per
target character, and then or-ing in the bit-pattern of the new
character truncated to the width of a target character. The final
bit-pattern is given type int
, and is therefore signed,
regardless of whether single characters are signed or not (a slight
change from versions 3.1 and earlier of GCC). If there are more
characters in the constant than would fit in the target int
the
compiler issues a warning, and the excess leading characters are
ignored.
For example, 'ab' for a target with an 8-bit char
would be
interpreted as (int) ((unsigned char) 'a' * 256 + (unsigned char)
'b'), and '\234a' as (int) ((unsigned char) '\234' * 256 + (unsigned
char) 'a').
For a discussion on how the preprocessor locates header files, 2.2 Include Operation.
See section 2.5 Computed Includes.
No macro expansion occurs on any `#pragma' directive line, so the question does not arise.
Note that GCC does not yet implement any of the standard pragmas.
CPP has a small number of internal limits. This section lists the limits which the C standard requires to be no lower than some minimum, and all the others we are aware of. We intend there to be as few limits as possible. If you encounter an undocumented or inconvenient limit, please report that to us as a bug. (See the section on reporting bugs in the GCC manual.)
Where we say something is limited only by available memory, that
means that internal data structures impose no intrinsic limit, and space
is allocated with malloc
or equivalent. The actual limit will
therefore depend on many things, such as the size of other things
allocated by the compiler at the same time, the amount of memory
consumed by other processes on the same computer, etc.
We impose an arbitrary limit of 200 levels, to avoid runaway recursion. The standard requires at least 15 levels.
The C standard mandates this be at least 63. CPP is limited only by available memory.
The C standard requires this to be at least 63. In preprocessor conditional expressions, it is limited only by available memory.
The preprocessor treats all characters as significant. The C standard requires only that the first 63 be significant.
The standard requires at least 4095 be possible. CPP is limited only by available memory.
We allow USHRT_MAX
, which is no smaller than 65,535. The minimum
required by the standard is 127.
The C standard requires a minimum of 4096 be permitted. CPP places no limits on this, but you may get incorrect column numbers reported in diagnostics for lines longer than 65,535 characters.
The standard does not specify any lower limit on the maximum size of a source file. GNU cpp maps files into memory, so it is limited by the available address space. This is generally at least two gigabytes. Depending on the operating system, the size of physical memory may or may not be a limitation.
CPP has a number of features which are present mainly for compatibility with older programs. We discourage their use in new code. In some cases, we plan to remove the feature in a future version of GCC.
11.3.1 Assertions 11.3.2 Obsolete once-only headers
Assertions are a deprecated alternative to macros in writing conditionals to test what sort of computer or system the compiled program will run on. Assertions are usually predefined, but you can define them with preprocessing directives or command-line options.
Assertions were intended to provide a more systematic way to describe the compiler's target system. However, in practice they are just as unpredictable as the system-specific predefined macros. In addition, they are not part of any standard, and only a few compilers support them. Therefore, the use of assertions is less portable than the use of system-specific predefined macros. We recommend you do not use them at all.
#predicate (answer) |
predicate must be a single identifier. answer can be any
sequence of tokens; all characters are significant except for leading
and trailing whitespace, and differences in internal whitespace
sequences are ignored. (This is similar to the rules governing macro
redefinition.) Thus, (x + y)
is different from (x+y)
but
equivalent to ( x + y )
. Parentheses do not nest inside an
answer.
To test an assertion, you write it in an `#if'. For example, this
conditional succeeds if either vax
or ns16000
has been
asserted as an answer for machine
.
#if #machine (vax) || #machine (ns16000) |
You can test whether any answer is asserted for a predicate by omitting the answer in the conditional:
#if #machine |
Assertions are made with the `#assert' directive. Its sole argument is the assertion to make, without the leading `#' that identifies assertions in conditionals.
#assert predicate (answer) |
You may make several assertions with the same predicate and different answers. Subsequent assertions do not override previous ones for the same predicate. All the answers for any given predicate are simultaneously true.
Assertions can be canceled with the `#unassert' directive. It has the same syntax as `#assert'. In that form it cancels only the answer which was specified on the `#unassert' line; other answers for that predicate remain true. You can cancel an entire predicate by leaving out the answer:
#unassert predicate |
In either form, if no such assertion has been made, `#unassert' has no effect.
You can also make or cancel assertions using command line options. See section 12. Invocation.
CPP supports two more ways of indicating that a header file should be read only once. Neither one is as portable as a wrapper `#ifndef', and we recommend you do not use them in new programs.
In the Objective-C language, there is a variant of `#include' called `#import' which includes a file, but does so at most once. If you use `#import' instead of `#include', then you don't need the conditionals inside the header file to prevent multiple inclusion of the contents. GCC permits the use of `#import' in C and C++ as well as Objective-C. However, it is not in standard C or C++ and should therefore not be used by portable programs.
`#import' is not a well designed feature. It requires the users of a header file to know that it should only be included once. It is much better for the header file's implementor to write the file so that users don't need to know this. Using a wrapper `#ifndef' accomplishes this goal.
In the present implementation, a single use of `#import' will prevent the file from ever being read again, by either `#import' or `#include'. You should not rely on this; do not use both `#import' and `#include' to refer to the same header file.
Another way to prevent a header file from being included more than once is with the `#pragma once' directive. If `#pragma once' is seen when scanning a header file, that file will never be read again, no matter what.
`#pragma once' does not have the problems that `#import' does, but it is not recognized by all preprocessors, so you cannot rely on it in a portable program.
This section details behavior which has changed from previous versions of CPP. We do not plan to change it again in the near future, but we do not promise not to, either.
The "previous versions" discussed here are 2.95 and before. The behavior of GCC 3.0 is mostly the same as the behavior of the widely used 2.96 and 2.97 development snapshots. Where there are differences, they generally represent bugs in the snapshots.
The standard does not specify the order of evaluation of a chain of `##' operators, nor whether `#' is evaluated before, after, or at the same time as `##'. You should therefore not write any code which depends on any specific ordering. It is possible to guarantee an ordering, if you need one, by suitable use of nested macros.
An example of where this might matter is pasting the arguments `1', `e' and `-2'. This would be fine for left-to-right pasting, but right-to-left pasting would produce an invalid token `e-2'.
GCC 3.0 evaluates `#' and `##' at the same time and strictly left to right. Older versions evaluated all `#' operators first, then all `##' operators, in an unreliable order.
See section 9. Preprocessor Output, for the current textual format. This is also the format used by stringification. Normally, the preprocessor communicates tokens directly to the compiler's parser, and whitespace does not come up at all.
Older versions of GCC preserved all whitespace provided by the user and inserted lots more whitespace of their own, because they could not accurately predict when extra spaces were needed to prevent accidental token pasting.
As an extension, GCC permits you to omit the variable arguments entirely when you use a variable argument macro. This is forbidden by the 1999 C standard, and will provoke a pedantic warning with GCC 3.0. Previous versions accepted it silently.
Formerly, in a macro expansion, if `##' appeared before a variable arguments parameter, and the set of tokens specified for that argument in the macro invocation was empty, previous versions of CPP would back up and remove the preceding sequence of non-whitespace characters (not the preceding token). This extension is in direct conflict with the 1999 C standard and has been drastically pared back.
In the current version of the preprocessor, if `##' appears between a comma and a variable arguments parameter, and the variable argument is omitted entirely, the comma will be removed from the expansion. If the variable argument is empty, or the token before `##' is not a comma, then `##' behaves as a normal token paste.
The `#line' directive used to change GCC's notion of the "directory containing the current file," used by `#include' with a double-quoted header file name. In 3.0 and later, it does not. See section 6. Line Control, for further explanation.
In GCC 2.95 and previous, the string constant argument to `#line' was treated the same way as the argument to `#include': backslash escapes were not honored, and the string ended at the second `"'. This is not compliant with the C standard. In GCC 3.0, an attempt was made to correct the behavior, so that the string was treated as a real string constant, but it turned out to be buggy. In 3.1, the bugs have been fixed. (We are not fixing the bugs in 3.0 because they affect relatively few people and the fix is quite invasive.)
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |