-
Notifications
You must be signed in to change notification settings - Fork 0
/
lexical.texi
516 lines (391 loc) · 12.7 KB
/
lexical.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
@c This is part of The GNU C Reference Manual
@c Copyright (C) 2004-2009 Free Software Foundation, Inc.
@c See the file gnu-c-manual.texi for copying conditions.
@node Lexical Elements
@chapter Lexical Elements
@cindex lexical elements
This chapter describes the lexical elements that make up C source code
after preprocessing. These elements are called @dfn{tokens}. There are
five types of tokens: keywords, identifiers, constants, operators, and
separators. White space, sometimes required to separate tokens, is also
described in this chapter.
@menu
* Identifiers::
* Keywords::
* Constants::
* Operators::
* Separators::
* White Space::
@end menu
@node Identifiers
@section Identifiers
@cindex identifiers
Identifiers are sequences of characters used for naming variables,
functions, new data types, and preprocessor macros. The characters
can be letters, decimal digits, the underscore character @samp{_}.
The first character of an identifier cannot be a digit.
Lowercase letters and uppercase letters are distinct, such that
@code{foo} and @code{FOO} are two different identifiers.
When using GNU extensions, you can also include the dollar sign
character @samp{$} in identifiers.
@node Keywords
@section Keywords
@cindex keywords
Keywords are special identifiers reserved for use as part of the
programming language itself. You cannot use them for any other purpose.
Here is a list of keywords recognized by ANSI C89:
@example
auto break case char const continue default do double else enum extern
float for goto if int long register return short signed sizeof static
struct switch typedef union unsigned void volatile while
@end example
ISO C99 adds the following keywords:
@example
inline _Bool _Complex _Imaginary
@end example
@noindent
and GNU extensions add these keywords:
@example
__FUNCTION__ __PRETTY_FUNCTION__ __alignof __alignof__ __asm
__asm__ __attribute __attribute__ __builtin_offsetof __builtin_va_arg
__complex __complex__ __const __extension__ __func__ __imag __imag__
__inline __inline__ __label__ __null __real __real__
__restrict __restrict__ __signed __signed__ __thread __typeof
__volatile __volatile__
@end example
In both ISO C99 and C89 with GNU extensions, the following is recogized
as a keyboard:
@example
restrict
@end example
@node Constants
@section Constants
@cindex constants
A constant is a literal numeric or character value, such as 5 or `m'. All
constants are of a particular data type; you can use type casting to explicitly
specify the type of a constant, or let the compiler use the default type based
on the value of the constant.
@menu
* Integer Constants::
* Character Constants::
* Real Number Constants::
* String Constants::
@end menu
@node Integer Constants
@subsection Integer Constants
@cindex integer constants
@cindex constants, integer
An integer constant is a sequence of digits.
If the sequence of digits is preceded by @code{0x} or @code{0X} (zero x or
zero X), then the constant is considered to be hexadecimal (base 16). Hexadecimal
values may use the digits from 0 to 9, as well as the letters @code{a} to
@code{f} and @code{A} to @code{F}. Here are some examples:
@example
@group
0x2f
0x88
0xAB43
0xAbCd
0x1
@end group
@end example
If the first digit is 0 (zero), and the next character is not @code{x} or
@code{X}, then the constant is considered to be octal (base 8).
Octal values may only use the digits from 0 to 7; 8 and 9 are not allowed.
Here are some examples:
@example
@group
057
012
03
0241
@end group
@end example
In all other cases, the sequence of digits is assumed to be
decimal (base 10). Decimal values may use the digits from
0 to 9. Here are some examples:
@example
@group
459
23901
8
12
@end group
@end example
There are various integer data types, for short integers, long integers,
signed integers, and unsigned integers. You can force an integer constant
to be of a long and/or unsigned integer type by appending a sequence
of one or more letters to the end of the constant:
@table @code
@item u
@itemx U
Unsigned integer type.
@item l
@itemx L
Long integer type.
@end table
For example, @code{45U} is an @code{unsigned int} constant. You can
also combine letters: @code{45UL} is an @code{unsigned long int} constant.
(The letters may be used in any order.)
Both ISO C99 and GNU C extensions add the integer types @code{long long int}
and @code{unsigned long long int}. You can use two @code{L}'s to get a
@code{long long int} constant; add a @code{U} to that and you have an
@code{unsigned long long int} constant. For example: @code{45ULL}.
@node Character Constants
@subsection Character Constants
@cindex character constants
@cindex constants, character
A character constant is usually a single character enclosed within single
quotation marks, such as @code{'Q'}. A character constant is of type
@code{int} by default.
Some characters, such as the single quotation mark character @code{'} itself,
cannot be represented using only one character. To represent such characters,
there are several ``escape sequences'' that you can use:
@table @code
@item \\
Backslash character.
@item \?
Question mark character.
@item \'
Single quotation mark.
@item \"
Double quotation mark.
@item \a
Audible alert.
@item \b
Backspace character.
@item \e
<ESC> character. (This is a GNU extension.)
@item \f
Form feed.
@item \n
Newline character.
@item \r
Carriage return.
@item \t
Horizontal tab.
@item \v
Vertical tab.
@item \o, \oo, \ooo
Octal number.
@item \xh, \xhh, \xhhh, @dots{}
Hexadecimal number.
@end table
To use any of these escape sequences, enclose the sequence in single
quotes, and treat it as if it were any other character. For example,
the letter m is @code{'m'} and the newline character is @code{'\n'}.
The octal number escape sequence is the backslash character followed by
one, two, or three octal digits (0 to 7). For example, 101 is the
octal equivalent of 65, which is the ASCII character @code{'A'}. Thus,
the character constant @code{'\101'} is the same as the character
constant @code{'A'}.
The hexadecimal escape sequence is the backslash character, followed
by @code{x} and an unlimited number of hexadecimal digits (0 to 9, and
@code{a} to @code{f} or @code{A} to @code{F}).
While the length of possible hexadecimal digit strings is unlimited, the number
of character constants in any given character set is not. (The much-used
extended ASCII character set, for example, has only 256 characters in it.)
If you try to use a hexadecimal value that is outside the range of characters,
you will get a compile-time error.
@node Real Number Constants
@subsection Real Number Constants
@cindex floating point constants
@cindex constants, floating point
@cindex real number constants
@cindex constants, real number
A real number constant is a value that represents a fractional (floating
point) number. It consists of a sequence of digits which represents the
integer (or ``whole'') part of the number, a decimal point, and
a sequence of digits which represents the fractional part.
Either the integer part or the fractional part may be omitted, but not
both. Here are some examples:
@example
@group
double a, b, c, d, e, f;
a = 4.7;
b = 4.;
c = 4;
d = .7;
e = 0.7;
@end group
@end example
@noindent
(In the third assignment statement, the integer constant 4 is automatically
converted from an integer value to a double value.)
Real number constants can also be followed by @code{e} or
@code{E}, and an integer exponent. The exponent can be either positive
or negative.
@example
@group
double x, y;
x = 5e2; /* @r{@code{x} is 5 * 100, or 500.0.} */
y = 5e-2; /* @r{@code{y} is 5 * (1/100, or 0.05.} */
@end group
@end example
You can append a letter to the end of a real number constant
to cause it to be of a particular type. If you append the letter
F (or f) to a real number constant, then its type is @code{float}.
If you append the letter L (or l), then its type is @code{long double}.
If you do not append any letters, then its type is @code{double}.
@node String Constants
@subsection String Constants
@cindex string constants
@cindex string literals
A string constant is a sequence of characters, digits, and/or escape
sequences enclosed within double quotation marks. A string constant
is of type ``array of characters''. All string constants contain a
null termination character (@code{\0}) as their last character. Strings
are stored as arrays of characters, with no inherent size attribute.
The null termination character lets string-processing functions know
where the string ends.
Adjacent string constants are concatenated (combined) into
one string, with the null termination character added to the end of
the final concatenated string.
A string cannot contain double quotation marks, as double
quotation marks are used to enclose the string. To include
the double quotation mark character in a string, use the @code{\"}
escape sequence. You can use any of the escape sequences that can
be used as character constants in strings. Here are some example
of string constants:
@example
@group
/* @r{This is a single string constant.} */
"tutti frutti ice cream"
/* @r{These string constants will be concatenated, same as above.} */
"tutti " "frutti" " ice " "cream"
/* @r{This one uses two escape sequences.} */
"\"hello, world!\""
@end group
@end example
@noindent
If a string is too long to fit on one line, you can use a backslash @code{\}
to break it up onto separate lines.
@example
@group
"Today's special is a pastrami sandwich on rye bread with \
a potato knish and a cherry soda."
@end group
@end example
@noindent
Adjacent strings are automatically concatenated, so you can also have string
constants span multiple lines by writing them as separate, adjacent, strings.
For example:
@example
@group
"Tomorrow's special is a corned beef sandwich on "
"pumpernickel bread with a kasha knish and seltzer water."
@end group
@end example
@noindent
is the same as
@example
@group
"Tomorrow's special is a corned beef sandwich on \
pumpernickel bread with a kasha knish and seltzer water."
@end group
@end example
To insert a newline character into the string, so that when the string
is printed it will be printed on two different lines, you can use the newline
escape sequence @samp{\n}.
@example
printf ("potato\nknish");
@end example
@noindent
prints
@example
@group
potato
knish
@end group
@end example
@c TJR: Removing this paragraph until we can determine for sure what
@c limit (including ``no limit'') GCC imposes on string constant length.
@c
@c All ANSI C89 systems must support string constants of at least 509
@c characters after concatenation of adjacent literals. GCC appears not
@c to have a fixed limit (string constants of hundreds of Megabytes will
@c work).
@c ANSI C89, sec 5.2.4.1, ``Translation Limits''.
@node Operators
@section Operators
@cindex operators as lexical elements
An operator is a special token that performs an operation, such as
addition or subtraction, on either one, two, or three operands.
Full coverage of operators can be found in a later
chapter. @xref{Expressions and Operators}.
@node Separators
@section Separators
@cindex separators
A separator separates tokens. White space (see next section) is a
separator, but it is not a token. The other separators are all
single-character tokens themselves:
@example
( ) [ ] @{ @} ; , . :
@end example
@node White Space
@section White Space
@cindex white space
White space is the collective term used for several characters:
the space character, the tab character, the newline character, the
vertical tab character, and the form-feed character.
White space is ignored (outside of string
and character constants), and is therefore optional, except when it is
used to separate tokens. This means that
@example
@group
#include <stdio.h>
int
main()
@{
printf( "hello, world\n" );
return 0;
@}
@end group
@end example
@noindent
and
@example
@group
#include <stdio.h> int main()@{printf("hello, world\n");
return 0;@}
@end group
@end example
@noindent
are functionally the same program.
Although you must use white space to separate many tokens, no
white space is required between operators and operands, nor is
it required between other separators and that which they separate.
@example
@group
/* @r{All of these are valid.} */
x++;
x ++ ;
x=y+z;
x = y + z ;
x=array[2];
x = array [ 2 ] ;
fraction=numerator / *denominator_ptr;
fraction = numerator / * denominator_ptr ;
@end group
@end example
@noindent
Furthermore, wherever one space is allowed, any amount of white space
is allowed.
@example
@group
/* @r{These two statements are functionally identical.} */
x++;
x
++ ;
@end group
@end example
In string constants, spaces and tabs are not ignored; rather, they are
part of the string. Therefore,
@example
"potato knish"
@end example
@noindent
is not the same as
@example
"potato knish"
@end example