diff options
| author | Richard M. Stallman | 2006-09-14 01:43:18 +0000 |
|---|---|---|
| committer | Richard M. Stallman | 2006-09-14 01:43:18 +0000 |
| commit | 4c71c1062a32fb31ae1e33932f026fd0deda0df5 (patch) | |
| tree | 959c99bfa5f73db8ddf5abd1bafdd22907fdddfc | |
| parent | 87bbe2fd4c37e49455933db810104b7eafc6b25a (diff) | |
| download | emacs-4c71c1062a32fb31ae1e33932f026fd0deda0df5.tar.gz emacs-4c71c1062a32fb31ae1e33932f026fd0deda0df5.zip | |
(Character Type): Node split.
Add xref to Describing Characters.
(Basic Char Syntax, General Escape Syntax)
(Ctl-Char Syntax, Meta-Char Syntax): New subnodes.
| -rw-r--r-- | lispref/objects.texi | 167 |
1 files changed, 99 insertions, 68 deletions
diff --git a/lispref/objects.texi b/lispref/objects.texi index cfb3864e9c9..519e93f2eb3 100644 --- a/lispref/objects.texi +++ b/lispref/objects.texi | |||
| @@ -227,9 +227,9 @@ number whose value is 1500. They are all equivalent. | |||
| 227 | other words, characters are represented by their character codes. For | 227 | other words, characters are represented by their character codes. For |
| 228 | example, the character @kbd{A} is represented as the @w{integer 65}. | 228 | example, the character @kbd{A} is represented as the @w{integer 65}. |
| 229 | 229 | ||
| 230 | Individual characters are not often used in programs. It is far more | 230 | Individual characters are used occasionally in programs, but it is |
| 231 | common to work with @emph{strings}, which are sequences composed of | 231 | more common to work with @emph{strings}, which are sequences composed |
| 232 | characters. @xref{String Type}. | 232 | of characters. @xref{String Type}. |
| 233 | 233 | ||
| 234 | Characters in strings, buffers, and files are currently limited to | 234 | Characters in strings, buffers, and files are currently limited to |
| 235 | the range of 0 to 524287---nineteen bits. But not all values in that | 235 | the range of 0 to 524287---nineteen bits. But not all values in that |
| @@ -239,17 +239,32 @@ range are valid character codes. Codes 0 through 127 are | |||
| 239 | input have a much wider range, to encode modifier keys such as | 239 | input have a much wider range, to encode modifier keys such as |
| 240 | Control, Meta and Shift. | 240 | Control, Meta and Shift. |
| 241 | 241 | ||
| 242 | There are special functions for producing a human-readable textual | ||
| 243 | description of a character for the sake of messages. @xref{Describing | ||
| 244 | Characters}. | ||
| 245 | |||
| 246 | @menu | ||
| 247 | * Basic Char Syntax:: | ||
| 248 | * General Escape Syntax:: | ||
| 249 | * Ctl-Char Syntax:: | ||
| 250 | * Meta-Char Syntax:: | ||
| 251 | * Other Char Bits:: | ||
| 252 | @end menu | ||
| 253 | |||
| 254 | @node Basic Char Syntax | ||
| 255 | @subsubsection Basic Char Syntax | ||
| 242 | @cindex read syntax for characters | 256 | @cindex read syntax for characters |
| 243 | @cindex printed representation for characters | 257 | @cindex printed representation for characters |
| 244 | @cindex syntax for characters | 258 | @cindex syntax for characters |
| 245 | @cindex @samp{?} in character constant | 259 | @cindex @samp{?} in character constant |
| 246 | @cindex question mark in character constant | 260 | @cindex question mark in character constant |
| 247 | Since characters are really integers, the printed representation of a | 261 | |
| 248 | character is a decimal number. This is also a possible read syntax for | 262 | Since characters are really integers, the printed representation of |
| 249 | a character, but writing characters that way in Lisp programs is a very | 263 | a character is a decimal number. This is also a possible read syntax |
| 250 | bad idea. You should @emph{always} use the special read syntax formats | 264 | for a character, but writing characters that way in Lisp programs is |
| 251 | that Emacs Lisp provides for characters. These syntax formats start | 265 | not clear programming. You should @emph{always} use the special read |
| 252 | with a question mark. | 266 | syntax formats that Emacs Lisp provides for characters. These syntax |
| 267 | formats start with a question mark. | ||
| 253 | 268 | ||
| 254 | The usual read syntax for alphanumeric characters is a question mark | 269 | The usual read syntax for alphanumeric characters is a question mark |
| 255 | followed by the character; thus, @samp{?A} for the character | 270 | followed by the character; thus, @samp{?A} for the character |
| @@ -315,8 +330,76 @@ the ``super'' modifier to the following character.) Thus, | |||
| 315 | character @key{ESC}. @samp{\s} is meant for use in character | 330 | character @key{ESC}. @samp{\s} is meant for use in character |
| 316 | constants; in string constants, just write the space. | 331 | constants; in string constants, just write the space. |
| 317 | 332 | ||
| 333 | A backslash is allowed, and harmless, preceding any character without | ||
| 334 | a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}. | ||
| 335 | There is no reason to add a backslash before most characters. However, | ||
| 336 | you should add a backslash before any of the characters | ||
| 337 | @samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing | ||
| 338 | Lisp code. You can also add a backslash before whitespace characters such as | ||
| 339 | space, tab, newline and formfeed. However, it is cleaner to use one of | ||
| 340 | the easily readable escape sequences, such as @samp{\t} or @samp{\s}, | ||
| 341 | instead of an actual whitespace character such as a tab or a space. | ||
| 342 | (If you do write backslash followed by a space, you should write | ||
| 343 | an extra space after the character constant to separate it from the | ||
| 344 | following text.) | ||
| 345 | |||
| 346 | @node General Escape Syntax | ||
| 347 | @subsubsection General Escape Syntax | ||
| 348 | |||
| 349 | In addition to the specific excape sequences for special important | ||
| 350 | control characters, Emacs provides general categories of escape syntax | ||
| 351 | that you can use to specify non-ASCII text characters. | ||
| 352 | |||
| 353 | @cindex unicode character escape | ||
| 354 | For instance, you can specify characters by their Unicode values. | ||
| 355 | @code{?\u@var{nnnn}} represents a character that maps to the Unicode | ||
| 356 | code point @samp{U+@var{nnnn}}. There is a slightly different syntax | ||
| 357 | for specifying characters with code points above @code{#xFFFF}; | ||
| 358 | @code{\U00@var{nnnnnn}} represents the character whose Unicode code | ||
| 359 | point is @samp{U+@var{nnnnnn}}, if such a character is supported by | ||
| 360 | Emacs. If the corresponding character is not supported, Emacs signals | ||
| 361 | an error. | ||
| 362 | |||
| 363 | This peculiar and inconvenient syntax was adopted for compatibility | ||
| 364 | with other programming languages. Unlike some other languages, Emacs | ||
| 365 | Lisp supports this syntax in only character literals and strings. | ||
| 366 | |||
| 367 | @cindex @samp{\} in character constant | ||
| 368 | @cindex backslash in character constant | ||
| 369 | @cindex octal character code | ||
| 370 | The most general read syntax for a character represents the | ||
| 371 | character code in either octal or hex. To use octal, write a question | ||
| 372 | mark followed by a backslash and the octal character code (up to three | ||
| 373 | octal digits); thus, @samp{?\101} for the character @kbd{A}, | ||
| 374 | @samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the | ||
| 375 | character @kbd{C-b}. Although this syntax can represent any | ||
| 376 | @acronym{ASCII} character, it is preferred only when the precise octal | ||
| 377 | value is more important than the @acronym{ASCII} representation. | ||
| 378 | |||
| 379 | @example | ||
| 380 | @group | ||
| 381 | ?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10 | ||
| 382 | ?\101 @result{} 65 ?A @result{} 65 | ||
| 383 | @end group | ||
| 384 | @end example | ||
| 385 | |||
| 386 | To use hex, write a question mark followed by a backslash, @samp{x}, | ||
| 387 | and the hexadecimal character code. You can use any number of hex | ||
| 388 | digits, so you can represent any character code in this way. | ||
| 389 | Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the | ||
| 390 | character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character | ||
| 391 | @iftex | ||
| 392 | @samp{@`a}. | ||
| 393 | @end iftex | ||
| 394 | @ifnottex | ||
| 395 | @samp{a} with grave accent. | ||
| 396 | @end ifnottex | ||
| 397 | |||
| 398 | @node Ctl-Char Syntax | ||
| 399 | @subsubsection Control-Character Syntax | ||
| 400 | |||
| 318 | @cindex control characters | 401 | @cindex control characters |
| 319 | Control characters may be represented using yet another read syntax. | 402 | Control characters can be represented using yet another read syntax. |
| 320 | This consists of a question mark followed by a backslash, caret, and the | 403 | This consists of a question mark followed by a backslash, caret, and the |
| 321 | corresponding non-control character, in either upper or lower case. For | 404 | corresponding non-control character, in either upper or lower case. For |
| 322 | example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the | 405 | example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the |
| @@ -363,6 +446,9 @@ input, we prefer the @samp{C-} syntax. Which one you use does not | |||
| 363 | affect the meaning of the program, but may guide the understanding of | 446 | affect the meaning of the program, but may guide the understanding of |
| 364 | people who read it. | 447 | people who read it. |
| 365 | 448 | ||
| 449 | @node Meta-Char Syntax | ||
| 450 | @subsubsection Meta-Character Syntax | ||
| 451 | |||
| 366 | @cindex meta characters | 452 | @cindex meta characters |
| 367 | A @dfn{meta character} is a character typed with the @key{META} | 453 | A @dfn{meta character} is a character typed with the @key{META} |
| 368 | modifier key. The integer that represents such a character has the | 454 | modifier key. The integer that represents such a character has the |
| @@ -395,6 +481,9 @@ syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A}, | |||
| 395 | or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as | 481 | or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as |
| 396 | @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}. | 482 | @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}. |
| 397 | 483 | ||
| 484 | @node Other Char Bits | ||
| 485 | @subsubsection Other Character Modifier Bits | ||
| 486 | |||
| 398 | The case of a graphic character is indicated by its character code; | 487 | The case of a graphic character is indicated by its character code; |
| 399 | for example, @acronym{ASCII} distinguishes between the characters @samp{a} | 488 | for example, @acronym{ASCII} distinguishes between the characters @samp{a} |
| 400 | and @samp{A}. But @acronym{ASCII} has no way to represent whether a control | 489 | and @samp{A}. But @acronym{ASCII} has no way to represent whether a control |
| @@ -431,64 +520,6 @@ Numerically, the | |||
| 431 | bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper. | 520 | bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper. |
| 432 | @end ifnottex | 521 | @end ifnottex |
| 433 | 522 | ||
| 434 | @cindex unicode character escape | ||
| 435 | Emacs provides a syntax for specifying characters by their Unicode | ||
| 436 | code points. @code{?\u@var{nnnn}} represents a character that maps to | ||
| 437 | the Unicode code point @samp{U+@var{nnnn}}. There is a slightly | ||
| 438 | different syntax for specifying characters with code points above | ||
| 439 | @code{#xFFFF}; @code{\U00@var{nnnnnn}} represents the character whose | ||
| 440 | Unicode code point is @samp{U+@var{nnnnnn}}, if such a character | ||
| 441 | is supported by Emacs. If the corresponding character is not | ||
| 442 | supported, Emacs signals an error. | ||
| 443 | |||
| 444 | This peculiar and inconvenient syntax was adopted for compatibility | ||
| 445 | with other programming languages. Unlike some other languages, Emacs | ||
| 446 | Lisp supports this syntax in only character literals and strings. | ||
| 447 | |||
| 448 | @cindex @samp{\} in character constant | ||
| 449 | @cindex backslash in character constant | ||
| 450 | @cindex octal character code | ||
| 451 | Finally, the most general read syntax for a character represents the | ||
| 452 | character code in either octal or hex. To use octal, write a question | ||
| 453 | mark followed by a backslash and the octal character code (up to three | ||
| 454 | octal digits); thus, @samp{?\101} for the character @kbd{A}, | ||
| 455 | @samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the | ||
| 456 | character @kbd{C-b}. Although this syntax can represent any @acronym{ASCII} | ||
| 457 | character, it is preferred only when the precise octal value is more | ||
| 458 | important than the @acronym{ASCII} representation. | ||
| 459 | |||
| 460 | @example | ||
| 461 | @group | ||
| 462 | ?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10 | ||
| 463 | ?\101 @result{} 65 ?A @result{} 65 | ||
| 464 | @end group | ||
| 465 | @end example | ||
| 466 | |||
| 467 | To use hex, write a question mark followed by a backslash, @samp{x}, | ||
| 468 | and the hexadecimal character code. You can use any number of hex | ||
| 469 | digits, so you can represent any character code in this way. | ||
| 470 | Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the | ||
| 471 | character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character | ||
| 472 | @iftex | ||
| 473 | @samp{@`a}. | ||
| 474 | @end iftex | ||
| 475 | @ifnottex | ||
| 476 | @samp{a} with grave accent. | ||
| 477 | @end ifnottex | ||
| 478 | |||
| 479 | A backslash is allowed, and harmless, preceding any character without | ||
| 480 | a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}. | ||
| 481 | There is no reason to add a backslash before most characters. However, | ||
| 482 | you should add a backslash before any of the characters | ||
| 483 | @samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing | ||
| 484 | Lisp code. You can also add a backslash before whitespace characters such as | ||
| 485 | space, tab, newline and formfeed. However, it is cleaner to use one of | ||
| 486 | the easily readable escape sequences, such as @samp{\t} or @samp{\s}, | ||
| 487 | instead of an actual whitespace character such as a tab or a space. | ||
| 488 | (If you do write backslash followed by a space, you should write | ||
| 489 | an extra space after the character constant to separate it from the | ||
| 490 | following text.) | ||
| 491 | |||
| 492 | @node Symbol Type | 523 | @node Symbol Type |
| 493 | @subsection Symbol Type | 524 | @subsection Symbol Type |
| 494 | 525 | ||