aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRichard M. Stallman2006-09-14 01:43:18 +0000
committerRichard M. Stallman2006-09-14 01:43:18 +0000
commit4c71c1062a32fb31ae1e33932f026fd0deda0df5 (patch)
tree959c99bfa5f73db8ddf5abd1bafdd22907fdddfc
parent87bbe2fd4c37e49455933db810104b7eafc6b25a (diff)
downloademacs-4c71c1062a32fb31ae1e33932f026fd0deda0df5.tar.gz
emacs-4c71c1062a32fb31ae1e33932f026fd0deda0df5.zip
(Character Type): Node split.
Add xref to Describing Characters. (Basic Char Syntax, General Escape Syntax) (Ctl-Char Syntax, Meta-Char Syntax): New subnodes.
-rw-r--r--lispref/objects.texi167
1 files changed, 99 insertions, 68 deletions
diff --git a/lispref/objects.texi b/lispref/objects.texi
index cfb3864e9c9..519e93f2eb3 100644
--- a/lispref/objects.texi
+++ b/lispref/objects.texi
@@ -227,9 +227,9 @@ number whose value is 1500. They are all equivalent.
227other words, characters are represented by their character codes. For 227other words, characters are represented by their character codes. For
228example, the character @kbd{A} is represented as the @w{integer 65}. 228example, the character @kbd{A} is represented as the @w{integer 65}.
229 229
230 Individual characters are not often used in programs. It is far more 230 Individual characters are used occasionally in programs, but it is
231common to work with @emph{strings}, which are sequences composed of 231more common to work with @emph{strings}, which are sequences composed
232characters. @xref{String Type}. 232of characters. @xref{String Type}.
233 233
234 Characters in strings, buffers, and files are currently limited to 234 Characters in strings, buffers, and files are currently limited to
235the range of 0 to 524287---nineteen bits. But not all values in that 235the range of 0 to 524287---nineteen bits. But not all values in that
@@ -239,17 +239,32 @@ range are valid character codes. Codes 0 through 127 are
239input have a much wider range, to encode modifier keys such as 239input have a much wider range, to encode modifier keys such as
240Control, Meta and Shift. 240Control, Meta and Shift.
241 241
242 There are special functions for producing a human-readable textual
243description of a character for the sake of messages. @xref{Describing
244Characters}.
245
246@menu
247* Basic Char Syntax::
248* General Escape Syntax::
249* Ctl-Char Syntax::
250* Meta-Char Syntax::
251* Other Char Bits::
252@end menu
253
254@node Basic Char Syntax
255@subsubsection Basic Char Syntax
242@cindex read syntax for characters 256@cindex read syntax for characters
243@cindex printed representation for characters 257@cindex printed representation for characters
244@cindex syntax for characters 258@cindex syntax for characters
245@cindex @samp{?} in character constant 259@cindex @samp{?} in character constant
246@cindex question mark in character constant 260@cindex question mark in character constant
247 Since characters are really integers, the printed representation of a 261
248character is a decimal number. This is also a possible read syntax for 262 Since characters are really integers, the printed representation of
249a character, but writing characters that way in Lisp programs is a very 263a character is a decimal number. This is also a possible read syntax
250bad idea. You should @emph{always} use the special read syntax formats 264for a character, but writing characters that way in Lisp programs is
251that Emacs Lisp provides for characters. These syntax formats start 265not clear programming. You should @emph{always} use the special read
252with a question mark. 266syntax formats that Emacs Lisp provides for characters. These syntax
267formats start with a question mark.
253 268
254 The usual read syntax for alphanumeric characters is a question mark 269 The usual read syntax for alphanumeric characters is a question mark
255followed by the character; thus, @samp{?A} for the character 270followed by the character; thus, @samp{?A} for the character
@@ -315,8 +330,76 @@ the ``super'' modifier to the following character.) Thus,
315character @key{ESC}. @samp{\s} is meant for use in character 330character @key{ESC}. @samp{\s} is meant for use in character
316constants; in string constants, just write the space. 331constants; in string constants, just write the space.
317 332
333 A backslash is allowed, and harmless, preceding any character without
334a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
335There is no reason to add a backslash before most characters. However,
336you should add a backslash before any of the characters
337@samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
338Lisp code. You can also add a backslash before whitespace characters such as
339space, tab, newline and formfeed. However, it is cleaner to use one of
340the easily readable escape sequences, such as @samp{\t} or @samp{\s},
341instead of an actual whitespace character such as a tab or a space.
342(If you do write backslash followed by a space, you should write
343an extra space after the character constant to separate it from the
344following text.)
345
346@node General Escape Syntax
347@subsubsection General Escape Syntax
348
349 In addition to the specific excape sequences for special important
350control characters, Emacs provides general categories of escape syntax
351that you can use to specify non-ASCII text characters.
352
353@cindex unicode character escape
354 For instance, you can specify characters by their Unicode values.
355@code{?\u@var{nnnn}} represents a character that maps to the Unicode
356code point @samp{U+@var{nnnn}}. There is a slightly different syntax
357for specifying characters with code points above @code{#xFFFF};
358@code{\U00@var{nnnnnn}} represents the character whose Unicode code
359point is @samp{U+@var{nnnnnn}}, if such a character is supported by
360Emacs. If the corresponding character is not supported, Emacs signals
361an error.
362
363 This peculiar and inconvenient syntax was adopted for compatibility
364with other programming languages. Unlike some other languages, Emacs
365Lisp supports this syntax in only character literals and strings.
366
367@cindex @samp{\} in character constant
368@cindex backslash in character constant
369@cindex octal character code
370 The most general read syntax for a character represents the
371character code in either octal or hex. To use octal, write a question
372mark followed by a backslash and the octal character code (up to three
373octal digits); thus, @samp{?\101} for the character @kbd{A},
374@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
375character @kbd{C-b}. Although this syntax can represent any
376@acronym{ASCII} character, it is preferred only when the precise octal
377value is more important than the @acronym{ASCII} representation.
378
379@example
380@group
381?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
382?\101 @result{} 65 ?A @result{} 65
383@end group
384@end example
385
386 To use hex, write a question mark followed by a backslash, @samp{x},
387and the hexadecimal character code. You can use any number of hex
388digits, so you can represent any character code in this way.
389Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
390character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
391@iftex
392@samp{@`a}.
393@end iftex
394@ifnottex
395@samp{a} with grave accent.
396@end ifnottex
397
398@node Ctl-Char Syntax
399@subsubsection Control-Character Syntax
400
318@cindex control characters 401@cindex control characters
319 Control characters may be represented using yet another read syntax. 402 Control characters can be represented using yet another read syntax.
320This consists of a question mark followed by a backslash, caret, and the 403This consists of a question mark followed by a backslash, caret, and the
321corresponding non-control character, in either upper or lower case. For 404corresponding non-control character, in either upper or lower case. For
322example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the 405example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the
@@ -363,6 +446,9 @@ input, we prefer the @samp{C-} syntax. Which one you use does not
363affect the meaning of the program, but may guide the understanding of 446affect the meaning of the program, but may guide the understanding of
364people who read it. 447people who read it.
365 448
449@node Meta-Char Syntax
450@subsubsection Meta-Character Syntax
451
366@cindex meta characters 452@cindex meta characters
367 A @dfn{meta character} is a character typed with the @key{META} 453 A @dfn{meta character} is a character typed with the @key{META}
368modifier key. The integer that represents such a character has the 454modifier key. The integer that represents such a character has the
@@ -395,6 +481,9 @@ syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A},
395or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as 481or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as
396@samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}. 482@samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}.
397 483
484@node Other Char Bits
485@subsubsection Other Character Modifier Bits
486
398 The case of a graphic character is indicated by its character code; 487 The case of a graphic character is indicated by its character code;
399for example, @acronym{ASCII} distinguishes between the characters @samp{a} 488for example, @acronym{ASCII} distinguishes between the characters @samp{a}
400and @samp{A}. But @acronym{ASCII} has no way to represent whether a control 489and @samp{A}. But @acronym{ASCII} has no way to represent whether a control
@@ -431,64 +520,6 @@ Numerically, the
431bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper. 520bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper.
432@end ifnottex 521@end ifnottex
433 522
434@cindex unicode character escape
435 Emacs provides a syntax for specifying characters by their Unicode
436code points. @code{?\u@var{nnnn}} represents a character that maps to
437the Unicode code point @samp{U+@var{nnnn}}. There is a slightly
438different syntax for specifying characters with code points above
439@code{#xFFFF}; @code{\U00@var{nnnnnn}} represents the character whose
440Unicode code point is @samp{U+@var{nnnnnn}}, if such a character
441is supported by Emacs. If the corresponding character is not
442supported, Emacs signals an error.
443
444 This peculiar and inconvenient syntax was adopted for compatibility
445with other programming languages. Unlike some other languages, Emacs
446Lisp supports this syntax in only character literals and strings.
447
448@cindex @samp{\} in character constant
449@cindex backslash in character constant
450@cindex octal character code
451 Finally, the most general read syntax for a character represents the
452character code in either octal or hex. To use octal, write a question
453mark followed by a backslash and the octal character code (up to three
454octal digits); thus, @samp{?\101} for the character @kbd{A},
455@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
456character @kbd{C-b}. Although this syntax can represent any @acronym{ASCII}
457character, it is preferred only when the precise octal value is more
458important than the @acronym{ASCII} representation.
459
460@example
461@group
462?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
463?\101 @result{} 65 ?A @result{} 65
464@end group
465@end example
466
467 To use hex, write a question mark followed by a backslash, @samp{x},
468and the hexadecimal character code. You can use any number of hex
469digits, so you can represent any character code in this way.
470Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
471character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
472@iftex
473@samp{@`a}.
474@end iftex
475@ifnottex
476@samp{a} with grave accent.
477@end ifnottex
478
479 A backslash is allowed, and harmless, preceding any character without
480a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
481There is no reason to add a backslash before most characters. However,
482you should add a backslash before any of the characters
483@samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
484Lisp code. You can also add a backslash before whitespace characters such as
485space, tab, newline and formfeed. However, it is cleaner to use one of
486the easily readable escape sequences, such as @samp{\t} or @samp{\s},
487instead of an actual whitespace character such as a tab or a space.
488(If you do write backslash followed by a space, you should write
489an extra space after the character constant to separate it from the
490following text.)
491
492@node Symbol Type 523@node Symbol Type
493@subsection Symbol Type 524@subsection Symbol Type
494 525