(Character Type): Node split.

Add xref to Describing Characters. (Basic Char Syntax, General Escape Syntax) (Ctl-Char Syntax, Meta-Char Syntax): New subnodes.
author: Richard M. Stallman 2006-09-14 01:43:18 +0000
committer: Richard M. Stallman 2006-09-14 01:43:18 +0000
commit: 4c71c1062a32fb31ae1e33932f026fd0deda0df5 (patch)
tree: 959c99bfa5f73db8ddf5abd1bafdd22907fdddfc
parent: 87bbe2fd4c37e49455933db810104b7eafc6b25a (diff)
download: emacs-4c71c1062a32fb31ae1e33932f026fd0deda0df5.tar.gz
emacs-4c71c1062a32fb31ae1e33932f026fd0deda0df5.zip
1 files changed, 99 insertions, 68 deletions
diff --git a/lispref/objects.texi b/lispref/objects.texi
index cfb3864e9c9..519e93f2eb3 100644
--- a/lispref/objects.texi
+++ b/lispref/objects.texi
@@ -227,9 +227,9 @@ number whose value is 1500.  They are all equivalent.
 other words, characters are represented by their character codes.  For
 example, the character @kbd{A} is represented as the @w{integer 65}.
-  Individual characters are not often used in programs.  It is far more
+  Individual characters are used occasionally in programs, but it is
-common to work with @emph{strings}, which are sequences composed of
+more common to work with @emph{strings}, which are sequences composed
-characters.  @xref{String Type}.
+of characters.  @xref{String Type}.
  Characters in strings, buffers, and files are currently limited to
 the range of 0 to 524287---nineteen bits.  But not all values in that
@@ -239,17 +239,32 @@ range are valid character codes.  Codes 0 through 127 are
 input have a much wider range, to encode modifier keys such as
 Control, Meta and Shift.
+  There are special functions for producing a human-readable textual
+description of a character for the sake of messages.  @xref{Describing
+Characters}.
+@menu
+* Basic Char Syntax::
+* General Escape Syntax::
+* Ctl-Char Syntax::
+* Meta-Char Syntax::
+* Other Char Bits::
+@end menu
+@node Basic Char Syntax
+@subsubsection Basic Char Syntax
 @cindex read syntax for characters
 @cindex printed representation for characters
 @cindex syntax for characters
 @cindex @samp{?} in character constant
 @cindex question mark in character constant
-  Since characters are really integers, the printed representation of a
-character is a decimal number.  This is also a possible read syntax for
+  Since characters are really integers, the printed representation of
-a character, but writing characters that way in Lisp programs is a very
+a character is a decimal number.  This is also a possible read syntax
-bad idea.  You should @emph{always} use the special read syntax formats
+for a character, but writing characters that way in Lisp programs is
-that Emacs Lisp provides for characters.  These syntax formats start
+not clear programming.  You should @emph{always} use the special read
-with a question mark.
+syntax formats that Emacs Lisp provides for characters.  These syntax
+formats start with a question mark.
  The usual read syntax for alphanumeric characters is a question mark
 followed by the character; thus, @samp{?A} for the character
@@ -315,8 +330,76 @@ the ``super'' modifier to the following character.)  Thus,
 character @key{ESC}.  @samp{\s} is meant for use in character
 constants; in string constants, just write the space.
+  A backslash is allowed, and harmless, preceding any character without
+a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
+There is no reason to add a backslash before most characters.  However,
+you should add a backslash before any of the characters
+@samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
+Lisp code.  You can also add a backslash before whitespace characters such as
+space, tab, newline and formfeed.  However, it is cleaner to use one of
+the easily readable escape sequences, such as @samp{\t} or @samp{\s},
+instead of an actual whitespace character such as a tab or a space.
+(If you do write backslash followed by a space, you should write
+an extra space after the character constant to separate it from the
+following text.)
+@node General Escape Syntax
+@subsubsection General Escape Syntax
+  In addition to the specific excape sequences for special important
+control characters, Emacs provides general categories of escape syntax
+that you can use to specify non-ASCII text characters.
+@cindex unicode character escape
+  For instance, you can specify characters by their Unicode values.
+@code{?\u@var{nnnn}} represents a character that maps to the Unicode
+code point @samp{U+@var{nnnn}}.  There is a slightly different syntax
+for specifying characters with code points above @code{#xFFFF};
+@code{\U00@var{nnnnnn}} represents the character whose Unicode code
+point is @samp{U+@var{nnnnnn}}, if such a character is supported by
+Emacs.  If the corresponding character is not supported, Emacs signals
+an error.
+  This peculiar and inconvenient syntax was adopted for compatibility
+with other programming languages.  Unlike some other languages, Emacs
+Lisp supports this syntax in only character literals and strings.
+@cindex @samp{\} in character constant
+@cindex backslash in character constant
+@cindex octal character code
+  The most general read syntax for a character represents the
+character code in either octal or hex.  To use octal, write a question
+mark followed by a backslash and the octal character code (up to three
+octal digits); thus, @samp{?\101} for the character @kbd{A},
+@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
+character @kbd{C-b}.  Although this syntax can represent any
+@acronym{ASCII} character, it is preferred only when the precise octal
+value is more important than the @acronym{ASCII} representation.
+@example
+@group
+?\012 @result{} 10         ?\n @result{} 10         ?\C-j @result{} 10
+?\101 @result{} 65         ?A @result{} 65
+@end group
+@end example
+  To use hex, write a question mark followed by a backslash, @samp{x},
+and the hexadecimal character code.  You can use any number of hex
+digits, so you can represent any character code in this way.
+Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
+character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
+@iftex
+@samp{@`a}.
+@end iftex
+@ifnottex
+@samp{a} with grave accent.
+@end ifnottex
+@node Ctl-Char Syntax
+@subsubsection Control-Character Syntax
 @cindex control characters
-  Control characters may be represented using yet another read syntax.
+  Control characters can be represented using yet another read syntax.
 This consists of a question mark followed by a backslash, caret, and the
 corresponding non-control character, in either upper or lower case.  For
 example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the
@@ -363,6 +446,9 @@ input, we prefer the @samp{C-} syntax.  Which one you use does not
 affect the meaning of the program, but may guide the understanding of
 people who read it.
+@node Meta-Char Syntax
+@subsubsection Meta-Character Syntax
 @cindex meta characters
  A @dfn{meta character} is a character typed with the @key{META}
 modifier key.  The integer that represents such a character has the
@@ -395,6 +481,9 @@ syntax for a character.  Thus, you can write @kbd{M-A} as @samp{?\M-A},
 or as @samp{?\M-\101}.  Likewise, you can write @kbd{C-M-b} as
 @samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}.
+@node Other Char Bits
+@subsubsection Other Character Modifier Bits
  The case of a graphic character is indicated by its character code;
 for example, @acronym{ASCII} distinguishes between the characters @samp{a}
 and @samp{A}.  But @acronym{ASCII} has no way to represent whether a control
@@ -431,64 +520,6 @@ Numerically, the
 bit values are 2**22 for alt, 2**23 for super and 2**24 for hyper.
 @end ifnottex
-@cindex unicode character escape
-  Emacs provides a syntax for specifying characters by their Unicode
-code points.  @code{?\u@var{nnnn}} represents a character that maps to
-the Unicode code point @samp{U+@var{nnnn}}.  There is a slightly
-different syntax for specifying characters with code points above
-@code{#xFFFF}; @code{\U00@var{nnnnnn}} represents the character whose
-Unicode code point is @samp{U+@var{nnnnnn}}, if such a character
-is supported by Emacs.  If the corresponding character is not
-supported, Emacs signals an error.
-  This peculiar and inconvenient syntax was adopted for compatibility
-with other programming languages.  Unlike some other languages, Emacs
-Lisp supports this syntax in only character literals and strings.
-@cindex @samp{\} in character constant
-@cindex backslash in character constant
-@cindex octal character code
-  Finally, the most general read syntax for a character represents the
-character code in either octal or hex.  To use octal, write a question
-mark followed by a backslash and the octal character code (up to three
-octal digits); thus, @samp{?\101} for the character @kbd{A},
-@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
-character @kbd{C-b}.  Although this syntax can represent any @acronym{ASCII}
-character, it is preferred only when the precise octal value is more
-important than the @acronym{ASCII} representation.
-@example
-@group
-?\012 @result{} 10         ?\n @result{} 10         ?\C-j @result{} 10
-?\101 @result{} 65         ?A @result{} 65
-@end group
-@end example
-  To use hex, write a question mark followed by a backslash, @samp{x},
-and the hexadecimal character code.  You can use any number of hex
-digits, so you can represent any character code in this way.
-Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
-character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
-@iftex
-@samp{@`a}.
-@end iftex
-@ifnottex
-@samp{a} with grave accent.
-@end ifnottex
-  A backslash is allowed, and harmless, preceding any character without
-a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
-There is no reason to add a backslash before most characters.  However,
-you should add a backslash before any of the characters
-@samp{()\|;'`"#.,} to avoid confusing the Emacs commands for editing
-Lisp code.  You can also add a backslash before whitespace characters such as
-space, tab, newline and formfeed.  However, it is cleaner to use one of
-the easily readable escape sequences, such as @samp{\t} or @samp{\s},
-instead of an actual whitespace character such as a tab or a space.
-(If you do write backslash followed by a space, you should write
-an extra space after the character constant to separate it from the
-following text.)
 @node Symbol Type
 @subsection Symbol Type
author	Richard M. Stallman	2006-09-14 01:43:18 +0000
committer	Richard M. Stallman	2006-09-14 01:43:18 +0000
commit	4c71c1062a32fb31ae1e33932f026fd0deda0df5 (patch)
tree	959c99bfa5f73db8ddf5abd1bafdd22907fdddfc
parent	87bbe2fd4c37e49455933db810104b7eafc6b25a (diff)
download	emacs-4c71c1062a32fb31ae1e33932f026fd0deda0df5.tar.gz emacs-4c71c1062a32fb31ae1e33932f026fd0deda0df5.zip

diff --git a/lispref/objects.texi b/lispref/objects.texi index cfb3864e9c9..519e93f2eb3 100644 --- a/lispref/objects.texi +++ b/lispref/objects.texi
@@ -227,9 +227,9 @@ number whose value is 1500. They are all equivalent.
227	other words, characters are represented by their character codes. For	227	other words, characters are represented by their character codes. For
228	example, the character @kbd{A} is represented as the @w{integer 65}.	228	example, the character @kbd{A} is represented as the @w{integer 65}.
229		229
230	Individual characters are not often used in programs. It is far more	230	Individual characters are used occasionally in programs, but it is
231	common to work with @emph{strings}, which are sequences composed of	231	more common to work with @emph{strings}, which are sequences composed
232	characters. @xref{String Type}.	232	of characters. @xref{String Type}.
233		233
234	Characters in strings, buffers, and files are currently limited to	234	Characters in strings, buffers, and files are currently limited to
235	the range of 0 to 524287---nineteen bits. But not all values in that	235	the range of 0 to 524287---nineteen bits. But not all values in that
@@ -239,17 +239,32 @@ range are valid character codes. Codes 0 through 127 are
239	input have a much wider range, to encode modifier keys such as	239	input have a much wider range, to encode modifier keys such as
240	Control, Meta and Shift.	240	Control, Meta and Shift.
241		241
		242	There are special functions for producing a human-readable textual
		243	description of a character for the sake of messages. @xref{Describing
		244	Characters}.
		245
		246	@menu
		247	* Basic Char Syntax::
		248	* General Escape Syntax::
		249	* Ctl-Char Syntax::
		250	* Meta-Char Syntax::
		251	* Other Char Bits::
		252	@end menu
		253
		254	@node Basic Char Syntax
		255	@subsubsection Basic Char Syntax
242	@cindex read syntax for characters	256	@cindex read syntax for characters
243	@cindex printed representation for characters	257	@cindex printed representation for characters
244	@cindex syntax for characters	258	@cindex syntax for characters
245	@cindex @samp{?} in character constant	259	@cindex @samp{?} in character constant
246	@cindex question mark in character constant	260	@cindex question mark in character constant
247	Since characters are really integers, the printed representation of a	261
248	character is a decimal number. This is also a possible read syntax for	262	Since characters are really integers, the printed representation of
249	a character, but writing characters that way in Lisp programs is a very	263	a character is a decimal number. This is also a possible read syntax
250	bad idea. You should @emph{always} use the special read syntax formats	264	for a character, but writing characters that way in Lisp programs is
251	that Emacs Lisp provides for characters. These syntax formats start	265	not clear programming. You should @emph{always} use the special read
252	with a question mark.	266	syntax formats that Emacs Lisp provides for characters. These syntax
		267	formats start with a question mark.
253		268
254	The usual read syntax for alphanumeric characters is a question mark	269	The usual read syntax for alphanumeric characters is a question mark
255	followed by the character; thus, @samp{?A} for the character	270	followed by the character; thus, @samp{?A} for the character
@@ -315,8 +330,76 @@ the ``super'' modifier to the following character.) Thus,
315	character @key{ESC}. @samp{\s} is meant for use in character	330	character @key{ESC}. @samp{\s} is meant for use in character
316	constants; in string constants, just write the space.	331	constants; in string constants, just write the space.
317		332
		333	A backslash is allowed, and harmless, preceding any character without
		334	a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
		335	There is no reason to add a backslash before most characters. However,
		336	you should add a backslash before any of the characters
		337	@samp{()\\|;'`"#.,} to avoid confusing the Emacs commands for editing
		338	Lisp code. You can also add a backslash before whitespace characters such as
		339	space, tab, newline and formfeed. However, it is cleaner to use one of
		340	the easily readable escape sequences, such as @samp{\t} or @samp{\s},
		341	instead of an actual whitespace character such as a tab or a space.
		342	(If you do write backslash followed by a space, you should write
		343	an extra space after the character constant to separate it from the
		344	following text.)
		345
		346	@node General Escape Syntax
		347	@subsubsection General Escape Syntax
		348
		349	In addition to the specific excape sequences for special important
		350	control characters, Emacs provides general categories of escape syntax
		351	that you can use to specify non-ASCII text characters.
		352
		353	@cindex unicode character escape
		354	For instance, you can specify characters by their Unicode values.
		355	@code{?\u@var{nnnn}} represents a character that maps to the Unicode
		356	code point @samp{U+@var{nnnn}}. There is a slightly different syntax
		357	for specifying characters with code points above @code{#xFFFF};
		358	@code{\U00@var{nnnnnn}} represents the character whose Unicode code
		359	point is @samp{U+@var{nnnnnn}}, if such a character is supported by
		360	Emacs. If the corresponding character is not supported, Emacs signals
		361	an error.
		362
		363	This peculiar and inconvenient syntax was adopted for compatibility
		364	with other programming languages. Unlike some other languages, Emacs
		365	Lisp supports this syntax in only character literals and strings.
		366
		367	@cindex @samp{\} in character constant
		368	@cindex backslash in character constant
		369	@cindex octal character code
		370	The most general read syntax for a character represents the
		371	character code in either octal or hex. To use octal, write a question
		372	mark followed by a backslash and the octal character code (up to three
		373	octal digits); thus, @samp{?\101} for the character @kbd{A},
		374	@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
		375	character @kbd{C-b}. Although this syntax can represent any
		376	@acronym{ASCII} character, it is preferred only when the precise octal
		377	value is more important than the @acronym{ASCII} representation.
		378
		379	@example
		380	@group
		381	?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
		382	?\101 @result{} 65 ?A @result{} 65
		383	@end group
		384	@end example
		385
		386	To use hex, write a question mark followed by a backslash, @samp{x},
		387	and the hexadecimal character code. You can use any number of hex
		388	digits, so you can represent any character code in this way.
		389	Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
		390	character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
		391	@iftex
		392	@samp{@`a}.
		393	@end iftex
		394	@ifnottex
		395	@samp{a} with grave accent.
		396	@end ifnottex
		397
		398	@node Ctl-Char Syntax
		399	@subsubsection Control-Character Syntax
		400
318	@cindex control characters	401	@cindex control characters
319	Control characters may be represented using yet another read syntax.	402	Control characters can be represented using yet another read syntax.
320	This consists of a question mark followed by a backslash, caret, and the	403	This consists of a question mark followed by a backslash, caret, and the
321	corresponding non-control character, in either upper or lower case. For	404	corresponding non-control character, in either upper or lower case. For
322	example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the	405	example, both @samp{?\^I} and @samp{?\^i} are valid read syntax for the
@@ -363,6 +446,9 @@ input, we prefer the @samp{C-} syntax. Which one you use does not
363	affect the meaning of the program, but may guide the understanding of	446	affect the meaning of the program, but may guide the understanding of
364	people who read it.	447	people who read it.
365		448
		449	@node Meta-Char Syntax
		450	@subsubsection Meta-Character Syntax
		451
366	@cindex meta characters	452	@cindex meta characters
367	A @dfn{meta character} is a character typed with the @key{META}	453	A @dfn{meta character} is a character typed with the @key{META}
368	modifier key. The integer that represents such a character has the	454	modifier key. The integer that represents such a character has the
@@ -395,6 +481,9 @@ syntax for a character. Thus, you can write @kbd{M-A} as @samp{?\M-A},
395	or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as	481	or as @samp{?\M-\101}. Likewise, you can write @kbd{C-M-b} as
396	@samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}.	482	@samp{?\M-\C-b}, @samp{?\C-\M-b}, or @samp{?\M-\002}.
397		483
		484	@node Other Char Bits
		485	@subsubsection Other Character Modifier Bits
		486
398	The case of a graphic character is indicated by its character code;	487	The case of a graphic character is indicated by its character code;
399	for example, @acronym{ASCII} distinguishes between the characters @samp{a}	488	for example, @acronym{ASCII} distinguishes between the characters @samp{a}
400	and @samp{A}. But @acronym{ASCII} has no way to represent whether a control	489	and @samp{A}. But @acronym{ASCII} has no way to represent whether a control
@@ -431,64 +520,6 @@ Numerically, the
431	bit values are 222 for alt, 223 for super and 2**24 for hyper.	520	bit values are 222 for alt, 223 for super and 2**24 for hyper.
432	@end ifnottex	521	@end ifnottex
433		522
434	@cindex unicode character escape
435	Emacs provides a syntax for specifying characters by their Unicode
436	code points. @code{?\u@var{nnnn}} represents a character that maps to
437	the Unicode code point @samp{U+@var{nnnn}}. There is a slightly
438	different syntax for specifying characters with code points above
439	@code{#xFFFF}; @code{\U00@var{nnnnnn}} represents the character whose
440	Unicode code point is @samp{U+@var{nnnnnn}}, if such a character
441	is supported by Emacs. If the corresponding character is not
442	supported, Emacs signals an error.
443
444	This peculiar and inconvenient syntax was adopted for compatibility
445	with other programming languages. Unlike some other languages, Emacs
446	Lisp supports this syntax in only character literals and strings.
447
448	@cindex @samp{\} in character constant
449	@cindex backslash in character constant
450	@cindex octal character code
451	Finally, the most general read syntax for a character represents the
452	character code in either octal or hex. To use octal, write a question
453	mark followed by a backslash and the octal character code (up to three
454	octal digits); thus, @samp{?\101} for the character @kbd{A},
455	@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
456	character @kbd{C-b}. Although this syntax can represent any @acronym{ASCII}
457	character, it is preferred only when the precise octal value is more
458	important than the @acronym{ASCII} representation.
459
460	@example
461	@group
462	?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
463	?\101 @result{} 65 ?A @result{} 65
464	@end group
465	@end example
466
467	To use hex, write a question mark followed by a backslash, @samp{x},
468	and the hexadecimal character code. You can use any number of hex
469	digits, so you can represent any character code in this way.
470	Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
471	character @kbd{C-a}, and @code{?\x8e0} for the Latin-1 character
472	@iftex
473	@samp{@`a}.
474	@end iftex
475	@ifnottex
476	@samp{a} with grave accent.
477	@end ifnottex
478
479	A backslash is allowed, and harmless, preceding any character without
480	a special escape meaning; thus, @samp{?\+} is equivalent to @samp{?+}.
481	There is no reason to add a backslash before most characters. However,
482	you should add a backslash before any of the characters
483	@samp{()\\|;'`"#.,} to avoid confusing the Emacs commands for editing
484	Lisp code. You can also add a backslash before whitespace characters such as
485	space, tab, newline and formfeed. However, it is cleaner to use one of
486	the easily readable escape sequences, such as @samp{\t} or @samp{\s},
487	instead of an actual whitespace character such as a tab or a space.
488	(If you do write backslash followed by a space, you should write
489	an extra space after the character constant to separate it from the
490	following text.)
491
492	@node Symbol Type	523	@node Symbol Type
493	@subsection Symbol Type	524	@subsection Symbol Type
494		525