aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLuc Teirlinck2003-12-01 03:57:00 +0000
committerLuc Teirlinck2003-12-01 03:57:00 +0000
commitd4241ae4cb9018ba2ad40a852dba2c0b95dc30ab (patch)
treee25c30ba0a00b52773b32f8237c27ea6662499d6
parentd18473b95695efe4e942defafe88a85963f9925f (diff)
downloademacs-d4241ae4cb9018ba2ad40a852dba2c0b95dc30ab.tar.gz
emacs-d4241ae4cb9018ba2ad40a852dba2c0b95dc30ab.zip
(Non-ASCII in Strings): Clarify description of when a string is
unibyte or multibyte. (Bool-Vector Type): Update examples. (Equality Predicates): Correctly describe when two strings are `equal'.
-rw-r--r--lispref/objects.texi72
1 files changed, 44 insertions, 28 deletions
diff --git a/lispref/objects.texi b/lispref/objects.texi
index bee2db2974c..4c905cb969e 100644
--- a/lispref/objects.texi
+++ b/lispref/objects.texi
@@ -226,11 +226,12 @@ example, the character @kbd{A} is represented as the @w{integer 65}.
226common to work with @emph{strings}, which are sequences composed of 226common to work with @emph{strings}, which are sequences composed of
227characters. @xref{String Type}. 227characters. @xref{String Type}.
228 228
229 Characters in strings, buffers, and files are currently limited to the 229 Characters in strings, buffers, and files are currently limited to
230range of 0 to 524287---nineteen bits. But not all values in that range 230the range of 0 to 524287---nineteen bits. But not all values in that
231are valid character codes. Codes 0 through 127 are @acronym{ASCII} codes; the 231range are valid character codes. Codes 0 through 127 are
232rest are non-@acronym{ASCII} (@pxref{Non-ASCII Characters}). Characters that represent 232@acronym{ASCII} codes; the rest are non-@acronym{ASCII}
233keyboard input have a much wider range, to encode modifier keys such as 233(@pxref{Non-ASCII Characters}). Characters that represent keyboard
234input have a much wider range, to encode modifier keys such as
234Control, Meta and Shift. 235Control, Meta and Shift.
235 236
236@cindex read syntax for characters 237@cindex read syntax for characters
@@ -375,11 +376,11 @@ possible a wide range of basic character codes.
375@ifnottex 376@ifnottex
3762**7 3772**7
377@end ifnottex 378@end ifnottex
378bit attached to an @acronym{ASCII} character indicates a meta character; thus, the 379bit attached to an @acronym{ASCII} character indicates a meta
379meta characters that can fit in a string have codes in the range from 380character; thus, the meta characters that can fit in a string have
380128 to 255, and are the meta versions of the ordinary @acronym{ASCII} 381codes in the range from 128 to 255, and are the meta versions of the
381characters. (In Emacs versions 18 and older, this convention was used 382ordinary @acronym{ASCII} characters. (In Emacs versions 18 and older,
382for characters outside of strings as well.) 383this convention was used for characters outside of strings as well.)
383 384
384 The read syntax for meta characters uses @samp{\M-}. For example, 385 The read syntax for meta characters uses @samp{\M-}. For example,
385@samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with 386@samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with
@@ -416,8 +417,8 @@ significant in these prefixes.) Thus, @samp{?\H-\M-\A-x} represents
416@kbd{Alt-Hyper-Meta-x}. (Note that @samp{\s} with no following @samp{-} 417@kbd{Alt-Hyper-Meta-x}. (Note that @samp{\s} with no following @samp{-}
417represents the space character.) 418represents the space character.)
418@tex 419@tex
419Numerically, the 420Numerically, the bit values are @math{2^{22}} for alt, @math{2^{23}}
420bit values are @math{2^{22}} for alt, @math{2^{23}} for super and @math{2^{24}} for hyper. 421for super and @math{2^{24}} for hyper.
421@end tex 422@end tex
422@ifnottex 423@ifnottex
423Numerically, the 424Numerically, the
@@ -938,10 +939,13 @@ one character, @samp{a} with grave accent. @w{@samp{\ }} in a string
938constant is just like backslash-newline; it does not contribute any 939constant is just like backslash-newline; it does not contribute any
939character to the string, but it does terminate the preceding hex escape. 940character to the string, but it does terminate the preceding hex escape.
940 941
941 Using a multibyte hex escape forces the string to multibyte. You can 942 You can represent a unibyte non-@acronym{ASCII} character with its
942represent a unibyte non-@acronym{ASCII} character with its character code, 943character code, which must be in the range from 128 (0200 octal) to
943which must be in the range from 128 (0200 octal) to 255 (0377 octal). 944255 (0377 octal). If you write all such character codes in octal and
944This forces a unibyte string. 945the string contains no other characters forcing it to be multibyte,
946this produces a unibyte string. However, using any hex escape in a
947string (even for an @acronym{ASCII} character) forces the string to be
948multibyte.
945 949
946 @xref{Text Representations}, for more information about the two 950 @xref{Text Representations}, for more information about the two
947text representations. 951text representations.
@@ -963,9 +967,9 @@ distinguish case in @acronym{ASCII} control characters.
963 967
964 Properly speaking, strings cannot hold meta characters; but when a 968 Properly speaking, strings cannot hold meta characters; but when a
965string is to be used as a key sequence, there is a special convention 969string is to be used as a key sequence, there is a special convention
966that provides a way to represent meta versions of @acronym{ASCII} characters in a 970that provides a way to represent meta versions of @acronym{ASCII}
967string. If you use the @samp{\M-} syntax to indicate a meta character 971characters in a string. If you use the @samp{\M-} syntax to indicate
968in a string constant, this sets the 972a meta character in a string constant, this sets the
969@tex 973@tex
970@math{2^{7}} 974@math{2^{7}}
971@end tex 975@end tex
@@ -1082,16 +1086,25 @@ constant that follows actually specifies the contents of the bool-vector
1082as a bitmap---each ``character'' in the string contains 8 bits, which 1086as a bitmap---each ``character'' in the string contains 8 bits, which
1083specify the next 8 elements of the bool-vector (1 stands for @code{t}, 1087specify the next 8 elements of the bool-vector (1 stands for @code{t},
1084and 0 for @code{nil}). The least significant bits of the character 1088and 0 for @code{nil}). The least significant bits of the character
1085correspond to the lowest indices in the bool-vector. If the length is not a 1089correspond to the lowest indices in the bool-vector.
1086multiple of 8, the printed representation shows extra elements, but
1087these extras really make no difference.
1088 1090
1089@example 1091@example
1090(make-bool-vector 3 t) 1092(make-bool-vector 3 t)
1091 @result{} #&3"\007" 1093 @result{} #&3"^G"
1092(make-bool-vector 3 nil) 1094(make-bool-vector 3 nil)
1093 @result{} #&3"\0" 1095 @result{} #&3"^@@"
1094;; @r{These are equal since only the first 3 bits are used.} 1096@end example
1097
1098@noindent
1099These results make sense, because the binary code for @samp{C-g} is
1100111 and @samp{C-@@} is the character with code 0.
1101
1102 If the length is not a multiple of 8, the printed representation
1103shows extra elements, but these extras really make no difference. For
1104instance, in the next example, the two bool-vectors are equal, because
1105only the first 3 bits are used:
1106
1107@example
1095(equal #&3"\377" #&3"\007") 1108(equal #&3"\377" #&3"\007")
1096 @result{} t 1109 @result{} t
1097@end example 1110@end example
@@ -1875,9 +1888,12 @@ always true.
1875@end example 1888@end example
1876 1889
1877Comparison of strings is case-sensitive, but does not take account of 1890Comparison of strings is case-sensitive, but does not take account of
1878text properties---it compares only the characters in the strings. 1891text properties---it compares only the characters in the strings. For
1879A unibyte string never equals a multibyte string unless the 1892technical reasons, a unibyte string and a multibyte string are
1880contents are entirely @acronym{ASCII} (@pxref{Text Representations}). 1893@code{equal} if and only if they contain the same sequence of
1894character codes and all these codes are either in the range 0 through
1895127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}).
1896(@pxref{Text Representations}).
1881 1897
1882@example 1898@example
1883@group 1899@group