diff options
| author | Luc Teirlinck | 2003-12-01 03:57:00 +0000 |
|---|---|---|
| committer | Luc Teirlinck | 2003-12-01 03:57:00 +0000 |
| commit | d4241ae4cb9018ba2ad40a852dba2c0b95dc30ab (patch) | |
| tree | e25c30ba0a00b52773b32f8237c27ea6662499d6 | |
| parent | d18473b95695efe4e942defafe88a85963f9925f (diff) | |
| download | emacs-d4241ae4cb9018ba2ad40a852dba2c0b95dc30ab.tar.gz emacs-d4241ae4cb9018ba2ad40a852dba2c0b95dc30ab.zip | |
(Non-ASCII in Strings): Clarify description of when a string is
unibyte or multibyte.
(Bool-Vector Type): Update examples.
(Equality Predicates): Correctly describe when two strings are `equal'.
| -rw-r--r-- | lispref/objects.texi | 72 |
1 files changed, 44 insertions, 28 deletions
diff --git a/lispref/objects.texi b/lispref/objects.texi index bee2db2974c..4c905cb969e 100644 --- a/lispref/objects.texi +++ b/lispref/objects.texi | |||
| @@ -226,11 +226,12 @@ example, the character @kbd{A} is represented as the @w{integer 65}. | |||
| 226 | common to work with @emph{strings}, which are sequences composed of | 226 | common to work with @emph{strings}, which are sequences composed of |
| 227 | characters. @xref{String Type}. | 227 | characters. @xref{String Type}. |
| 228 | 228 | ||
| 229 | Characters in strings, buffers, and files are currently limited to the | 229 | Characters in strings, buffers, and files are currently limited to |
| 230 | range of 0 to 524287---nineteen bits. But not all values in that range | 230 | the range of 0 to 524287---nineteen bits. But not all values in that |
| 231 | are valid character codes. Codes 0 through 127 are @acronym{ASCII} codes; the | 231 | range are valid character codes. Codes 0 through 127 are |
| 232 | rest are non-@acronym{ASCII} (@pxref{Non-ASCII Characters}). Characters that represent | 232 | @acronym{ASCII} codes; the rest are non-@acronym{ASCII} |
| 233 | keyboard input have a much wider range, to encode modifier keys such as | 233 | (@pxref{Non-ASCII Characters}). Characters that represent keyboard |
| 234 | input have a much wider range, to encode modifier keys such as | ||
| 234 | Control, Meta and Shift. | 235 | Control, Meta and Shift. |
| 235 | 236 | ||
| 236 | @cindex read syntax for characters | 237 | @cindex read syntax for characters |
| @@ -375,11 +376,11 @@ possible a wide range of basic character codes. | |||
| 375 | @ifnottex | 376 | @ifnottex |
| 376 | 2**7 | 377 | 2**7 |
| 377 | @end ifnottex | 378 | @end ifnottex |
| 378 | bit attached to an @acronym{ASCII} character indicates a meta character; thus, the | 379 | bit attached to an @acronym{ASCII} character indicates a meta |
| 379 | meta characters that can fit in a string have codes in the range from | 380 | character; thus, the meta characters that can fit in a string have |
| 380 | 128 to 255, and are the meta versions of the ordinary @acronym{ASCII} | 381 | codes in the range from 128 to 255, and are the meta versions of the |
| 381 | characters. (In Emacs versions 18 and older, this convention was used | 382 | ordinary @acronym{ASCII} characters. (In Emacs versions 18 and older, |
| 382 | for characters outside of strings as well.) | 383 | this convention was used for characters outside of strings as well.) |
| 383 | 384 | ||
| 384 | The read syntax for meta characters uses @samp{\M-}. For example, | 385 | The read syntax for meta characters uses @samp{\M-}. For example, |
| 385 | @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with | 386 | @samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with |
| @@ -416,8 +417,8 @@ significant in these prefixes.) Thus, @samp{?\H-\M-\A-x} represents | |||
| 416 | @kbd{Alt-Hyper-Meta-x}. (Note that @samp{\s} with no following @samp{-} | 417 | @kbd{Alt-Hyper-Meta-x}. (Note that @samp{\s} with no following @samp{-} |
| 417 | represents the space character.) | 418 | represents the space character.) |
| 418 | @tex | 419 | @tex |
| 419 | Numerically, the | 420 | Numerically, the bit values are @math{2^{22}} for alt, @math{2^{23}} |
| 420 | bit values are @math{2^{22}} for alt, @math{2^{23}} for super and @math{2^{24}} for hyper. | 421 | for super and @math{2^{24}} for hyper. |
| 421 | @end tex | 422 | @end tex |
| 422 | @ifnottex | 423 | @ifnottex |
| 423 | Numerically, the | 424 | Numerically, the |
| @@ -938,10 +939,13 @@ one character, @samp{a} with grave accent. @w{@samp{\ }} in a string | |||
| 938 | constant is just like backslash-newline; it does not contribute any | 939 | constant is just like backslash-newline; it does not contribute any |
| 939 | character to the string, but it does terminate the preceding hex escape. | 940 | character to the string, but it does terminate the preceding hex escape. |
| 940 | 941 | ||
| 941 | Using a multibyte hex escape forces the string to multibyte. You can | 942 | You can represent a unibyte non-@acronym{ASCII} character with its |
| 942 | represent a unibyte non-@acronym{ASCII} character with its character code, | 943 | character code, which must be in the range from 128 (0200 octal) to |
| 943 | which must be in the range from 128 (0200 octal) to 255 (0377 octal). | 944 | 255 (0377 octal). If you write all such character codes in octal and |
| 944 | This forces a unibyte string. | 945 | the string contains no other characters forcing it to be multibyte, |
| 946 | this produces a unibyte string. However, using any hex escape in a | ||
| 947 | string (even for an @acronym{ASCII} character) forces the string to be | ||
| 948 | multibyte. | ||
| 945 | 949 | ||
| 946 | @xref{Text Representations}, for more information about the two | 950 | @xref{Text Representations}, for more information about the two |
| 947 | text representations. | 951 | text representations. |
| @@ -963,9 +967,9 @@ distinguish case in @acronym{ASCII} control characters. | |||
| 963 | 967 | ||
| 964 | Properly speaking, strings cannot hold meta characters; but when a | 968 | Properly speaking, strings cannot hold meta characters; but when a |
| 965 | string is to be used as a key sequence, there is a special convention | 969 | string is to be used as a key sequence, there is a special convention |
| 966 | that provides a way to represent meta versions of @acronym{ASCII} characters in a | 970 | that provides a way to represent meta versions of @acronym{ASCII} |
| 967 | string. If you use the @samp{\M-} syntax to indicate a meta character | 971 | characters in a string. If you use the @samp{\M-} syntax to indicate |
| 968 | in a string constant, this sets the | 972 | a meta character in a string constant, this sets the |
| 969 | @tex | 973 | @tex |
| 970 | @math{2^{7}} | 974 | @math{2^{7}} |
| 971 | @end tex | 975 | @end tex |
| @@ -1082,16 +1086,25 @@ constant that follows actually specifies the contents of the bool-vector | |||
| 1082 | as a bitmap---each ``character'' in the string contains 8 bits, which | 1086 | as a bitmap---each ``character'' in the string contains 8 bits, which |
| 1083 | specify the next 8 elements of the bool-vector (1 stands for @code{t}, | 1087 | specify the next 8 elements of the bool-vector (1 stands for @code{t}, |
| 1084 | and 0 for @code{nil}). The least significant bits of the character | 1088 | and 0 for @code{nil}). The least significant bits of the character |
| 1085 | correspond to the lowest indices in the bool-vector. If the length is not a | 1089 | correspond to the lowest indices in the bool-vector. |
| 1086 | multiple of 8, the printed representation shows extra elements, but | ||
| 1087 | these extras really make no difference. | ||
| 1088 | 1090 | ||
| 1089 | @example | 1091 | @example |
| 1090 | (make-bool-vector 3 t) | 1092 | (make-bool-vector 3 t) |
| 1091 | @result{} #&3"\007" | 1093 | @result{} #&3"^G" |
| 1092 | (make-bool-vector 3 nil) | 1094 | (make-bool-vector 3 nil) |
| 1093 | @result{} #&3"\0" | 1095 | @result{} #&3"^@@" |
| 1094 | ;; @r{These are equal since only the first 3 bits are used.} | 1096 | @end example |
| 1097 | |||
| 1098 | @noindent | ||
| 1099 | These results make sense, because the binary code for @samp{C-g} is | ||
| 1100 | 111 and @samp{C-@@} is the character with code 0. | ||
| 1101 | |||
| 1102 | If the length is not a multiple of 8, the printed representation | ||
| 1103 | shows extra elements, but these extras really make no difference. For | ||
| 1104 | instance, in the next example, the two bool-vectors are equal, because | ||
| 1105 | only the first 3 bits are used: | ||
| 1106 | |||
| 1107 | @example | ||
| 1095 | (equal #&3"\377" #&3"\007") | 1108 | (equal #&3"\377" #&3"\007") |
| 1096 | @result{} t | 1109 | @result{} t |
| 1097 | @end example | 1110 | @end example |
| @@ -1875,9 +1888,12 @@ always true. | |||
| 1875 | @end example | 1888 | @end example |
| 1876 | 1889 | ||
| 1877 | Comparison of strings is case-sensitive, but does not take account of | 1890 | Comparison of strings is case-sensitive, but does not take account of |
| 1878 | text properties---it compares only the characters in the strings. | 1891 | text properties---it compares only the characters in the strings. For |
| 1879 | A unibyte string never equals a multibyte string unless the | 1892 | technical reasons, a unibyte string and a multibyte string are |
| 1880 | contents are entirely @acronym{ASCII} (@pxref{Text Representations}). | 1893 | @code{equal} if and only if they contain the same sequence of |
| 1894 | character codes and all these codes are either in the range 0 through | ||
| 1895 | 127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}). | ||
| 1896 | (@pxref{Text Representations}). | ||
| 1881 | 1897 | ||
| 1882 | @example | 1898 | @example |
| 1883 | @group | 1899 | @group |