diff options
| author | Chong Yidong | 2009-02-22 00:22:46 +0000 |
|---|---|---|
| committer | Chong Yidong | 2009-02-22 00:22:46 +0000 |
| commit | 8f88eb24b234cb3ae794a4780e76f99498d0a154 (patch) | |
| tree | b352d6c3da7d7596eacdd01b37f58507607e3d8e | |
| parent | 5fbf8b28ee3bc4c1921eeaf2a33d64bd1888f024 (diff) | |
| download | emacs-8f88eb24b234cb3ae794a4780e76f99498d0a154.tar.gz emacs-8f88eb24b234cb3ae794a4780e76f99498d0a154.zip | |
(Creating Strings): Copyedits. Remove obsolete Emacs 20 usage of `concat'.
(Case Conversion): Copyedits.
| -rw-r--r-- | doc/lispref/strings.texi | 145 |
1 files changed, 67 insertions, 78 deletions
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index 0b53a7ae593..5dd5e802b89 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi | |||
| @@ -61,15 +61,13 @@ concerned with these two representations. | |||
| 61 | Sometimes key sequences are represented as unibyte strings. When a | 61 | Sometimes key sequences are represented as unibyte strings. When a |
| 62 | unibyte string is a key sequence, string elements in the range 128 to | 62 | unibyte string is a key sequence, string elements in the range 128 to |
| 63 | 255 represent meta characters (which are large integers) rather than | 63 | 255 represent meta characters (which are large integers) rather than |
| 64 | character codes in the range 128 to 255. | 64 | character codes in the range 128 to 255. Strings cannot hold |
| 65 | 65 | characters that have the hyper, super or alt modifiers; they can hold | |
| 66 | Strings cannot hold characters that have the hyper, super or alt | 66 | @acronym{ASCII} control characters, but no other control characters. |
| 67 | modifiers; they can hold @acronym{ASCII} control characters, but no other | 67 | They do not distinguish case in @acronym{ASCII} control characters. |
| 68 | control characters. They do not distinguish case in @acronym{ASCII} control | 68 | If you want to store such characters in a sequence, such as a key |
| 69 | characters. If you want to store such characters in a sequence, such as | 69 | sequence, you must use a vector instead of a string. @xref{Character |
| 70 | a key sequence, you must use a vector instead of a string. | 70 | Type}, for more information about keyboard input characters. |
| 71 | @xref{Character Type}, for more information about the representation of meta | ||
| 72 | and other modifiers for keyboard input characters. | ||
| 73 | 71 | ||
| 74 | Strings are useful for holding regular expressions. You can also | 72 | Strings are useful for holding regular expressions. You can also |
| 75 | match regular expressions against strings with @code{string-match} | 73 | match regular expressions against strings with @code{string-match} |
| @@ -155,11 +153,11 @@ index @var{start} up to (but excluding) the character at the index | |||
| 155 | @end example | 153 | @end example |
| 156 | 154 | ||
| 157 | @noindent | 155 | @noindent |
| 158 | Here the index for @samp{a} is 0, the index for @samp{b} is 1, and the | 156 | In the above example, the index for @samp{a} is 0, the index for |
| 159 | index for @samp{c} is 2. Thus, three letters, @samp{abc}, are copied | 157 | @samp{b} is 1, and the index for @samp{c} is 2. The index 3---which |
| 160 | from the string @code{"abcdefg"}. The index 3 marks the character | 158 | is the the fourth character in the string---marks the character |
| 161 | position up to which the substring is copied. The character whose index | 159 | position up to which the substring is copied. Thus, @samp{abc} is |
| 162 | is 3 is actually the fourth character in the string. | 160 | copied from the string @code{"abcdefg"}. |
| 163 | 161 | ||
| 164 | A negative number counts from the end of the string, so that @minus{}1 | 162 | A negative number counts from the end of the string, so that @minus{}1 |
| 165 | signifies the index of the last character of the string. For example: | 163 | signifies the index of the last character of the string. For example: |
| @@ -256,16 +254,9 @@ returns an empty string. | |||
| 256 | @end example | 254 | @end example |
| 257 | 255 | ||
| 258 | @noindent | 256 | @noindent |
| 259 | The @code{concat} function always constructs a new string that is | 257 | This function always constructs a new string that is not @code{eq} to |
| 260 | not @code{eq} to any existing string, except when the result is empty | 258 | any existing string, except when the result is the empty string (to |
| 261 | (since empty strings are canonicalized to save space). | 259 | save space, Emacs makes only one empty multibyte string). |
| 262 | |||
| 263 | In Emacs versions before 21, when an argument was an integer (not a | ||
| 264 | sequence of integers), it was converted to a string of digits making up | ||
| 265 | the decimal printed representation of the integer. This obsolete usage | ||
| 266 | no longer works. The proper way to convert an integer to its decimal | ||
| 267 | printed form is with @code{format} (@pxref{Formatting Strings}) or | ||
| 268 | @code{number-to-string} (@pxref{String Conversion}). | ||
| 269 | 260 | ||
| 270 | For information about other concatenation functions, see the | 261 | For information about other concatenation functions, see the |
| 271 | description of @code{mapconcat} in @ref{Mapping Functions}, | 262 | description of @code{mapconcat} in @ref{Mapping Functions}, |
| @@ -276,20 +267,19 @@ combine-and-quote-strings}. | |||
| 276 | @end defun | 267 | @end defun |
| 277 | 268 | ||
| 278 | @defun split-string string &optional separators omit-nulls | 269 | @defun split-string string &optional separators omit-nulls |
| 279 | This function splits @var{string} into substrings at matches for the | 270 | This function splits @var{string} into substrings based on the regular |
| 280 | regular expression @var{separators}. Each match for @var{separators} | 271 | expression @var{separators} (@pxref{Regular Expressions}). Each match |
| 281 | defines a splitting point; the substrings between the splitting points | 272 | for @var{separators} defines a splitting point; the substrings between |
| 282 | are made into a list, which is the value returned by | 273 | splitting points are made into a list, which is returned. |
| 283 | @code{split-string}. | ||
| 284 | 274 | ||
| 285 | If @var{omit-nulls} is @code{nil}, the result contains null strings | 275 | If @var{omit-nulls} is @code{nil} (or omitted), the result contains |
| 286 | whenever there are two consecutive matches for @var{separators}, or a | 276 | null strings whenever there are two consecutive matches for |
| 287 | match is adjacent to the beginning or end of @var{string}. If | 277 | @var{separators}, or a match is adjacent to the beginning or end of |
| 288 | @var{omit-nulls} is @code{t}, these null strings are omitted from the | 278 | @var{string}. If @var{omit-nulls} is @code{t}, these null strings are |
| 289 | result. | 279 | omitted from the result. |
| 290 | 280 | ||
| 291 | If @var{separators} is @code{nil} (or omitted), | 281 | If @var{separators} is @code{nil} (or omitted), the default is the |
| 292 | the default is the value of @code{split-string-default-separators}. | 282 | value of @code{split-string-default-separators}. |
| 293 | 283 | ||
| 294 | As a special case, when @var{separators} is @code{nil} (or omitted), | 284 | As a special case, when @var{separators} is @code{nil} (or omitted), |
| 295 | null strings are always omitted from the result. Thus: | 285 | null strings are always omitted from the result. Thus: |
| @@ -441,9 +431,9 @@ For technical reasons, a unibyte and a multibyte string are | |||
| 441 | @code{equal} if and only if they contain the same sequence of | 431 | @code{equal} if and only if they contain the same sequence of |
| 442 | character codes and all these codes are either in the range 0 through | 432 | character codes and all these codes are either in the range 0 through |
| 443 | 127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}). | 433 | 127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}). |
| 444 | However, when a unibyte string gets converted to a multibyte string, | 434 | However, when a unibyte string is converted to a multibyte string, all |
| 445 | all characters with codes in the range 160 through 255 get converted | 435 | characters with codes in the range 160 through 255 are converted to |
| 446 | to characters with higher codes, whereas @acronym{ASCII} characters | 436 | characters with higher codes, whereas @acronym{ASCII} characters |
| 447 | remain unchanged. Thus, a unibyte string and its conversion to | 437 | remain unchanged. Thus, a unibyte string and its conversion to |
| 448 | multibyte are only @code{equal} if the string is all @acronym{ASCII}. | 438 | multibyte are only @code{equal} if the string is all @acronym{ASCII}. |
| 449 | Character codes 160 through 255 are not entirely proper in multibyte | 439 | Character codes 160 through 255 are not entirely proper in multibyte |
| @@ -549,7 +539,7 @@ be a list of strings or symbols rather than an actual alist. | |||
| 549 | @xref{Association Lists}. | 539 | @xref{Association Lists}. |
| 550 | @end defun | 540 | @end defun |
| 551 | 541 | ||
| 552 | See also the @code{compare-buffer-substrings} function in | 542 | See also the function @code{compare-buffer-substrings} in |
| 553 | @ref{Comparing Text}, for a way to compare text in buffers. The | 543 | @ref{Comparing Text}, for a way to compare text in buffers. The |
| 554 | function @code{string-match}, which matches a regular expression | 544 | function @code{string-match}, which matches a regular expression |
| 555 | against a string, can be used for a kind of string comparison; see | 545 | against a string, can be used for a kind of string comparison; see |
| @@ -560,14 +550,14 @@ against a string, can be used for a kind of string comparison; see | |||
| 560 | @section Conversion of Characters and Strings | 550 | @section Conversion of Characters and Strings |
| 561 | @cindex conversion of strings | 551 | @cindex conversion of strings |
| 562 | 552 | ||
| 563 | This section describes functions for conversions between characters, | 553 | This section describes functions for converting between characters, |
| 564 | strings and integers. @code{format} (@pxref{Formatting Strings}) | 554 | strings and integers. @code{format} (@pxref{Formatting Strings}) and |
| 565 | and @code{prin1-to-string} | 555 | @code{prin1-to-string} (@pxref{Output Functions}) can also convert |
| 566 | (@pxref{Output Functions}) can also convert Lisp objects into strings. | 556 | Lisp objects into strings. @code{read-from-string} (@pxref{Input |
| 567 | @code{read-from-string} (@pxref{Input Functions}) can ``convert'' a | 557 | Functions}) can ``convert'' a string representation of a Lisp object |
| 568 | string representation of a Lisp object into an object. The functions | 558 | into an object. The functions @code{string-make-multibyte} and |
| 569 | @code{string-make-multibyte} and @code{string-make-unibyte} convert the | 559 | @code{string-make-unibyte} convert the text representation of a string |
| 570 | text representation of a string (@pxref{Converting Representations}). | 560 | (@pxref{Converting Representations}). |
| 571 | 561 | ||
| 572 | @xref{Documentation}, for functions that produce textual descriptions | 562 | @xref{Documentation}, for functions that produce textual descriptions |
| 573 | of text characters and general input events | 563 | of text characters and general input events |
| @@ -689,10 +679,10 @@ Functions}. | |||
| 689 | @cindex formatting strings | 679 | @cindex formatting strings |
| 690 | @cindex strings, formatting them | 680 | @cindex strings, formatting them |
| 691 | 681 | ||
| 692 | @dfn{Formatting} means constructing a string by substitution of | 682 | @dfn{Formatting} means constructing a string by substituting |
| 693 | computed values at various places in a constant string. This constant string | 683 | computed values at various places in a constant string. This constant |
| 694 | controls how the other values are printed, as well as where they appear; | 684 | string controls how the other values are printed, as well as where |
| 695 | it is called a @dfn{format string}. | 685 | they appear; it is called a @dfn{format string}. |
| 696 | 686 | ||
| 697 | Formatting is often useful for computing messages to be displayed. In | 687 | Formatting is often useful for computing messages to be displayed. In |
| 698 | fact, the functions @code{message} and @code{error} provide the same | 688 | fact, the functions @code{message} and @code{error} provide the same |
| @@ -936,15 +926,15 @@ arguments. | |||
| 936 | @acronym{ASCII} codes 88 and 120 respectively. | 926 | @acronym{ASCII} codes 88 and 120 respectively. |
| 937 | 927 | ||
| 938 | @defun downcase string-or-char | 928 | @defun downcase string-or-char |
| 939 | This function converts a character or a string to lower case. | 929 | This function converts @var{string-or-char}, which should be either a |
| 930 | character or a string, to lower case. | ||
| 940 | 931 | ||
| 941 | When the argument to @code{downcase} is a string, the function creates | 932 | When @var{string-or-char} is a string, this function returns a new |
| 942 | and returns a new string in which each letter in the argument that is | 933 | string in which each letter in the argument that is upper case is |
| 943 | upper case is converted to lower case. When the argument to | 934 | converted to lower case. When @var{string-or-char} is a character, |
| 944 | @code{downcase} is a character, @code{downcase} returns the | 935 | this function returns the corresponding lower case character (an |
| 945 | corresponding lower case character. This value is an integer. If the | 936 | integer); if the original character is lower case, or is not a letter, |
| 946 | original character is lower case, or is not a letter, then the value | 937 | the return value is equal to the original character. |
| 947 | equals the original character. | ||
| 948 | 938 | ||
| 949 | @example | 939 | @example |
| 950 | (downcase "The cat in the hat") | 940 | (downcase "The cat in the hat") |
| @@ -956,16 +946,15 @@ equals the original character. | |||
| 956 | @end defun | 946 | @end defun |
| 957 | 947 | ||
| 958 | @defun upcase string-or-char | 948 | @defun upcase string-or-char |
| 959 | This function converts a character or a string to upper case. | 949 | This function converts @var{string-or-char}, which should be either a |
| 960 | 950 | character or a string, to upper case. | |
| 961 | When the argument to @code{upcase} is a string, the function creates | ||
| 962 | and returns a new string in which each letter in the argument that is | ||
| 963 | lower case is converted to upper case. | ||
| 964 | 951 | ||
| 965 | When the argument to @code{upcase} is a character, @code{upcase} | 952 | When @var{string-or-char} is a string, this function returns a new |
| 966 | returns the corresponding upper case character. This value is an integer. | 953 | string in which each letter in the argument that is lower case is |
| 967 | If the original character is upper case, or is not a letter, then the | 954 | converted to upper case. When @var{string-or-char} is a character, |
| 968 | value returned equals the original character. | 955 | this function returns the corresponding upper case character (an an |
| 956 | integer); if the original character is upper case, or is not a letter, | ||
| 957 | the return value is equal to the original character. | ||
| 969 | 958 | ||
| 970 | @example | 959 | @example |
| 971 | (upcase "The cat in the hat") | 960 | (upcase "The cat in the hat") |
| @@ -979,9 +968,9 @@ value returned equals the original character. | |||
| 979 | @defun capitalize string-or-char | 968 | @defun capitalize string-or-char |
| 980 | @cindex capitalization | 969 | @cindex capitalization |
| 981 | This function capitalizes strings or characters. If | 970 | This function capitalizes strings or characters. If |
| 982 | @var{string-or-char} is a string, the function creates and returns a new | 971 | @var{string-or-char} is a string, the function returns a new string |
| 983 | string, whose contents are a copy of @var{string-or-char} in which each | 972 | whose contents are a copy of @var{string-or-char} in which each word |
| 984 | word has been capitalized. This means that the first character of each | 973 | has been capitalized. This means that the first character of each |
| 985 | word is converted to upper case, and the rest are converted to lower | 974 | word is converted to upper case, and the rest are converted to lower |
| 986 | case. | 975 | case. |
| 987 | 976 | ||
| @@ -989,8 +978,8 @@ The definition of a word is any sequence of consecutive characters that | |||
| 989 | are assigned to the word constituent syntax class in the current syntax | 978 | are assigned to the word constituent syntax class in the current syntax |
| 990 | table (@pxref{Syntax Class Table}). | 979 | table (@pxref{Syntax Class Table}). |
| 991 | 980 | ||
| 992 | When the argument to @code{capitalize} is a character, @code{capitalize} | 981 | When @var{string-or-char} is a character, this function does the same |
| 993 | has the same result as @code{upcase}. | 982 | thing as @code{upcase}. |
| 994 | 983 | ||
| 995 | @example | 984 | @example |
| 996 | @group | 985 | @group |
| @@ -1084,13 +1073,13 @@ equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into | |||
| 1084 | @samp{A} and @samp{A} into @samp{a}, and likewise for each set of | 1073 | @samp{A} and @samp{A} into @samp{a}, and likewise for each set of |
| 1085 | equivalent characters.) | 1074 | equivalent characters.) |
| 1086 | 1075 | ||
| 1087 | When you construct a case table, you can provide @code{nil} for | 1076 | When constructing a case table, you can provide @code{nil} for |
| 1088 | @var{canonicalize}; then Emacs fills in this slot from the lower case | 1077 | @var{canonicalize}; then Emacs fills in this slot from the lower case |
| 1089 | and upper case mappings. You can also provide @code{nil} for | 1078 | and upper case mappings. You can also provide @code{nil} for |
| 1090 | @var{equivalences}; then Emacs fills in this slot from | 1079 | @var{equivalences}; then Emacs fills in this slot from |
| 1091 | @var{canonicalize}. In a case table that is actually in use, those | 1080 | @var{canonicalize}. In a case table that is actually in use, those |
| 1092 | components are non-@code{nil}. Do not try to specify @var{equivalences} | 1081 | components are non-@code{nil}. Do not try to specify |
| 1093 | without also specifying @var{canonicalize}. | 1082 | @var{equivalences} without also specifying @var{canonicalize}. |
| 1094 | 1083 | ||
| 1095 | Here are the functions for working with case tables: | 1084 | Here are the functions for working with case tables: |
| 1096 | 1085 | ||
| @@ -1125,7 +1114,7 @@ of an abnormal exit via @code{throw} or error (@pxref{Nonlocal | |||
| 1125 | Exits}). | 1114 | Exits}). |
| 1126 | @end defmac | 1115 | @end defmac |
| 1127 | 1116 | ||
| 1128 | Some language environments may modify the case conversions of | 1117 | Some language environments modify the case conversions of |
| 1129 | @acronym{ASCII} characters; for example, in the Turkish language | 1118 | @acronym{ASCII} characters; for example, in the Turkish language |
| 1130 | environment, the @acronym{ASCII} character @samp{I} is downcased into | 1119 | environment, the @acronym{ASCII} character @samp{I} is downcased into |
| 1131 | a Turkish ``dotless i''. This can interfere with code that requires | 1120 | a Turkish ``dotless i''. This can interfere with code that requires |