diff options
| -rw-r--r-- | doc/lispref/searching.texi | 52 |
1 files changed, 31 insertions, 21 deletions
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index 748ab586af9..72ee9233a3c 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi | |||
| @@ -398,17 +398,11 @@ range should not be the starting point of another one; for example, | |||
| 398 | The usual regexp special characters are not special inside a | 398 | The usual regexp special characters are not special inside a |
| 399 | character alternative. A completely different set of characters is | 399 | character alternative. A completely different set of characters is |
| 400 | special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. | 400 | special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. |
| 401 | 401 | To include @samp{]} in a character alternative, put it at the | |
| 402 | To include a @samp{]} in a character alternative, you must make it the first | 402 | beginning. To include @samp{^}, put it anywhere but at the beginning. |
| 403 | character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include | 403 | To include @samp{-}, put it at the end. Thus, @samp{[]^-]} matches |
| 404 | a @samp{-}, write @samp{-} as the last character of the character alternative, | 404 | all three of these special characters. You cannot use @samp{\} to |
| 405 | tho you can also put it first or after a range. Thus, @samp{[]-]} matches both | 405 | escape these three characters, since @samp{\} is not special here. |
| 406 | @samp{]} and @samp{-}. (As explained below, you cannot use @samp{\]} to | ||
| 407 | include a @samp{]} inside a character alternative, since @samp{\} is not | ||
| 408 | special there.) | ||
| 409 | |||
| 410 | To include @samp{^} in a character alternative, put it anywhere but at | ||
| 411 | the beginning. | ||
| 412 | 406 | ||
| 413 | The following aspects of ranges are specific to Emacs, in that POSIX | 407 | The following aspects of ranges are specific to Emacs, in that POSIX |
| 414 | allows but does not require this behavior and programs other than | 408 | allows but does not require this behavior and programs other than |
| @@ -426,17 +420,33 @@ of its bounds, so that @samp{[a-z]} matches only ASCII letters, even | |||
| 426 | outside the C or POSIX locale. | 420 | outside the C or POSIX locale. |
| 427 | 421 | ||
| 428 | @item | 422 | @item |
| 429 | As a special case, if either bound of a range is a raw 8-bit byte, the | 423 | If the lower bound of a range is greater than its upper bound, the |
| 430 | other bound should be a unibyte character, and the range matches only | 424 | range is empty and represents no characters. Thus, @samp{[z-a]} |
| 431 | unibyte characters. | 425 | always fails to match, and @samp{[^z-a]} matches any character, |
| 426 | including newline. However, a reversed range should always be from | ||
| 427 | the letter @samp{z} to the letter @samp{a} to make it clear that it is | ||
| 428 | not a typo; for example, @samp{[+-*/]} should be avoided, because it | ||
| 429 | matches only @samp{/} rather than the likely-intended four characters. | ||
| 430 | @end enumerate | ||
| 431 | |||
| 432 | Some kinds of character alternatives are not the best style even | ||
| 433 | though they are standardized by POSIX and are portable. They include: | ||
| 432 | 434 | ||
| 435 | @enumerate | ||
| 433 | @item | 436 | @item |
| 434 | If the lower bound of a range is greater than its upper bound, the | 437 | A character alternative can include duplicates. For example, |
| 435 | range is empty and represents no characters. Thus, @samp{[b-a]} | 438 | @samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}. |
| 436 | always fails to match, and @samp{[^b-a]} matches any character, | 439 | |
| 437 | including newline. However, the lower bound should be at most one | 440 | @item |
| 438 | greater than the upper bound; for example, @samp{[c-a]} should be | 441 | A range can denote just one, two, or three characters. For example, |
| 439 | avoided. | 442 | @samp{[(-(]} is less clear than @samp{[(]}, @samp{[*-+]} is less clear |
| 443 | than @samp{[*+]}, and @samp{[*-,]} is less clear than @samp{[*+,]}. | ||
| 444 | |||
| 445 | @item | ||
| 446 | A @samp{-} also appear at the beginning of a character alternative, or | ||
| 447 | as the upper bound of a range. For example, although @samp{[-a-z]} is | ||
| 448 | valid, @samp{[a-z-]} is better style; and although @samp{[!--/]} is | ||
| 449 | valid, @samp{[!-,/-]} is clearer. | ||
| 440 | @end enumerate | 450 | @end enumerate |
| 441 | 451 | ||
| 442 | A character alternative can also specify named character classes | 452 | A character alternative can also specify named character classes |
| @@ -452,7 +462,7 @@ of a range. | |||
| 452 | @cindex @samp{^} in regexp | 462 | @cindex @samp{^} in regexp |
| 453 | @samp{[^} begins a @dfn{complemented character alternative}. This | 463 | @samp{[^} begins a @dfn{complemented character alternative}. This |
| 454 | matches any character except the ones specified. Thus, | 464 | matches any character except the ones specified. Thus, |
| 455 | @samp{[^a-z0-9A-Z]} matches all characters @emph{except} letters and | 465 | @samp{[^a-z0-9A-Z]} matches all characters @emph{except} ASCII letters and |
| 456 | digits. | 466 | digits. |
| 457 | 467 | ||
| 458 | @samp{^} is not special in a character alternative unless it is the first | 468 | @samp{^} is not special in a character alternative unless it is the first |