diff options
| author | Chong Yidong | 2010-06-02 13:26:31 -0400 |
|---|---|---|
| committer | Chong Yidong | 2010-06-02 13:26:31 -0400 |
| commit | ba3bf1d9519b4f5aafba9a87109ef1e7a29f7fcc (patch) | |
| tree | c5fd5f0eceba7b0d82e511d0cac4823458d86047 | |
| parent | 2c3a3c1d035e06250db54fb17ee7aec6b7c2c70a (diff) | |
| download | emacs-ba3bf1d9519b4f5aafba9a87109ef1e7a29f7fcc.tar.gz emacs-ba3bf1d9519b4f5aafba9a87109ef1e7a29f7fcc.zip | |
Better doc fix for Bug#6283.
searching.texi (Regexp Special): Remove obsolete information
about matching non-ASCII characters, and suggest using char
classes (Bug#6283).
| -rw-r--r-- | doc/lispref/ChangeLog | 5 | ||||
| -rw-r--r-- | doc/lispref/searching.texi | 26 |
2 files changed, 13 insertions, 18 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index b871c442804..281f3e9ad7c 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog | |||
| @@ -1,7 +1,8 @@ | |||
| 1 | 2010-06-02 Chong Yidong <cyd@stupidchicken.com> | 1 | 2010-06-02 Chong Yidong <cyd@stupidchicken.com> |
| 2 | 2 | ||
| 3 | * searching.texi (Regexp Special): Replace "octal 377" | 3 | * searching.texi (Regexp Special): Remove obsolete information |
| 4 | with "#o377" (Bug#6283). | 4 | about matching non-ASCII characters, and suggest using char |
| 5 | classes (Bug#6283). | ||
| 5 | 6 | ||
| 6 | 2010-05-30 Juanma Barranquero <lekktu@gmail.com> | 7 | 2010-05-30 Juanma Barranquero <lekktu@gmail.com> |
| 7 | 8 | ||
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index d1e8c549679..722f76cdd7f 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi | |||
| @@ -362,7 +362,7 @@ the two brackets are what this character alternative can match. | |||
| 362 | 362 | ||
| 363 | Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and | 363 | Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and |
| 364 | @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s | 364 | @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s |
| 365 | (including the empty string), from which it follows that @samp{c[ad]*r} | 365 | (including the empty string). It follows that @samp{c[ad]*r} |
| 366 | matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. | 366 | matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. |
| 367 | 367 | ||
| 368 | You can also include character ranges in a character alternative, by | 368 | You can also include character ranges in a character alternative, by |
| @@ -400,21 +400,11 @@ is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where | |||
| 400 | @var{c1} is the first character of the charset to which @var{c2} | 400 | @var{c1} is the first character of the charset to which @var{c2} |
| 401 | belongs. | 401 | belongs. |
| 402 | 402 | ||
| 403 | You cannot always match all non-@acronym{ASCII} characters with the | 403 | A character alternative can also specify named character classes |
| 404 | regular expression @code{"[\200-\377]"}. This works when searching a | 404 | (@pxref{Char Classes}). This is a POSIX feature whose syntax is |
| 405 | unibyte buffer or string (@pxref{Text Representations}), but not in a | 405 | @samp{[:@var{class}:]}. Using a character class is equivalent to |
| 406 | multibyte buffer or string, because many non-@acronym{ASCII} | 406 | mentioning each of the characters in that class; but the latter is not |
| 407 | characters have codes above @code{#o377}. However, the regular | 407 | feasible in practice, since some classes include thousands of |
| 408 | expression @code{"[^\000-\177]"} does match all non-@acronym{ASCII} | ||
| 409 | characters (see below regarding @samp{^}), in both multibyte and | ||
| 410 | unibyte representations, because only the @acronym{ASCII} characters | ||
| 411 | are excluded. | ||
| 412 | |||
| 413 | A character alternative can also specify named | ||
| 414 | character classes (@pxref{Char Classes}). This is a POSIX feature whose | ||
| 415 | syntax is @samp{[:@var{class}:]}. Using a character class is equivalent | ||
| 416 | to mentioning each of the characters in that class; but the latter is | ||
| 417 | not feasible in practice, since some classes include thousands of | ||
| 418 | different characters. | 408 | different characters. |
| 419 | 409 | ||
| 420 | @item @samp{[^ @dots{} ]} | 410 | @item @samp{[^ @dots{} ]} |
| @@ -432,6 +422,10 @@ A complemented character alternative can match a newline, unless newline is | |||
| 432 | mentioned as one of the characters not to match. This is in contrast to | 422 | mentioned as one of the characters not to match. This is in contrast to |
| 433 | the handling of regexps in programs such as @code{grep}. | 423 | the handling of regexps in programs such as @code{grep}. |
| 434 | 424 | ||
| 425 | You can specify named character classes, just like in character | ||
| 426 | alternatives. For instance, @samp{[^[:ascii:]]} matches any | ||
| 427 | non-@acronym{ASCII} character. @xref{Char Classes}. | ||
| 428 | |||
| 435 | @item @samp{^} | 429 | @item @samp{^} |
| 436 | @cindex beginning of line in regexp | 430 | @cindex beginning of line in regexp |
| 437 | When matching a buffer, @samp{^} matches the empty string, but only at the | 431 | When matching a buffer, @samp{^} matches the empty string, but only at the |