aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorChong Yidong2010-06-02 13:26:31 -0400
committerChong Yidong2010-06-02 13:26:31 -0400
commitba3bf1d9519b4f5aafba9a87109ef1e7a29f7fcc (patch)
treec5fd5f0eceba7b0d82e511d0cac4823458d86047
parent2c3a3c1d035e06250db54fb17ee7aec6b7c2c70a (diff)
downloademacs-ba3bf1d9519b4f5aafba9a87109ef1e7a29f7fcc.tar.gz
emacs-ba3bf1d9519b4f5aafba9a87109ef1e7a29f7fcc.zip
Better doc fix for Bug#6283.
searching.texi (Regexp Special): Remove obsolete information about matching non-ASCII characters, and suggest using char classes (Bug#6283).
-rw-r--r--doc/lispref/ChangeLog5
-rw-r--r--doc/lispref/searching.texi26
2 files changed, 13 insertions, 18 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index b871c442804..281f3e9ad7c 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,7 +1,8 @@
12010-06-02 Chong Yidong <cyd@stupidchicken.com> 12010-06-02 Chong Yidong <cyd@stupidchicken.com>
2 2
3 * searching.texi (Regexp Special): Replace "octal 377" 3 * searching.texi (Regexp Special): Remove obsolete information
4 with "#o377" (Bug#6283). 4 about matching non-ASCII characters, and suggest using char
5 classes (Bug#6283).
5 6
62010-05-30 Juanma Barranquero <lekktu@gmail.com> 72010-05-30 Juanma Barranquero <lekktu@gmail.com>
7 8
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index d1e8c549679..722f76cdd7f 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -362,7 +362,7 @@ the two brackets are what this character alternative can match.
362 362
363Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and 363Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
364@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s 364@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
365(including the empty string), from which it follows that @samp{c[ad]*r} 365(including the empty string). It follows that @samp{c[ad]*r}
366matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. 366matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
367 367
368You can also include character ranges in a character alternative, by 368You can also include character ranges in a character alternative, by
@@ -400,21 +400,11 @@ is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where
400@var{c1} is the first character of the charset to which @var{c2} 400@var{c1} is the first character of the charset to which @var{c2}
401belongs. 401belongs.
402 402
403You cannot always match all non-@acronym{ASCII} characters with the 403A character alternative can also specify named character classes
404regular expression @code{"[\200-\377]"}. This works when searching a 404(@pxref{Char Classes}). This is a POSIX feature whose syntax is
405unibyte buffer or string (@pxref{Text Representations}), but not in a 405@samp{[:@var{class}:]}. Using a character class is equivalent to
406multibyte buffer or string, because many non-@acronym{ASCII} 406mentioning each of the characters in that class; but the latter is not
407characters have codes above @code{#o377}. However, the regular 407feasible in practice, since some classes include thousands of
408expression @code{"[^\000-\177]"} does match all non-@acronym{ASCII}
409characters (see below regarding @samp{^}), in both multibyte and
410unibyte representations, because only the @acronym{ASCII} characters
411are excluded.
412
413A character alternative can also specify named
414character classes (@pxref{Char Classes}). This is a POSIX feature whose
415syntax is @samp{[:@var{class}:]}. Using a character class is equivalent
416to mentioning each of the characters in that class; but the latter is
417not feasible in practice, since some classes include thousands of
418different characters. 408different characters.
419 409
420@item @samp{[^ @dots{} ]} 410@item @samp{[^ @dots{} ]}
@@ -432,6 +422,10 @@ A complemented character alternative can match a newline, unless newline is
432mentioned as one of the characters not to match. This is in contrast to 422mentioned as one of the characters not to match. This is in contrast to
433the handling of regexps in programs such as @code{grep}. 423the handling of regexps in programs such as @code{grep}.
434 424
425You can specify named character classes, just like in character
426alternatives. For instance, @samp{[^[:ascii:]]} matches any
427non-@acronym{ASCII} character. @xref{Char Classes}.
428
435@item @samp{^} 429@item @samp{^}
436@cindex beginning of line in regexp 430@cindex beginning of line in regexp
437When matching a buffer, @samp{^} matches the empty string, but only at the 431When matching a buffer, @samp{^} matches the empty string, but only at the