aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLuc Teirlinck2006-03-07 23:28:33 +0000
committerLuc Teirlinck2006-03-07 23:28:33 +0000
commit179a6f216dc7a2f4dc7f490ea5c84953201d43d8 (patch)
tree03e8a38c2a1f4e5a7926c63815f83898a635c96c
parent7b2c2ca9de950309ecf11d23cd859a39821c0a20 (diff)
downloademacs-179a6f216dc7a2f4dc7f490ea5c84953201d43d8.tar.gz
emacs-179a6f216dc7a2f4dc7f490ea5c84953201d43d8.zip
(Syntax of Regexps): More accurately describe
which characters are special in which situations. (Regexp Special): Recommend _not_ to quote `]' or `-' when they are not special. Describe in detail when `[' and `]' are special. (Regexp Backslash): Plenty of regexps with unbalanced square brackets are valid, so reword that statement.
-rw-r--r--lispref/searching.texi47
1 files changed, 39 insertions, 8 deletions
diff --git a/lispref/searching.texi b/lispref/searching.texi
index 7c10ed6881b..b45467fbf83 100644
--- a/lispref/searching.texi
+++ b/lispref/searching.texi
@@ -235,12 +235,15 @@ it easier to verify even very complex regexps.
235 235
236 Regular expressions have a syntax in which a few characters are 236 Regular expressions have a syntax in which a few characters are
237special constructs and the rest are @dfn{ordinary}. An ordinary 237special constructs and the rest are @dfn{ordinary}. An ordinary
238character is a simple regular expression that matches that character and 238character is a simple regular expression that matches that character
239nothing else. The special characters are @samp{.}, @samp{*}, @samp{+}, 239and nothing else. The special characters are @samp{.}, @samp{*},
240@samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new 240@samp{+}, @samp{?}, @samp{[}, @samp{^}, @samp{$}, and @samp{\}; no new
241special characters will be defined in the future. Any other character 241special characters will be defined in the future. The character
242appearing in a regular expression is ordinary, unless a @samp{\} 242@samp{]} is special if it ends a character alternative (see later).
243precedes it. 243The character @samp{-} is special inside a character alternative. A
244@samp{[:} and balancing @samp{:]} enclose a character class inside a
245character alternative. Any other character appearing in a regular
246expression is ordinary, unless a @samp{\} precedes it.
244 247
245 For example, @samp{f} is not a special character, so it is ordinary, and 248 For example, @samp{f} is not a special character, so it is ordinary, and
246therefore @samp{f} is a regular expression that matches the string 249therefore @samp{f} is a regular expression that matches the string
@@ -468,6 +471,34 @@ ordinary since there is no preceding expression on which the @samp{*}
468can act. It is poor practice to depend on this behavior; quote the 471can act. It is poor practice to depend on this behavior; quote the
469special character anyway, regardless of where it appears.@refill 472special character anyway, regardless of where it appears.@refill
470 473
474As a @samp{\} is not special inside a character alternative, it can
475never remove the special meaning of @samp{-} or @samp{]}. So you
476should not quote these characters when they have no special meaning
477either. This would not clarify anything, since backslashes can
478legitimately precede these characters where they @emph{have} special
479meaning, as in @code{[^\]} (@code{"[^\\]"} for Lisp string syntax),
480which matches any single character except a backslash.
481
482In practice, most @samp{]} that occur in regular expressions close a
483character alternative and hence are special. However, occasionally a
484regular expression may try to match a complex pattern of literal
485@samp{[} and @samp{]}. In such situations, it sometimes may be
486necessary to carefully parse the regexp from the start to determine
487which square brackets enclose a character alternative. For example,
488@code{[^][]]}, consists of the complemented character alternative
489@code{[^][]}, which matches any single character that is not a square
490bracket, followed by a literal @samp{]}.
491
492The exact rules are that at the beginning of a regexp, @samp{[} is
493special and @samp{]} not. This lasts until the first unquoted
494@samp{[}, after which we are in a character alternative; @samp{[} is
495no longer special (except when it starts a character class) but @samp{]}
496is special, unless it immediately follows the special @samp{[} or that
497@samp{[} followed by a @samp{^}. This lasts until the next special
498@samp{]} that does not end a character class. This ends the character
499alternative and restores the ordinary syntax of regular expressions;
500an unquoted @samp{[} is special again and a @samp{]} not.
501
471@node Char Classes 502@node Char Classes
472@subsubsection Character Classes 503@subsubsection Character Classes
473@cindex character classes in regexp 504@cindex character classes in regexp
@@ -740,8 +771,8 @@ with a symbol-constituent character.
740 771
741@kindex invalid-regexp 772@kindex invalid-regexp
742 Not every string is a valid regular expression. For example, a string 773 Not every string is a valid regular expression. For example, a string
743with unbalanced square brackets is invalid (with a few exceptions, such 774that ends inside a character alternative without terminating @samp{]}
744as @samp{[]]}), and so is a string that ends with a single @samp{\}. If 775is invalid, and so is a string that ends with a single @samp{\}. If
745an invalid regular expression is passed to any of the search functions, 776an invalid regular expression is passed to any of the search functions,
746an @code{invalid-regexp} error is signaled. 777an @code{invalid-regexp} error is signaled.
747 778