diff options
| author | Luc Teirlinck | 2006-03-07 23:28:33 +0000 |
|---|---|---|
| committer | Luc Teirlinck | 2006-03-07 23:28:33 +0000 |
| commit | 179a6f216dc7a2f4dc7f490ea5c84953201d43d8 (patch) | |
| tree | 03e8a38c2a1f4e5a7926c63815f83898a635c96c | |
| parent | 7b2c2ca9de950309ecf11d23cd859a39821c0a20 (diff) | |
| download | emacs-179a6f216dc7a2f4dc7f490ea5c84953201d43d8.tar.gz emacs-179a6f216dc7a2f4dc7f490ea5c84953201d43d8.zip | |
(Syntax of Regexps): More accurately describe
which characters are special in which situations.
(Regexp Special): Recommend _not_ to quote `]' or `-' when they
are not special. Describe in detail when `[' and `]' are special.
(Regexp Backslash): Plenty of regexps with unbalanced square
brackets are valid, so reword that statement.
| -rw-r--r-- | lispref/searching.texi | 47 |
1 files changed, 39 insertions, 8 deletions
diff --git a/lispref/searching.texi b/lispref/searching.texi index 7c10ed6881b..b45467fbf83 100644 --- a/lispref/searching.texi +++ b/lispref/searching.texi | |||
| @@ -235,12 +235,15 @@ it easier to verify even very complex regexps. | |||
| 235 | 235 | ||
| 236 | Regular expressions have a syntax in which a few characters are | 236 | Regular expressions have a syntax in which a few characters are |
| 237 | special constructs and the rest are @dfn{ordinary}. An ordinary | 237 | special constructs and the rest are @dfn{ordinary}. An ordinary |
| 238 | character is a simple regular expression that matches that character and | 238 | character is a simple regular expression that matches that character |
| 239 | nothing else. The special characters are @samp{.}, @samp{*}, @samp{+}, | 239 | and nothing else. The special characters are @samp{.}, @samp{*}, |
| 240 | @samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new | 240 | @samp{+}, @samp{?}, @samp{[}, @samp{^}, @samp{$}, and @samp{\}; no new |
| 241 | special characters will be defined in the future. Any other character | 241 | special characters will be defined in the future. The character |
| 242 | appearing in a regular expression is ordinary, unless a @samp{\} | 242 | @samp{]} is special if it ends a character alternative (see later). |
| 243 | precedes it. | 243 | The character @samp{-} is special inside a character alternative. A |
| 244 | @samp{[:} and balancing @samp{:]} enclose a character class inside a | ||
| 245 | character alternative. Any other character appearing in a regular | ||
| 246 | expression is ordinary, unless a @samp{\} precedes it. | ||
| 244 | 247 | ||
| 245 | For example, @samp{f} is not a special character, so it is ordinary, and | 248 | For example, @samp{f} is not a special character, so it is ordinary, and |
| 246 | therefore @samp{f} is a regular expression that matches the string | 249 | therefore @samp{f} is a regular expression that matches the string |
| @@ -468,6 +471,34 @@ ordinary since there is no preceding expression on which the @samp{*} | |||
| 468 | can act. It is poor practice to depend on this behavior; quote the | 471 | can act. It is poor practice to depend on this behavior; quote the |
| 469 | special character anyway, regardless of where it appears.@refill | 472 | special character anyway, regardless of where it appears.@refill |
| 470 | 473 | ||
| 474 | As a @samp{\} is not special inside a character alternative, it can | ||
| 475 | never remove the special meaning of @samp{-} or @samp{]}. So you | ||
| 476 | should not quote these characters when they have no special meaning | ||
| 477 | either. This would not clarify anything, since backslashes can | ||
| 478 | legitimately precede these characters where they @emph{have} special | ||
| 479 | meaning, as in @code{[^\]} (@code{"[^\\]"} for Lisp string syntax), | ||
| 480 | which matches any single character except a backslash. | ||
| 481 | |||
| 482 | In practice, most @samp{]} that occur in regular expressions close a | ||
| 483 | character alternative and hence are special. However, occasionally a | ||
| 484 | regular expression may try to match a complex pattern of literal | ||
| 485 | @samp{[} and @samp{]}. In such situations, it sometimes may be | ||
| 486 | necessary to carefully parse the regexp from the start to determine | ||
| 487 | which square brackets enclose a character alternative. For example, | ||
| 488 | @code{[^][]]}, consists of the complemented character alternative | ||
| 489 | @code{[^][]}, which matches any single character that is not a square | ||
| 490 | bracket, followed by a literal @samp{]}. | ||
| 491 | |||
| 492 | The exact rules are that at the beginning of a regexp, @samp{[} is | ||
| 493 | special and @samp{]} not. This lasts until the first unquoted | ||
| 494 | @samp{[}, after which we are in a character alternative; @samp{[} is | ||
| 495 | no longer special (except when it starts a character class) but @samp{]} | ||
| 496 | is special, unless it immediately follows the special @samp{[} or that | ||
| 497 | @samp{[} followed by a @samp{^}. This lasts until the next special | ||
| 498 | @samp{]} that does not end a character class. This ends the character | ||
| 499 | alternative and restores the ordinary syntax of regular expressions; | ||
| 500 | an unquoted @samp{[} is special again and a @samp{]} not. | ||
| 501 | |||
| 471 | @node Char Classes | 502 | @node Char Classes |
| 472 | @subsubsection Character Classes | 503 | @subsubsection Character Classes |
| 473 | @cindex character classes in regexp | 504 | @cindex character classes in regexp |
| @@ -740,8 +771,8 @@ with a symbol-constituent character. | |||
| 740 | 771 | ||
| 741 | @kindex invalid-regexp | 772 | @kindex invalid-regexp |
| 742 | Not every string is a valid regular expression. For example, a string | 773 | Not every string is a valid regular expression. For example, a string |
| 743 | with unbalanced square brackets is invalid (with a few exceptions, such | 774 | that ends inside a character alternative without terminating @samp{]} |
| 744 | as @samp{[]]}), and so is a string that ends with a single @samp{\}. If | 775 | is invalid, and so is a string that ends with a single @samp{\}. If |
| 745 | an invalid regular expression is passed to any of the search functions, | 776 | an invalid regular expression is passed to any of the search functions, |
| 746 | an @code{invalid-regexp} error is signaled. | 777 | an @code{invalid-regexp} error is signaled. |
| 747 | 778 | ||