diff options
| author | Chong Yidong | 2011-01-28 13:03:30 -0500 |
|---|---|---|
| committer | Chong Yidong | 2011-01-28 13:03:30 -0500 |
| commit | 65401ee3fefe38cb3a8a350a17f8b0a3a4ccb579 (patch) | |
| tree | 8a98ca7c39fa21497d2052428deb0d3fbb634aa8 /doc | |
| parent | 7427eb9754e8d22568b99621b5e8117dc2bde802 (diff) | |
| download | emacs-65401ee3fefe38cb3a8a350a17f8b0a3a4ccb579.tar.gz emacs-65401ee3fefe38cb3a8a350a17f8b0a3a4ccb579.zip | |
* search.texi (Regexps): Copyedits. Mention character classes (Bug#7809).
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/emacs/ChangeLog | 3 | ||||
| -rw-r--r-- | doc/emacs/search.texi | 113 |
2 files changed, 60 insertions, 56 deletions
diff --git a/doc/emacs/ChangeLog b/doc/emacs/ChangeLog index 4d4d38c2c5c..f3c6afc3fa5 100644 --- a/doc/emacs/ChangeLog +++ b/doc/emacs/ChangeLog | |||
| @@ -1,5 +1,8 @@ | |||
| 1 | 2011-01-28 Chong Yidong <cyd@stupidchicken.com> | 1 | 2011-01-28 Chong Yidong <cyd@stupidchicken.com> |
| 2 | 2 | ||
| 3 | * search.texi (Regexps): Copyedits. Mention character classes | ||
| 4 | (Bug#7809). | ||
| 5 | |||
| 3 | * files.texi (File Aliases): Restore explanatory text from Eli | 6 | * files.texi (File Aliases): Restore explanatory text from Eli |
| 4 | Zaretskii, accidentally removed in 2011-01-08 commit. | 7 | Zaretskii, accidentally removed in 2011-01-08 commit. |
| 5 | 8 | ||
diff --git a/doc/emacs/search.texi b/doc/emacs/search.texi index cd63a562d66..e2ecb4a2385 100644 --- a/doc/emacs/search.texi +++ b/doc/emacs/search.texi | |||
| @@ -546,21 +546,20 @@ Search}. | |||
| 546 | @cindex syntax of regexps | 546 | @cindex syntax of regexps |
| 547 | 547 | ||
| 548 | This manual describes regular expression features that users | 548 | This manual describes regular expression features that users |
| 549 | typically want to use. There are additional features that are | 549 | typically use. @xref{Regular Expressions,,, elisp, The Emacs Lisp |
| 550 | mainly used in Lisp programs; see @ref{Regular Expressions,,, | 550 | Reference Manual}, for additional features used mainly in Lisp |
| 551 | elisp, The Emacs Lisp Reference Manual}. | 551 | programs. |
| 552 | 552 | ||
| 553 | Regular expressions have a syntax in which a few characters are | 553 | Regular expressions have a syntax in which a few characters are |
| 554 | special constructs and the rest are @dfn{ordinary}. An ordinary | 554 | special constructs and the rest are @dfn{ordinary}. An ordinary |
| 555 | character is a simple regular expression which matches that same | 555 | character matches that same character and nothing else. The special |
| 556 | character and nothing else. The special characters are @samp{$}, | 556 | characters are @samp{$^.*+?[\}. The character @samp{]} is special if |
| 557 | @samp{^}, @samp{.}, @samp{*}, @samp{+}, @samp{?}, @samp{[}, and | 557 | it ends a character alternative (see later). The character @samp{-} |
| 558 | @samp{\}. The character @samp{]} is special if it ends a character | 558 | is special inside a character alternative. Any other character |
| 559 | alternative (see later). The character @samp{-} is special inside a | 559 | appearing in a regular expression is ordinary, unless a @samp{\} |
| 560 | character alternative. Any other character appearing in a regular | 560 | precedes it. (When you use regular expressions in a Lisp program, |
| 561 | expression is ordinary, unless a @samp{\} precedes it. (When you use | 561 | each @samp{\} must be doubled, see the example near the end of this |
| 562 | regular expressions in a Lisp program, each @samp{\} must be doubled, | 562 | section.) |
| 563 | see the example near the end of this section.) | ||
| 564 | 563 | ||
| 565 | For example, @samp{f} is not a special character, so it is ordinary, and | 564 | For example, @samp{f} is not a special character, so it is ordinary, and |
| 566 | therefore @samp{f} is a regular expression that matches the string | 565 | therefore @samp{f} is a regular expression that matches the string |
| @@ -570,28 +569,27 @@ only @samp{o}. (When case distinctions are being ignored, these regexps | |||
| 570 | also match @samp{F} and @samp{O}, but we consider this a generalization | 569 | also match @samp{F} and @samp{O}, but we consider this a generalization |
| 571 | of ``the same string,'' rather than an exception.) | 570 | of ``the same string,'' rather than an exception.) |
| 572 | 571 | ||
| 573 | Any two regular expressions @var{a} and @var{b} can be concatenated. The | 572 | Any two regular expressions @var{a} and @var{b} can be concatenated. |
| 574 | result is a regular expression which matches a string if @var{a} matches | 573 | The result is a regular expression which matches a string if @var{a} |
| 575 | some amount of the beginning of that string and @var{b} matches the rest of | 574 | matches some amount of the beginning of that string and @var{b} |
| 576 | the string.@refill | 575 | matches the rest of the string. For example, concatenating the |
| 577 | 576 | regular expressions @samp{f} and @samp{o} gives the regular expression | |
| 578 | As a simple example, we can concatenate the regular expressions @samp{f} | 577 | @samp{fo}, which matches only the string @samp{fo}. Still trivial. |
| 579 | and @samp{o} to get the regular expression @samp{fo}, which matches only | 578 | To do something nontrivial, you need to use one of the special |
| 580 | the string @samp{fo}. Still trivial. To do something nontrivial, you | 579 | characters. Here is a list of them. |
| 581 | need to use one of the special characters. Here is a list of them. | ||
| 582 | 580 | ||
| 583 | @table @asis | 581 | @table @asis |
| 584 | @item @kbd{.}@: @r{(Period)} | 582 | @item @kbd{.}@: @r{(Period)} |
| 585 | is a special character that matches any single character except a newline. | 583 | is a special character that matches any single character except a |
| 586 | Using concatenation, we can make regular expressions like @samp{a.b}, which | 584 | newline. For example, the regular expressions @samp{a.b} matches any |
| 587 | matches any three-character string that begins with @samp{a} and ends with | 585 | three-character string that begins with @samp{a} and ends with |
| 588 | @samp{b}.@refill | 586 | @samp{b}. |
| 589 | 587 | ||
| 590 | @item @kbd{*} | 588 | @item @kbd{*} |
| 591 | is not a construct by itself; it is a postfix operator that means to | 589 | is not a construct by itself; it is a postfix operator that means to |
| 592 | match the preceding regular expression repetitively as many times as | 590 | match the preceding regular expression repetitively any number of |
| 593 | possible. Thus, @samp{o*} matches any number of @samp{o}s (including no | 591 | times, as many times as possible. Thus, @samp{o*} matches any number |
| 594 | @samp{o}s). | 592 | of @samp{o}s, including no @samp{o}s. |
| 595 | 593 | ||
| 596 | @samp{*} always applies to the @emph{smallest} possible preceding | 594 | @samp{*} always applies to the @emph{smallest} possible preceding |
| 597 | expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating | 595 | expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating |
| @@ -610,22 +608,21 @@ With this choice, the rest of the regexp matches successfully.@refill | |||
| 610 | 608 | ||
| 611 | @item @kbd{+} | 609 | @item @kbd{+} |
| 612 | is a postfix operator, similar to @samp{*} except that it must match | 610 | is a postfix operator, similar to @samp{*} except that it must match |
| 613 | the preceding expression at least once. So, for example, @samp{ca+r} | 611 | the preceding expression at least once. Thus, @samp{ca+r} matches the |
| 614 | matches the strings @samp{car} and @samp{caaaar} but not the string | 612 | strings @samp{car} and @samp{caaaar} but not the string @samp{cr}, |
| 615 | @samp{cr}, whereas @samp{ca*r} matches all three strings. | 613 | whereas @samp{ca*r} matches all three strings. |
| 616 | 614 | ||
| 617 | @item @kbd{?} | 615 | @item @kbd{?} |
| 618 | is a postfix operator, similar to @samp{*} except that it can match the | 616 | is a postfix operator, similar to @samp{*} except that it can match |
| 619 | preceding expression either once or not at all. For example, | 617 | the preceding expression either once or not at all. Thus, @samp{ca?r} |
| 620 | @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else. | 618 | matches @samp{car} or @samp{cr}, and nothing else. |
| 621 | 619 | ||
| 622 | @item @kbd{*?}, @kbd{+?}, @kbd{??} | 620 | @item @kbd{*?}, @kbd{+?}, @kbd{??} |
| 623 | @cindex non-greedy regexp matching | 621 | @cindex non-greedy regexp matching |
| 624 | are non-greedy variants of the operators above. The normal operators | 622 | are non-@dfn{greedy} variants of the operators above. The normal |
| 625 | @samp{*}, @samp{+}, @samp{?} are @dfn{greedy} in that they match as | 623 | operators @samp{*}, @samp{+}, @samp{?} match as much as they can, as |
| 626 | much as they can, as long as the overall regexp can still match. With | 624 | long as the overall regexp can still match. With a following |
| 627 | a following @samp{?}, they are non-greedy: they will match as little | 625 | @samp{?}, they will match as little as possible. |
| 628 | as possible. | ||
| 629 | 626 | ||
| 630 | Thus, both @samp{ab*} and @samp{ab*?} can match the string @samp{a} | 627 | Thus, both @samp{ab*} and @samp{ab*?} can match the string @samp{a} |
| 631 | and the string @samp{abbbb}; but if you try to match them both against | 628 | and the string @samp{abbbb}; but if you try to match them both against |
| @@ -641,29 +638,30 @@ a newline, it matches the whole string. Since it @emph{can} match | |||
| 641 | starting at the first @samp{a}, it does. | 638 | starting at the first @samp{a}, it does. |
| 642 | 639 | ||
| 643 | @item @kbd{\@{@var{n}\@}} | 640 | @item @kbd{\@{@var{n}\@}} |
| 644 | is a postfix operator that specifies repetition @var{n} times---that | 641 | is a postfix operator specifying @var{n} repetitions---that is, the |
| 645 | is, the preceding regular expression must match exactly @var{n} times | 642 | preceding regular expression must match exactly @var{n} times in a |
| 646 | in a row. For example, @samp{x\@{4\@}} matches the string @samp{xxxx} | 643 | row. For example, @samp{x\@{4\@}} matches the string @samp{xxxx} and |
| 647 | and nothing else. | 644 | nothing else. |
| 648 | 645 | ||
| 649 | @item @kbd{\@{@var{n},@var{m}\@}} | 646 | @item @kbd{\@{@var{n},@var{m}\@}} |
| 650 | is a postfix operator that specifies repetition between @var{n} and | 647 | is a postfix operator specifying between @var{n} and @var{m} |
| 651 | @var{m} times---that is, the preceding regular expression must match | 648 | repetitions---that is, the preceding regular expression must match at |
| 652 | at least @var{n} times, but no more than @var{m} times. If @var{m} is | 649 | least @var{n} times, but no more than @var{m} times. If @var{m} is |
| 653 | omitted, then there is no upper limit, but the preceding regular | 650 | omitted, then there is no upper limit, but the preceding regular |
| 654 | expression must match at least @var{n} times.@* @samp{\@{0,1\@}} is | 651 | expression must match at least @var{n} times.@* @samp{\@{0,1\@}} is |
| 655 | equivalent to @samp{?}. @* @samp{\@{0,\@}} is equivalent to | 652 | equivalent to @samp{?}. @* @samp{\@{0,\@}} is equivalent to |
| 656 | @samp{*}. @* @samp{\@{1,\@}} is equivalent to @samp{+}. | 653 | @samp{*}. @* @samp{\@{1,\@}} is equivalent to @samp{+}. |
| 657 | 654 | ||
| 658 | @item @kbd{[ @dots{} ]} | 655 | @item @kbd{[ @dots{} ]} |
| 659 | is a @dfn{character set}, which begins with @samp{[} and is terminated | 656 | is a @dfn{character set}, beginning with @samp{[} and terminated by |
| 660 | by @samp{]}. In the simplest case, the characters between the two | 657 | @samp{]}. |
| 661 | brackets are what this set can match. | ||
| 662 | 658 | ||
| 663 | Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and | 659 | In the simplest case, the characters between the two brackets are what |
| 664 | @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s | 660 | this set can match. Thus, @samp{[ad]} matches either one @samp{a} or |
| 665 | (including the empty string), from which it follows that @samp{c[ad]*r} | 661 | one @samp{d}, and @samp{[ad]*} matches any string composed of just |
| 666 | matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. | 662 | @samp{a}s and @samp{d}s (including the empty string). It follows that |
| 663 | @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr}, | ||
| 664 | @samp{caddaar}, etc. | ||
| 667 | 665 | ||
| 668 | You can also include character ranges in a character set, by writing the | 666 | You can also include character ranges in a character set, by writing the |
| 669 | starting and ending characters with a @samp{-} between them. Thus, | 667 | starting and ending characters with a @samp{-} between them. Thus, |
| @@ -672,9 +670,12 @@ intermixed freely with individual characters, as in @samp{[a-z$%.]}, | |||
| 672 | which matches any lower-case @acronym{ASCII} letter or @samp{$}, @samp{%} or | 670 | which matches any lower-case @acronym{ASCII} letter or @samp{$}, @samp{%} or |
| 673 | period. | 671 | period. |
| 674 | 672 | ||
| 675 | Note that the usual regexp special characters are not special inside a | 673 | You can also include certain special @dfn{character classes} in a |
| 676 | character set. A completely different set of special characters exists | 674 | character set. A @samp{[:} and balancing @samp{:]} enclose a |
| 677 | inside character sets: @samp{]}, @samp{-} and @samp{^}. | 675 | character class inside a character alternative. For instance, |
| 676 | @samp{[[:alnum:]]} matches any letter or digit. @xref{Char Classes,,, | ||
| 677 | elisp, The Emacs Lisp Reference Manual}, for a list of character | ||
| 678 | classes. | ||
| 678 | 679 | ||
| 679 | To include a @samp{]} in a character set, you must make it the first | 680 | To include a @samp{]} in a character set, you must make it the first |
| 680 | character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To | 681 | character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To |