diff options
| author | Glenn Morris | 2012-03-28 00:57:42 -0700 |
|---|---|---|
| committer | Glenn Morris | 2012-03-28 00:57:42 -0700 |
| commit | d14daa28e401f6079d9a656a942e4db01112d69f (patch) | |
| tree | fa22ca7c22c81ffc6ae3d02e0190c1f636499caf | |
| parent | 425df10c7bab7333905424e2012b1af7c7496026 (diff) | |
| download | emacs-d14daa28e401f6079d9a656a942e4db01112d69f.tar.gz emacs-d14daa28e401f6079d9a656a942e4db01112d69f.zip | |
lispref/searching.tex small edits
* doc/lispref/searching.texi (Regular Expressions, Regexp Special):
(Regexp Backslash, Regexp Example): Copyedits.
(Regexp Special): Mention collation.
Clarify char classes with an example.
| -rw-r--r-- | doc/lispref/ChangeLog | 7 | ||||
| -rw-r--r-- | doc/lispref/searching.texi | 48 |
2 files changed, 33 insertions, 22 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 494e3416d80..ca3b61d897e 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog | |||
| @@ -1,3 +1,10 @@ | |||
| 1 | 2012-03-28 Glenn Morris <rgm@gnu.org> | ||
| 2 | |||
| 3 | * searching.texi (Regular Expressions, Regexp Special): | ||
| 4 | (Regexp Backslash, Regexp Example): Copyedits. | ||
| 5 | (Regexp Special): Mention collation. | ||
| 6 | Clarify char classes with an example. | ||
| 7 | |||
| 1 | 2012-03-27 Martin Rudalics <rudalics@gmx.at> | 8 | 2012-03-27 Martin Rudalics <rudalics@gmx.at> |
| 2 | 9 | ||
| 3 | * windows.texi (Window History): Describe new option | 10 | * windows.texi (Window History): Describe new option |
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi index 9a508d37340..16eea349d7f 100644 --- a/doc/lispref/searching.texi +++ b/doc/lispref/searching.texi | |||
| @@ -241,7 +241,7 @@ regexps; the following section says how to search for them. | |||
| 241 | 241 | ||
| 242 | @findex re-builder | 242 | @findex re-builder |
| 243 | @cindex regular expressions, developing | 243 | @cindex regular expressions, developing |
| 244 | For convenient interactive development of regular expressions, you | 244 | For interactive development of regular expressions, you |
| 245 | can use the @kbd{M-x re-builder} command. It provides a convenient | 245 | can use the @kbd{M-x re-builder} command. It provides a convenient |
| 246 | interface for creating regular expressions, by giving immediate visual | 246 | interface for creating regular expressions, by giving immediate visual |
| 247 | feedback in a separate buffer. As you edit the regexp, all its | 247 | feedback in a separate buffer. As you edit the regexp, all its |
| @@ -318,6 +318,7 @@ possible. Thus, @samp{o*} matches any number of @samp{o}s (including no | |||
| 318 | expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating | 318 | expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating |
| 319 | @samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on. | 319 | @samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on. |
| 320 | 320 | ||
| 321 | @cindex backtracking and regular expressions | ||
| 321 | The matcher processes a @samp{*} construct by matching, immediately, as | 322 | The matcher processes a @samp{*} construct by matching, immediately, as |
| 322 | many repetitions as can be found. Then it continues with the rest of | 323 | many repetitions as can be found. Then it continues with the rest of |
| 323 | the pattern. If that fails, backtracking occurs, discarding some of the | 324 | the pattern. If that fails, backtracking occurs, discarding some of the |
| @@ -387,7 +388,12 @@ Ranges may be intermixed freely with individual characters, as in | |||
| 387 | @samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter | 388 | @samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter |
| 388 | or @samp{$}, @samp{%} or period. | 389 | or @samp{$}, @samp{%} or period. |
| 389 | 390 | ||
| 390 | Note that the usual regexp special characters are not special inside a | 391 | If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also |
| 392 | matches upper-case letters. Note that a range like @samp{[a-z]} is | ||
| 393 | not affected by the locale's collation sequence, it always represents | ||
| 394 | a sequence in @acronym{ASCII} order. | ||
| 395 | |||
| 396 | Note also that the usual regexp special characters are not special inside a | ||
| 391 | character alternative. A completely different set of characters is | 397 | character alternative. A completely different set of characters is |
| 392 | special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. | 398 | special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. |
| 393 | 399 | ||
| @@ -395,23 +401,27 @@ To include a @samp{]} in a character alternative, you must make it the | |||
| 395 | first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. | 401 | first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. |
| 396 | To include a @samp{-}, write @samp{-} as the first or last character of | 402 | To include a @samp{-}, write @samp{-} as the first or last character of |
| 397 | the character alternative, or put it after a range. Thus, @samp{[]-]} | 403 | the character alternative, or put it after a range. Thus, @samp{[]-]} |
| 398 | matches both @samp{]} and @samp{-}. | 404 | matches both @samp{]} and @samp{-}. (As explained below, you cannot |
| 405 | use @samp{\]} to include a @samp{]} inside a character alternative, | ||
| 406 | since @samp{\} is not special there.) | ||
| 399 | 407 | ||
| 400 | To include @samp{^} in a character alternative, put it anywhere but at | 408 | To include @samp{^} in a character alternative, put it anywhere but at |
| 401 | the beginning. | 409 | the beginning. |
| 402 | 410 | ||
| 411 | @c What if it starts with a multibyte and ends with a unibyte? | ||
| 412 | @c That doesn't seem to match anything...? | ||
| 403 | If a range starts with a unibyte character @var{c} and ends with a | 413 | If a range starts with a unibyte character @var{c} and ends with a |
| 404 | multibyte character @var{c2}, the range is divided into two parts: one | 414 | multibyte character @var{c2}, the range is divided into two parts: one |
| 405 | is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where | 415 | spans the unibyte characters @samp{@var{c}..?\377}, the other the |
| 406 | @var{c1} is the first character of the charset to which @var{c2} | 416 | multibyte characters @samp{@var{c1}..@var{c2}}, where @var{c1} is the |
| 407 | belongs. | 417 | first character of the charset to which @var{c2} belongs. |
| 408 | 418 | ||
| 409 | A character alternative can also specify named character classes | 419 | A character alternative can also specify named character classes |
| 410 | (@pxref{Char Classes}). This is a POSIX feature whose syntax is | 420 | (@pxref{Char Classes}). This is a POSIX feature. For example, |
| 411 | @samp{[:@var{class}:]}. Using a character class is equivalent to | 421 | @samp{[[:ascii:]]} matches any @acronym{ASCII} character. |
| 412 | mentioning each of the characters in that class; but the latter is not | 422 | Using a character class is equivalent to mentioning each of the |
| 413 | feasible in practice, since some classes include thousands of | 423 | characters in that class; but the latter is not feasible in practice, |
| 414 | different characters. | 424 | since some classes include thousands of different characters. |
| 415 | 425 | ||
| 416 | @item @samp{[^ @dots{} ]} | 426 | @item @samp{[^ @dots{} ]} |
| 417 | @cindex @samp{^} in regexp | 427 | @cindex @samp{^} in regexp |
| @@ -812,7 +822,7 @@ with a symbol-constituent character. | |||
| 812 | 822 | ||
| 813 | @kindex invalid-regexp | 823 | @kindex invalid-regexp |
| 814 | Not every string is a valid regular expression. For example, a string | 824 | Not every string is a valid regular expression. For example, a string |
| 815 | that ends inside a character alternative without terminating @samp{]} | 825 | that ends inside a character alternative without a terminating @samp{]} |
| 816 | is invalid, and so is a string that ends with a single @samp{\}. If | 826 | is invalid, and so is a string that ends with a single @samp{\}. If |
| 817 | an invalid regular expression is passed to any of the search functions, | 827 | an invalid regular expression is passed to any of the search functions, |
| 818 | an @code{invalid-regexp} error is signaled. | 828 | an @code{invalid-regexp} error is signaled. |
| @@ -827,20 +837,14 @@ follows. (Nowadays Emacs uses a similar but more complex default | |||
| 827 | regexp constructed by the function @code{sentence-end}. | 837 | regexp constructed by the function @code{sentence-end}. |
| 828 | @xref{Standard Regexps}.) | 838 | @xref{Standard Regexps}.) |
| 829 | 839 | ||
| 830 | First, we show the regexp as a string in Lisp syntax to distinguish | 840 | Below, we show first the regexp as a string in Lisp syntax (to |
| 831 | spaces from tab characters. The string constant begins and ends with a | 841 | distinguish spaces from tab characters), and then the result of |
| 842 | evaluating it. The string constant begins and ends with a | ||
| 832 | double-quote. @samp{\"} stands for a double-quote as part of the | 843 | double-quote. @samp{\"} stands for a double-quote as part of the |
| 833 | string, @samp{\\} for a backslash as part of the string, @samp{\t} for a | 844 | string, @samp{\\} for a backslash as part of the string, @samp{\t} for a |
| 834 | tab and @samp{\n} for a newline. | 845 | tab and @samp{\n} for a newline. |
| 835 | 846 | ||
| 836 | @example | 847 | @example |
| 837 | "[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*" | ||
| 838 | @end example | ||
| 839 | |||
| 840 | @noindent | ||
| 841 | In contrast, if you evaluate this string, you will see the following: | ||
| 842 | |||
| 843 | @example | ||
| 844 | @group | 848 | @group |
| 845 | "[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*" | 849 | "[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*" |
| 846 | @result{} "[.?!][]\"')@}]*\\($\\| $\\| \\|@ @ \\)[ | 850 | @result{} "[.?!][]\"')@}]*\\($\\| $\\| \\|@ @ \\)[ |
| @@ -849,7 +853,7 @@ In contrast, if you evaluate this string, you will see the following: | |||
| 849 | @end example | 853 | @end example |
| 850 | 854 | ||
| 851 | @noindent | 855 | @noindent |
| 852 | In this output, tab and newline appear as themselves. | 856 | In the output, tab and newline appear as themselves. |
| 853 | 857 | ||
| 854 | This regular expression contains four parts in succession and can be | 858 | This regular expression contains four parts in succession and can be |
| 855 | deciphered as follows: | 859 | deciphered as follows: |