aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorGlenn Morris2012-03-28 00:57:42 -0700
committerGlenn Morris2012-03-28 00:57:42 -0700
commitd14daa28e401f6079d9a656a942e4db01112d69f (patch)
treefa22ca7c22c81ffc6ae3d02e0190c1f636499caf
parent425df10c7bab7333905424e2012b1af7c7496026 (diff)
downloademacs-d14daa28e401f6079d9a656a942e4db01112d69f.tar.gz
emacs-d14daa28e401f6079d9a656a942e4db01112d69f.zip
lispref/searching.tex small edits
* doc/lispref/searching.texi (Regular Expressions, Regexp Special): (Regexp Backslash, Regexp Example): Copyedits. (Regexp Special): Mention collation. Clarify char classes with an example.
-rw-r--r--doc/lispref/ChangeLog7
-rw-r--r--doc/lispref/searching.texi48
2 files changed, 33 insertions, 22 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index 494e3416d80..ca3b61d897e 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,3 +1,10 @@
12012-03-28 Glenn Morris <rgm@gnu.org>
2
3 * searching.texi (Regular Expressions, Regexp Special):
4 (Regexp Backslash, Regexp Example): Copyedits.
5 (Regexp Special): Mention collation.
6 Clarify char classes with an example.
7
12012-03-27 Martin Rudalics <rudalics@gmx.at> 82012-03-27 Martin Rudalics <rudalics@gmx.at>
2 9
3 * windows.texi (Window History): Describe new option 10 * windows.texi (Window History): Describe new option
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 9a508d37340..16eea349d7f 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -241,7 +241,7 @@ regexps; the following section says how to search for them.
241 241
242@findex re-builder 242@findex re-builder
243@cindex regular expressions, developing 243@cindex regular expressions, developing
244 For convenient interactive development of regular expressions, you 244 For interactive development of regular expressions, you
245can use the @kbd{M-x re-builder} command. It provides a convenient 245can use the @kbd{M-x re-builder} command. It provides a convenient
246interface for creating regular expressions, by giving immediate visual 246interface for creating regular expressions, by giving immediate visual
247feedback in a separate buffer. As you edit the regexp, all its 247feedback in a separate buffer. As you edit the regexp, all its
@@ -318,6 +318,7 @@ possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
318expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating 318expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
319@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on. 319@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
320 320
321@cindex backtracking and regular expressions
321The matcher processes a @samp{*} construct by matching, immediately, as 322The matcher processes a @samp{*} construct by matching, immediately, as
322many repetitions as can be found. Then it continues with the rest of 323many repetitions as can be found. Then it continues with the rest of
323the pattern. If that fails, backtracking occurs, discarding some of the 324the pattern. If that fails, backtracking occurs, discarding some of the
@@ -387,7 +388,12 @@ Ranges may be intermixed freely with individual characters, as in
387@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter 388@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
388or @samp{$}, @samp{%} or period. 389or @samp{$}, @samp{%} or period.
389 390
390Note that the usual regexp special characters are not special inside a 391If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also
392matches upper-case letters. Note that a range like @samp{[a-z]} is
393not affected by the locale's collation sequence, it always represents
394a sequence in @acronym{ASCII} order.
395
396Note also that the usual regexp special characters are not special inside a
391character alternative. A completely different set of characters is 397character alternative. A completely different set of characters is
392special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. 398special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
393 399
@@ -395,23 +401,27 @@ To include a @samp{]} in a character alternative, you must make it the
395first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. 401first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}.
396To include a @samp{-}, write @samp{-} as the first or last character of 402To include a @samp{-}, write @samp{-} as the first or last character of
397the character alternative, or put it after a range. Thus, @samp{[]-]} 403the character alternative, or put it after a range. Thus, @samp{[]-]}
398matches both @samp{]} and @samp{-}. 404matches both @samp{]} and @samp{-}. (As explained below, you cannot
405use @samp{\]} to include a @samp{]} inside a character alternative,
406since @samp{\} is not special there.)
399 407
400To include @samp{^} in a character alternative, put it anywhere but at 408To include @samp{^} in a character alternative, put it anywhere but at
401the beginning. 409the beginning.
402 410
411@c What if it starts with a multibyte and ends with a unibyte?
412@c That doesn't seem to match anything...?
403If a range starts with a unibyte character @var{c} and ends with a 413If a range starts with a unibyte character @var{c} and ends with a
404multibyte character @var{c2}, the range is divided into two parts: one 414multibyte character @var{c2}, the range is divided into two parts: one
405is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where 415spans the unibyte characters @samp{@var{c}..?\377}, the other the
406@var{c1} is the first character of the charset to which @var{c2} 416multibyte characters @samp{@var{c1}..@var{c2}}, where @var{c1} is the
407belongs. 417first character of the charset to which @var{c2} belongs.
408 418
409A character alternative can also specify named character classes 419A character alternative can also specify named character classes
410(@pxref{Char Classes}). This is a POSIX feature whose syntax is 420(@pxref{Char Classes}). This is a POSIX feature. For example,
411@samp{[:@var{class}:]}. Using a character class is equivalent to 421@samp{[[:ascii:]]} matches any @acronym{ASCII} character.
412mentioning each of the characters in that class; but the latter is not 422Using a character class is equivalent to mentioning each of the
413feasible in practice, since some classes include thousands of 423characters in that class; but the latter is not feasible in practice,
414different characters. 424since some classes include thousands of different characters.
415 425
416@item @samp{[^ @dots{} ]} 426@item @samp{[^ @dots{} ]}
417@cindex @samp{^} in regexp 427@cindex @samp{^} in regexp
@@ -812,7 +822,7 @@ with a symbol-constituent character.
812 822
813@kindex invalid-regexp 823@kindex invalid-regexp
814 Not every string is a valid regular expression. For example, a string 824 Not every string is a valid regular expression. For example, a string
815that ends inside a character alternative without terminating @samp{]} 825that ends inside a character alternative without a terminating @samp{]}
816is invalid, and so is a string that ends with a single @samp{\}. If 826is invalid, and so is a string that ends with a single @samp{\}. If
817an invalid regular expression is passed to any of the search functions, 827an invalid regular expression is passed to any of the search functions,
818an @code{invalid-regexp} error is signaled. 828an @code{invalid-regexp} error is signaled.
@@ -827,20 +837,14 @@ follows. (Nowadays Emacs uses a similar but more complex default
827regexp constructed by the function @code{sentence-end}. 837regexp constructed by the function @code{sentence-end}.
828@xref{Standard Regexps}.) 838@xref{Standard Regexps}.)
829 839
830 First, we show the regexp as a string in Lisp syntax to distinguish 840 Below, we show first the regexp as a string in Lisp syntax (to
831spaces from tab characters. The string constant begins and ends with a 841distinguish spaces from tab characters), and then the result of
842evaluating it. The string constant begins and ends with a
832double-quote. @samp{\"} stands for a double-quote as part of the 843double-quote. @samp{\"} stands for a double-quote as part of the
833string, @samp{\\} for a backslash as part of the string, @samp{\t} for a 844string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
834tab and @samp{\n} for a newline. 845tab and @samp{\n} for a newline.
835 846
836@example 847@example
837"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
838@end example
839
840@noindent
841In contrast, if you evaluate this string, you will see the following:
842
843@example
844@group 848@group
845"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*" 849"[.?!][]\"')@}]*\\($\\| $\\|\t\\|@ @ \\)[ \t\n]*"
846 @result{} "[.?!][]\"')@}]*\\($\\| $\\| \\|@ @ \\)[ 850 @result{} "[.?!][]\"')@}]*\\($\\| $\\| \\|@ @ \\)[
@@ -849,7 +853,7 @@ In contrast, if you evaluate this string, you will see the following:
849@end example 853@end example
850 854
851@noindent 855@noindent
852In this output, tab and newline appear as themselves. 856In the output, tab and newline appear as themselves.
853 857
854 This regular expression contains four parts in succession and can be 858 This regular expression contains four parts in succession and can be
855deciphered as follows: 859deciphered as follows: