aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRichard M. Stallman1997-05-19 06:29:13 +0000
committerRichard M. Stallman1997-05-19 06:29:13 +0000
commit1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd (patch)
tree94b50ed50e8ac0070a1bd7f8bb45a31ed4edaaaa
parentd987e6cbf7c9217fbe3b599a0425fd2d227a77b7 (diff)
downloademacs-1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd.tar.gz
emacs-1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd.zip
Update regexp syntax from Emacs manual.
-rw-r--r--lispref/searching.texi116
1 files changed, 62 insertions, 54 deletions
diff --git a/lispref/searching.texi b/lispref/searching.texi
index a9e45998926..80c10e94d9a 100644
--- a/lispref/searching.texi
+++ b/lispref/searching.texi
@@ -205,15 +205,14 @@ matches any three-character string that begins with @samp{a} and ends with
205 205
206@item * 206@item *
207@cindex @samp{*} in regexp 207@cindex @samp{*} in regexp
208is not a construct by itself; it is a suffix operator that means to 208is not a construct by itself; it is a postfix operator that means to
209repeat the preceding regular expression as many times as possible. In 209match the preceding regular expression repetitively as many times as
210@samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches 210possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
211one @samp{f} followed by any number of @samp{o}s. The case of zero 211@samp{o}s).
212@samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
213 212
214@samp{*} always applies to the @emph{smallest} possible preceding 213@samp{*} always applies to the @emph{smallest} possible preceding
215expression. Thus, @samp{fo*} has a repeating @samp{o}, not a 214expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
216repeating @samp{fo}.@refill 215@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
217 216
218The matcher processes a @samp{*} construct by matching, immediately, 217The matcher processes a @samp{*} construct by matching, immediately,
219as many repetitions as can be found. Then it continues with the rest 218as many repetitions as can be found. Then it continues with the rest
@@ -236,63 +235,63 @@ expressions run fast, check nested repetitions carefully.
236 235
237@item + 236@item +
238@cindex @samp{+} in regexp 237@cindex @samp{+} in regexp
239is a suffix operator similar to @samp{*} except that the preceding 238is a postfix operator, similar to @samp{*} except that it must match
240expression must match at least once. So, for example, @samp{ca+r} 239the preceding expression at least once. So, for example, @samp{ca+r}
241matches the strings @samp{car} and @samp{caaaar} but not the string 240matches the strings @samp{car} and @samp{caaaar} but not the string
242@samp{cr}, whereas @samp{ca*r} matches all three strings. 241@samp{cr}, whereas @samp{ca*r} matches all three strings.
243 242
244@item ? 243@item ?
245@cindex @samp{?} in regexp 244@cindex @samp{?} in regexp
246is a suffix operator similar to @samp{*} except that the preceding 245is a postfix operator, similar to @samp{*} except that it can match the
247expression can match either once or not at all. For example, 246preceding expression either once or not at all. For example,
248@samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing 247@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
249else.
250 248
251@item [ @dots{} ] 249@item [ @dots{} ]
252@cindex character set (in regexp) 250@cindex character set (in regexp)
253@cindex @samp{[} in regexp 251@cindex @samp{[} in regexp
254@cindex @samp{]} in regexp 252@cindex @samp{]} in regexp
255@samp{[} begins a @dfn{character set}, which is terminated by a 253is a @dfn{character set}, which begins with @samp{[} and is terminated
256@samp{]}. In the simplest case, the characters between the two brackets 254by @samp{]}. In the simplest case, the characters between the two
257form the set. Thus, @samp{[ad]} matches either one @samp{a} or one 255brackets are what this set can match.
258@samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s 256
259and @samp{d}s (including the empty string), from which it follows that 257Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
260@samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr}, 258@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
261@samp{caddaar}, etc.@refill 259(including the empty string), from which it follows that @samp{c[ad]*r}
262 260matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
263The usual regular expression special characters are not special inside a 261
262You can also include character ranges in a character set, by writing the
263startong and ending characters with a @samp{-} between them. Thus,
264@samp{[a-z]} matches any lower-case ASCII letter. Ranges may be
265intermixed freely with individual characters, as in @samp{[a-z$%.]},
266which matches any lower case ASCII letter or @samp{$}, @samp{%} or
267period.
268
269Note that the usual regexp special characters are not special inside a
264character set. A completely different set of special characters exists 270character set. A completely different set of special characters exists
265inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill 271inside character sets: @samp{]}, @samp{-} and @samp{^}.
266 272
267@samp{-} is used for ranges of characters. To write a range, write two 273To include a @samp{]} in a character set, you must make it the first
268characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any 274character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To
269lower case letter. Ranges may be intermixed freely with individual 275include a @samp{-}, write @samp{-} as the first or last character of the
270characters, as in @samp{[a-z$%.]}, which matches any lower case letter 276set, or put it after a range. Thus, @samp{[]-]} matches both @samp{]}
271or @samp{$}, @samp{%}, or a period.@refill 277and @samp{-}.
272
273To include a @samp{]} in a character set, make it the first character.
274For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
275@samp{-}, write @samp{-} as the first character in the set, or put it
276immediately after a range. (You can replace one individual character
277@var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
278@samp{-}.) There is no way to write a set containing just @samp{-} and
279@samp{]}.
280 278
281To include @samp{^} in a set, put it anywhere but at the beginning of 279To include @samp{^} in a set, put it anywhere but at the beginning of
282the set. 280the set.
283 281
284@item [^ @dots{} ] 282@item [^ @dots{} ]
285@cindex @samp{^} in regexp 283@cindex @samp{^} in regexp
286@samp{[^} begins a @dfn{complement character set}, which matches any 284@samp{[^} begins a @dfn{complemented character set}, which matches any
287character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} 285character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches
288matches all characters @emph{except} letters and digits.@refill 286all characters @emph{except} letters and digits.
289 287
290@samp{^} is not special in a character set unless it is the first 288@samp{^} is not special in a character set unless it is the first
291character. The character following the @samp{^} is treated as if it 289character. The character following the @samp{^} is treated as if it
292were first (thus, @samp{-} and @samp{]} are not special there). 290were first (in other words, @samp{-} and @samp{]} are not special there).
293 291
294Note that a complement character set can match a newline, unless 292A complemented character set can match a newline, unless newline is
295newline is mentioned as one of the characters not to match. 293mentioned as one of the characters not to match. This is in contrast to
294the handling of regexps in programs such as @code{grep}.
296 295
297@item ^ 296@item ^
298@cindex @samp{^} in regexp 297@cindex @samp{^} in regexp
@@ -339,10 +338,10 @@ can act. It is poor practice to depend on this behavior; quote the
339special character anyway, regardless of where it appears.@refill 338special character anyway, regardless of where it appears.@refill
340 339
341For the most part, @samp{\} followed by any character matches only 340For the most part, @samp{\} followed by any character matches only
342that character. However, there are several exceptions: characters 341that character. However, there are several exceptions: two-character
343that, when preceded by @samp{\}, are special constructs. Such 342sequences starting with @samp{\} which have special meanings. The
344characters are always ordinary when encountered on their own. Here 343second character in the sequence is always an ordinary character on
345is a table of @samp{\} constructs: 344their own. Here is a table of @samp{\} constructs.
346 345
347@table @kbd 346@table @kbd
348@item \| 347@item \|
@@ -375,9 +374,10 @@ the regular expression @samp{\(foo\|bar\)x} matches either @samp{foox}
375or @samp{barx}. 374or @samp{barx}.
376 375
377@item 376@item
378To enclose an expression for a suffix operator such as @samp{*} to act 377To enclose a complicated expression for the postfix operators @samp{*},
379on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any 378@samp{+} and @samp{?} to operate on. Thus, @samp{ba\(na\)*} matches
380(zero or more) number of @samp{na} strings.@refill 379@samp{bananana}, etc., with any (zero or more) number of @samp{na}
380strings.@refill
381 381
382@item 382@item
383To record a matched substring for future reference. 383To record a matched substring for future reference.
@@ -393,7 +393,7 @@ Here is an explanation of this feature:
393matches the same text that matched the @var{digit}th occurrence of a 393matches the same text that matched the @var{digit}th occurrence of a
394@samp{\( @dots{} \)} construct. 394@samp{\( @dots{} \)} construct.
395 395
396In other words, after the end of a @samp{\( @dots{} \)} construct. the 396In other words, after the end of a @samp{\( @dots{} \)} construct, the
397matcher remembers the beginning and end of the text matched by that 397matcher remembers the beginning and end of the text matched by that
398construct. Then, later on in the regular expression, you can use 398construct. Then, later on in the regular expression, you can use
399@samp{\} followed by @var{digit} to match that same text, whatever it 399@samp{\} followed by @var{digit} to match that same text, whatever it
@@ -424,8 +424,9 @@ matches any character that is not a word constituent.
424matches any character whose syntax is @var{code}. Here @var{code} is a 424matches any character whose syntax is @var{code}. Here @var{code} is a
425character that represents a syntax code: thus, @samp{w} for word 425character that represents a syntax code: thus, @samp{w} for word
426constituent, @samp{-} for whitespace, @samp{(} for open parenthesis, 426constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
427etc. @xref{Syntax Tables}, for a list of syntax codes and the 427etc. Represent a character of whitespace (which can be a newline) by
428characters that stand for them. 428either @samp{-} or a space character. @xref{Syntax Tables}, for a list
429of syntax codes and the characters that stand for them.
429 430
430@item \S@var{code} 431@item \S@var{code}
431@cindex @samp{\S} in regexp 432@cindex @samp{\S} in regexp
@@ -459,6 +460,9 @@ end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
459@samp{foo} as a separate word. @samp{\bballs?\b} matches 460@samp{foo} as a separate word. @samp{\bballs?\b} matches
460@samp{ball} or @samp{balls} as a separate word.@refill 461@samp{ball} or @samp{balls} as a separate word.@refill
461 462
463@samp{\b} matches at the beginning or end of the buffer
464regardless of what text appears next to it.
465
462@item \B 466@item \B
463@cindex @samp{\B} in regexp 467@cindex @samp{\B} in regexp
464matches the empty string, but @emph{not} at the beginning or 468matches the empty string, but @emph{not} at the beginning or
@@ -467,10 +471,14 @@ end of a word.
467@item \< 471@item \<
468@cindex @samp{\<} in regexp 472@cindex @samp{\<} in regexp
469matches the empty string, but only at the beginning of a word. 473matches the empty string, but only at the beginning of a word.
474@samp{\<} matches at the beginning of the buffer only if a
475word-constituent character follows.
470 476
471@item \> 477@item \>
472@cindex @samp{\>} in regexp 478@cindex @samp{\>} in regexp
473matches the empty string, but only at the end of a word. 479matches the empty string, but only at the end of a word. @samp{\>}
480matches at the end of the buffer only if the contents end with a
481word-constituent character.
474@end table 482@end table
475 483
476@kindex invalid-regexp 484@kindex invalid-regexp