Update regexp syntax from Emacs manual.

author: Richard M. Stallman 1997-05-19 06:29:13 +0000
committer: Richard M. Stallman 1997-05-19 06:29:13 +0000
commit: 1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd (patch)
tree: 94b50ed50e8ac0070a1bd7f8bb45a31ed4edaaaa
parent: d987e6cbf7c9217fbe3b599a0425fd2d227a77b7 (diff)
download: emacs-1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd.tar.gz
emacs-1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd.zip
1 files changed, 62 insertions, 54 deletions
diff --git a/lispref/searching.texi b/lispref/searching.texi
index a9e45998926..80c10e94d9a 100644
--- a/lispref/searching.texi
+++ b/lispref/searching.texi
@@ -205,15 +205,14 @@ matches any three-character string that begins with @samp{a} and ends with
 @item *
 @cindex @samp{*} in regexp
-is not a construct by itself; it is a suffix operator that means to
+is not a construct by itself; it is a postfix operator that means to
-repeat the preceding regular expression as many times as possible.  In
+match the preceding regular expression repetitively as many times as
-@samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches
+possible.  Thus, @samp{o*} matches any number of @samp{o}s (including no
-one @samp{f} followed by any number of @samp{o}s.  The case of zero
+@samp{o}s).
-@samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
 @samp{*} always applies to the @emph{smallest} possible preceding
-expression.  Thus, @samp{fo*} has a repeating @samp{o}, not a
+expression.  Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
-repeating @samp{fo}.@refill
+@samp{fo}.  It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
 The matcher processes a @samp{*} construct by matching, immediately,
 as many repetitions as can be found.  Then it continues with the rest
@@ -236,63 +235,63 @@ expressions run fast, check nested repetitions carefully.
 @item +
 @cindex @samp{+} in regexp
-is a suffix operator similar to @samp{*} except that the preceding
+is a postfix operator, similar to @samp{*} except that it must match
-expression must match at least once.  So, for example, @samp{ca+r}
+the preceding expression at least once.  So, for example, @samp{ca+r}
 matches the strings @samp{car} and @samp{caaaar} but not the string
 @samp{cr}, whereas @samp{ca*r} matches all three strings.
 @item ?
 @cindex @samp{?} in regexp
-is a suffix operator similar to @samp{*} except that the preceding
+is a postfix operator, similar to @samp{*} except that it can match the
-expression can match either once or not at all.  For example,
+preceding expression either once or not at all.  For example,
-@samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing
+@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
-else.
 @item [ @dots{} ]
 @cindex character set (in regexp)
 @cindex @samp{[} in regexp
 @cindex @samp{]} in regexp
-@samp{[} begins a @dfn{character set}, which is terminated by a
+is a @dfn{character set}, which begins with @samp{[} and is terminated
-@samp{]}.  In the simplest case, the characters between the two brackets
+by @samp{]}.  In the simplest case, the characters between the two
-form the set.  Thus, @samp{[ad]} matches either one @samp{a} or one
+brackets are what this set can match.
-@samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
-and @samp{d}s (including the empty string), from which it follows that
+Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
-@samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
+@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
-@samp{caddaar}, etc.@refill
+(including the empty string), from which it follows that @samp{c[ad]*r}
+matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
-The usual regular expression special characters are not special inside a
+You can also include character ranges in a character set, by writing the
+startong and ending characters with a @samp{-} between them.  Thus,
+@samp{[a-z]} matches any lower-case ASCII letter.  Ranges may be
+intermixed freely with individual characters, as in @samp{[a-z$%.]},
+which matches any lower case ASCII letter or @samp{$}, @samp{%} or
+period.
+Note that the usual regexp special characters are not special inside a
 character set.  A completely different set of special characters exists
-inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
+inside character sets: @samp{]}, @samp{-} and @samp{^}.
-@samp{-} is used for ranges of characters.  To write a range, write two
+To include a @samp{]} in a character set, you must make it the first
-characters with a @samp{-} between them.  Thus, @samp{[a-z]} matches any
+character.  For example, @samp{[]a]} matches @samp{]} or @samp{a}.  To
-lower case letter.  Ranges may be intermixed freely with individual
+include a @samp{-}, write @samp{-} as the first or last character of the
-characters, as in @samp{[a-z$%.]}, which matches any lower case letter
+set, or put it after a range.  Thus, @samp{[]-]} matches both @samp{]}
-or @samp{$}, @samp{%}, or a period.@refill
+and @samp{-}.
-To include a @samp{]} in a character set, make it the first character.
-For example, @samp{[]a]} matches @samp{]} or @samp{a}.  To include a
-@samp{-}, write @samp{-} as the first character in the set, or put it
-immediately after a range.  (You can replace one individual character
-@var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
-@samp{-}.)  There is no way to write a set containing just @samp{-} and
-@samp{]}.
 To include @samp{^} in a set, put it anywhere but at the beginning of
 the set.
 @item [^ @dots{} ]
 @cindex @samp{^} in regexp
-@samp{[^} begins a @dfn{complement character set}, which matches any
+@samp{[^} begins a @dfn{complemented character set}, which matches any
-character except the ones specified.  Thus, @samp{[^a-z0-9A-Z]}
+character except the ones specified.  Thus, @samp{[^a-z0-9A-Z]} matches
-matches all characters @emph{except} letters and digits.@refill
+all characters @emph{except} letters and digits.
 @samp{^} is not special in a character set unless it is the first
 character.  The character following the @samp{^} is treated as if it
-were first (thus, @samp{-} and @samp{]} are not special there).
+were first (in other words, @samp{-} and @samp{]} are not special there).
-Note that a complement character set can match a newline, unless
+A complemented character set can match a newline, unless newline is
-newline is mentioned as one of the characters not to match.
+mentioned as one of the characters not to match.  This is in contrast to
+the handling of regexps in programs such as @code{grep}.
 @item ^
 @cindex @samp{^} in regexp
@@ -339,10 +338,10 @@ can act.  It is poor practice to depend on this behavior; quote the
 special character anyway, regardless of where it appears.@refill
 For the most part, @samp{\} followed by any character matches only
-that character.  However, there are several exceptions: characters
+that character.  However, there are several exceptions: two-character
-that, when preceded by @samp{\}, are special constructs.  Such
+sequences starting with @samp{\} which have special meanings.  The
-characters are always ordinary when encountered on their own.  Here
+second character in the sequence is always an ordinary character on
-is a table of @samp{\} constructs:
+their own.  Here is a table of @samp{\} constructs.
 @table @kbd
 @item \|
@@ -375,9 +374,10 @@ the regular expression @samp{\(foo\|bar\)x} matches either @samp{foox}
 or @samp{barx}.
 @item
-To enclose an expression for a suffix operator such as @samp{*} to act
+To enclose a complicated expression for the postfix operators @samp{*},
-on.  Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
+@samp{+} and @samp{?} to operate on.  Thus, @samp{ba\(na\)*} matches
-(zero or more) number of @samp{na} strings.@refill
+@samp{bananana}, etc., with any (zero or more) number of @samp{na}
+strings.@refill
 @item
 To record a matched substring for future reference.
@@ -393,7 +393,7 @@ Here is an explanation of this feature:
 matches the same text that matched the @var{digit}th occurrence of a
 @samp{\( @dots{} \)} construct.
-In other words, after the end of a @samp{\( @dots{} \)} construct.  the
+In other words, after the end of a @samp{\( @dots{} \)} construct, the
 matcher remembers the beginning and end of the text matched by that
 construct.  Then, later on in the regular expression, you can use
 @samp{\} followed by @var{digit} to match that same text, whatever it
@@ -424,8 +424,9 @@ matches any character that is not a word constituent.
 matches any character whose syntax is @var{code}.  Here @var{code} is a
 character that represents a syntax code: thus, @samp{w} for word
 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
-etc.  @xref{Syntax Tables}, for a list of syntax codes and the
+etc.  Represent a character of whitespace (which can be a newline) by
-characters that stand for them.
+either @samp{-} or a space character.  @xref{Syntax Tables}, for a list
+of syntax codes and the characters that stand for them.
 @item \S@var{code}
 @cindex @samp{\S} in regexp
@@ -459,6 +460,9 @@ end of a word.  Thus, @samp{\bfoo\b} matches any occurrence of
 @samp{foo} as a separate word.  @samp{\bballs?\b} matches
 @samp{ball} or @samp{balls} as a separate word.@refill
+@samp{\b} matches at the beginning or end of the buffer
+regardless of what text appears next to it.
 @item \B
 @cindex @samp{\B} in regexp
 matches the empty string, but @emph{not} at the beginning or
@@ -467,10 +471,14 @@ end of a word.
 @item \<
 @cindex @samp{\<} in regexp
 matches the empty string, but only at the beginning of a word.
+@samp{\<} matches at the beginning of the buffer only if a
+word-constituent character follows.
 @item \>
 @cindex @samp{\>} in regexp
-matches the empty string, but only at the end of a word.
+matches the empty string, but only at the end of a word.  @samp{\>}
+matches at the end of the buffer only if the contents end with a
+word-constituent character.
 @end table
 @kindex invalid-regexp
author	Richard M. Stallman	1997-05-19 06:29:13 +0000
committer	Richard M. Stallman	1997-05-19 06:29:13 +0000
commit	1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd (patch)
tree	94b50ed50e8ac0070a1bd7f8bb45a31ed4edaaaa
parent	d987e6cbf7c9217fbe3b599a0425fd2d227a77b7 (diff)
download	emacs-1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd.tar.gz emacs-1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd.zip

diff --git a/lispref/searching.texi b/lispref/searching.texi index a9e45998926..80c10e94d9a 100644 --- a/lispref/searching.texi +++ b/lispref/searching.texi
@@ -205,15 +205,14 @@ matches any three-character string that begins with @samp{a} and ends with
205		205
206	@item *	206	@item *
207	@cindex @samp{*} in regexp	207	@cindex @samp{*} in regexp
208	is not a construct by itself; it is a suffix operator that means to	208	is not a construct by itself; it is a postfix operator that means to
209	repeat the preceding regular expression as many times as possible. In	209	match the preceding regular expression repetitively as many times as
210	@samp{fo}, the @samp{} applies to the @samp{o}, so @samp{fo*} matches	210	possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
211	one @samp{f} followed by any number of @samp{o}s. The case of zero	211	@samp{o}s).
212	@samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill
213		212
214	@samp{*} always applies to the @emph{smallest} possible preceding	213	@samp{*} always applies to the @emph{smallest} possible preceding
215	expression. Thus, @samp{fo*} has a repeating @samp{o}, not a	214	expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
216	repeating @samp{fo}.@refill	215	@samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
217		216
218	The matcher processes a @samp{*} construct by matching, immediately,	217	The matcher processes a @samp{*} construct by matching, immediately,
219	as many repetitions as can be found. Then it continues with the rest	218	as many repetitions as can be found. Then it continues with the rest
@@ -236,63 +235,63 @@ expressions run fast, check nested repetitions carefully.
236		235
237	@item +	236	@item +
238	@cindex @samp{+} in regexp	237	@cindex @samp{+} in regexp
239	is a suffix operator similar to @samp{*} except that the preceding	238	is a postfix operator, similar to @samp{*} except that it must match
240	expression must match at least once. So, for example, @samp{ca+r}	239	the preceding expression at least once. So, for example, @samp{ca+r}
241	matches the strings @samp{car} and @samp{caaaar} but not the string	240	matches the strings @samp{car} and @samp{caaaar} but not the string
242	@samp{cr}, whereas @samp{ca*r} matches all three strings.	241	@samp{cr}, whereas @samp{ca*r} matches all three strings.
243		242
244	@item ?	243	@item ?
245	@cindex @samp{?} in regexp	244	@cindex @samp{?} in regexp
246	is a suffix operator similar to @samp{*} except that the preceding	245	is a postfix operator, similar to @samp{*} except that it can match the
247	expression can match either once or not at all. For example,	246	preceding expression either once or not at all. For example,
248	@samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing	247	@samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
249	else.
250		248
251	@item [ @dots{} ]	249	@item [ @dots{} ]
252	@cindex character set (in regexp)	250	@cindex character set (in regexp)
253	@cindex @samp{[} in regexp	251	@cindex @samp{[} in regexp
254	@cindex @samp{]} in regexp	252	@cindex @samp{]} in regexp
255	@samp{[} begins a @dfn{character set}, which is terminated by a	253	is a @dfn{character set}, which begins with @samp{[} and is terminated
256	@samp{]}. In the simplest case, the characters between the two brackets	254	by @samp{]}. In the simplest case, the characters between the two
257	form the set. Thus, @samp{[ad]} matches either one @samp{a} or one	255	brackets are what this set can match.
258	@samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s	256
259	and @samp{d}s (including the empty string), from which it follows that	257	Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
260	@samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},	258	@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
261	@samp{caddaar}, etc.@refill	259	(including the empty string), from which it follows that @samp{c[ad]*r}
262		260	matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
263	The usual regular expression special characters are not special inside a	261
		262	You can also include character ranges in a character set, by writing the
		263	startong and ending characters with a @samp{-} between them. Thus,
		264	@samp{[a-z]} matches any lower-case ASCII letter. Ranges may be
		265	intermixed freely with individual characters, as in @samp{[a-z$%.]},
		266	which matches any lower case ASCII letter or @samp{$}, @samp{%} or
		267	period.
		268
		269	Note that the usual regexp special characters are not special inside a
264	character set. A completely different set of special characters exists	270	character set. A completely different set of special characters exists
265	inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill	271	inside character sets: @samp{]}, @samp{-} and @samp{^}.
266		272
267	@samp{-} is used for ranges of characters. To write a range, write two	273	To include a @samp{]} in a character set, you must make it the first
268	characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any	274	character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To
269	lower case letter. Ranges may be intermixed freely with individual	275	include a @samp{-}, write @samp{-} as the first or last character of the
270	characters, as in @samp{[a-z$%.]}, which matches any lower case letter	276	set, or put it after a range. Thus, @samp{[]-]} matches both @samp{]}
271	or @samp{$}, @samp{%}, or a period.@refill	277	and @samp{-}.
272
273	To include a @samp{]} in a character set, make it the first character.
274	For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a
275	@samp{-}, write @samp{-} as the first character in the set, or put it
276	immediately after a range. (You can replace one individual character
277	@var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
278	@samp{-}.) There is no way to write a set containing just @samp{-} and
279	@samp{]}.
280		278
281	To include @samp{^} in a set, put it anywhere but at the beginning of	279	To include @samp{^} in a set, put it anywhere but at the beginning of
282	the set.	280	the set.
283		281
284	@item [^ @dots{} ]	282	@item [^ @dots{} ]
285	@cindex @samp{^} in regexp	283	@cindex @samp{^} in regexp
286	@samp{[^} begins a @dfn{complement character set}, which matches any	284	@samp{[^} begins a @dfn{complemented character set}, which matches any
287	character except the ones specified. Thus, @samp{[^a-z0-9A-Z]}	285	character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches
288	matches all characters @emph{except} letters and digits.@refill	286	all characters @emph{except} letters and digits.
289		287
290	@samp{^} is not special in a character set unless it is the first	288	@samp{^} is not special in a character set unless it is the first
291	character. The character following the @samp{^} is treated as if it	289	character. The character following the @samp{^} is treated as if it
292	were first (thus, @samp{-} and @samp{]} are not special there).	290	were first (in other words, @samp{-} and @samp{]} are not special there).
293		291
294	Note that a complement character set can match a newline, unless	292	A complemented character set can match a newline, unless newline is
295	newline is mentioned as one of the characters not to match.	293	mentioned as one of the characters not to match. This is in contrast to
		294	the handling of regexps in programs such as @code{grep}.
296		295
297	@item ^	296	@item ^
298	@cindex @samp{^} in regexp	297	@cindex @samp{^} in regexp
@@ -339,10 +338,10 @@ can act. It is poor practice to depend on this behavior; quote the
339	special character anyway, regardless of where it appears.@refill	338	special character anyway, regardless of where it appears.@refill
340		339
341	For the most part, @samp{\} followed by any character matches only	340	For the most part, @samp{\} followed by any character matches only
342	that character. However, there are several exceptions: characters	341	that character. However, there are several exceptions: two-character
343	that, when preceded by @samp{\}, are special constructs. Such	342	sequences starting with @samp{\} which have special meanings. The
344	characters are always ordinary when encountered on their own. Here	343	second character in the sequence is always an ordinary character on
345	is a table of @samp{\} constructs:	344	their own. Here is a table of @samp{\} constructs.
346		345
347	@table @kbd	346	@table @kbd
348	@item \\|	347	@item \\|
@@ -375,9 +374,10 @@ the regular expression @samp{\(foo\\|bar\)x} matches either @samp{foox}
375	or @samp{barx}.	374	or @samp{barx}.
376		375
377	@item	376	@item
378	To enclose an expression for a suffix operator such as @samp{*} to act	377	To enclose a complicated expression for the postfix operators @samp{*},
379	on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any	378	@samp{+} and @samp{?} to operate on. Thus, @samp{ba\(na\)*} matches
380	(zero or more) number of @samp{na} strings.@refill	379	@samp{bananana}, etc., with any (zero or more) number of @samp{na}
		380	strings.@refill
381		381
382	@item	382	@item
383	To record a matched substring for future reference.	383	To record a matched substring for future reference.
@@ -393,7 +393,7 @@ Here is an explanation of this feature:
393	matches the same text that matched the @var{digit}th occurrence of a	393	matches the same text that matched the @var{digit}th occurrence of a
394	@samp{\( @dots{} \)} construct.	394	@samp{\( @dots{} \)} construct.
395		395
396	In other words, after the end of a @samp{\( @dots{} \)} construct. the	396	In other words, after the end of a @samp{\( @dots{} \)} construct, the
397	matcher remembers the beginning and end of the text matched by that	397	matcher remembers the beginning and end of the text matched by that
398	construct. Then, later on in the regular expression, you can use	398	construct. Then, later on in the regular expression, you can use
399	@samp{\} followed by @var{digit} to match that same text, whatever it	399	@samp{\} followed by @var{digit} to match that same text, whatever it
@@ -424,8 +424,9 @@ matches any character that is not a word constituent.
424	matches any character whose syntax is @var{code}. Here @var{code} is a	424	matches any character whose syntax is @var{code}. Here @var{code} is a
425	character that represents a syntax code: thus, @samp{w} for word	425	character that represents a syntax code: thus, @samp{w} for word
426	constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,	426	constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
427	etc. @xref{Syntax Tables}, for a list of syntax codes and the	427	etc. Represent a character of whitespace (which can be a newline) by
428	characters that stand for them.	428	either @samp{-} or a space character. @xref{Syntax Tables}, for a list
		429	of syntax codes and the characters that stand for them.
429		430
430	@item \S@var{code}	431	@item \S@var{code}
431	@cindex @samp{\S} in regexp	432	@cindex @samp{\S} in regexp
@@ -459,6 +460,9 @@ end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
459	@samp{foo} as a separate word. @samp{\bballs?\b} matches	460	@samp{foo} as a separate word. @samp{\bballs?\b} matches
460	@samp{ball} or @samp{balls} as a separate word.@refill	461	@samp{ball} or @samp{balls} as a separate word.@refill
461		462
		463	@samp{\b} matches at the beginning or end of the buffer
		464	regardless of what text appears next to it.
		465
462	@item \B	466	@item \B
463	@cindex @samp{\B} in regexp	467	@cindex @samp{\B} in regexp
464	matches the empty string, but @emph{not} at the beginning or	468	matches the empty string, but @emph{not} at the beginning or
@@ -467,10 +471,14 @@ end of a word.
467	@item \<	471	@item \<
468	@cindex @samp{\<} in regexp	472	@cindex @samp{\<} in regexp
469	matches the empty string, but only at the beginning of a word.	473	matches the empty string, but only at the beginning of a word.
		474	@samp{\<} matches at the beginning of the buffer only if a
		475	word-constituent character follows.
470		476
471	@item \>	477	@item \>
472	@cindex @samp{\>} in regexp	478	@cindex @samp{\>} in regexp
473	matches the empty string, but only at the end of a word.	479	matches the empty string, but only at the end of a word. @samp{\>}
		480	matches at the end of the buffer only if the contents end with a
		481	word-constituent character.
474	@end table	482	@end table
475		483
476	@kindex invalid-regexp	484	@kindex invalid-regexp