diff options
| author | Richard M. Stallman | 1997-05-19 06:29:13 +0000 |
|---|---|---|
| committer | Richard M. Stallman | 1997-05-19 06:29:13 +0000 |
| commit | 1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd (patch) | |
| tree | 94b50ed50e8ac0070a1bd7f8bb45a31ed4edaaaa | |
| parent | d987e6cbf7c9217fbe3b599a0425fd2d227a77b7 (diff) | |
| download | emacs-1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd.tar.gz emacs-1cd71ce02b2e1740e4b7cc2a7afdd2f8f526a7cd.zip | |
Update regexp syntax from Emacs manual.
| -rw-r--r-- | lispref/searching.texi | 116 |
1 files changed, 62 insertions, 54 deletions
diff --git a/lispref/searching.texi b/lispref/searching.texi index a9e45998926..80c10e94d9a 100644 --- a/lispref/searching.texi +++ b/lispref/searching.texi | |||
| @@ -205,15 +205,14 @@ matches any three-character string that begins with @samp{a} and ends with | |||
| 205 | 205 | ||
| 206 | @item * | 206 | @item * |
| 207 | @cindex @samp{*} in regexp | 207 | @cindex @samp{*} in regexp |
| 208 | is not a construct by itself; it is a suffix operator that means to | 208 | is not a construct by itself; it is a postfix operator that means to |
| 209 | repeat the preceding regular expression as many times as possible. In | 209 | match the preceding regular expression repetitively as many times as |
| 210 | @samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches | 210 | possible. Thus, @samp{o*} matches any number of @samp{o}s (including no |
| 211 | one @samp{f} followed by any number of @samp{o}s. The case of zero | 211 | @samp{o}s). |
| 212 | @samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill | ||
| 213 | 212 | ||
| 214 | @samp{*} always applies to the @emph{smallest} possible preceding | 213 | @samp{*} always applies to the @emph{smallest} possible preceding |
| 215 | expression. Thus, @samp{fo*} has a repeating @samp{o}, not a | 214 | expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating |
| 216 | repeating @samp{fo}.@refill | 215 | @samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on. |
| 217 | 216 | ||
| 218 | The matcher processes a @samp{*} construct by matching, immediately, | 217 | The matcher processes a @samp{*} construct by matching, immediately, |
| 219 | as many repetitions as can be found. Then it continues with the rest | 218 | as many repetitions as can be found. Then it continues with the rest |
| @@ -236,63 +235,63 @@ expressions run fast, check nested repetitions carefully. | |||
| 236 | 235 | ||
| 237 | @item + | 236 | @item + |
| 238 | @cindex @samp{+} in regexp | 237 | @cindex @samp{+} in regexp |
| 239 | is a suffix operator similar to @samp{*} except that the preceding | 238 | is a postfix operator, similar to @samp{*} except that it must match |
| 240 | expression must match at least once. So, for example, @samp{ca+r} | 239 | the preceding expression at least once. So, for example, @samp{ca+r} |
| 241 | matches the strings @samp{car} and @samp{caaaar} but not the string | 240 | matches the strings @samp{car} and @samp{caaaar} but not the string |
| 242 | @samp{cr}, whereas @samp{ca*r} matches all three strings. | 241 | @samp{cr}, whereas @samp{ca*r} matches all three strings. |
| 243 | 242 | ||
| 244 | @item ? | 243 | @item ? |
| 245 | @cindex @samp{?} in regexp | 244 | @cindex @samp{?} in regexp |
| 246 | is a suffix operator similar to @samp{*} except that the preceding | 245 | is a postfix operator, similar to @samp{*} except that it can match the |
| 247 | expression can match either once or not at all. For example, | 246 | preceding expression either once or not at all. For example, |
| 248 | @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anyhing | 247 | @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else. |
| 249 | else. | ||
| 250 | 248 | ||
| 251 | @item [ @dots{} ] | 249 | @item [ @dots{} ] |
| 252 | @cindex character set (in regexp) | 250 | @cindex character set (in regexp) |
| 253 | @cindex @samp{[} in regexp | 251 | @cindex @samp{[} in regexp |
| 254 | @cindex @samp{]} in regexp | 252 | @cindex @samp{]} in regexp |
| 255 | @samp{[} begins a @dfn{character set}, which is terminated by a | 253 | is a @dfn{character set}, which begins with @samp{[} and is terminated |
| 256 | @samp{]}. In the simplest case, the characters between the two brackets | 254 | by @samp{]}. In the simplest case, the characters between the two |
| 257 | form the set. Thus, @samp{[ad]} matches either one @samp{a} or one | 255 | brackets are what this set can match. |
| 258 | @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s | 256 | |
| 259 | and @samp{d}s (including the empty string), from which it follows that | 257 | Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and |
| 260 | @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr}, | 258 | @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s |
| 261 | @samp{caddaar}, etc.@refill | 259 | (including the empty string), from which it follows that @samp{c[ad]*r} |
| 262 | 260 | matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc. | |
| 263 | The usual regular expression special characters are not special inside a | 261 | |
| 262 | You can also include character ranges in a character set, by writing the | ||
| 263 | startong and ending characters with a @samp{-} between them. Thus, | ||
| 264 | @samp{[a-z]} matches any lower-case ASCII letter. Ranges may be | ||
| 265 | intermixed freely with individual characters, as in @samp{[a-z$%.]}, | ||
| 266 | which matches any lower case ASCII letter or @samp{$}, @samp{%} or | ||
| 267 | period. | ||
| 268 | |||
| 269 | Note that the usual regexp special characters are not special inside a | ||
| 264 | character set. A completely different set of special characters exists | 270 | character set. A completely different set of special characters exists |
| 265 | inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill | 271 | inside character sets: @samp{]}, @samp{-} and @samp{^}. |
| 266 | 272 | ||
| 267 | @samp{-} is used for ranges of characters. To write a range, write two | 273 | To include a @samp{]} in a character set, you must make it the first |
| 268 | characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any | 274 | character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To |
| 269 | lower case letter. Ranges may be intermixed freely with individual | 275 | include a @samp{-}, write @samp{-} as the first or last character of the |
| 270 | characters, as in @samp{[a-z$%.]}, which matches any lower case letter | 276 | set, or put it after a range. Thus, @samp{[]-]} matches both @samp{]} |
| 271 | or @samp{$}, @samp{%}, or a period.@refill | 277 | and @samp{-}. |
| 272 | |||
| 273 | To include a @samp{]} in a character set, make it the first character. | ||
| 274 | For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a | ||
| 275 | @samp{-}, write @samp{-} as the first character in the set, or put it | ||
| 276 | immediately after a range. (You can replace one individual character | ||
| 277 | @var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the | ||
| 278 | @samp{-}.) There is no way to write a set containing just @samp{-} and | ||
| 279 | @samp{]}. | ||
| 280 | 278 | ||
| 281 | To include @samp{^} in a set, put it anywhere but at the beginning of | 279 | To include @samp{^} in a set, put it anywhere but at the beginning of |
| 282 | the set. | 280 | the set. |
| 283 | 281 | ||
| 284 | @item [^ @dots{} ] | 282 | @item [^ @dots{} ] |
| 285 | @cindex @samp{^} in regexp | 283 | @cindex @samp{^} in regexp |
| 286 | @samp{[^} begins a @dfn{complement character set}, which matches any | 284 | @samp{[^} begins a @dfn{complemented character set}, which matches any |
| 287 | character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} | 285 | character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches |
| 288 | matches all characters @emph{except} letters and digits.@refill | 286 | all characters @emph{except} letters and digits. |
| 289 | 287 | ||
| 290 | @samp{^} is not special in a character set unless it is the first | 288 | @samp{^} is not special in a character set unless it is the first |
| 291 | character. The character following the @samp{^} is treated as if it | 289 | character. The character following the @samp{^} is treated as if it |
| 292 | were first (thus, @samp{-} and @samp{]} are not special there). | 290 | were first (in other words, @samp{-} and @samp{]} are not special there). |
| 293 | 291 | ||
| 294 | Note that a complement character set can match a newline, unless | 292 | A complemented character set can match a newline, unless newline is |
| 295 | newline is mentioned as one of the characters not to match. | 293 | mentioned as one of the characters not to match. This is in contrast to |
| 294 | the handling of regexps in programs such as @code{grep}. | ||
| 296 | 295 | ||
| 297 | @item ^ | 296 | @item ^ |
| 298 | @cindex @samp{^} in regexp | 297 | @cindex @samp{^} in regexp |
| @@ -339,10 +338,10 @@ can act. It is poor practice to depend on this behavior; quote the | |||
| 339 | special character anyway, regardless of where it appears.@refill | 338 | special character anyway, regardless of where it appears.@refill |
| 340 | 339 | ||
| 341 | For the most part, @samp{\} followed by any character matches only | 340 | For the most part, @samp{\} followed by any character matches only |
| 342 | that character. However, there are several exceptions: characters | 341 | that character. However, there are several exceptions: two-character |
| 343 | that, when preceded by @samp{\}, are special constructs. Such | 342 | sequences starting with @samp{\} which have special meanings. The |
| 344 | characters are always ordinary when encountered on their own. Here | 343 | second character in the sequence is always an ordinary character on |
| 345 | is a table of @samp{\} constructs: | 344 | their own. Here is a table of @samp{\} constructs. |
| 346 | 345 | ||
| 347 | @table @kbd | 346 | @table @kbd |
| 348 | @item \| | 347 | @item \| |
| @@ -375,9 +374,10 @@ the regular expression @samp{\(foo\|bar\)x} matches either @samp{foox} | |||
| 375 | or @samp{barx}. | 374 | or @samp{barx}. |
| 376 | 375 | ||
| 377 | @item | 376 | @item |
| 378 | To enclose an expression for a suffix operator such as @samp{*} to act | 377 | To enclose a complicated expression for the postfix operators @samp{*}, |
| 379 | on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any | 378 | @samp{+} and @samp{?} to operate on. Thus, @samp{ba\(na\)*} matches |
| 380 | (zero or more) number of @samp{na} strings.@refill | 379 | @samp{bananana}, etc., with any (zero or more) number of @samp{na} |
| 380 | strings.@refill | ||
| 381 | 381 | ||
| 382 | @item | 382 | @item |
| 383 | To record a matched substring for future reference. | 383 | To record a matched substring for future reference. |
| @@ -393,7 +393,7 @@ Here is an explanation of this feature: | |||
| 393 | matches the same text that matched the @var{digit}th occurrence of a | 393 | matches the same text that matched the @var{digit}th occurrence of a |
| 394 | @samp{\( @dots{} \)} construct. | 394 | @samp{\( @dots{} \)} construct. |
| 395 | 395 | ||
| 396 | In other words, after the end of a @samp{\( @dots{} \)} construct. the | 396 | In other words, after the end of a @samp{\( @dots{} \)} construct, the |
| 397 | matcher remembers the beginning and end of the text matched by that | 397 | matcher remembers the beginning and end of the text matched by that |
| 398 | construct. Then, later on in the regular expression, you can use | 398 | construct. Then, later on in the regular expression, you can use |
| 399 | @samp{\} followed by @var{digit} to match that same text, whatever it | 399 | @samp{\} followed by @var{digit} to match that same text, whatever it |
| @@ -424,8 +424,9 @@ matches any character that is not a word constituent. | |||
| 424 | matches any character whose syntax is @var{code}. Here @var{code} is a | 424 | matches any character whose syntax is @var{code}. Here @var{code} is a |
| 425 | character that represents a syntax code: thus, @samp{w} for word | 425 | character that represents a syntax code: thus, @samp{w} for word |
| 426 | constituent, @samp{-} for whitespace, @samp{(} for open parenthesis, | 426 | constituent, @samp{-} for whitespace, @samp{(} for open parenthesis, |
| 427 | etc. @xref{Syntax Tables}, for a list of syntax codes and the | 427 | etc. Represent a character of whitespace (which can be a newline) by |
| 428 | characters that stand for them. | 428 | either @samp{-} or a space character. @xref{Syntax Tables}, for a list |
| 429 | of syntax codes and the characters that stand for them. | ||
| 429 | 430 | ||
| 430 | @item \S@var{code} | 431 | @item \S@var{code} |
| 431 | @cindex @samp{\S} in regexp | 432 | @cindex @samp{\S} in regexp |
| @@ -459,6 +460,9 @@ end of a word. Thus, @samp{\bfoo\b} matches any occurrence of | |||
| 459 | @samp{foo} as a separate word. @samp{\bballs?\b} matches | 460 | @samp{foo} as a separate word. @samp{\bballs?\b} matches |
| 460 | @samp{ball} or @samp{balls} as a separate word.@refill | 461 | @samp{ball} or @samp{balls} as a separate word.@refill |
| 461 | 462 | ||
| 463 | @samp{\b} matches at the beginning or end of the buffer | ||
| 464 | regardless of what text appears next to it. | ||
| 465 | |||
| 462 | @item \B | 466 | @item \B |
| 463 | @cindex @samp{\B} in regexp | 467 | @cindex @samp{\B} in regexp |
| 464 | matches the empty string, but @emph{not} at the beginning or | 468 | matches the empty string, but @emph{not} at the beginning or |
| @@ -467,10 +471,14 @@ end of a word. | |||
| 467 | @item \< | 471 | @item \< |
| 468 | @cindex @samp{\<} in regexp | 472 | @cindex @samp{\<} in regexp |
| 469 | matches the empty string, but only at the beginning of a word. | 473 | matches the empty string, but only at the beginning of a word. |
| 474 | @samp{\<} matches at the beginning of the buffer only if a | ||
| 475 | word-constituent character follows. | ||
| 470 | 476 | ||
| 471 | @item \> | 477 | @item \> |
| 472 | @cindex @samp{\>} in regexp | 478 | @cindex @samp{\>} in regexp |
| 473 | matches the empty string, but only at the end of a word. | 479 | matches the empty string, but only at the end of a word. @samp{\>} |
| 480 | matches at the end of the buffer only if the contents end with a | ||
| 481 | word-constituent character. | ||
| 474 | @end table | 482 | @end table |
| 475 | 483 | ||
| 476 | @kindex invalid-regexp | 484 | @kindex invalid-regexp |