aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPaul Eggert2019-04-02 00:17:37 -0700
committerPaul Eggert2019-04-02 00:18:28 -0700
commit076ed98ff6d7debff3929beab048c8a90e48dbb8 (patch)
tree6e971b25a08ce58ca47a7b20676e33f13b1dddd5
parentf81ec28f4fc122658e59c0ec99ca4d92a1fe439f (diff)
downloademacs-076ed98ff6d7debff3929beab048c8a90e48dbb8.tar.gz
emacs-076ed98ff6d7debff3929beab048c8a90e48dbb8.zip
More regexp advice and clarifications
* doc/lispref/searching.texi (Regexp Special): Simplify style advice for order of ], ^, and - in character alternatives. Stick with saying that it’s not a good idea to put ‘-’ after a range. Remove the special case about raw 8-bit bytes and unibyte characters, as this documentation is confusing and seems to be incorrect in some cases. Say that z-a is the preferred style for reversed ranges, since it’s clearer and is typically what’s used in practice. Mention some bad styles: duplicates in character alternatives, ranges that denote <=3 characters, and ‘-’ as the first character.
-rw-r--r--doc/lispref/searching.texi52
1 files changed, 31 insertions, 21 deletions
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index 748ab586af9..72ee9233a3c 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -398,17 +398,11 @@ range should not be the starting point of another one; for example,
398The usual regexp special characters are not special inside a 398The usual regexp special characters are not special inside a
399character alternative. A completely different set of characters is 399character alternative. A completely different set of characters is
400special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. 400special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
401 401To include @samp{]} in a character alternative, put it at the
402To include a @samp{]} in a character alternative, you must make it the first 402beginning. To include @samp{^}, put it anywhere but at the beginning.
403character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include 403To include @samp{-}, put it at the end. Thus, @samp{[]^-]} matches
404a @samp{-}, write @samp{-} as the last character of the character alternative, 404all three of these special characters. You cannot use @samp{\} to
405tho you can also put it first or after a range. Thus, @samp{[]-]} matches both 405escape these three characters, since @samp{\} is not special here.
406@samp{]} and @samp{-}. (As explained below, you cannot use @samp{\]} to
407include a @samp{]} inside a character alternative, since @samp{\} is not
408special there.)
409
410To include @samp{^} in a character alternative, put it anywhere but at
411the beginning.
412 406
413The following aspects of ranges are specific to Emacs, in that POSIX 407The following aspects of ranges are specific to Emacs, in that POSIX
414allows but does not require this behavior and programs other than 408allows but does not require this behavior and programs other than
@@ -426,17 +420,33 @@ of its bounds, so that @samp{[a-z]} matches only ASCII letters, even
426outside the C or POSIX locale. 420outside the C or POSIX locale.
427 421
428@item 422@item
429As a special case, if either bound of a range is a raw 8-bit byte, the 423If the lower bound of a range is greater than its upper bound, the
430other bound should be a unibyte character, and the range matches only 424range is empty and represents no characters. Thus, @samp{[z-a]}
431unibyte characters. 425always fails to match, and @samp{[^z-a]} matches any character,
426including newline. However, a reversed range should always be from
427the letter @samp{z} to the letter @samp{a} to make it clear that it is
428not a typo; for example, @samp{[+-*/]} should be avoided, because it
429matches only @samp{/} rather than the likely-intended four characters.
430@end enumerate
431
432Some kinds of character alternatives are not the best style even
433though they are standardized by POSIX and are portable. They include:
432 434
435@enumerate
433@item 436@item
434If the lower bound of a range is greater than its upper bound, the 437A character alternative can include duplicates. For example,
435range is empty and represents no characters. Thus, @samp{[b-a]} 438@samp{[XYa-yYb-zX]} is less clear than @samp{[XYa-z]}.
436always fails to match, and @samp{[^b-a]} matches any character, 439
437including newline. However, the lower bound should be at most one 440@item
438greater than the upper bound; for example, @samp{[c-a]} should be 441A range can denote just one, two, or three characters. For example,
439avoided. 442@samp{[(-(]} is less clear than @samp{[(]}, @samp{[*-+]} is less clear
443than @samp{[*+]}, and @samp{[*-,]} is less clear than @samp{[*+,]}.
444
445@item
446A @samp{-} also appear at the beginning of a character alternative, or
447as the upper bound of a range. For example, although @samp{[-a-z]} is
448valid, @samp{[a-z-]} is better style; and although @samp{[!--/]} is
449valid, @samp{[!-,/-]} is clearer.
440@end enumerate 450@end enumerate
441 451
442A character alternative can also specify named character classes 452A character alternative can also specify named character classes
@@ -452,7 +462,7 @@ of a range.
452@cindex @samp{^} in regexp 462@cindex @samp{^} in regexp
453@samp{[^} begins a @dfn{complemented character alternative}. This 463@samp{[^} begins a @dfn{complemented character alternative}. This
454matches any character except the ones specified. Thus, 464matches any character except the ones specified. Thus,
455@samp{[^a-z0-9A-Z]} matches all characters @emph{except} letters and 465@samp{[^a-z0-9A-Z]} matches all characters @emph{except} ASCII letters and
456digits. 466digits.
457 467
458@samp{^} is not special in a character alternative unless it is the first 468@samp{^} is not special in a character alternative unless it is the first