aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPaul Eggert2019-04-01 23:43:57 -0700
committerPaul Eggert2019-04-01 23:43:57 -0700
commitf81ec28f4fc122658e59c0ec99ca4d92a1fe439f (patch)
treed5864857bbb8dcf48481673a74dc50c8ebddfa94
parentf5d34496123ce6df53d50082159280da54f052c4 (diff)
parent0924b27bca40d219e34529144ea04a581428f1f7 (diff)
downloademacs-f81ec28f4fc122658e59c0ec99ca4d92a1fe439f.tar.gz
emacs-f81ec28f4fc122658e59c0ec99ca4d92a1fe439f.zip
Merge from origin/emacs-26
0924b27bca Say which regexp ranges should be avoided # Conflicts: # doc/lispref/searching.texi
-rw-r--r--doc/lispref/searching.texi52
1 files changed, 34 insertions, 18 deletions
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index e3f31fdf836..748ab586af9 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -391,18 +391,11 @@ writing the starting and ending characters with a @samp{-} between them.
391Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter. 391Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
392Ranges may be intermixed freely with individual characters, as in 392Ranges may be intermixed freely with individual characters, as in
393@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter 393@samp{[a-z$%.]}, which matches any lower case @acronym{ASCII} letter
394or @samp{$}, @samp{%} or period. 394or @samp{$}, @samp{%} or period. However, the ending character of one
395range should not be the starting point of another one; for example,
396@samp{[a-m-z]} should be avoided.
395 397
396If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also 398The usual regexp special characters are not special inside a
397matches upper-case letters. Note that a range like @samp{[a-z]} is
398not affected by the locale's collation sequence, it always represents
399a sequence in @acronym{ASCII} order.
400@c This wasn't obvious to me, since, e.g., the grep manual "Character
401@c Classes and Bracket Expressions" specifically notes the opposite
402@c behavior. But by experiment Emacs seems unaffected by LC_COLLATE
403@c in this regard.
404
405Note also that the usual regexp special characters are not special inside a
406character alternative. A completely different set of characters is 399character alternative. A completely different set of characters is
407special inside character alternatives: @samp{]}, @samp{-} and @samp{^}. 400special inside character alternatives: @samp{]}, @samp{-} and @samp{^}.
408 401
@@ -417,13 +410,34 @@ special there.)
417To include @samp{^} in a character alternative, put it anywhere but at 410To include @samp{^} in a character alternative, put it anywhere but at
418the beginning. 411the beginning.
419 412
420@c What if it starts with a multibyte and ends with a unibyte? 413The following aspects of ranges are specific to Emacs, in that POSIX
421@c That doesn't seem to match anything...? 414allows but does not require this behavior and programs other than
422If a range starts with a unibyte character @var{c} and ends with a 415Emacs may behave differently:
423multibyte character @var{c2}, the range is divided into two parts: one 416
424spans the unibyte characters @samp{@var{c}..?\377}, the other the 417@enumerate
425multibyte characters @samp{@var{c1}..@var{c2}}, where @var{c1} is the 418@item
426first character of the charset to which @var{c2} belongs. 419If @code{case-fold-search} is non-@code{nil}, @samp{[a-z]} also
420matches upper-case letters.
421
422@item
423A range is not affected by the locale's collation sequence: it always
424represents the set of characters with codepoints ranging between those
425of its bounds, so that @samp{[a-z]} matches only ASCII letters, even
426outside the C or POSIX locale.
427
428@item
429As a special case, if either bound of a range is a raw 8-bit byte, the
430other bound should be a unibyte character, and the range matches only
431unibyte characters.
432
433@item
434If the lower bound of a range is greater than its upper bound, the
435range is empty and represents no characters. Thus, @samp{[b-a]}
436always fails to match, and @samp{[^b-a]} matches any character,
437including newline. However, the lower bound should be at most one
438greater than the upper bound; for example, @samp{[c-a]} should be
439avoided.
440@end enumerate
427 441
428A character alternative can also specify named character classes 442A character alternative can also specify named character classes
429(@pxref{Char Classes}). This is a POSIX feature. For example, 443(@pxref{Char Classes}). This is a POSIX feature. For example,
@@ -431,6 +445,8 @@ A character alternative can also specify named character classes
431Using a character class is equivalent to mentioning each of the 445Using a character class is equivalent to mentioning each of the
432characters in that class; but the latter is not feasible in practice, 446characters in that class; but the latter is not feasible in practice,
433since some classes include thousands of different characters. 447since some classes include thousands of different characters.
448A character class should not appear as the lower or upper bound
449of a range.
434 450
435@item @samp{[^ @dots{} ]} 451@item @samp{[^ @dots{} ]}
436@cindex @samp{^} in regexp 452@cindex @samp{^} in regexp