;Improve documentation of locale-specific string comparison

* doc/lispref/strings.texi (Text Comparison): Mention the Unicode collation rules and buffer-local case-tables.
author: Eli Zaretskii 2022-07-21 09:53:45 +0300
committer: Eli Zaretskii 2022-07-21 09:54:46 +0300
commit: 2b31e667be95731d7e9ee328c8331eecf69b3831 (patch)
tree: 1a61d5dabb96876c0bc025b17f8c29e4efd70308
parent: ea44d7ddfc9fe07fbdffd8e02db2ef6bab1f8b5c (diff)
download: emacs-2b31e667be95731d7e9ee328c8331eecf69b3831.tar.gz
emacs-2b31e667be95731d7e9ee328c8331eecf69b3831.zip
1 files changed, 15 insertions, 6 deletions
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
index c9612e598a3..89120575f52 100644
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -564,11 +564,19 @@ equal with respect to collation rules.  A collation rule is not only
 determined by the lexicographic order of the characters contained in
 @var{string1} and @var{string2}, but also further rules about
 relations between these characters.  Usually, it is defined by the
-@var{locale} environment Emacs is running with.
+@var{locale} environment Emacs is running with and by the Standard C
+library against which Emacs was linked@footnote{
-For example, characters with different coding points but
+For more information about collation rules and their locale
-the same meaning might be considered as equal, like different grave
+dependencies, see @uref{https://unicode.org/reports/tr10/, The Unicode
-accent Unicode characters:
+Collation Algorithm}.  Some Standard C libraries, such as the
+@acronym{GNU} C Library (a.k.a.@: @dfn{glibc}) implement large
+portions of the Unicode Collation Algorithm and use the associated
+locale data, Common Locale Data Repository, or @acronym{CLDR}.
+}.
+For example, characters with different code points but the same
+meaning, like different grave accent Unicode characters, might, in
+some locales, be considered as equal:
 @example
 @group
@@ -756,7 +764,8 @@ The strings are compared by the numeric values of their characters.
 For instance, @var{str1} is considered less than @var{str2} if
 its first differing character has a smaller numeric value.  If
 @var{ignore-case} is non-@code{nil}, characters are converted to
-upper-case before comparing them.  Unibyte strings are converted to
+upper-case, using the current buffer's case-table (@pxref{Case
+Tables}), before comparing them.  Unibyte strings are converted to
 multibyte for comparison (@pxref{Text Representations}), so that a
 unibyte string and its conversion to multibyte are always regarded as
 equal.
author	Eli Zaretskii	2022-07-21 09:53:45 +0300
committer	Eli Zaretskii	2022-07-21 09:54:46 +0300
commit	2b31e667be95731d7e9ee328c8331eecf69b3831 (patch)
tree	1a61d5dabb96876c0bc025b17f8c29e4efd70308
parent	ea44d7ddfc9fe07fbdffd8e02db2ef6bab1f8b5c (diff)
download	emacs-2b31e667be95731d7e9ee328c8331eecf69b3831.tar.gz emacs-2b31e667be95731d7e9ee328c8331eecf69b3831.zip

diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index c9612e598a3..89120575f52 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi
@@ -564,11 +564,19 @@ equal with respect to collation rules. A collation rule is not only
564	determined by the lexicographic order of the characters contained in	564	determined by the lexicographic order of the characters contained in
565	@var{string1} and @var{string2}, but also further rules about	565	@var{string1} and @var{string2}, but also further rules about
566	relations between these characters. Usually, it is defined by the	566	relations between these characters. Usually, it is defined by the
567	@var{locale} environment Emacs is running with.	567	@var{locale} environment Emacs is running with and by the Standard C
568		568	library against which Emacs was linked@footnote{
569	For example, characters with different coding points but	569	For more information about collation rules and their locale
570	the same meaning might be considered as equal, like different grave	570	dependencies, see @uref{https://unicode.org/reports/tr10/, The Unicode
571	accent Unicode characters:	571	Collation Algorithm}. Some Standard C libraries, such as the
		572	@acronym{GNU} C Library (a.k.a.@: @dfn{glibc}) implement large
		573	portions of the Unicode Collation Algorithm and use the associated
		574	locale data, Common Locale Data Repository, or @acronym{CLDR}.
		575	}.
		576
		577	For example, characters with different code points but the same
		578	meaning, like different grave accent Unicode characters, might, in
		579	some locales, be considered as equal:
572		580
573	@example	581	@example
574	@group	582	@group
@@ -756,7 +764,8 @@ The strings are compared by the numeric values of their characters.
756	For instance, @var{str1} is considered less than @var{str2} if	764	For instance, @var{str1} is considered less than @var{str2} if
757	its first differing character has a smaller numeric value. If	765	its first differing character has a smaller numeric value. If
758	@var{ignore-case} is non-@code{nil}, characters are converted to	766	@var{ignore-case} is non-@code{nil}, characters are converted to
759	upper-case before comparing them. Unibyte strings are converted to	767	upper-case, using the current buffer's case-table (@pxref{Case
		768	Tables}), before comparing them. Unibyte strings are converted to
760	multibyte for comparison (@pxref{Text Representations}), so that a	769	multibyte for comparison (@pxref{Text Representations}), so that a
761	unibyte string and its conversion to multibyte are always regarded as	770	unibyte string and its conversion to multibyte are always regarded as
762	equal.	771	equal.