diff options
| author | Eli Zaretskii | 2022-07-21 09:53:45 +0300 |
|---|---|---|
| committer | Eli Zaretskii | 2022-07-21 09:54:46 +0300 |
| commit | 2b31e667be95731d7e9ee328c8331eecf69b3831 (patch) | |
| tree | 1a61d5dabb96876c0bc025b17f8c29e4efd70308 | |
| parent | ea44d7ddfc9fe07fbdffd8e02db2ef6bab1f8b5c (diff) | |
| download | emacs-2b31e667be95731d7e9ee328c8331eecf69b3831.tar.gz emacs-2b31e667be95731d7e9ee328c8331eecf69b3831.zip | |
;Improve documentation of locale-specific string comparison
* doc/lispref/strings.texi (Text Comparison): Mention the Unicode
collation rules and buffer-local case-tables.
| -rw-r--r-- | doc/lispref/strings.texi | 21 |
1 files changed, 15 insertions, 6 deletions
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index c9612e598a3..89120575f52 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi | |||
| @@ -564,11 +564,19 @@ equal with respect to collation rules. A collation rule is not only | |||
| 564 | determined by the lexicographic order of the characters contained in | 564 | determined by the lexicographic order of the characters contained in |
| 565 | @var{string1} and @var{string2}, but also further rules about | 565 | @var{string1} and @var{string2}, but also further rules about |
| 566 | relations between these characters. Usually, it is defined by the | 566 | relations between these characters. Usually, it is defined by the |
| 567 | @var{locale} environment Emacs is running with. | 567 | @var{locale} environment Emacs is running with and by the Standard C |
| 568 | 568 | library against which Emacs was linked@footnote{ | |
| 569 | For example, characters with different coding points but | 569 | For more information about collation rules and their locale |
| 570 | the same meaning might be considered as equal, like different grave | 570 | dependencies, see @uref{https://unicode.org/reports/tr10/, The Unicode |
| 571 | accent Unicode characters: | 571 | Collation Algorithm}. Some Standard C libraries, such as the |
| 572 | @acronym{GNU} C Library (a.k.a.@: @dfn{glibc}) implement large | ||
| 573 | portions of the Unicode Collation Algorithm and use the associated | ||
| 574 | locale data, Common Locale Data Repository, or @acronym{CLDR}. | ||
| 575 | }. | ||
| 576 | |||
| 577 | For example, characters with different code points but the same | ||
| 578 | meaning, like different grave accent Unicode characters, might, in | ||
| 579 | some locales, be considered as equal: | ||
| 572 | 580 | ||
| 573 | @example | 581 | @example |
| 574 | @group | 582 | @group |
| @@ -756,7 +764,8 @@ The strings are compared by the numeric values of their characters. | |||
| 756 | For instance, @var{str1} is considered less than @var{str2} if | 764 | For instance, @var{str1} is considered less than @var{str2} if |
| 757 | its first differing character has a smaller numeric value. If | 765 | its first differing character has a smaller numeric value. If |
| 758 | @var{ignore-case} is non-@code{nil}, characters are converted to | 766 | @var{ignore-case} is non-@code{nil}, characters are converted to |
| 759 | upper-case before comparing them. Unibyte strings are converted to | 767 | upper-case, using the current buffer's case-table (@pxref{Case |
| 768 | Tables}), before comparing them. Unibyte strings are converted to | ||
| 760 | multibyte for comparison (@pxref{Text Representations}), so that a | 769 | multibyte for comparison (@pxref{Text Representations}), so that a |
| 761 | unibyte string and its conversion to multibyte are always regarded as | 770 | unibyte string and its conversion to multibyte are always regarded as |
| 762 | equal. | 771 | equal. |