diff options
| author | Eli Zaretskii | 2019-03-09 12:41:48 +0200 |
|---|---|---|
| committer | Eli Zaretskii | 2019-03-09 12:41:48 +0200 |
| commit | fddb915d234515af81dce30982a8dd22568b4e84 (patch) | |
| tree | 7fc4d497bd317df930e6f492a6bddbf0ba1e5b96 /admin/notes | |
| parent | 4e082ce3941a9c1fcaae509897761d3e24e08625 (diff) | |
| download | emacs-fddb915d234515af81dce30982a8dd22568b4e84.tar.gz emacs-fddb915d234515af81dce30982a8dd22568b4e84.zip | |
Import Unicode 12.0 data files
* admin/unidata/copyright.html:
* admin/unidata/UnicodeData.txt:
* admin/unidata/SpecialCasing.txt:
* admin/unidata/NormalizationTest.txt:
* admin/unidata/Blocks.txt:
* admin/unidata/BidiMirroring.txt:
* admin/unidata/BidiBrackets.txt: New versions from Unicode 12.0.
* admin/unidata/unidata-gen.el (unidata-gen-file):
* admin/unidata/blocks.awk (name2alias): Adapt to changes in
new data files.
* admin/notes/unicode: Update and improve instructions for
importing a new Unicode Standard.
* lisp/international/characters.el (char-width-table): Update
lists of characters according to Unicode 12.0.
* lisp/international/fontset.el (script-representative-chars):
Add characters from new scripts to 'script-representative-chars'.
(otf-script-alist): Update according to data on the MS site.
* lisp/international/mule-cmds.el (ucs-names): Update unused
ranges of codepoints according to Unicode 12.0.
* test/lisp/international/ucs-normalize-tests.el
(ucs-normalize-tests--failing-lines-part1)
(ucs-normalize-tests--failing-lines-part2): Update for the new
NormalizationTest.txt file.
* test/manual/BidiCharacterTest.txt: Update with the new
version from Unicode 12.0.
Diffstat (limited to 'admin/notes')
| -rw-r--r-- | admin/notes/unicode | 23 |
1 files changed, 18 insertions, 5 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode index bbee3e9de7f..4d6aa6e9a9e 100644 --- a/admin/notes/unicode +++ b/admin/notes/unicode | |||
| @@ -11,15 +11,20 @@ Emacs uses the following files from the Unicode Character Database | |||
| 11 | 11 | ||
| 12 | . UnicodeData.txt | 12 | . UnicodeData.txt |
| 13 | . Blocks.txt | 13 | . Blocks.txt |
| 14 | . BidiMirroring.txt | ||
| 15 | . BidiBrackets.txt | 14 | . BidiBrackets.txt |
| 15 | . BidiCharacterTest.txt | ||
| 16 | . BidiMirroring.txt | ||
| 16 | . IVD_Sequences.txt | 17 | . IVD_Sequences.txt |
| 17 | . NormalizationTest.txt | 18 | . NormalizationTest.txt |
| 18 | . SpecialCasing.txt | 19 | . SpecialCasing.txt |
| 19 | . BidiCharacterTest.txt | ||
| 20 | 20 | ||
| 21 | First, the first 7 files need to be copied into admin/unidata/, and | 21 | First, the first 7 files need to be copied into admin/unidata/, and |
| 22 | then Emacs should be rebuilt for them to take effect. Rebuilding | 22 | the file https://www.unicode.org/copyright.html should be copied over |
| 23 | copyright.html in admin/unidata (that file might need trailing | ||
| 24 | whitespace removed before it can be committed to the Emacs | ||
| 25 | repository). | ||
| 26 | |||
| 27 | Then Emacs should be rebuilt for them to take effect. Rebuilding | ||
| 23 | Emacs updates several derived files elsewhere in the Emacs source | 28 | Emacs updates several derived files elsewhere in the Emacs source |
| 24 | tree, mainly in lisp/international/. | 29 | tree, mainly in lisp/international/. |
| 25 | 30 | ||
| @@ -28,7 +33,10 @@ files, pay attention to any warning or error messages. In particular, | |||
| 28 | admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines | 33 | admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines |
| 29 | new bidirectional attributes of characters, because unidata-gen.el, | 34 | new bidirectional attributes of characters, because unidata-gen.el, |
| 30 | bidi.c and dispextern.h need to be updated in that case; failure to do | 35 | bidi.c and dispextern.h need to be updated in that case; failure to do |
| 31 | so will cause aborts in redisplay. | 36 | so will cause aborts in redisplay. unidata-gen.el will also complain |
| 37 | if the format of the Unicode Copyright notice in copyright.html | ||
| 38 | changed in significant ways; in that case, update the regular | ||
| 39 | expression in unidata-gen-file used to extract the copyright string. | ||
| 32 | 40 | ||
| 33 | Next, review the changes in UnicodeData.txt vs the previous version | 41 | Next, review the changes in UnicodeData.txt vs the previous version |
| 34 | used by Emacs. Any changes, be it introduction of new scripts or | 42 | used by Emacs. Any changes, be it introduction of new scripts or |
| @@ -40,7 +48,12 @@ and see if any changes in admin/unidata/blocks.awk are required. | |||
| 40 | 48 | ||
| 41 | The setting of char-width-table around line 1200 of characters.el | 49 | The setting of char-width-table around line 1200 of characters.el |
| 42 | should be checked against the latest version of the Unicode file | 50 | should be checked against the latest version of the Unicode file |
| 43 | EastAsianWidth.txt, and any discrepancies fixed. | 51 | EastAsianWidth.txt, and any discrepancies fixed: double-width |
| 52 | characters are those marked with W or F in that file. Zero-width | ||
| 53 | characters are not taken from EastAsianWidth.txt, they are those whose | ||
| 54 | Unicode General Category property is one of Mn, Me, or Cf, and also | ||
| 55 | Hangul jungseong and jongseong characters (a.k.a. "Jamo medial vowels" | ||
| 56 | and "Jamo final consonants"). | ||
| 44 | 57 | ||
| 45 | Any new scripts added by UnicodeData.txt will also need updates to | 58 | Any new scripts added by UnicodeData.txt will also need updates to |
| 46 | script-representative-chars defined in fontset.el, and also the list | 59 | script-representative-chars defined in fontset.el, and also the list |