aboutsummaryrefslogtreecommitdiffstats
path: root/admin/notes/unicode
diff options
context:
space:
mode:
authorYAMAMOTO Mitsuharu2019-04-27 18:33:39 +0900
committerYAMAMOTO Mitsuharu2019-04-27 18:33:39 +0900
commit886bedb36c7b959b7e6fc8ce8e0c04e144b0ae28 (patch)
treeb5770d9fc10a704ad8aeb3474c6940121252c770 /admin/notes/unicode
parent015a6e1df2772bd43680df5cbeaffccf98a881da (diff)
parent8dc00b2f1e6523c634df3e24379afbe712a32b27 (diff)
downloademacs-886bedb36c7b959b7e6fc8ce8e0c04e144b0ae28.tar.gz
emacs-886bedb36c7b959b7e6fc8ce8e0c04e144b0ae28.zip
Merge branch 'master' into harfbuzz
Diffstat (limited to 'admin/notes/unicode')
-rw-r--r--admin/notes/unicode55
1 files changed, 26 insertions, 29 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode
index 40f93fc216f..4d6aa6e9a9e 100644
--- a/admin/notes/unicode
+++ b/admin/notes/unicode
@@ -1,6 +1,6 @@
1 -*-mode: text; coding: utf-8;-*- 1 -*-mode: text; coding: utf-8;-*-
2 2
3Copyright (C) 2002-2018 Free Software Foundation, Inc. 3Copyright (C) 2002-2019 Free Software Foundation, Inc.
4See the end of the file for license conditions. 4See the end of the file for license conditions.
5 5
6Importing a new Unicode Standard version into Emacs 6Importing a new Unicode Standard version into Emacs
@@ -11,15 +11,20 @@ Emacs uses the following files from the Unicode Character Database
11 11
12 . UnicodeData.txt 12 . UnicodeData.txt
13 . Blocks.txt 13 . Blocks.txt
14 . BidiMirroring.txt
15 . BidiBrackets.txt 14 . BidiBrackets.txt
15 . BidiCharacterTest.txt
16 . BidiMirroring.txt
16 . IVD_Sequences.txt 17 . IVD_Sequences.txt
17 . NormalizationTest.txt 18 . NormalizationTest.txt
18 . SpecialCasing.txt 19 . SpecialCasing.txt
19 . BidiCharacterTest.txt
20 20
21First, the first 7 files need to be copied into admin/unidata/, and 21First, the first 7 files need to be copied into admin/unidata/, and
22then Emacs should be rebuilt for them to take effect. Rebuilding 22the file https://www.unicode.org/copyright.html should be copied over
23copyright.html in admin/unidata (that file might need trailing
24whitespace removed before it can be committed to the Emacs
25repository).
26
27Then Emacs should be rebuilt for them to take effect. Rebuilding
23Emacs updates several derived files elsewhere in the Emacs source 28Emacs updates several derived files elsewhere in the Emacs source
24tree, mainly in lisp/international/. 29tree, mainly in lisp/international/.
25 30
@@ -28,7 +33,10 @@ files, pay attention to any warning or error messages. In particular,
28admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines 33admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines
29new bidirectional attributes of characters, because unidata-gen.el, 34new bidirectional attributes of characters, because unidata-gen.el,
30bidi.c and dispextern.h need to be updated in that case; failure to do 35bidi.c and dispextern.h need to be updated in that case; failure to do
31so will cause aborts in redisplay. 36so will cause aborts in redisplay. unidata-gen.el will also complain
37if the format of the Unicode Copyright notice in copyright.html
38changed in significant ways; in that case, update the regular
39expression in unidata-gen-file used to extract the copyright string.
32 40
33Next, review the changes in UnicodeData.txt vs the previous version 41Next, review the changes in UnicodeData.txt vs the previous version
34used by Emacs. Any changes, be it introduction of new scripts or 42used by Emacs. Any changes, be it introduction of new scripts or
@@ -40,7 +48,12 @@ and see if any changes in admin/unidata/blocks.awk are required.
40 48
41The setting of char-width-table around line 1200 of characters.el 49The setting of char-width-table around line 1200 of characters.el
42should be checked against the latest version of the Unicode file 50should be checked against the latest version of the Unicode file
43EastAsianWidth.txt, and any discrepancies fixed. 51EastAsianWidth.txt, and any discrepancies fixed: double-width
52characters are those marked with W or F in that file. Zero-width
53characters are not taken from EastAsianWidth.txt, they are those whose
54Unicode General Category property is one of Mn, Me, or Cf, and also
55Hangul jungseong and jongseong characters (a.k.a. "Jamo medial vowels"
56and "Jamo final consonants").
44 57
45Any new scripts added by UnicodeData.txt will also need updates to 58Any new scripts added by UnicodeData.txt will also need updates to
46script-representative-chars defined in fontset.el, and also the list 59script-representative-chars defined in fontset.el, and also the list
@@ -230,37 +243,21 @@ nontrivial changes to the build process.
230 243
231 admin/charsets/mapfiles/cns2ucsdkw.txt 244 admin/charsets/mapfiles/cns2ucsdkw.txt
232 245
233 * iso-2022-7bit 246 * iso-2022-jp
234 247
235 Each of these files contains just one CJK charset, but Emacs 248 This contains just one CJK charset, but Emacs currently has no
236 currently has no easy way to specify set-charset-priority on a 249 easy way to specify set-charset-priority on a per-file basis, so
237 per-file basis, so converting any of these files to UTF-8 might 250 converting this file to UTF-8 might change the file's appearance
238 change the file's appearance when viewed by an Emacs that is 251 when viewed by an Emacs that is operating in some other language
239 operating in some other language environment. 252 environment.
240 253
241 etc/tutorials/TUTORIAL.ja 254 etc/tutorials/TUTORIAL.ja
242 lisp/international/ja-dic-cnv.el
243 lisp/international/ja-dic-utl.el
244 lisp/international/kinsoku.el
245 lisp/international/kkc.el
246 lisp/international/titdic-cnv.el
247 lisp/language/japan-util.el
248 lisp/language/japanese.el
249 lisp/leim/quail/cyril-jis.el
250 lisp/leim/quail/hanja-jis.el
251 lisp/leim/quail/japanese.el
252 lisp/leim/quail/py-punct.el
253 lisp/leim/quail/pypunct-b5.el
254
255 This file contains just Chinese characters, and has same problem.
256 Also, it contains characters that cannot be encoded in UTF-8.
257
258 lisp/international/titdic-cnv.el
259 255
260 * utf-8-emacs 256 * utf-8-emacs
261 257
262 These files contain characters that cannot be encoded in UTF-8. 258 These files contain characters that cannot be encoded in UTF-8.
263 259
260 lisp/international/titdic-cnv.el
264 lisp/language/ethio-util.el 261 lisp/language/ethio-util.el
265 lisp/language/ethiopic.el 262 lisp/language/ethiopic.el
266 lisp/language/ind-util.el 263 lisp/language/ind-util.el