diff options
Diffstat (limited to 'admin/notes/unicode')
| -rw-r--r-- | admin/notes/unicode | 112 |
1 files changed, 75 insertions, 37 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode index 6db5bb7d05c..3901f60954f 100644 --- a/admin/notes/unicode +++ b/admin/notes/unicode | |||
| @@ -1,12 +1,46 @@ | |||
| 1 | -*-mode: text; coding: utf-8;-*- | 1 | -*-mode: text; coding: utf-8;-*- |
| 2 | 2 | ||
| 3 | Copyright (C) 2002-2013 Free Software Foundation, Inc. | 3 | Copyright (C) 2002-2015 Free Software Foundation, Inc. |
| 4 | See the end of the file for license conditions. | 4 | See the end of the file for license conditions. |
| 5 | 5 | ||
| 6 | Importing a new Unicode Standard version into Emacs | ||
| 7 | ------------------------------------------------------------- | ||
| 8 | |||
| 9 | Emacs uses the following files from the Unicode Character Database | ||
| 10 | (a.k.a. "UCD): | ||
| 11 | |||
| 12 | . UnicodeData.txt | ||
| 13 | . BidiMirroring.txt | ||
| 14 | . IVD_Sequences.txt | ||
| 15 | |||
| 16 | First, these files need to be copied into admin/unidata/, and then | ||
| 17 | Emacs should be rebuilt for them to take effect. Rebuilding Emacs | ||
| 18 | updates several derived files elsewhere in the Emacs source tree, | ||
| 19 | mainly in lisp/international/. | ||
| 20 | |||
| 21 | When Emacs is rebuilt for the first time after importing the new | ||
| 22 | files, pay attention to any warning or error messages. In particular, | ||
| 23 | admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines | ||
| 24 | new bidirectional attributes of characters, because unidata-gen.el, | ||
| 25 | bidi.c and dispextern.h need to be updated in that case; failure to do | ||
| 26 | so will cause aborts in redisplay. | ||
| 27 | |||
| 28 | Next, review the changes in UnicodeData.txt vs the previous version | ||
| 29 | used by Emacs. Any changes, be it introduction of new scripts or | ||
| 30 | addition of codepoints to existing scripts, might need corresponding | ||
| 31 | changes in the data used for filling the category-table, case-table, | ||
| 32 | and char-width-table. The additional scripts should cause automatic | ||
| 33 | updates in charscript.el, but it is a good idea to look at the results | ||
| 34 | and see if any changes in admin/unidata/blocks.awk are required. | ||
| 35 | |||
| 36 | Any new scripts added by UnicodeData.txt will also need updates to | ||
| 37 | script-representative-chars defined in fontset.el. Other databases in | ||
| 38 | fontset.el might also need to be updated as needed. | ||
| 39 | |||
| 6 | Problems, fixmes and other unicode-related issues | 40 | Problems, fixmes and other unicode-related issues |
| 7 | ------------------------------------------------------------- | 41 | ------------------------------------------------------------- |
| 8 | 42 | ||
| 9 | Notes by fx to record various things of variable importance. handa | 43 | Notes by fx to record various things of variable importance. Handa |
| 10 | needs to check them -- don't take too seriously, especially with | 44 | needs to check them -- don't take too seriously, especially with |
| 11 | regard to completeness. | 45 | regard to completeness. |
| 12 | 46 | ||
| @@ -64,11 +98,11 @@ regard to completeness. | |||
| 64 | 98 | ||
| 65 | * iso-2022 charsets get unified on i/o. | 99 | * iso-2022 charsets get unified on i/o. |
| 66 | 100 | ||
| 67 | With the change on 2003-01-06, decoding routines put `charset' | 101 | With the change on 2003-01-06, decoding routines put the 'charset' |
| 68 | property to decoded text, and iso-2022 encoder pay attention | 102 | property onto decoded text, and iso-2022 encoder pay attention |
| 69 | to it. Thus, for instance, reading and writing by | 103 | to it. Thus, for instance, reading and writing by |
| 70 | iso-2022-7bit preserve the original designation sequences. | 104 | iso-2022-7bit preserve the original designation sequences. |
| 71 | The property name `preferred-charset' may be better? | 105 | The property name 'preferred-charset' may be better? |
| 72 | 106 | ||
| 73 | We may have to utilize this property to decide a font. | 107 | We may have to utilize this property to decide a font. |
| 74 | 108 | ||
| @@ -134,8 +168,8 @@ nontrivial changes to the build process. | |||
| 134 | leim/CXTERM-DIC/QJ.tit | 168 | leim/CXTERM-DIC/QJ.tit |
| 135 | leim/CXTERM-DIC/SW.tit | 169 | leim/CXTERM-DIC/SW.tit |
| 136 | leim/CXTERM-DIC/TONEPY.tit | 170 | leim/CXTERM-DIC/TONEPY.tit |
| 137 | leim/MISC-DIC/pinyin.map | ||
| 138 | leim/MISC-DIC/CTLau.html | 171 | leim/MISC-DIC/CTLau.html |
| 172 | leim/MISC-DIC/pinyin.map | ||
| 139 | leim/MISC-DIC/ziranma.cin | 173 | leim/MISC-DIC/ziranma.cin |
| 140 | 174 | ||
| 141 | * cp850 | 175 | * cp850 |
| @@ -154,19 +188,6 @@ nontrivial changes to the build process. | |||
| 154 | 188 | ||
| 155 | leim/MISC-DIC/cangjie-table.cns | 189 | leim/MISC-DIC/cangjie-table.cns |
| 156 | 190 | ||
| 157 | * iso-latin-2 | ||
| 158 | |||
| 159 | These files are processed by csplain, a program that requires | ||
| 160 | Latin-2 input. In 2012 the csplain maintainers started | ||
| 161 | recommending UTF-8, but these files haven't been converted yet. | ||
| 162 | |||
| 163 | etc/refcards/cs-dired-ref.tex | ||
| 164 | etc/refcards/cs-refcard.tex | ||
| 165 | etc/refcards/cs-survival.tex | ||
| 166 | etc/refcards/sk-dired-ref.tex | ||
| 167 | etc/refcards/sk-refcard.tex | ||
| 168 | etc/refcards/sk-survival.tex | ||
| 169 | |||
| 170 | * japanese-iso-8bit | 191 | * japanese-iso-8bit |
| 171 | 192 | ||
| 172 | SKK-JISYO.L is a verbatim copy of a file taken from an external source. | 193 | SKK-JISYO.L is a verbatim copy of a file taken from an external source. |
| @@ -181,13 +202,6 @@ nontrivial changes to the build process. | |||
| 181 | 202 | ||
| 182 | admin/charsets/mapfiles/cns2ucsdkw.txt | 203 | admin/charsets/mapfiles/cns2ucsdkw.txt |
| 183 | 204 | ||
| 184 | * no-conversion | ||
| 185 | |||
| 186 | This file purposely contains arbitrary bytes interspersed within text, | ||
| 187 | to test whether the Emacs distribution is corrupted. | ||
| 188 | |||
| 189 | lib-src/testfile | ||
| 190 | |||
| 191 | * iso-2022-7bit | 205 | * iso-2022-7bit |
| 192 | 206 | ||
| 193 | This file switches between CJK charsets, which is not encoded in UTF-8. | 207 | This file switches between CJK charsets, which is not encoded in UTF-8. |
| @@ -201,11 +215,6 @@ nontrivial changes to the build process. | |||
| 201 | operating in some other language environment. | 215 | operating in some other language environment. |
| 202 | 216 | ||
| 203 | etc/tutorials/TUTORIAL.ja | 217 | etc/tutorials/TUTORIAL.ja |
| 204 | leim/quail/cyril-jis.el | ||
| 205 | leim/quail/hanja-jis.el | ||
| 206 | leim/quail/japanese.el | ||
| 207 | leim/quail/py-punct.el | ||
| 208 | leim/quail/pypunct-b5.el | ||
| 209 | lisp/international/ja-dic-cnv.el | 218 | lisp/international/ja-dic-cnv.el |
| 210 | lisp/international/ja-dic-utl.el | 219 | lisp/international/ja-dic-utl.el |
| 211 | lisp/international/kinsoku.el | 220 | lisp/international/kinsoku.el |
| @@ -213,18 +222,47 @@ nontrivial changes to the build process. | |||
| 213 | lisp/international/titdic-cnv.el | 222 | lisp/international/titdic-cnv.el |
| 214 | lisp/language/japan-util.el | 223 | lisp/language/japan-util.el |
| 215 | lisp/language/japanese.el | 224 | lisp/language/japanese.el |
| 216 | lisp/term/x-win.el | 225 | lisp/leim/quail/cyril-jis.el |
| 226 | lisp/leim/quail/hanja-jis.el | ||
| 227 | lisp/leim/quail/japanese.el | ||
| 228 | lisp/leim/quail/py-punct.el | ||
| 229 | lisp/leim/quail/pypunct-b5.el | ||
| 230 | |||
| 231 | This file contains just Chinese characters, and has same problem. | ||
| 232 | Also, it contains characters that cannot be encoded in UTF-8. | ||
| 233 | |||
| 234 | lisp/international/titdic-cnv.el | ||
| 217 | 235 | ||
| 218 | * utf-8-emacs | 236 | * utf-8-emacs |
| 219 | 237 | ||
| 220 | These files contain characters that cannot be encoded in UTF-8. | 238 | These files contain characters that cannot be encoded in UTF-8. |
| 221 | 239 | ||
| 222 | leim/quail/tibetan.el | 240 | lisp/language/ethio-util.el |
| 223 | leim/quail/ethiopic.el | 241 | lisp/language/ethiopic.el |
| 224 | lisp/international/titdic-cnv.el | ||
| 225 | lisp/language/tibetan.el | ||
| 226 | lisp/language/tibet-util.el | ||
| 227 | lisp/language/ind-util.el | 242 | lisp/language/ind-util.el |
| 243 | lisp/language/tibet-util.el | ||
| 244 | lisp/language/tibetan.el | ||
| 245 | lisp/leim/quail/ethiopic.el | ||
| 246 | lisp/leim/quail/tibetan.el | ||
| 247 | |||
| 248 | * binary files | ||
| 249 | |||
| 250 | These files contain binary data, and are not text files. | ||
| 251 | Some of the entries in this list are patterns, and stand for any | ||
| 252 | files with the listed extension. | ||
| 253 | |||
| 254 | *.gz | ||
| 255 | *.icns | ||
| 256 | *.ico | ||
| 257 | *.pbm | ||
| 258 | |||
| 259 | *.png | ||
| 260 | *.sig | ||
| 261 | etc/e/eterm-color | ||
| 262 | etc/package-keyring.gpg | ||
| 263 | msdos/emacs.pif | ||
| 264 | nextstep/GNUstep/Emacs.base/Resources/emacs.tiff | ||
| 265 | nt/icons/hand.cur | ||
| 228 | 266 | ||
| 229 | 267 | ||
| 230 | This file is part of GNU Emacs. | 268 | This file is part of GNU Emacs. |