aboutsummaryrefslogtreecommitdiffstats
path: root/admin/notes/unicode
diff options
context:
space:
mode:
Diffstat (limited to 'admin/notes/unicode')
-rw-r--r--admin/notes/unicode112
1 files changed, 75 insertions, 37 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode
index 6db5bb7d05c..3901f60954f 100644
--- a/admin/notes/unicode
+++ b/admin/notes/unicode
@@ -1,12 +1,46 @@
1 -*-mode: text; coding: utf-8;-*- 1 -*-mode: text; coding: utf-8;-*-
2 2
3Copyright (C) 2002-2013 Free Software Foundation, Inc. 3Copyright (C) 2002-2015 Free Software Foundation, Inc.
4See the end of the file for license conditions. 4See the end of the file for license conditions.
5 5
6Importing a new Unicode Standard version into Emacs
7-------------------------------------------------------------
8
9Emacs uses the following files from the Unicode Character Database
10(a.k.a. "UCD):
11
12 . UnicodeData.txt
13 . BidiMirroring.txt
14 . IVD_Sequences.txt
15
16First, these files need to be copied into admin/unidata/, and then
17Emacs should be rebuilt for them to take effect. Rebuilding Emacs
18updates several derived files elsewhere in the Emacs source tree,
19mainly in lisp/international/.
20
21When Emacs is rebuilt for the first time after importing the new
22files, pay attention to any warning or error messages. In particular,
23admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines
24new bidirectional attributes of characters, because unidata-gen.el,
25bidi.c and dispextern.h need to be updated in that case; failure to do
26so will cause aborts in redisplay.
27
28Next, review the changes in UnicodeData.txt vs the previous version
29used by Emacs. Any changes, be it introduction of new scripts or
30addition of codepoints to existing scripts, might need corresponding
31changes in the data used for filling the category-table, case-table,
32and char-width-table. The additional scripts should cause automatic
33updates in charscript.el, but it is a good idea to look at the results
34and see if any changes in admin/unidata/blocks.awk are required.
35
36Any new scripts added by UnicodeData.txt will also need updates to
37script-representative-chars defined in fontset.el. Other databases in
38fontset.el might also need to be updated as needed.
39
6Problems, fixmes and other unicode-related issues 40Problems, fixmes and other unicode-related issues
7------------------------------------------------------------- 41-------------------------------------------------------------
8 42
9Notes by fx to record various things of variable importance. handa 43Notes by fx to record various things of variable importance. Handa
10needs to check them -- don't take too seriously, especially with 44needs to check them -- don't take too seriously, especially with
11regard to completeness. 45regard to completeness.
12 46
@@ -64,11 +98,11 @@ regard to completeness.
64 98
65 * iso-2022 charsets get unified on i/o. 99 * iso-2022 charsets get unified on i/o.
66 100
67 With the change on 2003-01-06, decoding routines put `charset' 101 With the change on 2003-01-06, decoding routines put the 'charset'
68 property to decoded text, and iso-2022 encoder pay attention 102 property onto decoded text, and iso-2022 encoder pay attention
69 to it. Thus, for instance, reading and writing by 103 to it. Thus, for instance, reading and writing by
70 iso-2022-7bit preserve the original designation sequences. 104 iso-2022-7bit preserve the original designation sequences.
71 The property name `preferred-charset' may be better? 105 The property name 'preferred-charset' may be better?
72 106
73 We may have to utilize this property to decide a font. 107 We may have to utilize this property to decide a font.
74 108
@@ -134,8 +168,8 @@ nontrivial changes to the build process.
134 leim/CXTERM-DIC/QJ.tit 168 leim/CXTERM-DIC/QJ.tit
135 leim/CXTERM-DIC/SW.tit 169 leim/CXTERM-DIC/SW.tit
136 leim/CXTERM-DIC/TONEPY.tit 170 leim/CXTERM-DIC/TONEPY.tit
137 leim/MISC-DIC/pinyin.map
138 leim/MISC-DIC/CTLau.html 171 leim/MISC-DIC/CTLau.html
172 leim/MISC-DIC/pinyin.map
139 leim/MISC-DIC/ziranma.cin 173 leim/MISC-DIC/ziranma.cin
140 174
141 * cp850 175 * cp850
@@ -154,19 +188,6 @@ nontrivial changes to the build process.
154 188
155 leim/MISC-DIC/cangjie-table.cns 189 leim/MISC-DIC/cangjie-table.cns
156 190
157 * iso-latin-2
158
159 These files are processed by csplain, a program that requires
160 Latin-2 input. In 2012 the csplain maintainers started
161 recommending UTF-8, but these files haven't been converted yet.
162
163 etc/refcards/cs-dired-ref.tex
164 etc/refcards/cs-refcard.tex
165 etc/refcards/cs-survival.tex
166 etc/refcards/sk-dired-ref.tex
167 etc/refcards/sk-refcard.tex
168 etc/refcards/sk-survival.tex
169
170 * japanese-iso-8bit 191 * japanese-iso-8bit
171 192
172 SKK-JISYO.L is a verbatim copy of a file taken from an external source. 193 SKK-JISYO.L is a verbatim copy of a file taken from an external source.
@@ -181,13 +202,6 @@ nontrivial changes to the build process.
181 202
182 admin/charsets/mapfiles/cns2ucsdkw.txt 203 admin/charsets/mapfiles/cns2ucsdkw.txt
183 204
184 * no-conversion
185
186 This file purposely contains arbitrary bytes interspersed within text,
187 to test whether the Emacs distribution is corrupted.
188
189 lib-src/testfile
190
191 * iso-2022-7bit 205 * iso-2022-7bit
192 206
193 This file switches between CJK charsets, which is not encoded in UTF-8. 207 This file switches between CJK charsets, which is not encoded in UTF-8.
@@ -201,11 +215,6 @@ nontrivial changes to the build process.
201 operating in some other language environment. 215 operating in some other language environment.
202 216
203 etc/tutorials/TUTORIAL.ja 217 etc/tutorials/TUTORIAL.ja
204 leim/quail/cyril-jis.el
205 leim/quail/hanja-jis.el
206 leim/quail/japanese.el
207 leim/quail/py-punct.el
208 leim/quail/pypunct-b5.el
209 lisp/international/ja-dic-cnv.el 218 lisp/international/ja-dic-cnv.el
210 lisp/international/ja-dic-utl.el 219 lisp/international/ja-dic-utl.el
211 lisp/international/kinsoku.el 220 lisp/international/kinsoku.el
@@ -213,18 +222,47 @@ nontrivial changes to the build process.
213 lisp/international/titdic-cnv.el 222 lisp/international/titdic-cnv.el
214 lisp/language/japan-util.el 223 lisp/language/japan-util.el
215 lisp/language/japanese.el 224 lisp/language/japanese.el
216 lisp/term/x-win.el 225 lisp/leim/quail/cyril-jis.el
226 lisp/leim/quail/hanja-jis.el
227 lisp/leim/quail/japanese.el
228 lisp/leim/quail/py-punct.el
229 lisp/leim/quail/pypunct-b5.el
230
231 This file contains just Chinese characters, and has same problem.
232 Also, it contains characters that cannot be encoded in UTF-8.
233
234 lisp/international/titdic-cnv.el
217 235
218 * utf-8-emacs 236 * utf-8-emacs
219 237
220 These files contain characters that cannot be encoded in UTF-8. 238 These files contain characters that cannot be encoded in UTF-8.
221 239
222 leim/quail/tibetan.el 240 lisp/language/ethio-util.el
223 leim/quail/ethiopic.el 241 lisp/language/ethiopic.el
224 lisp/international/titdic-cnv.el
225 lisp/language/tibetan.el
226 lisp/language/tibet-util.el
227 lisp/language/ind-util.el 242 lisp/language/ind-util.el
243 lisp/language/tibet-util.el
244 lisp/language/tibetan.el
245 lisp/leim/quail/ethiopic.el
246 lisp/leim/quail/tibetan.el
247
248 * binary files
249
250 These files contain binary data, and are not text files.
251 Some of the entries in this list are patterns, and stand for any
252 files with the listed extension.
253
254 *.gz
255 *.icns
256 *.ico
257 *.pbm
258 *.pdf
259 *.png
260 *.sig
261 etc/e/eterm-color
262 etc/package-keyring.gpg
263 msdos/emacs.pif
264 nextstep/GNUstep/Emacs.base/Resources/emacs.tiff
265 nt/icons/hand.cur
228 266
229 267
230This file is part of GNU Emacs. 268This file is part of GNU Emacs.