diff options
| author | Stephen Berman | 2013-06-14 22:07:55 +0200 |
|---|---|---|
| committer | Stephen Berman | 2013-06-14 22:07:55 +0200 |
| commit | bd358779861f265a7acff31ead40172735af693e (patch) | |
| tree | 345217a9889dbd29b09bdc80a94265c17719d41f /admin/notes/unicode | |
| parent | 2a97b47f0878cbda86cb6ba0e7e744924810b70e (diff) | |
| parent | f7394b12358ae453a0c8b85fc307afc1b740010d (diff) | |
| download | emacs-bd358779861f265a7acff31ead40172735af693e.tar.gz emacs-bd358779861f265a7acff31ead40172735af693e.zip | |
Merge from trunk.
Diffstat (limited to 'admin/notes/unicode')
| -rw-r--r-- | admin/notes/unicode | 140 |
1 files changed, 134 insertions, 6 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode index dda6ec4cc93..6db5bb7d05c 100644 --- a/admin/notes/unicode +++ b/admin/notes/unicode | |||
| @@ -1,6 +1,6 @@ | |||
| 1 | -*-mode: text; coding: latin-1;-*- | 1 | -*-mode: text; coding: utf-8;-*- |
| 2 | 2 | ||
| 3 | Copyright (C) 2002-2012 Free Software Foundation, Inc. | 3 | Copyright (C) 2002-2013 Free Software Foundation, Inc. |
| 4 | See the end of the file for license conditions. | 4 | See the end of the file for license conditions. |
| 5 | 5 | ||
| 6 | Problems, fixmes and other unicode-related issues | 6 | Problems, fixmes and other unicode-related issues |
| @@ -12,9 +12,9 @@ regard to completeness. | |||
| 12 | 12 | ||
| 13 | * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has | 13 | * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has |
| 14 | undesirable effects. E.g.: | 14 | undesirable effects. E.g.: |
| 15 | (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil | 15 | (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil |
| 16 | (multibyte-string-p (concat [?£])) => nil | 16 | (multibyte-string-p (concat [?£])) => nil |
| 17 | (text-char-description ?£) => "M-#" | 17 | (text-char-description ?£) => "M-#" |
| 18 | 18 | ||
| 19 | These examples are all fixed by the change of 2002-10-14, but | 19 | These examples are all fixed by the change of 2002-10-14, but |
| 20 | there still exist questionable SINGLE_BYTE_CHAR_P in the | 20 | there still exist questionable SINGLE_BYTE_CHAR_P in the |
| @@ -77,7 +77,7 @@ regard to completeness. | |||
| 77 | spelling and calendar, but that's not a Unicode issue.) | 77 | spelling and calendar, but that's not a Unicode issue.) |
| 78 | 78 | ||
| 79 | * Handle Unicode combining characters usefully, e.g. diacritics, and | 79 | * Handle Unicode combining characters usefully, e.g. diacritics, and |
| 80 | handle more scripts specifically (à la Devanagari). There are | 80 | handle more scripts specifically (Ã la Devanagari). There are |
| 81 | issues with canonicalization. | 81 | issues with canonicalization. |
| 82 | 82 | ||
| 83 | * We need tabular input methods, e.g. for maths symbols. (Not | 83 | * We need tabular input methods, e.g. for maths symbols. (Not |
| @@ -98,6 +98,134 @@ regard to completeness. | |||
| 98 | * Old auto-save files, and similar files, such as Gnus drafts, | 98 | * Old auto-save files, and similar files, such as Gnus drafts, |
| 99 | containing non-ASCII characters probably won't be re-read correctly. | 99 | containing non-ASCII characters probably won't be re-read correctly. |
| 100 | 100 | ||
| 101 | |||
| 102 | Source file encoding | ||
| 103 | -------------------- | ||
| 104 | |||
| 105 | Most Emacs source files are encoded in UTF-8 (or in ASCII, which is a | ||
| 106 | subset), but there are a few exceptions, listed below. Perhaps | ||
| 107 | someday many of these files will be converted to UTF-8, for | ||
| 108 | convenience when using tools like 'grep -r', but this might need | ||
| 109 | nontrivial changes to the build process. | ||
| 110 | |||
| 111 | * chinese-big5 | ||
| 112 | |||
| 113 | These are verbatim copies of files taken from external sources. | ||
| 114 | They haven't been converted to UTF-8. | ||
| 115 | |||
| 116 | leim/CXTERM-DIC/4Corner.tit | ||
| 117 | leim/CXTERM-DIC/ARRAY30.tit | ||
| 118 | leim/CXTERM-DIC/ECDICT.tit | ||
| 119 | leim/CXTERM-DIC/ETZY.tit | ||
| 120 | leim/CXTERM-DIC/PY-b5.tit | ||
| 121 | leim/CXTERM-DIC/Punct-b5.tit | ||
| 122 | leim/CXTERM-DIC/QJ-b5.tit | ||
| 123 | leim/CXTERM-DIC/ZOZY.tit | ||
| 124 | leim/MISC-DIC/CTLau-b5.html | ||
| 125 | leim/MISC-DIC/cangjie-table.b5 | ||
| 126 | |||
| 127 | * chinese-iso-8bit | ||
| 128 | |||
| 129 | These are verbatim copies of files taken from external sources. | ||
| 130 | They haven't been converted to UTF-8. | ||
| 131 | |||
| 132 | leim/CXTERM-DIC/CCDOSPY.tit | ||
| 133 | leim/CXTERM-DIC/Punct.tit | ||
| 134 | leim/CXTERM-DIC/QJ.tit | ||
| 135 | leim/CXTERM-DIC/SW.tit | ||
| 136 | leim/CXTERM-DIC/TONEPY.tit | ||
| 137 | leim/MISC-DIC/pinyin.map | ||
| 138 | leim/MISC-DIC/CTLau.html | ||
| 139 | leim/MISC-DIC/ziranma.cin | ||
| 140 | |||
| 141 | * cp850 | ||
| 142 | |||
| 143 | This file contains non-ASCII characters in unibyte strings. When | ||
| 144 | editing a keyboard layout it's more convenient to see 'é' than | ||
| 145 | '\202', and the MS-DOS compiler requires the single byte if a | ||
| 146 | backslash escape is not being used. | ||
| 147 | |||
| 148 | src/msdos.c | ||
| 149 | |||
| 150 | * iso-2022-cn-ext | ||
| 151 | |||
| 152 | This file is externally generated from leim/MISC-DIC/cangjie-table.b5 | ||
| 153 | by Big5->CNS converter. It hasn't been converted to UTF-8. | ||
| 154 | |||
| 155 | leim/MISC-DIC/cangjie-table.cns | ||
| 156 | |||
| 157 | * iso-latin-2 | ||
| 158 | |||
| 159 | These files are processed by csplain, a program that requires | ||
| 160 | Latin-2 input. In 2012 the csplain maintainers started | ||
| 161 | recommending UTF-8, but these files haven't been converted yet. | ||
| 162 | |||
| 163 | etc/refcards/cs-dired-ref.tex | ||
| 164 | etc/refcards/cs-refcard.tex | ||
| 165 | etc/refcards/cs-survival.tex | ||
| 166 | etc/refcards/sk-dired-ref.tex | ||
| 167 | etc/refcards/sk-refcard.tex | ||
| 168 | etc/refcards/sk-survival.tex | ||
| 169 | |||
| 170 | * japanese-iso-8bit | ||
| 171 | |||
| 172 | SKK-JISYO.L is a verbatim copy of a file taken from an external source. | ||
| 173 | It hasn't been converted to UTF-8. | ||
| 174 | |||
| 175 | leim/SKK-DIC/SKK-JISYO.L | ||
| 176 | |||
| 177 | * japanese-shift-jis | ||
| 178 | |||
| 179 | This is a verbatim copy of a file taken from an external source. | ||
| 180 | It hasn't been converted to UTF-8. | ||
| 181 | |||
| 182 | admin/charsets/mapfiles/cns2ucsdkw.txt | ||
| 183 | |||
| 184 | * no-conversion | ||
| 185 | |||
| 186 | This file purposely contains arbitrary bytes interspersed within text, | ||
| 187 | to test whether the Emacs distribution is corrupted. | ||
| 188 | |||
| 189 | lib-src/testfile | ||
| 190 | |||
| 191 | * iso-2022-7bit | ||
| 192 | |||
| 193 | This file switches between CJK charsets, which is not encoded in UTF-8. | ||
| 194 | |||
| 195 | etc/HELLO | ||
| 196 | |||
| 197 | Each of these files contains just one CJK charset, but Emacs | ||
| 198 | currently has no easy way to specify set-charset-priority on a | ||
| 199 | per-file basis, so converting any of these files to UTF-8 might | ||
| 200 | change the file's appearance when viewed by an Emacs that is | ||
| 201 | operating in some other language environment. | ||
| 202 | |||
| 203 | etc/tutorials/TUTORIAL.ja | ||
| 204 | leim/quail/cyril-jis.el | ||
| 205 | leim/quail/hanja-jis.el | ||
| 206 | leim/quail/japanese.el | ||
| 207 | leim/quail/py-punct.el | ||
| 208 | leim/quail/pypunct-b5.el | ||
| 209 | lisp/international/ja-dic-cnv.el | ||
| 210 | lisp/international/ja-dic-utl.el | ||
| 211 | lisp/international/kinsoku.el | ||
| 212 | lisp/international/kkc.el | ||
| 213 | lisp/international/titdic-cnv.el | ||
| 214 | lisp/language/japan-util.el | ||
| 215 | lisp/language/japanese.el | ||
| 216 | lisp/term/x-win.el | ||
| 217 | |||
| 218 | * utf-8-emacs | ||
| 219 | |||
| 220 | These files contain characters that cannot be encoded in UTF-8. | ||
| 221 | |||
| 222 | leim/quail/tibetan.el | ||
| 223 | leim/quail/ethiopic.el | ||
| 224 | lisp/international/titdic-cnv.el | ||
| 225 | lisp/language/tibetan.el | ||
| 226 | lisp/language/tibet-util.el | ||
| 227 | lisp/language/ind-util.el | ||
| 228 | |||
| 101 | 229 | ||
| 102 | This file is part of GNU Emacs. | 230 | This file is part of GNU Emacs. |
| 103 | 231 | ||