aboutsummaryrefslogtreecommitdiffstats
path: root/admin/notes/unicode
diff options
context:
space:
mode:
authorPaul Eggert2013-03-11 15:32:07 -0700
committerPaul Eggert2013-03-11 15:32:07 -0700
commit1b610f514360dc54d34facf98f1072efba436ca6 (patch)
treef94bf542d1912cf7ada9a2ad2aa59d107a9a563a /admin/notes/unicode
parente56221d55000f52ca15c75d772db1ddf150de016 (diff)
downloademacs-1b610f514360dc54d34facf98f1072efba436ca6.tar.gz
emacs-1b610f514360dc54d34facf98f1072efba436ca6.zip
* notes/unicode: Improve notes about Emacs source file encoding.
Diffstat (limited to 'admin/notes/unicode')
-rw-r--r--admin/notes/unicode61
1 files changed, 56 insertions, 5 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode
index 0654036d364..68a6a67a93c 100644
--- a/admin/notes/unicode
+++ b/admin/notes/unicode
@@ -104,12 +104,15 @@ Source file encoding
104 104
105Most Emacs source files are encoded in UTF-8 (or in ASCII, which is a 105Most Emacs source files are encoded in UTF-8 (or in ASCII, which is a
106subset), but there are a few exceptions, listed below. Perhaps 106subset), but there are a few exceptions, listed below. Perhaps
107someday these files will be converted to UTF-8, for convenience when 107someday many of the these files will be converted to UTF-8, for
108using tools like 'grep -r', but this might need nontrivial changes to 108convenience when using tools like 'grep -r', but this might need
109the build process. 109nontrivial changes to the build process.
110 110
111 * chinese-big5 111 * chinese-big5
112 112
113 These are verbatim copies of files taken from external sources.
114 They haven't been converted to UTF-8.
115
113 leim/CXTERM-DIC/4Corner.tit 116 leim/CXTERM-DIC/4Corner.tit
114 leim/CXTERM-DIC/ARRAY30.tit 117 leim/CXTERM-DIC/ARRAY30.tit
115 leim/CXTERM-DIC/ECDICT.tit 118 leim/CXTERM-DIC/ECDICT.tit
@@ -123,6 +126,9 @@ the build process.
123 126
124 * chinese-iso-8bit 127 * chinese-iso-8bit
125 128
129 These are verbatim copies of files taken from external sources.
130 They haven't been converted to UTF-8.
131
126 leim/CXTERM-DIC/CCDOSPY.tit 132 leim/CXTERM-DIC/CCDOSPY.tit
127 leim/CXTERM-DIC/Punct.tit 133 leim/CXTERM-DIC/Punct.tit
128 leim/CXTERM-DIC/QJ.tit 134 leim/CXTERM-DIC/QJ.tit
@@ -132,28 +138,73 @@ the build process.
132 leim/MISC-DIC/CTLau.html 138 leim/MISC-DIC/CTLau.html
133 leim/MISC-DIC/ziranma.cin 139 leim/MISC-DIC/ziranma.cin
134 140
141 * cp850
142
143 This file contains non-ASCII characters in unibyte strings. When
144 editing a keyboard layout it's more convenient to see 'é' than
145 '\202', and the MS-DOS compiler requires the single byte if a
146 backslash escape is not being used.
147
148 src/msdos.c
149
150 * iso-2022-cn-ext
151
152 This file is externally generated from leim/MISC-DIC/cangjie-table.b5
153 by Big5->CNS converter. It hasn't been converted to UTF-8.
154
155 leim/MISC-DIC/cangjie-table.cns
156
135 * iso-latin-2 157 * iso-latin-2
136 158
159 These files are processed by csplain, a program that requires
160 Latin-2 input. In 2012 the csplain maintainers started
161 recommending UTF-8, but these files haven't been converted yet.
162
163 etc/refcards/cs-dired-ref.tex
137 etc/refcards/cs-refcard.tex 164 etc/refcards/cs-refcard.tex
138 etc/refcards/sk-survival.tex
139 etc/refcards/cs-survival.tex 165 etc/refcards/cs-survival.tex
140 etc/refcards/cs-dired-ref.tex
141 etc/refcards/sk-dired-ref.tex 166 etc/refcards/sk-dired-ref.tex
142 etc/refcards/sk-refcard.tex 167 etc/refcards/sk-refcard.tex
168 etc/refcards/sk-survival.tex
143 169
144 * japanese-iso-8bit 170 * japanese-iso-8bit
145 171
172 SKK-JISYO.L is a verbatim copy of a file taken from an external source.
173 ja-dic.el is generated automatically by skkdic-convert; this process
174 hasn't been converted to use UTF-8.
175
146 leim/SKK-DIC/SKK-JISYO.L 176 leim/SKK-DIC/SKK-JISYO.L
147 leim/ja-dic/ja-dic.el 177 leim/ja-dic/ja-dic.el
148 178
149 * japanese-shift-jis 179 * japanese-shift-jis
150 180
181 This is a verbatim copy of a file taken from an external source.
182 It hasn't been converted to UTF-8.
183
151 admin/charsets/mapfiles/cns2ucsdkw.txt 184 admin/charsets/mapfiles/cns2ucsdkw.txt
152 185
153 * no-conversion 186 * no-conversion
154 187
188 This file purposely contains arbitrary bytes interspersed within text,
189 to test whether the Emacs distribution is corrupted.
190
155 lib-src/testfile 191 lib-src/testfile
156 192
193 * iso-2022-7bit
194
195 These files contain characters that cannot be encoded in UTF-8.
196
197 leim/quail/tibetan.el
198 leim/quail/ethiopic.el
199 lisp/international/titdic-cnv.el
200 lisp/language/tibetan.el
201 lisp/language/tibet-util.el
202 lisp/language/ind-util.el
203
204 Converting this file to UTF-8 loses non-character information.
205
206 leim/quail/hanja3.el
207
157 208
158This file is part of GNU Emacs. 209This file is part of GNU Emacs.
159 210