Split off from README.unicode

author: Glenn Morris 2008-02-21 04:00:22 +0000
committer: Glenn Morris 2008-02-21 04:00:22 +0000
commit: e88a2ed3719bc74be5f13d297d9370dde777c6f5 (patch)
tree: 21f01cdc74e5dbb44829b007ad2296e7d5afc322 /admin/notes/unicode
parent: 2b7a2553a5fda159241852db13d3709ef6cd200f (diff)
download: emacs-e88a2ed3719bc74be5f13d297d9370dde777c6f5.tar.gz
emacs-e88a2ed3719bc74be5f13d297d9370dde777c6f5.zip
1 files changed, 146 insertions, 0 deletions
diff --git a/admin/notes/unicode b/admin/notes/unicode
new file mode 100644
index 00000000000..0f76a3d36f9
--- /dev/null
+++ b/admin/notes/unicode
@@ -0,0 +1,146 @@
+                                            -*-mode: text; coding: latin-1;-*-
+Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008
+  Free Software Foundation, Inc.
+See the end of the file for license conditions.
+Problems, fixmes and other unicode-related issues
+-------------------------------------------------------------
+Notes by fx to record various things of variable importance.  handa
+needs to check them -- don't take too seriously, especially with
+regard to completeness.
+ * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has
+   undesirable effects.  E.g.:
+   (multibyte-string-p (let ((s "x")) (aset s 0 ?�) s)) => nil
+   (multibyte-string-p (concat [?�])) => nil
+   (text-char-description ?�) => "M-#"
+        These examples are all fixed by the change of 2002-10-14, but
+        there still exist questionable SINGLE_BYTE_CHAR_P in the
+        code (keymap.c and print.c).
+ * Rationalize character syntax and its relationship to the Unicode
+   database.  (Applies mainly to symbol an punctuation syntax.)
+ * Fontset handling and customization needs work.  We want to relate
+   fonts to scripts, probably based on the Unicode blocks.  The
+   presence of small-repertoire 10646-encoded fonts in XFree 4 is a
+   pain, not currently worked round.
+        With the change on 2002-07-26, multiple fonts can be
+        specified in a fontset for a specific range of characters.
+        Each range can also be specified by script.  Before using
+        ISO10646 fonts, Emacs checks their repertories to avoid such
+        fonts that don't have a glyph for a specific character.
+        fx has worked on fontset customization, but was stymied by
+        basic problems with the way the default face is dealt with
+        (and something else, I think).  This needs revisiting.
+ * Work is also needed on charset and coding system priorities.
+ * The relevant bits of latin1-disp.el need porting (and probably
+   re-naming/updating).  See also cyril-util.el.
+ * Quail files need more work now the encoding is largely irrelevant.
+ * What to do with the old coding categories stuff?
+ * The preferred-coding-system property of charsets should probably be
+   junked unless it can be made more useful now.
+ * find-multibyte-characters needs looking at.
+ * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing
+   charsets.
+ * Lazy-load tables for unify-charset somehow?
+        Actually, Emacs clears out all charset maps and unify-map just
+        before dumping, and they are loaded again on demand by the
+        dumped emacs.  But, those maps (char tables) generated while
+        temacs is running can't be removed from the dumped emacs.
+ * Translation tables for {en,de}code currently aren't supported.
+        This should be fixed by the changes of 2002-10-14.
+ * Defining CCL coding systems currently doesn't work.
+        This should be fixed by the changes of 2003-01-30.
+ * iso-2022 charsets get unified on i/o.
+        With the change on 2003-01-06, decoding routines put `charset'
+        property to decoded text, and iso-2022 encoder pay attention
+        to it.  Thus, for instance, reading and writing by
+        iso-2022-7bit preserve the original designation sequences.
+        The property name `preferred-charset' may be better?
+        We may have to utilize this property to decide a font.
+ * Revisit locale processing: look at treating the language and
+   charset parts separately.  (Language should affect things like
+   spelling and calendar, but that's not a Unicode issue.)
+ * Handle Unicode combining characters usefully, e.g. diacritics, and
+   handle more scripts specifically (� la Devanagari).  There are
+   issues with canonicalization.
+ * Bidi is a separate issue with no support currently.
+ * We need tabular input methods, e.g. for maths symbols.  (Not
+   specific to Unicode.)
+ * Need multibyte text in menus, e.g. for the above.  (Not specific to
+   Unicode -- see Emacs etc/TODO, but now mostly works with gtk.)
+ * There's currently no support for Unicode normalization.
+ * Populate char-width-table correctly for Unicode characters and
+   worry about what happens when double-width charsets covering
+   non-CJK characters are unified.
+ * Emacs 20/21 .elc files are currently not loadable.  It may or may
+   not be possible to do this properly.
+        With the change on 2002-07-24, elc files generated by Emacs
+        20.3 and later are correctly loaded (including those
+        containing multibyte characters and compressed).  But, elc
+        files generated by 20.2 and the primer are still not loadable.
+        Is it really worth working on it?
+ * Rmail won't work with non-ASCII text.  Encoding issues for Babyl
+   files need sorting out, but rms says Babyl will go before this is
+   released.
+ * Gnus still needs some attention, and we need to get changes
+   accepted by Gnus maintainers...
+ * There are type errors lurking, e.g. in
+   Fcheck_coding_systems_region.  Define ENABLE_CHECKING to find them.
+ * You can grep the code for lots of fixmes.
+ * Old auto-save files, and similar files, such as Gnus drafts,
+   containing non-ASCII characters probably won't be re-read correctly.
+This file is part of GNU Emacs.
+GNU Emacs is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+You should have received a copy of the GNU General Public License
+along with GNU Emacs; see the file COPYING.  If not, write to the
+Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.
author	Glenn Morris	2008-02-21 04:00:22 +0000
committer	Glenn Morris	2008-02-21 04:00:22 +0000
commit	e88a2ed3719bc74be5f13d297d9370dde777c6f5 (patch)
tree	21f01cdc74e5dbb44829b007ad2296e7d5afc322 /admin/notes/unicode
parent	2b7a2553a5fda159241852db13d3709ef6cd200f (diff)
download	emacs-e88a2ed3719bc74be5f13d297d9370dde777c6f5.tar.gz emacs-e88a2ed3719bc74be5f13d297d9370dde777c6f5.zip

diff --git a/admin/notes/unicode b/admin/notes/unicode new file mode 100644 index 00000000000..0f76a3d36f9 --- /dev/null +++ b/admin/notes/unicode
@@ -0,0 +1,146 @@
	1	--mode: text; coding: latin-1;--
	2
	3	Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008
	4	Free Software Foundation, Inc.
	5	See the end of the file for license conditions.
	6
	7	Problems, fixmes and other unicode-related issues
	8	-------------------------------------------------------------
	9
	10	Notes by fx to record various things of variable importance. handa
	11	needs to check them -- don't take too seriously, especially with
	12	regard to completeness.
	13
	14	* SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has
	15	undesirable effects. E.g.:
	16	(multibyte-string-p (let ((s "x")) (aset s 0 ?�) s)) => nil
	17	(multibyte-string-p (concat [?�])) => nil
	18	(text-char-description ?�) => "M-#"
	19
	20	These examples are all fixed by the change of 2002-10-14, but
	21	there still exist questionable SINGLE_BYTE_CHAR_P in the
	22	code (keymap.c and print.c).
	23
	24	* Rationalize character syntax and its relationship to the Unicode
	25	database. (Applies mainly to symbol an punctuation syntax.)
	26
	27	* Fontset handling and customization needs work. We want to relate
	28	fonts to scripts, probably based on the Unicode blocks. The
	29	presence of small-repertoire 10646-encoded fonts in XFree 4 is a
	30	pain, not currently worked round.
	31
	32	With the change on 2002-07-26, multiple fonts can be
	33	specified in a fontset for a specific range of characters.
	34	Each range can also be specified by script. Before using
	35	ISO10646 fonts, Emacs checks their repertories to avoid such
	36	fonts that don't have a glyph for a specific character.
	37
	38	fx has worked on fontset customization, but was stymied by
	39	basic problems with the way the default face is dealt with
	40	(and something else, I think). This needs revisiting.
	41
	42	* Work is also needed on charset and coding system priorities.
	43
	44	* The relevant bits of latin1-disp.el need porting (and probably
	45	re-naming/updating). See also cyril-util.el.
	46
	47	* Quail files need more work now the encoding is largely irrelevant.
	48
	49	* What to do with the old coding categories stuff?
	50
	51	* The preferred-coding-system property of charsets should probably be
	52	junked unless it can be made more useful now.
	53
	54	* find-multibyte-characters needs looking at.
	55
	56	* Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing
	57	charsets.
	58
	59	* Lazy-load tables for unify-charset somehow?
	60
	61	Actually, Emacs clears out all charset maps and unify-map just
	62	before dumping, and they are loaded again on demand by the
	63	dumped emacs. But, those maps (char tables) generated while
	64	temacs is running can't be removed from the dumped emacs.
	65
	66	* Translation tables for {en,de}code currently aren't supported.
	67
	68	This should be fixed by the changes of 2002-10-14.
	69
	70	* Defining CCL coding systems currently doesn't work.
	71
	72	This should be fixed by the changes of 2003-01-30.
	73
	74	* iso-2022 charsets get unified on i/o.
	75
	76	With the change on 2003-01-06, decoding routines put `charset'
	77	property to decoded text, and iso-2022 encoder pay attention
	78	to it. Thus, for instance, reading and writing by
	79	iso-2022-7bit preserve the original designation sequences.
	80	The property name `preferred-charset' may be better?
	81
	82	We may have to utilize this property to decide a font.
	83
	84	* Revisit locale processing: look at treating the language and
	85	charset parts separately. (Language should affect things like
	86	spelling and calendar, but that's not a Unicode issue.)
	87
	88	* Handle Unicode combining characters usefully, e.g. diacritics, and
	89	handle more scripts specifically (� la Devanagari). There are
	90	issues with canonicalization.
	91
	92	* Bidi is a separate issue with no support currently.
	93
	94	* We need tabular input methods, e.g. for maths symbols. (Not
	95	specific to Unicode.)
	96
	97	* Need multibyte text in menus, e.g. for the above. (Not specific to
	98	Unicode -- see Emacs etc/TODO, but now mostly works with gtk.)
	99
	100	* There's currently no support for Unicode normalization.
	101
	102	* Populate char-width-table correctly for Unicode characters and
	103	worry about what happens when double-width charsets covering
	104	non-CJK characters are unified.
	105
	106	* Emacs 20/21 .elc files are currently not loadable. It may or may
	107	not be possible to do this properly.
	108
	109	With the change on 2002-07-24, elc files generated by Emacs
	110	20.3 and later are correctly loaded (including those
	111	containing multibyte characters and compressed). But, elc
	112	files generated by 20.2 and the primer are still not loadable.
	113	Is it really worth working on it?
	114
	115	* Rmail won't work with non-ASCII text. Encoding issues for Babyl
	116	files need sorting out, but rms says Babyl will go before this is
	117	released.
	118
	119	* Gnus still needs some attention, and we need to get changes
	120	accepted by Gnus maintainers...
	121
	122	* There are type errors lurking, e.g. in
	123	Fcheck_coding_systems_region. Define ENABLE_CHECKING to find them.
	124
	125	* You can grep the code for lots of fixmes.
	126
	127	* Old auto-save files, and similar files, such as Gnus drafts,
	128	containing non-ASCII characters probably won't be re-read correctly.
	129
	130
	131	This file is part of GNU Emacs.
	132
	133	GNU Emacs is free software; you can redistribute it and/or modify
	134	it under the terms of the GNU General Public License as published by
	135	the Free Software Foundation; either version 3, or (at your option)
	136	any later version.
	137
	138	GNU Emacs is distributed in the hope that it will be useful,
	139	but WITHOUT ANY WARRANTY; without even the implied warranty of
	140	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	141	GNU General Public License for more details.
	142
	143	You should have received a copy of the GNU General Public License
	144	along with GNU Emacs; see the file COPYING. If not, write to the
	145	Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
	146	Boston, MA 02110-1301, USA.