Merged in CHARACTERS

author: Eric S. Raymond 1993-03-22 03:00:23 +0000
committer: Eric S. Raymond 1993-03-22 03:00:23 +0000
commit: 33d92c1f9de704cda9309731b4d6add46178aafc (patch)
tree: 036d61fa604499e80225e246a6ef8af890cc5c1b
parent: 462e90c9f9aad73961d3d32dcfb062169318bf5e (diff)
download: emacs-33d92c1f9de704cda9309731b4d6add46178aafc.tar.gz
emacs-33d92c1f9de704cda9309731b4d6add46178aafc.zip
1 files changed, 57 insertions, 0 deletions
diff --git a/etc/TO-DO b/etc/TO-DO
index cc5b398eeca..e5b9a49599b 100644
--- a/etc/TO-DO
+++ b/etc/TO-DO
@@ -24,3 +24,60 @@ Things useful to do for GNU Emacs:
 inverse video.
 * VMS code to list a file directory.  Make dired work.
+Long range:
+   Ideas for extending GNU Emacs to deal with arbitrary character sets.
+I would like GNU Emacs to be extended to handle all the world's alphabets
+and word signs.  I don't expect to have time to do such a thing in the next
+few years, so here are my ideas on the best way to do it.
+* Each graphic is represented by a sequence of ordinary 8-bit characters.
+* All the characters that make up such a sequence have codes >= 0200.
+* The first character of such a sequence is between 0200 and 0237.
+* The remaining characters of such a sequence are all 0240 or higher.
+* The first character of the sequence determines the number of characters
+in the sequence.  Thus, 0200...0207 could start two-character sequences,
+0210...0227 could start three-character sequences, and 0230 could start
+four-character sequences.  (Codes 0231...0237 would be reserved.)
+*  Several common  alphabets,  and  some mathematical   symbols,  would get
+two-character sequences.  (Probably Greek,  Russian,  Hebrew(?), Arabic(?),
+Korean, and Japanese kana).  The remaining alphabets, and  some versions of
+Chinese,  would   get  three-character sequences.    Other  sets of Chinese
+characters would get four-character sequences.
+Each country that uses Chinese characters has its own standard character
+set, and it is not easy to correlate them to avoid overlap.  So there may
+need to be several sets of Chinese characters.  That is why they need so
+much code space.
+True support for Hebrew and Arabic requires dealing with the problem of
+writing direction for mixed text; I don't know what to do for that.
+* The functions that use syntax table would determine the
+syntax of a sequence from its first character.
+* Functions in indent.c for computing widths and columns would
+determine the width of a sequence from its first character.
+So would display routines.
+* Only a few other editing routines would need any change.  In
+particular, searching and regexp matching might not need any change.
+* Most of the work required would be in redisplay.  The only case that
+needs to be supported is with X windows, since ordinary terminals
+can't display all these characters anyway.
+* There might need to be code to translate files from this format
+to whatever format is typically stored on disk.
+I would be very unhappy with half-measures, such as support for
+Japanese only.
author	Eric S. Raymond	1993-03-22 03:00:23 +0000
committer	Eric S. Raymond	1993-03-22 03:00:23 +0000
commit	33d92c1f9de704cda9309731b4d6add46178aafc (patch)
tree	036d61fa604499e80225e246a6ef8af890cc5c1b
parent	462e90c9f9aad73961d3d32dcfb062169318bf5e (diff)
download	emacs-33d92c1f9de704cda9309731b4d6add46178aafc.tar.gz emacs-33d92c1f9de704cda9309731b4d6add46178aafc.zip

diff --git a/etc/TO-DO b/etc/TO-DO index cc5b398eeca..e5b9a49599b 100644 --- a/etc/TO-DO +++ b/etc/TO-DO
@@ -24,3 +24,60 @@ Things useful to do for GNU Emacs:
24	inverse video.	24	inverse video.
25		25
26	* VMS code to list a file directory. Make dired work.	26	* VMS code to list a file directory. Make dired work.
		27
		28	Long range:
		29
		30	Ideas for extending GNU Emacs to deal with arbitrary character sets.
		31
		32	I would like GNU Emacs to be extended to handle all the world's alphabets
		33	and word signs. I don't expect to have time to do such a thing in the next
		34	few years, so here are my ideas on the best way to do it.
		35
		36	* Each graphic is represented by a sequence of ordinary 8-bit characters.
		37
		38	* All the characters that make up such a sequence have codes >= 0200.
		39
		40	* The first character of such a sequence is between 0200 and 0237.
		41
		42	* The remaining characters of such a sequence are all 0240 or higher.
		43
		44	* The first character of the sequence determines the number of characters
		45	in the sequence. Thus, 0200...0207 could start two-character sequences,
		46	0210...0227 could start three-character sequences, and 0230 could start
		47	four-character sequences. (Codes 0231...0237 would be reserved.)
		48
		49	* Several common alphabets, and some mathematical symbols, would get
		50	two-character sequences. (Probably Greek, Russian, Hebrew(?), Arabic(?),
		51	Korean, and Japanese kana). The remaining alphabets, and some versions of
		52	Chinese, would get three-character sequences. Other sets of Chinese
		53	characters would get four-character sequences.
		54
		55	Each country that uses Chinese characters has its own standard character
		56	set, and it is not easy to correlate them to avoid overlap. So there may
		57	need to be several sets of Chinese characters. That is why they need so
		58	much code space.
		59
		60	True support for Hebrew and Arabic requires dealing with the problem of
		61	writing direction for mixed text; I don't know what to do for that.
		62
		63	* The functions that use syntax table would determine the
		64	syntax of a sequence from its first character.
		65
		66	* Functions in indent.c for computing widths and columns would
		67	determine the width of a sequence from its first character.
		68	So would display routines.
		69
		70	* Only a few other editing routines would need any change. In
		71	particular, searching and regexp matching might not need any change.
		72
		73	* Most of the work required would be in redisplay. The only case that
		74	needs to be supported is with X windows, since ordinary terminals
		75	can't display all these characters anyway.
		76
		77	* There might need to be code to translate files from this format
		78	to whatever format is typically stored on disk.
		79
		80
		81	I would be very unhappy with half-measures, such as support for
		82	Japanese only.
		83