diff options
| author | Eric S. Raymond | 1993-03-22 03:00:23 +0000 |
|---|---|---|
| committer | Eric S. Raymond | 1993-03-22 03:00:23 +0000 |
| commit | 33d92c1f9de704cda9309731b4d6add46178aafc (patch) | |
| tree | 036d61fa604499e80225e246a6ef8af890cc5c1b | |
| parent | 462e90c9f9aad73961d3d32dcfb062169318bf5e (diff) | |
| download | emacs-33d92c1f9de704cda9309731b4d6add46178aafc.tar.gz emacs-33d92c1f9de704cda9309731b4d6add46178aafc.zip | |
Merged in CHARACTERS
| -rw-r--r-- | etc/TO-DO | 57 |
1 files changed, 57 insertions, 0 deletions
| @@ -24,3 +24,60 @@ Things useful to do for GNU Emacs: | |||
| 24 | inverse video. | 24 | inverse video. |
| 25 | 25 | ||
| 26 | * VMS code to list a file directory. Make dired work. | 26 | * VMS code to list a file directory. Make dired work. |
| 27 | |||
| 28 | Long range: | ||
| 29 | |||
| 30 | Ideas for extending GNU Emacs to deal with arbitrary character sets. | ||
| 31 | |||
| 32 | I would like GNU Emacs to be extended to handle all the world's alphabets | ||
| 33 | and word signs. I don't expect to have time to do such a thing in the next | ||
| 34 | few years, so here are my ideas on the best way to do it. | ||
| 35 | |||
| 36 | * Each graphic is represented by a sequence of ordinary 8-bit characters. | ||
| 37 | |||
| 38 | * All the characters that make up such a sequence have codes >= 0200. | ||
| 39 | |||
| 40 | * The first character of such a sequence is between 0200 and 0237. | ||
| 41 | |||
| 42 | * The remaining characters of such a sequence are all 0240 or higher. | ||
| 43 | |||
| 44 | * The first character of the sequence determines the number of characters | ||
| 45 | in the sequence. Thus, 0200...0207 could start two-character sequences, | ||
| 46 | 0210...0227 could start three-character sequences, and 0230 could start | ||
| 47 | four-character sequences. (Codes 0231...0237 would be reserved.) | ||
| 48 | |||
| 49 | * Several common alphabets, and some mathematical symbols, would get | ||
| 50 | two-character sequences. (Probably Greek, Russian, Hebrew(?), Arabic(?), | ||
| 51 | Korean, and Japanese kana). The remaining alphabets, and some versions of | ||
| 52 | Chinese, would get three-character sequences. Other sets of Chinese | ||
| 53 | characters would get four-character sequences. | ||
| 54 | |||
| 55 | Each country that uses Chinese characters has its own standard character | ||
| 56 | set, and it is not easy to correlate them to avoid overlap. So there may | ||
| 57 | need to be several sets of Chinese characters. That is why they need so | ||
| 58 | much code space. | ||
| 59 | |||
| 60 | True support for Hebrew and Arabic requires dealing with the problem of | ||
| 61 | writing direction for mixed text; I don't know what to do for that. | ||
| 62 | |||
| 63 | * The functions that use syntax table would determine the | ||
| 64 | syntax of a sequence from its first character. | ||
| 65 | |||
| 66 | * Functions in indent.c for computing widths and columns would | ||
| 67 | determine the width of a sequence from its first character. | ||
| 68 | So would display routines. | ||
| 69 | |||
| 70 | * Only a few other editing routines would need any change. In | ||
| 71 | particular, searching and regexp matching might not need any change. | ||
| 72 | |||
| 73 | * Most of the work required would be in redisplay. The only case that | ||
| 74 | needs to be supported is with X windows, since ordinary terminals | ||
| 75 | can't display all these characters anyway. | ||
| 76 | |||
| 77 | * There might need to be code to translate files from this format | ||
| 78 | to whatever format is typically stored on disk. | ||
| 79 | |||
| 80 | |||
| 81 | I would be very unhappy with half-measures, such as support for | ||
| 82 | Japanese only. | ||
| 83 | |||