aboutsummaryrefslogtreecommitdiffstats
path: root/src/coding.c
diff options
context:
space:
mode:
authorDave Love2002-06-16 19:57:54 +0000
committerDave Love2002-06-16 19:57:54 +0000
commit5a936b4698228cb5c8c86da284a7075a7a34d0c3 (patch)
tree3e4cd74238e5c0c0b4a020471f1759952918a30e /src/coding.c
parentdc8533549ecc3ac1b08dd5fb8f052fcff961ef0e (diff)
downloademacs-5a936b4698228cb5c8c86da284a7075a7a34d0c3.tar.gz
emacs-5a936b4698228cb5c8c86da284a7075a7a34d0c3.zip
comments
Diffstat (limited to 'src/coding.c')
-rw-r--r--src/coding.c106
1 files changed, 24 insertions, 82 deletions
diff --git a/src/coding.c b/src/coding.c
index abc11ea5eb7..78ab0e0db03 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -94,7 +94,7 @@ CODING SYSTEM
94 o BIG5 94 o BIG5
95 95
96 A coding system to encode character sets: ASCII and Big5. Widely 96 A coding system to encode character sets: ASCII and Big5. Widely
97 used by Chinese (mainly in Taiwan and Hong Kong). Details are 97 used for Chinese (mainly in Taiwan and Hong Kong). Details are
98 described in section 8. In this file, when we write "big5" (all 98 described in section 8. In this file, when we write "big5" (all
99 lowercase), we mean the coding system, and when we write "Big5" 99 lowercase), we mean the coding system, and when we write "Big5"
100 (capitalized), we mean the character set. 100 (capitalized), we mean the character set.
@@ -108,7 +108,7 @@ CODING SYSTEM
108 108
109 o Raw-text 109 o Raw-text
110 110
111 A coding system for a text containing raw eight-bit data. Emacs 111 A coding system for text containing raw eight-bit data. Emacs
112 treats each byte of source text as a character (except for 112 treats each byte of source text as a character (except for
113 end-of-line conversion). 113 end-of-line conversion).
114 114
@@ -587,7 +587,7 @@ enum iso_code_class_type
587 (XSTRING (AREF (CODING_ID_ATTRS ((coding)->id), coding_attr_ccl_valids)) \ 587 (XSTRING (AREF (CODING_ID_ATTRS ((coding)->id), coding_attr_ccl_valids)) \
588 ->data) 588 ->data)
589 589
590/* Index for each coding category in `coding_category_table' */ 590/* Index for each coding category in `coding_categories' */
591 591
592enum coding_category 592enum coding_category
593 { 593 {
@@ -2049,21 +2049,23 @@ encode_coding_emacs_mule (coding)
2049 2049
2050/* The following note describes the coding system ISO2022 briefly. 2050/* The following note describes the coding system ISO2022 briefly.
2051 Since the intention of this note is to help understand the 2051 Since the intention of this note is to help understand the
2052 functions in this file, some parts are NOT ACCURATE or OVERLY 2052 functions in this file, some parts are NOT ACCURATE or are OVERLY
2053 SIMPLIFIED. For thorough understanding, please refer to the 2053 SIMPLIFIED. For thorough understanding, please refer to the
2054 original document of ISO2022. 2054 original document of ISO2022. This is equivalent to the standard
2055 ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
2055 2056
2056 ISO2022 provides many mechanisms to encode several character sets 2057 ISO2022 provides many mechanisms to encode several character sets
2057 in 7-bit and 8-bit environments. For 7-bite environments, all text 2058 in 7-bit and 8-bit environments. For 7-bit environments, all text
2058 is encoded using bytes less than 128. This may make the encoded 2059 is encoded using bytes less than 128. This may make the encoded
2059 text a little bit longer, but the text passes more easily through 2060 text a little bit longer, but the text passes more easily through
2060 several gateways, some of which strip off MSB (Most Signigant Bit). 2061 several types of gateway, some of which strip off the MSB (Most
2062 Significant Bit).
2061 2063
2062 There are two kinds of character sets: control character set and 2064 There are two kinds of character sets: control character sets and
2063 graphic character set. The former contains control characters such 2065 graphic character sets. The former contain control characters such
2064 as `newline' and `escape' to provide control functions (control 2066 as `newline' and `escape' to provide control functions (control
2065 functions are also provided by escape sequences). The latter 2067 functions are also provided by escape sequences). The latter
2066 contains graphic characters such as 'A' and '-'. Emacs recognizes 2068 contain graphic characters such as 'A' and '-'. Emacs recognizes
2067 two control character sets and many graphic character sets. 2069 two control character sets and many graphic character sets.
2068 2070
2069 Graphic character sets are classified into one of the following 2071 Graphic character sets are classified into one of the following
@@ -2075,14 +2077,14 @@ encode_coding_emacs_mule (coding)
2075 - DIMENSION2_CHARS96 2077 - DIMENSION2_CHARS96
2076 2078
2077 In addition, each character set is assigned an identification tag, 2079 In addition, each character set is assigned an identification tag,
2078 unique for each set, called "final character" (denoted as <F> 2080 unique for each set, called the "final character" (denoted as <F>
2079 hereafter). The <F> of each character set is decided by ECMA(*) 2081 hereafter). The <F> of each character set is decided by ECMA(*)
2080 when it is registered in ISO. The code range of <F> is 0x30..0x7F 2082 when it is registered in ISO. The code range of <F> is 0x30..0x7F
2081 (0x30..0x3F are for private use only). 2083 (0x30..0x3F are for private use only).
2082 2084
2083 Note (*): ECMA = European Computer Manufacturers Association 2085 Note (*): ECMA = European Computer Manufacturers Association
2084 2086
2085 Here are examples of graphic character set [NAME(<F>)]: 2087 Here are examples of graphic character sets [NAME(<F>)]:
2086 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ... 2088 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
2087 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ... 2089 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
2088 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ... 2090 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
@@ -2175,11 +2177,11 @@ encode_coding_emacs_mule (coding)
2175 Note (**): If <F> is '@', 'A', or 'B', the intermediate character 2177 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
2176 '(' must be omitted. We refer to this as "short-form" hereafter. 2178 '(' must be omitted. We refer to this as "short-form" hereafter.
2177 2179
2178 Now you may notice that there are a lot of ways for encoding the 2180 Now you may notice that there are a lot of ways of encoding the
2179 same multilingual text in ISO2022. Actually, there exist many 2181 same multilingual text in ISO2022. Actually, there exist many
2180 coding systems such as Compound Text (used in X11's inter client 2182 coding systems such as Compound Text (used in X11's inter client
2181 communication, ISO-2022-JP (used in Japanese internet), ISO-2022-KR 2183 communication, ISO-2022-JP (used in Japanese Internet), ISO-2022-KR
2182 (used in Korean internet), EUC (Extended UNIX Code, used in Asian 2184 (used in Korean Internet), EUC (Extended UNIX Code, used in Asian
2183 localized platforms), and all of these are variants of ISO2022. 2185 localized platforms), and all of these are variants of ISO2022.
2184 2186
2185 In addition to the above, Emacs handles two more kinds of escape 2187 In addition to the above, Emacs handles two more kinds of escape
@@ -2201,19 +2203,19 @@ encode_coding_emacs_mule (coding)
2201 o ESC '3' -- start relative composition with alternate chars (**) 2203 o ESC '3' -- start relative composition with alternate chars (**)
2202 o ESC '4' -- start rule-base composition with alternate chars (**) 2204 o ESC '4' -- start rule-base composition with alternate chars (**)
2203 Since these are not standard escape sequences of any ISO standard, 2205 Since these are not standard escape sequences of any ISO standard,
2204 the use of them for these meaning is restricted to Emacs only. 2206 the use of them with these meanings is restricted to Emacs only.
2205 2207
2206 (*) This form is used only in Emacs 20.5 and the older versions, 2208 (*) This form is used only in Emacs 20.7 and older versions,
2207 but the newer versions can safely decode it. 2209 but newer versions can safely decode it.
2208 (**) This form is used only in Emacs 21.1 and the newer versions, 2210 (**) This form is used only in Emacs 21.1 and newer versions,
2209 and the older versions can't decode it. 2211 and older versions can't decode it.
2210 2212
2211 Here's a list of examples usages of these composition escape 2213 Here's a list of example usages of these composition escape
2212 sequences (categorized by `enum composition_method'). 2214 sequences (categorized by `enum composition_method').
2213 2215
2214 COMPOSITION_RELATIVE: 2216 COMPOSITION_RELATIVE:
2215 ESC 0 CHAR [ CHAR ] ESC 1 2217 ESC 0 CHAR [ CHAR ] ESC 1
2216 COMPOSITOIN_WITH_RULE: 2218 COMPOSITION_WITH_RULE:
2217 ESC 2 CHAR [ RULE CHAR ] ESC 1 2219 ESC 2 CHAR [ RULE CHAR ] ESC 1
2218 COMPOSITION_WITH_ALTCHARS: 2220 COMPOSITION_WITH_ALTCHARS:
2219 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 2221 ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
@@ -4535,66 +4537,6 @@ encode_coding_charset (coding)
4535 4537
4536/*** 7. C library functions ***/ 4538/*** 7. C library functions ***/
4537 4539
4538/* In Emacs Lisp, coding system is represented by a Lisp symbol which
4539 has a property `coding-system'. The value of this property is a
4540 vector of length 5 (called as coding-vector). Among elements of
4541 this vector, the first (element[0]) and the fifth (element[4])
4542 carry important information for decoding/encoding. Before
4543 decoding/encoding, this information should be set in fields of a
4544 structure of type `coding_system'.
4545
4546 A value of property `coding-system' can be a symbol of another
4547 subsidiary coding-system. In that case, Emacs gets coding-vector
4548 from that symbol.
4549
4550 `element[0]' contains information to be set in `coding->type'. The
4551 value and its meaning is as follows:
4552
4553 0 -- coding_type_emacs_mule
4554 1 -- coding_type_sjis
4555 2 -- coding_type_iso_2022
4556 3 -- coding_type_big5
4557 4 -- coding_type_ccl encoder/decoder written in CCL
4558 nil -- coding_type_no_conversion
4559 t -- coding_type_undecided (automatic conversion on decoding,
4560 no-conversion on encoding)
4561
4562 `element[4]' contains information to be set in `coding->flags' and
4563 `coding->spec'. The meaning varies by `coding->type'.
4564
4565 If `coding->type' is `coding_type_iso_2022', element[4] is a vector
4566 of length 32 (of which the first 13 sub-elements are used now).
4567 Meanings of these sub-elements are:
4568
4569 sub-element[N] where N is 0 through 3: to be set in `coding->spec.iso_2022'
4570 If the value is an integer of valid charset, the charset is
4571 assumed to be designated to graphic register N initially.
4572
4573 If the value is minus, it is a minus value of charset which
4574 reserves graphic register N, which means that the charset is
4575 not designated initially but should be designated to graphic
4576 register N just before encoding a character in that charset.
4577
4578 If the value is nil, graphic register N is never used on
4579 encoding.
4580
4581 sub-element[N] where N is 4 through 11: to be set in `coding->flags'
4582 Each value takes t or nil. See the section ISO2022 of
4583 `coding.h' for more information.
4584
4585 If `coding->type' is `coding_type_big5', element[4] is t to denote
4586 BIG5-ETen or nil to denote BIG5-HKU.
4587
4588 If `coding->type' takes the other value, element[4] is ignored.
4589
4590 Emacs Lisp's coding system also carries information about format of
4591 end-of-line in a value of property `eol-type'. If the value is
4592 integer, 0 means eol_lf, 1 means eol_crlf, and 2 means eol_cr. If
4593 it is not integer, it should be a vector of subsidiary coding
4594 systems of which property `eol-type' has one of above values.
4595
4596*/
4597
4598/* Setup coding context CODING from information about CODING_SYSTEM. 4540/* Setup coding context CODING from information about CODING_SYSTEM.
4599 If CODING_SYSTEM is nil, `no-conversion' is assumed. If 4541 If CODING_SYSTEM is nil, `no-conversion' is assumed. If
4600 CODING_SYSTEM is invalid, signal an error. */ 4542 CODING_SYSTEM is invalid, signal an error. */