diff options
| author | Kenichi Handa | 1998-01-22 01:26:45 +0000 |
|---|---|---|
| committer | Kenichi Handa | 1998-01-22 01:26:45 +0000 |
| commit | d46c5b12512d9b56b35b1e9bafd6e51535270f77 (patch) | |
| tree | 2fcf7b6e32fb52071ccfb90b0fdd04a09a09eb16 /src/coding.c | |
| parent | 658cc2522eef712ab87432953aa385a62de1de63 (diff) | |
| download | emacs-d46c5b12512d9b56b35b1e9bafd6e51535270f77.tar.gz emacs-d46c5b12512d9b56b35b1e9bafd6e51535270f77.zip | |
Vselect_safe_coding_system_function): New variable.
(coding_category_table): This variable deleted.
(Vcoding_category_table): New variable.
(coding_category_name): Add "coding-category-iso-7-tight".
(detect_coding_iso2022): Check the mask
CODING_FLAG_ISO_DESIGNATION in CODING->FLAGS. Check a new coding
category coding-category-iso-7-tight.
(DECODE_DESIGNATION): Decode only such designations that CODING
can handle.
(check_composing_code): New function.
(decode_coding_iso2022): Decode only such characters that CODING
can handle.
(encode_coding_iso2022): Before and after encoding composite
characters, reset designation and invocation status.
(detect_coding_sjis): Delete unnecessary check.
(detect_coding_big5): Likewise.
(encode_designation_at_bol): Check the validity of requested
designation register.
(setup_coding_system): Set requested designation registers for
non-supported charsets to
CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION. Set mask
CODING_FLAG_ISO_DESIGNATION in CODING->FLAGS. Code tuned for
no-conversion and undecided.
(detect_coding): Adjusted for the new variable
Vcoding_category_table.
(syms_of_coding): Initialize Vcoding_category_table and staticpro
it. Register select-safe-coding-system as a Lisp variable.
(DECODE_CHARACTER_ASCII): Update coding->produced_char;
(DECODE_CHARACTER_DIMENSION1): Likewise.
(Qraw_text, Qcoding_category): New variables.
(syms_of_coding): Intern and staticpro them.
(coding_system_table): New variable.
(CHARSET_OK, SHIFT_OUT_OK): New macros.
(detect_coding_iso2022): Detection algorithm improved.
(decode_coding_iso2022): Arg CONSUMED deleted, and the meaning of
return value changed. Update members produced, produced_char,
consumed, consumed_char of the struct *coding. Pay attention to
CODING_MODE_INHIBIT_INCONSISTENT_EOL.
(encode_coding_iso2022): Likewise.
(decode_coding_sjis_big5, encode_coding_sjis_big5): Likewise.
(decode_eol, encode_eol): Likewise.
(ENCODE_ISO_CHARACTER): Update coding->consumed_char.
(DECODE_SJIS_BIG5_CHARACTER): Update coding->produced_char.
(ENCODE_SJIS_BIG5_CHARACTER): Update coding->consumed_char.
(detect_coding(detect_coding(detect_ITIES and SKIP.
(detect_coding): Adjusted for the change of detect_coding_mask.
Update coding->heading_ascii.
(detect_eol_type): New arg SKIP.
(detect_eol): Adjusted for the change of detect_eol_type.
(ccl_codign_driver): New function.
(decode_coding): Arg CONSUMED deleted, and the meaning of return
value changed. Update members produced, produced_char, consumed,
consumed_char of the struct *coding.
(encode_coding): Likewise.
(shrink_decoding_region, shrink_encoding_region): New function.
(code_convert_region, code_convert_string): Completely rewritten.
(detect_coding_sy(detect_coding_sy(detect_coding_sy(detect_coding_sy(detect_codiT.
(Fdetect_coding_string): New function.
(Fdecode_coding_region, Fencode_coding_region): Adjusted for the
change of code_convert_region.
(Fdecode_coding_string, Fencode_coding_string): Adjusted for the
change of code_convert_string.
(Fupdate_iso_coding_systems): New function.
(init_coding_once): Initialize coding_system_table.
Diffstat (limited to 'src/coding.c')
| -rw-r--r-- | src/coding.c | 2731 |
1 files changed, 1771 insertions, 960 deletions
diff --git a/src/coding.c b/src/coding.c index 752ca765994..60bfed358a7 100644 --- a/src/coding.c +++ b/src/coding.c | |||
| @@ -79,8 +79,8 @@ Boston, MA 02111-1307, USA. */ | |||
| 79 | (Code Conversion Language) programs. Emacs executes the CCL program | 79 | (Code Conversion Language) programs. Emacs executes the CCL program |
| 80 | while reading/writing. | 80 | while reading/writing. |
| 81 | 81 | ||
| 82 | Emacs represents a coding-system by a Lisp symbol that has a property | 82 | Emacs represents a coding system by a Lisp symbol that has a property |
| 83 | `coding-system'. But, before actually using the coding-system, the | 83 | `coding-system'. But, before actually using the coding system, the |
| 84 | information about it is set in a structure of type `struct | 84 | information about it is set in a structure of type `struct |
| 85 | coding_system' for rapid processing. See section 6 for more details. | 85 | coding_system' for rapid processing. See section 6 for more details. |
| 86 | 86 | ||
| @@ -91,7 +91,8 @@ Boston, MA 02111-1307, USA. */ | |||
| 91 | How end-of-line of a text is encoded depends on a system. For | 91 | How end-of-line of a text is encoded depends on a system. For |
| 92 | instance, Unix's format is just one byte of `line-feed' code, | 92 | instance, Unix's format is just one byte of `line-feed' code, |
| 93 | whereas DOS's format is two-byte sequence of `carriage-return' and | 93 | whereas DOS's format is two-byte sequence of `carriage-return' and |
| 94 | `line-feed' codes. MacOS's format is one byte of `carriage-return'. | 94 | `line-feed' codes. MacOS's format is usually one byte of |
| 95 | `carriage-return'. | ||
| 95 | 96 | ||
| 96 | Since text characters encoding and end-of-line encoding are | 97 | Since text characters encoding and end-of-line encoding are |
| 97 | independent, any coding system described above can take | 98 | independent, any coding system described above can take |
| @@ -120,16 +121,24 @@ detect_coding_emacs_mule (src, src_end) | |||
| 120 | 121 | ||
| 121 | These functions decode SRC_BYTES length text at SOURCE encoded in | 122 | These functions decode SRC_BYTES length text at SOURCE encoded in |
| 122 | CODING to Emacs' internal format (emacs-mule). The resulting text | 123 | CODING to Emacs' internal format (emacs-mule). The resulting text |
| 123 | goes to a place pointed to by DESTINATION, the length of which should | 124 | goes to a place pointed to by DESTINATION, the length of which |
| 124 | not exceed DST_BYTES. The number of bytes actually processed is | 125 | should not exceed DST_BYTES. These functions set the information of |
| 125 | returned as *CONSUMED. The return value is the length of the decoded | 126 | original and decoded texts in the members produced, produced_char, |
| 126 | text. Below is a template of these functions. */ | 127 | consumed, and consumed_char of the structure *CODING. |
| 128 | |||
| 129 | The return value is an integer (CODING_FINISH_XXX) indicating how | ||
| 130 | the decoding finished. | ||
| 131 | |||
| 132 | DST_BYTES zero means that source area and destination area are | ||
| 133 | overlapped, which means that we can produce a decoded text until it | ||
| 134 | reaches at the head of not-yet-decoded source text. | ||
| 135 | |||
| 136 | Below is a template of these functions. */ | ||
| 127 | #if 0 | 137 | #if 0 |
| 128 | decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) | 138 | decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes) |
| 129 | struct coding_system *coding; | 139 | struct coding_system *coding; |
| 130 | unsigned char *source, *destination; | 140 | unsigned char *source, *destination; |
| 131 | int src_bytes, dst_bytes; | 141 | int src_bytes, dst_bytes; |
| 132 | int *consumed; | ||
| 133 | { | 142 | { |
| 134 | ... | 143 | ... |
| 135 | } | 144 | } |
| @@ -140,15 +149,23 @@ decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) | |||
| 140 | These functions encode SRC_BYTES length text at SOURCE of Emacs' | 149 | These functions encode SRC_BYTES length text at SOURCE of Emacs' |
| 141 | internal format (emacs-mule) to CODING. The resulting text goes to | 150 | internal format (emacs-mule) to CODING. The resulting text goes to |
| 142 | a place pointed to by DESTINATION, the length of which should not | 151 | a place pointed to by DESTINATION, the length of which should not |
| 143 | exceed DST_BYTES. The number of bytes actually processed is | 152 | exceed DST_BYTES. These functions set the information of |
| 144 | returned as *CONSUMED. The return value is the length of the | 153 | original and encoded texts in the members produced, produced_char, |
| 145 | encoded text. Below is a template of these functions. */ | 154 | consumed, and consumed_char of the structure *CODING. |
| 155 | |||
| 156 | The return value is an integer (CODING_FINISH_XXX) indicating how | ||
| 157 | the encoding finished. | ||
| 158 | |||
| 159 | DST_BYTES zero means that source area and destination area are | ||
| 160 | overlapped, which means that we can produce a decoded text until it | ||
| 161 | reaches at the head of not-yet-decoded source text. | ||
| 162 | |||
| 163 | Below is a template of these functions. */ | ||
| 146 | #if 0 | 164 | #if 0 |
| 147 | encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) | 165 | encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes) |
| 148 | struct coding_system *coding; | 166 | struct coding_system *coding; |
| 149 | unsigned char *source, *destination; | 167 | unsigned char *source, *destination; |
| 150 | int src_bytes, dst_bytes; | 168 | int src_bytes, dst_bytes; |
| 151 | int *consumed; | ||
| 152 | { | 169 | { |
| 153 | ... | 170 | ... |
| 154 | } | 171 | } |
| @@ -200,7 +217,10 @@ encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) | |||
| 200 | if (COMPOSING_P (coding->composing)) \ | 217 | if (COMPOSING_P (coding->composing)) \ |
| 201 | *dst++ = 0xA0, *dst++ = (c) | 0x80; \ | 218 | *dst++ = 0xA0, *dst++ = (c) | 0x80; \ |
| 202 | else \ | 219 | else \ |
| 203 | *dst++ = (c); \ | 220 | { \ |
| 221 | *dst++ = (c); \ | ||
| 222 | coding->produced_char++; \ | ||
| 223 | } \ | ||
| 204 | } while (0) | 224 | } while (0) |
| 205 | 225 | ||
| 206 | /* Decode one DIMENSION1 character whose charset is CHARSET and whose | 226 | /* Decode one DIMENSION1 character whose charset is CHARSET and whose |
| @@ -212,7 +232,10 @@ encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) | |||
| 212 | if (COMPOSING_P (coding->composing)) \ | 232 | if (COMPOSING_P (coding->composing)) \ |
| 213 | *dst++ = leading_code + 0x20; \ | 233 | *dst++ = leading_code + 0x20; \ |
| 214 | else \ | 234 | else \ |
| 215 | *dst++ = leading_code; \ | 235 | { \ |
| 236 | *dst++ = leading_code; \ | ||
| 237 | coding->produced_char++; \ | ||
| 238 | } \ | ||
| 216 | if (leading_code = CHARSET_LEADING_CODE_EXT (charset)) \ | 239 | if (leading_code = CHARSET_LEADING_CODE_EXT (charset)) \ |
| 217 | *dst++ = leading_code; \ | 240 | *dst++ = leading_code; \ |
| 218 | *dst++ = (c) | 0x80; \ | 241 | *dst++ = (c) | 0x80; \ |
| @@ -260,6 +283,8 @@ Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument; | |||
| 260 | Lisp_Object Qstart_process, Qopen_network_stream; | 283 | Lisp_Object Qstart_process, Qopen_network_stream; |
| 261 | Lisp_Object Qtarget_idx; | 284 | Lisp_Object Qtarget_idx; |
| 262 | 285 | ||
| 286 | Lisp_Object Vselect_safe_coding_system_function; | ||
| 287 | |||
| 263 | /* Mnemonic character of each format of end-of-line. */ | 288 | /* Mnemonic character of each format of end-of-line. */ |
| 264 | int eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac; | 289 | int eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac; |
| 265 | /* Mnemonic character to indicate format of end-of-line is not yet | 290 | /* Mnemonic character to indicate format of end-of-line is not yet |
| @@ -276,8 +301,9 @@ Lisp_Object Vcoding_system_list, Vcoding_system_alist; | |||
| 276 | 301 | ||
| 277 | Lisp_Object Qcoding_system_p, Qcoding_system_error; | 302 | Lisp_Object Qcoding_system_p, Qcoding_system_error; |
| 278 | 303 | ||
| 279 | /* Coding system emacs-mule is for converting only end-of-line format. */ | 304 | /* Coding system emacs-mule and raw-text are for converting only |
| 280 | Lisp_Object Qemacs_mule; | 305 | end-of-line format. */ |
| 306 | Lisp_Object Qemacs_mule, Qraw_text; | ||
| 281 | 307 | ||
| 282 | /* Coding-systems are handed between Emacs Lisp programs and C internal | 308 | /* Coding-systems are handed between Emacs Lisp programs and C internal |
| 283 | routines by the following three variables. */ | 309 | routines by the following three variables. */ |
| @@ -311,19 +337,20 @@ Lisp_Object Vnetwork_coding_system_alist; | |||
| 311 | 337 | ||
| 312 | #endif /* emacs */ | 338 | #endif /* emacs */ |
| 313 | 339 | ||
| 314 | Lisp_Object Qcoding_category_index; | 340 | Lisp_Object Qcoding_category, Qcoding_category_index; |
| 315 | 341 | ||
| 316 | /* List of symbols `coding-category-xxx' ordered by priority. */ | 342 | /* List of symbols `coding-category-xxx' ordered by priority. */ |
| 317 | Lisp_Object Vcoding_category_list; | 343 | Lisp_Object Vcoding_category_list; |
| 318 | 344 | ||
| 319 | /* Table of coding-systems currently assigned to each coding-category. */ | 345 | /* Table of coding categories (Lisp symbols). */ |
| 320 | Lisp_Object coding_category_table[CODING_CATEGORY_IDX_MAX]; | 346 | Lisp_Object Vcoding_category_table; |
| 321 | 347 | ||
| 322 | /* Table of names of symbol for each coding-category. */ | 348 | /* Table of names of symbol for each coding-category. */ |
| 323 | char *coding_category_name[CODING_CATEGORY_IDX_MAX] = { | 349 | char *coding_category_name[CODING_CATEGORY_IDX_MAX] = { |
| 324 | "coding-category-emacs-mule", | 350 | "coding-category-emacs-mule", |
| 325 | "coding-category-sjis", | 351 | "coding-category-sjis", |
| 326 | "coding-category-iso-7", | 352 | "coding-category-iso-7", |
| 353 | "coding-category-iso-7-tight", | ||
| 327 | "coding-category-iso-8-1", | 354 | "coding-category-iso-8-1", |
| 328 | "coding-category-iso-8-2", | 355 | "coding-category-iso-8-2", |
| 329 | "coding-category-iso-7-else", | 356 | "coding-category-iso-7-else", |
| @@ -333,6 +360,10 @@ char *coding_category_name[CODING_CATEGORY_IDX_MAX] = { | |||
| 333 | "coding-category-binary" | 360 | "coding-category-binary" |
| 334 | }; | 361 | }; |
| 335 | 362 | ||
| 363 | /* Table pointers to coding systems corresponding to each coding | ||
| 364 | categories. */ | ||
| 365 | struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX]; | ||
| 366 | |||
| 336 | /* Flag to tell if we look up unification table on character code | 367 | /* Flag to tell if we look up unification table on character code |
| 337 | conversion. */ | 368 | conversion. */ |
| 338 | Lisp_Object Venable_character_unification; | 369 | Lisp_Object Venable_character_unification; |
| @@ -399,7 +430,7 @@ enum emacs_code_class_type emacs_code_class[256]; | |||
| 399 | 430 | ||
| 400 | /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | 431 | /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". |
| 401 | Check if a text is encoded in Emacs' internal format. If it is, | 432 | Check if a text is encoded in Emacs' internal format. If it is, |
| 402 | return CODING_CATEGORY_MASK_EMASC_MULE, else return 0. */ | 433 | return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */ |
| 403 | 434 | ||
| 404 | int | 435 | int |
| 405 | detect_coding_emacs_mule (src, src_end) | 436 | detect_coding_emacs_mule (src, src_end) |
| @@ -609,10 +640,19 @@ detect_coding_emacs_mule (src, src_end) | |||
| 609 | 640 | ||
| 610 | enum iso_code_class_type iso_code_class[256]; | 641 | enum iso_code_class_type iso_code_class[256]; |
| 611 | 642 | ||
| 643 | #define CHARSET_OK(idx, charset) \ | ||
| 644 | (CODING_SPEC_ISO_REQUESTED_DESIGNATION \ | ||
| 645 | (coding_system_table[idx], charset) \ | ||
| 646 | != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION) | ||
| 647 | |||
| 648 | #define SHIFT_OUT_OK(idx) \ | ||
| 649 | (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0) | ||
| 650 | |||
| 612 | /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | 651 | /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". |
| 613 | Check if a text is encoded in ISO2022. If it is, returns an | 652 | Check if a text is encoded in ISO2022. If it is, returns an |
| 614 | integer in which appropriate flag bits any of: | 653 | integer in which appropriate flag bits any of: |
| 615 | CODING_CATEGORY_MASK_ISO_7 | 654 | CODING_CATEGORY_MASK_ISO_7 |
| 655 | CODING_CATEGORY_MASK_ISO_7_TIGHT | ||
| 616 | CODING_CATEGORY_MASK_ISO_8_1 | 656 | CODING_CATEGORY_MASK_ISO_8_1 |
| 617 | CODING_CATEGORY_MASK_ISO_8_2 | 657 | CODING_CATEGORY_MASK_ISO_8_2 |
| 618 | CODING_CATEGORY_MASK_ISO_7_ELSE | 658 | CODING_CATEGORY_MASK_ISO_7_ELSE |
| @@ -624,24 +664,12 @@ int | |||
| 624 | detect_coding_iso2022 (src, src_end) | 664 | detect_coding_iso2022 (src, src_end) |
| 625 | unsigned char *src, *src_end; | 665 | unsigned char *src, *src_end; |
| 626 | { | 666 | { |
| 627 | int mask = (CODING_CATEGORY_MASK_ISO_7 | 667 | int mask = CODING_CATEGORY_MASK_ISO; |
| 628 | | CODING_CATEGORY_MASK_ISO_8_1 | 668 | int mask_found = 0; |
| 629 | | CODING_CATEGORY_MASK_ISO_8_2 | 669 | int reg[4], shift_out = 0; |
| 630 | | CODING_CATEGORY_MASK_ISO_7_ELSE | 670 | int c, c1, i, charset; |
| 631 | | CODING_CATEGORY_MASK_ISO_8_ELSE | ||
| 632 | ); | ||
| 633 | int g1 = 0; /* 1 iff designating to G1. */ | ||
| 634 | int c, i; | ||
| 635 | struct coding_system coding_iso_8_1, coding_iso_8_2; | ||
| 636 | |||
| 637 | /* Coding systems of these categories may accept latin extra codes. */ | ||
| 638 | setup_coding_system | ||
| 639 | (XSYMBOL (coding_category_table[CODING_CATEGORY_IDX_ISO_8_1])->value, | ||
| 640 | &coding_iso_8_1); | ||
| 641 | setup_coding_system | ||
| 642 | (XSYMBOL (coding_category_table[CODING_CATEGORY_IDX_ISO_8_2])->value, | ||
| 643 | &coding_iso_8_2); | ||
| 644 | 671 | ||
| 672 | reg[0] = CHARSET_ASCII, reg[1] = reg[2] = reg[3] = -1; | ||
| 645 | while (mask && src < src_end) | 673 | while (mask && src < src_end) |
| 646 | { | 674 | { |
| 647 | c = *src++; | 675 | c = *src++; |
| @@ -651,15 +679,17 @@ detect_coding_iso2022 (src, src_end) | |||
| 651 | if (src >= src_end) | 679 | if (src >= src_end) |
| 652 | break; | 680 | break; |
| 653 | c = *src++; | 681 | c = *src++; |
| 654 | if ((c >= '(' && c <= '/')) | 682 | if (c >= '(' && c <= '/') |
| 655 | { | 683 | { |
| 656 | /* Designation sequence for a charset of dimension 1. */ | 684 | /* Designation sequence for a charset of dimension 1. */ |
| 657 | if (src >= src_end) | 685 | if (src >= src_end) |
| 658 | break; | 686 | break; |
| 659 | c = *src++; | 687 | c1 = *src++; |
| 660 | if (c < ' ' || c >= 0x80) | 688 | if (c1 < ' ' || c1 >= 0x80 |
| 661 | /* Invalid designation sequence. */ | 689 | || (charset = iso_charset_table[0][c >= ','][c1]) < 0) |
| 662 | return 0; | 690 | /* Invalid designation sequence. Just ignore. */ |
| 691 | break; | ||
| 692 | reg[(c - '(') % 4] = charset; | ||
| 663 | } | 693 | } |
| 664 | else if (c == '$') | 694 | else if (c == '$') |
| 665 | { | 695 | { |
| @@ -669,37 +699,91 @@ detect_coding_iso2022 (src, src_end) | |||
| 669 | c = *src++; | 699 | c = *src++; |
| 670 | if (c >= '@' && c <= 'B') | 700 | if (c >= '@' && c <= 'B') |
| 671 | /* Designation for JISX0208.1978, GB2312, or JISX0208. */ | 701 | /* Designation for JISX0208.1978, GB2312, or JISX0208. */ |
| 672 | ; | 702 | reg[0] = charset = iso_charset_table[1][0][c]; |
| 673 | else if (c >= '(' && c <= '/') | 703 | else if (c >= '(' && c <= '/') |
| 674 | { | 704 | { |
| 675 | if (src >= src_end) | 705 | if (src >= src_end) |
| 676 | break; | 706 | break; |
| 677 | c = *src++; | 707 | c1 = *src++; |
| 678 | if (c < ' ' || c >= 0x80) | 708 | if (c1 < ' ' || c1 >= 0x80 |
| 679 | /* Invalid designation sequence. */ | 709 | || (charset = iso_charset_table[1][c >= ','][c1]) < 0) |
| 680 | return 0; | 710 | /* Invalid designation sequence. Just ignore. */ |
| 711 | break; | ||
| 712 | reg[(c - '(') % 4] = charset; | ||
| 681 | } | 713 | } |
| 682 | else | 714 | else |
| 683 | /* Invalid designation sequence. */ | 715 | /* Invalid designation sequence. Just ignore. */ |
| 684 | return 0; | 716 | break; |
| 717 | } | ||
| 718 | else if (c == 'N' || c == 'n') | ||
| 719 | { | ||
| 720 | if (shift_out == 0 | ||
| 721 | && (reg[1] >= 0 | ||
| 722 | || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE) | ||
| 723 | || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE))) | ||
| 724 | { | ||
| 725 | /* Locking shift out. */ | ||
| 726 | mask &= ~CODING_CATEGORY_MASK_ISO_7BIT; | ||
| 727 | mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT; | ||
| 728 | shift_out = 1; | ||
| 729 | } | ||
| 730 | break; | ||
| 731 | } | ||
| 732 | else if (c == 'O' || c == 'o') | ||
| 733 | { | ||
| 734 | if (shift_out == 1) | ||
| 735 | { | ||
| 736 | /* Locking shift in. */ | ||
| 737 | mask &= ~CODING_CATEGORY_MASK_ISO_7BIT; | ||
| 738 | mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT; | ||
| 739 | shift_out = 0; | ||
| 740 | } | ||
| 741 | break; | ||
| 685 | } | 742 | } |
| 686 | else if (c == 'N' || c == 'O' || c == 'n' || c == 'o') | ||
| 687 | /* Locking shift. */ | ||
| 688 | mask &= (CODING_CATEGORY_MASK_ISO_7_ELSE | ||
| 689 | | CODING_CATEGORY_MASK_ISO_8_ELSE); | ||
| 690 | else if (c == '0' || c == '1' || c == '2') | 743 | else if (c == '0' || c == '1' || c == '2') |
| 691 | /* Start/end composition. */ | 744 | /* Start/end composition. Just ignore. */ |
| 692 | ; | 745 | break; |
| 693 | else | 746 | else |
| 694 | /* Invalid escape sequence. */ | 747 | /* Invalid escape sequence. Just ignore. */ |
| 695 | return 0; | 748 | break; |
| 749 | |||
| 750 | /* We found a valid designation sequence for CHARSET. */ | ||
| 751 | mask &= ~CODING_CATEGORY_MASK_ISO_8BIT; | ||
| 752 | if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7, charset)) | ||
| 753 | mask_found |= CODING_CATEGORY_MASK_ISO_7; | ||
| 754 | else | ||
| 755 | mask &= ~CODING_CATEGORY_MASK_ISO_7; | ||
| 756 | if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT, charset)) | ||
| 757 | mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT; | ||
| 758 | else | ||
| 759 | mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT; | ||
| 760 | if (! CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE, charset)) | ||
| 761 | mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE; | ||
| 762 | if (! CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE, charset)) | ||
| 763 | mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE; | ||
| 696 | break; | 764 | break; |
| 697 | 765 | ||
| 698 | case ISO_CODE_SO: | 766 | case ISO_CODE_SO: |
| 699 | mask &= (CODING_CATEGORY_MASK_ISO_7_ELSE | 767 | if (shift_out == 0 |
| 700 | | CODING_CATEGORY_MASK_ISO_8_ELSE); | 768 | && (reg[1] >= 0 |
| 769 | || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE) | ||
| 770 | || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE))) | ||
| 771 | { | ||
| 772 | /* Locking shift out. */ | ||
| 773 | mask &= ~CODING_CATEGORY_MASK_ISO_7BIT; | ||
| 774 | mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT; | ||
| 775 | } | ||
| 701 | break; | 776 | break; |
| 702 | 777 | ||
| 778 | case ISO_CODE_SI: | ||
| 779 | if (shift_out == 1) | ||
| 780 | { | ||
| 781 | /* Locking shift in. */ | ||
| 782 | mask &= ~CODING_CATEGORY_MASK_ISO_7BIT; | ||
| 783 | mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT; | ||
| 784 | } | ||
| 785 | break; | ||
| 786 | |||
| 703 | case ISO_CODE_CSI: | 787 | case ISO_CODE_CSI: |
| 704 | case ISO_CODE_SS2: | 788 | case ISO_CODE_SS2: |
| 705 | case ISO_CODE_SS3: | 789 | case ISO_CODE_SS3: |
| @@ -708,20 +792,25 @@ detect_coding_iso2022 (src, src_end) | |||
| 708 | 792 | ||
| 709 | if (c != ISO_CODE_CSI) | 793 | if (c != ISO_CODE_CSI) |
| 710 | { | 794 | { |
| 711 | if (coding_iso_8_1.flags & CODING_FLAG_ISO_SINGLE_SHIFT) | 795 | if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags |
| 796 | & CODING_FLAG_ISO_SINGLE_SHIFT) | ||
| 712 | newmask |= CODING_CATEGORY_MASK_ISO_8_1; | 797 | newmask |= CODING_CATEGORY_MASK_ISO_8_1; |
| 713 | if (coding_iso_8_2.flags & CODING_FLAG_ISO_SINGLE_SHIFT) | 798 | if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags |
| 799 | & CODING_FLAG_ISO_SINGLE_SHIFT) | ||
| 714 | newmask |= CODING_CATEGORY_MASK_ISO_8_2; | 800 | newmask |= CODING_CATEGORY_MASK_ISO_8_2; |
| 715 | } | 801 | } |
| 716 | if (VECTORP (Vlatin_extra_code_table) | 802 | if (VECTORP (Vlatin_extra_code_table) |
| 717 | && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) | 803 | && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) |
| 718 | { | 804 | { |
| 719 | if (coding_iso_8_1.flags & CODING_FLAG_ISO_LATIN_EXTRA) | 805 | if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags |
| 806 | & CODING_FLAG_ISO_LATIN_EXTRA) | ||
| 720 | newmask |= CODING_CATEGORY_MASK_ISO_8_1; | 807 | newmask |= CODING_CATEGORY_MASK_ISO_8_1; |
| 721 | if (coding_iso_8_2.flags & CODING_FLAG_ISO_LATIN_EXTRA) | 808 | if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags |
| 809 | & CODING_FLAG_ISO_LATIN_EXTRA) | ||
| 722 | newmask |= CODING_CATEGORY_MASK_ISO_8_2; | 810 | newmask |= CODING_CATEGORY_MASK_ISO_8_2; |
| 723 | } | 811 | } |
| 724 | mask &= newmask; | 812 | mask &= newmask; |
| 813 | mask_found |= newmask; | ||
| 725 | } | 814 | } |
| 726 | break; | 815 | break; |
| 727 | 816 | ||
| @@ -735,11 +824,14 @@ detect_coding_iso2022 (src, src_end) | |||
| 735 | { | 824 | { |
| 736 | int newmask = 0; | 825 | int newmask = 0; |
| 737 | 826 | ||
| 738 | if (coding_iso_8_1.flags & CODING_FLAG_ISO_LATIN_EXTRA) | 827 | if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags |
| 828 | & CODING_FLAG_ISO_LATIN_EXTRA) | ||
| 739 | newmask |= CODING_CATEGORY_MASK_ISO_8_1; | 829 | newmask |= CODING_CATEGORY_MASK_ISO_8_1; |
| 740 | if (coding_iso_8_2.flags & CODING_FLAG_ISO_LATIN_EXTRA) | 830 | if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags |
| 831 | & CODING_FLAG_ISO_LATIN_EXTRA) | ||
| 741 | newmask |= CODING_CATEGORY_MASK_ISO_8_2; | 832 | newmask |= CODING_CATEGORY_MASK_ISO_8_2; |
| 742 | mask &= newmask; | 833 | mask &= newmask; |
| 834 | mask_found |= newmask; | ||
| 743 | } | 835 | } |
| 744 | else | 836 | else |
| 745 | return 0; | 837 | return 0; |
| @@ -748,18 +840,21 @@ detect_coding_iso2022 (src, src_end) | |||
| 748 | { | 840 | { |
| 749 | unsigned char *src_begin = src; | 841 | unsigned char *src_begin = src; |
| 750 | 842 | ||
| 751 | mask &= ~(CODING_CATEGORY_MASK_ISO_7 | 843 | mask &= ~(CODING_CATEGORY_MASK_ISO_7BIT |
| 752 | | CODING_CATEGORY_MASK_ISO_7_ELSE); | 844 | | CODING_CATEGORY_MASK_ISO_7_ELSE); |
| 845 | mask_found |= CODING_CATEGORY_MASK_ISO_8_1; | ||
| 753 | while (src < src_end && *src >= 0xA0) | 846 | while (src < src_end && *src >= 0xA0) |
| 754 | src++; | 847 | src++; |
| 755 | if ((src - src_begin - 1) & 1 && src < src_end) | 848 | if ((src - src_begin - 1) & 1 && src < src_end) |
| 756 | mask &= ~CODING_CATEGORY_MASK_ISO_8_2; | 849 | mask &= ~CODING_CATEGORY_MASK_ISO_8_2; |
| 850 | else | ||
| 851 | mask_found |= CODING_CATEGORY_MASK_ISO_8_2; | ||
| 757 | } | 852 | } |
| 758 | break; | 853 | break; |
| 759 | } | 854 | } |
| 760 | } | 855 | } |
| 761 | 856 | ||
| 762 | return mask; | 857 | return (mask & mask_found); |
| 763 | } | 858 | } |
| 764 | 859 | ||
| 765 | /* Decode a character of which charset is CHARSET and the 1st position | 860 | /* Decode a character of which charset is CHARSET and the 1st position |
| @@ -808,29 +903,89 @@ detect_coding_iso2022 (src, src_end) | |||
| 808 | } while (0) | 903 | } while (0) |
| 809 | 904 | ||
| 810 | /* Set designation state into CODING. */ | 905 | /* Set designation state into CODING. */ |
| 811 | #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \ | 906 | #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \ |
| 812 | do { \ | 907 | do { \ |
| 813 | int charset = ISO_CHARSET_TABLE (make_number (dimension), \ | 908 | int charset = ISO_CHARSET_TABLE (make_number (dimension), \ |
| 814 | make_number (chars), \ | 909 | make_number (chars), \ |
| 815 | make_number (final_char)); \ | 910 | make_number (final_char)); \ |
| 816 | if (charset >= 0) \ | 911 | if (charset >= 0 \ |
| 817 | { \ | 912 | && CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg) \ |
| 818 | if (coding->direction == 1 \ | 913 | { \ |
| 819 | && CHARSET_REVERSE_CHARSET (charset) >= 0) \ | 914 | if (coding->spec.iso2022.last_invalid_designation_register == 0 \ |
| 820 | charset = CHARSET_REVERSE_CHARSET (charset); \ | 915 | && reg == 0 \ |
| 821 | CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \ | 916 | && charset == CHARSET_ASCII) \ |
| 822 | } \ | 917 | { \ |
| 918 | /* We should insert this designation sequence as is so \ | ||
| 919 | that it is surely written back to a file. */ \ | ||
| 920 | coding->spec.iso2022.last_invalid_designation_register = -1; \ | ||
| 921 | goto label_invalid_code; \ | ||
| 922 | } \ | ||
| 923 | coding->spec.iso2022.last_invalid_designation_register = -1; \ | ||
| 924 | if ((coding->mode & CODING_MODE_DIRECTION) \ | ||
| 925 | && CHARSET_REVERSE_CHARSET (charset) >= 0) \ | ||
| 926 | charset = CHARSET_REVERSE_CHARSET (charset); \ | ||
| 927 | CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \ | ||
| 928 | } \ | ||
| 929 | else \ | ||
| 930 | { \ | ||
| 931 | coding->spec.iso2022.last_invalid_designation_register = reg; \ | ||
| 932 | goto label_invalid_code; \ | ||
| 933 | } \ | ||
| 823 | } while (0) | 934 | } while (0) |
| 824 | 935 | ||
| 936 | /* Check if the current composing sequence contains only valid codes. | ||
| 937 | If the composing sequence doesn't end before SRC_END, return -1. | ||
| 938 | Else, if it contains only valid codes, return 0. | ||
| 939 | Else return the length of the composing sequence. */ | ||
| 940 | |||
| 941 | int check_composing_code (coding, src, src_end) | ||
| 942 | struct coding_system *coding; | ||
| 943 | unsigned char *src, *src_end; | ||
| 944 | { | ||
| 945 | unsigned char *src_start = src; | ||
| 946 | int invalid_code_found = 0; | ||
| 947 | int charset, c, c1, dim; | ||
| 948 | |||
| 949 | while (src < src_end) | ||
| 950 | { | ||
| 951 | if (*src++ != ISO_CODE_ESC) continue; | ||
| 952 | if (src >= src_end) break; | ||
| 953 | if ((c = *src++) == '1') /* end of compsition */ | ||
| 954 | return (invalid_code_found ? src - src_start : 0); | ||
| 955 | if (src + 2 >= src_end) break; | ||
| 956 | if (!coding->flags & CODING_FLAG_ISO_DESIGNATION) | ||
| 957 | invalid_code_found = 1; | ||
| 958 | else | ||
| 959 | { | ||
| 960 | dim = 0; | ||
| 961 | if (c == '$') | ||
| 962 | { | ||
| 963 | dim = 1; | ||
| 964 | c = (*src >= '@' && *src <= 'B') ? '(' : *src++; | ||
| 965 | } | ||
| 966 | if (c >= '(' && c <= '/') | ||
| 967 | { | ||
| 968 | c1 = *src++; | ||
| 969 | if ((c1 < ' ' || c1 >= 0x80) | ||
| 970 | || (charset = iso_charset_table[dim][c >= ','][c1]) < 0 | ||
| 971 | || (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | ||
| 972 | == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)) | ||
| 973 | invalid_code_found = 1; | ||
| 974 | } | ||
| 975 | else | ||
| 976 | invalid_code_found = 1; | ||
| 977 | } | ||
| 978 | } | ||
| 979 | return ((coding->mode & CODING_MODE_LAST_BLOCK) ? src_end - src_start : -1); | ||
| 980 | } | ||
| 981 | |||
| 825 | /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */ | 982 | /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */ |
| 826 | 983 | ||
| 827 | int | 984 | int |
| 828 | decode_coding_iso2022 (coding, source, destination, | 985 | decode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes) |
| 829 | src_bytes, dst_bytes, consumed) | ||
| 830 | struct coding_system *coding; | 986 | struct coding_system *coding; |
| 831 | unsigned char *source, *destination; | 987 | unsigned char *source, *destination; |
| 832 | int src_bytes, dst_bytes; | 988 | int src_bytes, dst_bytes; |
| 833 | int *consumed; | ||
| 834 | { | 989 | { |
| 835 | unsigned char *src = source; | 990 | unsigned char *src = source; |
| 836 | unsigned char *src_end = source + src_bytes; | 991 | unsigned char *src_end = source + src_bytes; |
| @@ -845,12 +1000,16 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 845 | int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1000 | int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 846 | int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); | 1001 | int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); |
| 847 | Lisp_Object unification_table | 1002 | Lisp_Object unification_table |
| 848 | = coding->character_unification_table_for_decode; | 1003 | = coding->character_unification_table_for_decode; |
| 1004 | int result = CODING_FINISH_NORMAL; | ||
| 849 | 1005 | ||
| 850 | if (!NILP (Venable_character_unification) && NILP (unification_table)) | 1006 | if (!NILP (Venable_character_unification) && NILP (unification_table)) |
| 851 | unification_table = Vstandard_character_unification_table_for_decode; | 1007 | unification_table = Vstandard_character_unification_table_for_decode; |
| 852 | 1008 | ||
| 853 | while (src < src_end && dst < adjusted_dst_end) | 1009 | coding->produced_char = 0; |
| 1010 | while (src < src_end && (dst_bytes | ||
| 1011 | ? (dst < adjusted_dst_end) | ||
| 1012 | : (dst < src - 6))) | ||
| 854 | { | 1013 | { |
| 855 | /* SRC_BASE remembers the start position in source in each loop. | 1014 | /* SRC_BASE remembers the start position in source in each loop. |
| 856 | The loop will be exited when there's not enough source text | 1015 | The loop will be exited when there's not enough source text |
| @@ -868,6 +1027,7 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 868 | { | 1027 | { |
| 869 | /* This is SPACE or DEL. */ | 1028 | /* This is SPACE or DEL. */ |
| 870 | *dst++ = c1; | 1029 | *dst++ = c1; |
| 1030 | coding->produced_char++; | ||
| 871 | break; | 1031 | break; |
| 872 | } | 1032 | } |
| 873 | /* This is a graphic character, we fall down ... */ | 1033 | /* This is a graphic character, we fall down ... */ |
| @@ -884,29 +1044,45 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 884 | break; | 1044 | break; |
| 885 | 1045 | ||
| 886 | case ISO_0xA0_or_0xFF: | 1046 | case ISO_0xA0_or_0xFF: |
| 887 | if (charset1 < 0 || CHARSET_CHARS (charset1) == 94) | 1047 | if (charset1 < 0 || CHARSET_CHARS (charset1) == 94 |
| 1048 | || coding->flags & CODING_FLAG_ISO_SEVEN_BITS) | ||
| 888 | { | 1049 | { |
| 889 | /* Invalid code. */ | 1050 | /* Invalid code. */ |
| 890 | *dst++ = c1; | 1051 | *dst++ = c1; |
| 1052 | coding->produced_char++; | ||
| 891 | break; | 1053 | break; |
| 892 | } | 1054 | } |
| 893 | /* This is a graphic character, we fall down ... */ | 1055 | /* This is a graphic character, we fall down ... */ |
| 894 | 1056 | ||
| 895 | case ISO_graphic_plane_1: | 1057 | case ISO_graphic_plane_1: |
| 896 | DECODE_ISO_CHARACTER (charset1, c1); | 1058 | if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) |
| 1059 | { | ||
| 1060 | /* Invalid code. */ | ||
| 1061 | *dst++ = c1; | ||
| 1062 | coding->produced_char++; | ||
| 1063 | } | ||
| 1064 | else | ||
| 1065 | DECODE_ISO_CHARACTER (charset1, c1); | ||
| 897 | break; | 1066 | break; |
| 898 | 1067 | ||
| 899 | case ISO_control_code: | 1068 | case ISO_control_code: |
| 900 | /* All ISO2022 control characters in this class have the | 1069 | /* All ISO2022 control characters in this class have the |
| 901 | same representation in Emacs internal format. */ | 1070 | same representation in Emacs internal format. */ |
| 1071 | if (c1 == '\n' | ||
| 1072 | && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | ||
| 1073 | && (coding->eol_type == CODING_EOL_CR | ||
| 1074 | || coding->eol_type == CODING_EOL_CRLF)) | ||
| 1075 | { | ||
| 1076 | result = CODING_FINISH_INCONSISTENT_EOL; | ||
| 1077 | goto label_end_of_loop_2; | ||
| 1078 | } | ||
| 902 | *dst++ = c1; | 1079 | *dst++ = c1; |
| 1080 | coding->produced_char++; | ||
| 903 | break; | 1081 | break; |
| 904 | 1082 | ||
| 905 | case ISO_carriage_return: | 1083 | case ISO_carriage_return: |
| 906 | if (coding->eol_type == CODING_EOL_CR) | 1084 | if (coding->eol_type == CODING_EOL_CR) |
| 907 | { | 1085 | *dst++ = '\n'; |
| 908 | *dst++ = '\n'; | ||
| 909 | } | ||
| 910 | else if (coding->eol_type == CODING_EOL_CRLF) | 1086 | else if (coding->eol_type == CODING_EOL_CRLF) |
| 911 | { | 1087 | { |
| 912 | ONE_MORE_BYTE (c1); | 1088 | ONE_MORE_BYTE (c1); |
| @@ -914,35 +1090,46 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 914 | *dst++ = '\n'; | 1090 | *dst++ = '\n'; |
| 915 | else | 1091 | else |
| 916 | { | 1092 | { |
| 1093 | if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | ||
| 1094 | { | ||
| 1095 | result = CODING_FINISH_INCONSISTENT_EOL; | ||
| 1096 | goto label_end_of_loop_2; | ||
| 1097 | } | ||
| 917 | src--; | 1098 | src--; |
| 918 | *dst++ = c1; | 1099 | *dst++ = '\r'; |
| 919 | } | 1100 | } |
| 920 | } | 1101 | } |
| 921 | else | 1102 | else |
| 922 | { | 1103 | *dst++ = c1; |
| 923 | *dst++ = c1; | 1104 | coding->produced_char++; |
| 924 | } | ||
| 925 | break; | 1105 | break; |
| 926 | 1106 | ||
| 927 | case ISO_shift_out: | 1107 | case ISO_shift_out: |
| 928 | if (CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0) | 1108 | if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT) |
| 929 | goto label_invalid_escape_sequence; | 1109 | || CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0) |
| 1110 | goto label_invalid_code; | ||
| 930 | CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; | 1111 | CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; |
| 931 | charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1112 | charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 932 | break; | 1113 | break; |
| 933 | 1114 | ||
| 934 | case ISO_shift_in: | 1115 | case ISO_shift_in: |
| 1116 | if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)) | ||
| 1117 | goto label_invalid_code; | ||
| 935 | CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; | 1118 | CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; |
| 936 | charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1119 | charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 937 | break; | 1120 | break; |
| 938 | 1121 | ||
| 939 | case ISO_single_shift_2_7: | 1122 | case ISO_single_shift_2_7: |
| 940 | case ISO_single_shift_2: | 1123 | case ISO_single_shift_2: |
| 1124 | if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) | ||
| 1125 | goto label_invalid_code; | ||
| 941 | /* SS2 is handled as an escape sequence of ESC 'N' */ | 1126 | /* SS2 is handled as an escape sequence of ESC 'N' */ |
| 942 | c1 = 'N'; | 1127 | c1 = 'N'; |
| 943 | goto label_escape_sequence; | 1128 | goto label_escape_sequence; |
| 944 | 1129 | ||
| 945 | case ISO_single_shift_3: | 1130 | case ISO_single_shift_3: |
| 1131 | if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) | ||
| 1132 | goto label_invalid_code; | ||
| 946 | /* SS2 is handled as an escape sequence of ESC 'O' */ | 1133 | /* SS2 is handled as an escape sequence of ESC 'O' */ |
| 947 | c1 = 'O'; | 1134 | c1 = 'O'; |
| 948 | goto label_escape_sequence; | 1135 | goto label_escape_sequence; |
| @@ -963,14 +1150,16 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 963 | case '&': /* revision of following character set */ | 1150 | case '&': /* revision of following character set */ |
| 964 | ONE_MORE_BYTE (c1); | 1151 | ONE_MORE_BYTE (c1); |
| 965 | if (!(c1 >= '@' && c1 <= '~')) | 1152 | if (!(c1 >= '@' && c1 <= '~')) |
| 966 | goto label_invalid_escape_sequence; | 1153 | goto label_invalid_code; |
| 967 | ONE_MORE_BYTE (c1); | 1154 | ONE_MORE_BYTE (c1); |
| 968 | if (c1 != ISO_CODE_ESC) | 1155 | if (c1 != ISO_CODE_ESC) |
| 969 | goto label_invalid_escape_sequence; | 1156 | goto label_invalid_code; |
| 970 | ONE_MORE_BYTE (c1); | 1157 | ONE_MORE_BYTE (c1); |
| 971 | goto label_escape_sequence; | 1158 | goto label_escape_sequence; |
| 972 | 1159 | ||
| 973 | case '$': /* designation of 2-byte character set */ | 1160 | case '$': /* designation of 2-byte character set */ |
| 1161 | if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION)) | ||
| 1162 | goto label_invalid_code; | ||
| 974 | ONE_MORE_BYTE (c1); | 1163 | ONE_MORE_BYTE (c1); |
| 975 | if (c1 >= '@' && c1 <= 'B') | 1164 | if (c1 >= '@' && c1 <= 'B') |
| 976 | { /* designation of JISX0208.1978, GB2312.1980, | 1165 | { /* designation of JISX0208.1978, GB2312.1980, |
| @@ -988,84 +1177,118 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 988 | DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2); | 1177 | DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2); |
| 989 | } | 1178 | } |
| 990 | else | 1179 | else |
| 991 | goto label_invalid_escape_sequence; | 1180 | goto label_invalid_code; |
| 992 | break; | 1181 | break; |
| 993 | 1182 | ||
| 994 | case 'n': /* invocation of locking-shift-2 */ | 1183 | case 'n': /* invocation of locking-shift-2 */ |
| 995 | if (CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) | 1184 | if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT) |
| 996 | goto label_invalid_escape_sequence; | 1185 | || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) |
| 1186 | goto label_invalid_code; | ||
| 997 | CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; | 1187 | CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; |
| 998 | charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1188 | charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 999 | break; | 1189 | break; |
| 1000 | 1190 | ||
| 1001 | case 'o': /* invocation of locking-shift-3 */ | 1191 | case 'o': /* invocation of locking-shift-3 */ |
| 1002 | if (CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) | 1192 | if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT) |
| 1003 | goto label_invalid_escape_sequence; | 1193 | || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) |
| 1194 | goto label_invalid_code; | ||
| 1004 | CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; | 1195 | CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; |
| 1005 | charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1196 | charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 1006 | break; | 1197 | break; |
| 1007 | 1198 | ||
| 1008 | case 'N': /* invocation of single-shift-2 */ | 1199 | case 'N': /* invocation of single-shift-2 */ |
| 1009 | if (CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) | 1200 | if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT) |
| 1010 | goto label_invalid_escape_sequence; | 1201 | || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) |
| 1202 | goto label_invalid_code; | ||
| 1011 | ONE_MORE_BYTE (c1); | 1203 | ONE_MORE_BYTE (c1); |
| 1012 | charset = CODING_SPEC_ISO_DESIGNATION (coding, 2); | 1204 | charset = CODING_SPEC_ISO_DESIGNATION (coding, 2); |
| 1013 | DECODE_ISO_CHARACTER (charset, c1); | 1205 | DECODE_ISO_CHARACTER (charset, c1); |
| 1014 | break; | 1206 | break; |
| 1015 | 1207 | ||
| 1016 | case 'O': /* invocation of single-shift-3 */ | 1208 | case 'O': /* invocation of single-shift-3 */ |
| 1017 | if (CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) | 1209 | if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT) |
| 1018 | goto label_invalid_escape_sequence; | 1210 | || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) |
| 1211 | goto label_invalid_code; | ||
| 1019 | ONE_MORE_BYTE (c1); | 1212 | ONE_MORE_BYTE (c1); |
| 1020 | charset = CODING_SPEC_ISO_DESIGNATION (coding, 3); | 1213 | charset = CODING_SPEC_ISO_DESIGNATION (coding, 3); |
| 1021 | DECODE_ISO_CHARACTER (charset, c1); | 1214 | DECODE_ISO_CHARACTER (charset, c1); |
| 1022 | break; | 1215 | break; |
| 1023 | 1216 | ||
| 1024 | case '0': /* start composing without embeded rules */ | 1217 | case '0': case '2': /* start composing */ |
| 1025 | coding->composing = COMPOSING_NO_RULE_HEAD; | 1218 | /* Before processing composing, we must be sure that all |
| 1219 | characters being composed are supported by CODING. | ||
| 1220 | If not, we must give up composing and insert the | ||
| 1221 | bunch of codes for composing as is without decoding. */ | ||
| 1222 | { | ||
| 1223 | int result1; | ||
| 1224 | |||
| 1225 | result1 = check_composing_code (coding, src, src_end); | ||
| 1226 | if (result1 == 0) | ||
| 1227 | coding->composing = (c1 == '0' | ||
| 1228 | ? COMPOSING_NO_RULE_HEAD | ||
| 1229 | : COMPOSING_WITH_RULE_HEAD); | ||
| 1230 | else if (result1 > 0) | ||
| 1231 | { | ||
| 1232 | if (result1 + 2 < (dst_bytes ? dst_end : src_base) - dst) | ||
| 1233 | { | ||
| 1234 | bcopy (src_base, dst, result1 + 2); | ||
| 1235 | src += result1; | ||
| 1236 | dst += result1 + 2; | ||
| 1237 | coding->produced_char += result1 + 2; | ||
| 1238 | } | ||
| 1239 | else | ||
| 1240 | { | ||
| 1241 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 1242 | goto label_end_of_loop_2; | ||
| 1243 | } | ||
| 1244 | } | ||
| 1245 | else | ||
| 1246 | goto label_end_of_loop; | ||
| 1247 | } | ||
| 1026 | break; | 1248 | break; |
| 1027 | 1249 | ||
| 1028 | case '1': /* end composing */ | 1250 | case '1': /* end composing */ |
| 1029 | coding->composing = COMPOSING_NO; | 1251 | coding->composing = COMPOSING_NO; |
| 1030 | break; | 1252 | coding->produced_char++; |
| 1031 | |||
| 1032 | case '2': /* start composing with embeded rules */ | ||
| 1033 | coding->composing = COMPOSING_WITH_RULE_HEAD; | ||
| 1034 | break; | 1253 | break; |
| 1035 | 1254 | ||
| 1036 | case '[': /* specification of direction */ | 1255 | case '[': /* specification of direction */ |
| 1256 | if (coding->flags & CODING_FLAG_ISO_NO_DIRECTION) | ||
| 1257 | goto label_invalid_code; | ||
| 1037 | /* For the moment, nested direction is not supported. | 1258 | /* For the moment, nested direction is not supported. |
| 1038 | So, the value of `coding->direction' is 0 or 1: 0 | 1259 | So, `coding->mode & CODING_MODE_DIRECTION' zero means |
| 1039 | means left-to-right, 1 means right-to-left. */ | 1260 | left-to-right, and nozero means right-to-left. */ |
| 1040 | ONE_MORE_BYTE (c1); | 1261 | ONE_MORE_BYTE (c1); |
| 1041 | switch (c1) | 1262 | switch (c1) |
| 1042 | { | 1263 | { |
| 1043 | case ']': /* end of the current direction */ | 1264 | case ']': /* end of the current direction */ |
| 1044 | coding->direction = 0; | 1265 | coding->mode &= ~CODING_MODE_DIRECTION; |
| 1045 | 1266 | ||
| 1046 | case '0': /* end of the current direction */ | 1267 | case '0': /* end of the current direction */ |
| 1047 | case '1': /* start of left-to-right direction */ | 1268 | case '1': /* start of left-to-right direction */ |
| 1048 | ONE_MORE_BYTE (c1); | 1269 | ONE_MORE_BYTE (c1); |
| 1049 | if (c1 == ']') | 1270 | if (c1 == ']') |
| 1050 | coding->direction = 0; | 1271 | coding->mode &= ~CODING_MODE_DIRECTION; |
| 1051 | else | 1272 | else |
| 1052 | goto label_invalid_escape_sequence; | 1273 | goto label_invalid_code; |
| 1053 | break; | 1274 | break; |
| 1054 | 1275 | ||
| 1055 | case '2': /* start of right-to-left direction */ | 1276 | case '2': /* start of right-to-left direction */ |
| 1056 | ONE_MORE_BYTE (c1); | 1277 | ONE_MORE_BYTE (c1); |
| 1057 | if (c1 == ']') | 1278 | if (c1 == ']') |
| 1058 | coding->direction= 1; | 1279 | coding->mode |= CODING_MODE_DIRECTION; |
| 1059 | else | 1280 | else |
| 1060 | goto label_invalid_escape_sequence; | 1281 | goto label_invalid_code; |
| 1061 | break; | 1282 | break; |
| 1062 | 1283 | ||
| 1063 | default: | 1284 | default: |
| 1064 | goto label_invalid_escape_sequence; | 1285 | goto label_invalid_code; |
| 1065 | } | 1286 | } |
| 1066 | break; | 1287 | break; |
| 1067 | 1288 | ||
| 1068 | default: | 1289 | default: |
| 1290 | if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION)) | ||
| 1291 | goto label_invalid_code; | ||
| 1069 | if (c1 >= 0x28 && c1 <= 0x2B) | 1292 | if (c1 >= 0x28 && c1 <= 0x2B) |
| 1070 | { /* designation of DIMENSION1_CHARS94 character set */ | 1293 | { /* designation of DIMENSION1_CHARS94 character set */ |
| 1071 | ONE_MORE_BYTE (c2); | 1294 | ONE_MORE_BYTE (c2); |
| @@ -1078,7 +1301,7 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 1078 | } | 1301 | } |
| 1079 | else | 1302 | else |
| 1080 | { | 1303 | { |
| 1081 | goto label_invalid_escape_sequence; | 1304 | goto label_invalid_code; |
| 1082 | } | 1305 | } |
| 1083 | } | 1306 | } |
| 1084 | /* We must update these variables now. */ | 1307 | /* We must update these variables now. */ |
| @@ -1086,41 +1309,43 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 1086 | charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); | 1309 | charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); |
| 1087 | break; | 1310 | break; |
| 1088 | 1311 | ||
| 1089 | label_invalid_escape_sequence: | 1312 | label_invalid_code: |
| 1090 | { | 1313 | coding->produced_char += src - src_base; |
| 1091 | int length = src - src_base; | 1314 | while (src_base < src) |
| 1092 | 1315 | *dst++ = *src_base++; | |
| 1093 | bcopy (src_base, dst, length); | ||
| 1094 | dst += length; | ||
| 1095 | } | ||
| 1096 | } | 1316 | } |
| 1097 | continue; | 1317 | continue; |
| 1098 | 1318 | ||
| 1099 | label_end_of_loop: | 1319 | label_end_of_loop: |
| 1100 | coding->carryover_size = src - src_base; | 1320 | result = CODING_FINISH_INSUFFICIENT_SRC; |
| 1101 | bcopy (src_base, coding->carryover, coding->carryover_size); | 1321 | label_end_of_loop_2: |
| 1102 | src = src_base; | 1322 | src = src_base; |
| 1103 | break; | 1323 | break; |
| 1104 | } | 1324 | } |
| 1105 | 1325 | ||
| 1326 | if (result == CODING_FINISH_NORMAL | ||
| 1327 | && src < src_end) | ||
| 1328 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 1329 | |||
| 1106 | /* If this is the last block of the text to be decoded, we had | 1330 | /* If this is the last block of the text to be decoded, we had |
| 1107 | better just flush out all remaining codes in the text although | 1331 | better just flush out all remaining codes in the text although |
| 1108 | they are not valid characters. */ | 1332 | they are not valid characters. */ |
| 1109 | if (coding->last_block) | 1333 | if (coding->mode & CODING_MODE_LAST_BLOCK) |
| 1110 | { | 1334 | { |
| 1111 | bcopy (src, dst, src_end - src); | 1335 | bcopy (src, dst, src_end - src); |
| 1112 | dst += (src_end - src); | 1336 | dst += (src_end - src); |
| 1113 | src = src_end; | 1337 | src = src_end; |
| 1114 | } | 1338 | } |
| 1115 | *consumed = src - source; | 1339 | coding->consumed = coding->consumed_char = src - source; |
| 1116 | return dst - destination; | 1340 | coding->produced = dst - destination; |
| 1341 | return result; | ||
| 1117 | } | 1342 | } |
| 1118 | 1343 | ||
| 1119 | /* ISO2022 encoding stuff. */ | 1344 | /* ISO2022 encoding stuff. */ |
| 1120 | 1345 | ||
| 1121 | /* | 1346 | /* |
| 1122 | It is not enough to say just "ISO2022" on encoding, we have to | 1347 | It is not enough to say just "ISO2022" on encoding, we have to |
| 1123 | specify more details. In Emacs, each coding-system of ISO2022 | 1348 | specify more details. In Emacs, each coding system of ISO2022 |
| 1124 | variant has the following specifications: | 1349 | variant has the following specifications: |
| 1125 | 1. Initial designation to G0 thru G3. | 1350 | 1. Initial designation to G0 thru G3. |
| 1126 | 2. Allows short-form designation? | 1351 | 2. Allows short-form designation? |
| @@ -1329,6 +1554,8 @@ decode_coding_iso2022 (coding, source, destination, | |||
| 1329 | ENCODE_ISO_CHARACTER_DIMENSION1 (charset_alt, c1); \ | 1554 | ENCODE_ISO_CHARACTER_DIMENSION1 (charset_alt, c1); \ |
| 1330 | else \ | 1555 | else \ |
| 1331 | ENCODE_ISO_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ | 1556 | ENCODE_ISO_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ |
| 1557 | if (! COMPOSING_P (coding->composing)) \ | ||
| 1558 | coding->consumed_char++; \ | ||
| 1332 | } while (0) | 1559 | } while (0) |
| 1333 | 1560 | ||
| 1334 | /* Produce designation and invocation codes at a place pointed by DST | 1561 | /* Produce designation and invocation codes at a place pointed by DST |
| @@ -1431,10 +1658,11 @@ encode_invocation_designation (charset, coding, dst) | |||
| 1431 | } while (0) | 1658 | } while (0) |
| 1432 | 1659 | ||
| 1433 | /* Produce designation sequences of charsets in the line started from | 1660 | /* Produce designation sequences of charsets in the line started from |
| 1434 | *SRC to a place pointed by DSTP. | 1661 | SRC to a place pointed by *DSTP, and update DSTP. |
| 1435 | 1662 | ||
| 1436 | If the current block ends before any end-of-line, we may fail to | 1663 | If the current block ends before any end-of-line, we may fail to |
| 1437 | find all the necessary *designations. */ | 1664 | find all the necessary designations. */ |
| 1665 | |||
| 1438 | encode_designation_at_bol (coding, table, src, src_end, dstp) | 1666 | encode_designation_at_bol (coding, table, src, src_end, dstp) |
| 1439 | struct coding_system *coding; | 1667 | struct coding_system *coding; |
| 1440 | Lisp_Object table; | 1668 | Lisp_Object table; |
| @@ -1465,7 +1693,7 @@ encode_designation_at_bol (coding, table, src, src_end, dstp) | |||
| 1465 | } | 1693 | } |
| 1466 | 1694 | ||
| 1467 | reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset); | 1695 | reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset); |
| 1468 | if (r[reg] < 0) | 1696 | if (reg != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION && r[reg] < 0) |
| 1469 | { | 1697 | { |
| 1470 | found++; | 1698 | found++; |
| 1471 | r[reg] = charset; | 1699 | r[reg] = charset; |
| @@ -1487,12 +1715,10 @@ encode_designation_at_bol (coding, table, src, src_end, dstp) | |||
| 1487 | /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ | 1715 | /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ |
| 1488 | 1716 | ||
| 1489 | int | 1717 | int |
| 1490 | encode_coding_iso2022 (coding, source, destination, | 1718 | encode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes) |
| 1491 | src_bytes, dst_bytes, consumed) | ||
| 1492 | struct coding_system *coding; | 1719 | struct coding_system *coding; |
| 1493 | unsigned char *source, *destination; | 1720 | unsigned char *source, *destination; |
| 1494 | int src_bytes, dst_bytes; | 1721 | int src_bytes, dst_bytes; |
| 1495 | int *consumed; | ||
| 1496 | { | 1722 | { |
| 1497 | unsigned char *src = source; | 1723 | unsigned char *src = source; |
| 1498 | unsigned char *src_end = source + src_bytes; | 1724 | unsigned char *src_end = source + src_bytes; |
| @@ -1504,11 +1730,15 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1504 | unsigned char *adjusted_dst_end = dst_end - 19; | 1730 | unsigned char *adjusted_dst_end = dst_end - 19; |
| 1505 | Lisp_Object unification_table | 1731 | Lisp_Object unification_table |
| 1506 | = coding->character_unification_table_for_encode; | 1732 | = coding->character_unification_table_for_encode; |
| 1733 | int result = CODING_FINISH_NORMAL; | ||
| 1507 | 1734 | ||
| 1508 | if (!NILP (Venable_character_unification) && NILP (unification_table)) | 1735 | if (!NILP (Venable_character_unification) && NILP (unification_table)) |
| 1509 | unification_table = Vstandard_character_unification_table_for_encode; | 1736 | unification_table = Vstandard_character_unification_table_for_encode; |
| 1510 | 1737 | ||
| 1511 | while (src < src_end && dst < adjusted_dst_end) | 1738 | coding->consumed_char = 0; |
| 1739 | while (src < src_end && (dst_bytes | ||
| 1740 | ? (dst < adjusted_dst_end) | ||
| 1741 | : (dst < src - 19))) | ||
| 1512 | { | 1742 | { |
| 1513 | /* SRC_BASE remembers the start position in source in each loop. | 1743 | /* SRC_BASE remembers the start position in source in each loop. |
| 1514 | The loop will be exited when there's not enough source text | 1744 | The loop will be exited when there's not enough source text |
| @@ -1529,16 +1759,18 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1529 | 1759 | ||
| 1530 | c1 = *src++; | 1760 | c1 = *src++; |
| 1531 | /* If we are seeing a component of a composite character, we are | 1761 | /* If we are seeing a component of a composite character, we are |
| 1532 | seeing a leading-code specially encoded for composition, or a | 1762 | seeing a leading-code encoded irregularly for composition, or |
| 1533 | composition rule if composing with rule. We must set C1 | 1763 | a composition rule if composing with rule. We must set C1 to |
| 1534 | to a normal leading-code or an ASCII code. If we are not at | 1764 | a normal leading-code or an ASCII code. If we are not seeing |
| 1535 | a composed character, we must reset the composition state. */ | 1765 | a composite character, we must reset composition, |
| 1766 | designation, and invocation states. */ | ||
| 1536 | if (COMPOSING_P (coding->composing)) | 1767 | if (COMPOSING_P (coding->composing)) |
| 1537 | { | 1768 | { |
| 1538 | if (c1 < 0xA0) | 1769 | if (c1 < 0xA0) |
| 1539 | { | 1770 | { |
| 1540 | /* We are not in a composite character any longer. */ | 1771 | /* We are not in a composite character any longer. */ |
| 1541 | coding->composing = COMPOSING_NO; | 1772 | coding->composing = COMPOSING_NO; |
| 1773 | ENCODE_RESET_PLANE_AND_REGISTER; | ||
| 1542 | ENCODE_COMPOSITION_END; | 1774 | ENCODE_COMPOSITION_END; |
| 1543 | } | 1775 | } |
| 1544 | else | 1776 | else |
| @@ -1575,14 +1807,16 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1575 | if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) | 1807 | if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) |
| 1576 | ENCODE_RESET_PLANE_AND_REGISTER; | 1808 | ENCODE_RESET_PLANE_AND_REGISTER; |
| 1577 | *dst++ = c1; | 1809 | *dst++ = c1; |
| 1810 | coding->consumed_char++; | ||
| 1578 | break; | 1811 | break; |
| 1579 | 1812 | ||
| 1580 | case EMACS_carriage_return_code: | 1813 | case EMACS_carriage_return_code: |
| 1581 | if (!coding->selective) | 1814 | if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY)) |
| 1582 | { | 1815 | { |
| 1583 | if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) | 1816 | if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) |
| 1584 | ENCODE_RESET_PLANE_AND_REGISTER; | 1817 | ENCODE_RESET_PLANE_AND_REGISTER; |
| 1585 | *dst++ = c1; | 1818 | *dst++ = c1; |
| 1819 | coding->consumed_char++; | ||
| 1586 | break; | 1820 | break; |
| 1587 | } | 1821 | } |
| 1588 | /* fall down to treat '\r' as '\n' ... */ | 1822 | /* fall down to treat '\r' as '\n' ... */ |
| @@ -1602,6 +1836,7 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1602 | else | 1836 | else |
| 1603 | *dst++ = ISO_CODE_CR; | 1837 | *dst++ = ISO_CODE_CR; |
| 1604 | CODING_SPEC_ISO_BOL (coding) = 1; | 1838 | CODING_SPEC_ISO_BOL (coding) = 1; |
| 1839 | coding->consumed_char++; | ||
| 1605 | break; | 1840 | break; |
| 1606 | 1841 | ||
| 1607 | case EMACS_leading_code_2: | 1842 | case EMACS_leading_code_2: |
| @@ -1611,6 +1846,7 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1611 | /* invalid sequence */ | 1846 | /* invalid sequence */ |
| 1612 | *dst++ = c1; | 1847 | *dst++ = c1; |
| 1613 | *dst++ = c2; | 1848 | *dst++ = c2; |
| 1849 | coding->consumed_char += 2; | ||
| 1614 | } | 1850 | } |
| 1615 | else | 1851 | else |
| 1616 | ENCODE_ISO_CHARACTER (c1, c2, /* dummy */ c3); | 1852 | ENCODE_ISO_CHARACTER (c1, c2, /* dummy */ c3); |
| @@ -1624,6 +1860,7 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1624 | *dst++ = c1; | 1860 | *dst++ = c1; |
| 1625 | *dst++ = c2; | 1861 | *dst++ = c2; |
| 1626 | *dst++ = c3; | 1862 | *dst++ = c3; |
| 1863 | coding->consumed_char += 3; | ||
| 1627 | } | 1864 | } |
| 1628 | else if (c1 < LEADING_CODE_PRIVATE_11) | 1865 | else if (c1 < LEADING_CODE_PRIVATE_11) |
| 1629 | ENCODE_ISO_CHARACTER (c1, c2, c3); | 1866 | ENCODE_ISO_CHARACTER (c1, c2, c3); |
| @@ -1640,6 +1877,7 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1640 | *dst++ = c2; | 1877 | *dst++ = c2; |
| 1641 | *dst++ = c3; | 1878 | *dst++ = c3; |
| 1642 | *dst++ = c4; | 1879 | *dst++ = c4; |
| 1880 | coding->consumed_char += 4; | ||
| 1643 | } | 1881 | } |
| 1644 | else | 1882 | else |
| 1645 | ENCODE_ISO_CHARACTER (c2, c3, c4); | 1883 | ENCODE_ISO_CHARACTER (c2, c3, c4); |
| @@ -1652,51 +1890,52 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1652 | /* invalid sequence */ | 1890 | /* invalid sequence */ |
| 1653 | *dst++ = c1; | 1891 | *dst++ = c1; |
| 1654 | *dst++ = c2; | 1892 | *dst++ = c2; |
| 1893 | coding->consumed_char += 2; | ||
| 1655 | } | 1894 | } |
| 1656 | else if (c2 == 0xFF) | 1895 | else if (c2 == 0xFF) |
| 1657 | { | 1896 | { |
| 1897 | ENCODE_RESET_PLANE_AND_REGISTER; | ||
| 1658 | coding->composing = COMPOSING_WITH_RULE_HEAD; | 1898 | coding->composing = COMPOSING_WITH_RULE_HEAD; |
| 1659 | ENCODE_COMPOSITION_WITH_RULE_START; | 1899 | ENCODE_COMPOSITION_WITH_RULE_START; |
| 1900 | coding->consumed_char++; | ||
| 1660 | } | 1901 | } |
| 1661 | else | 1902 | else |
| 1662 | { | 1903 | { |
| 1904 | ENCODE_RESET_PLANE_AND_REGISTER; | ||
| 1663 | /* Rewind one byte because it is a character code of | 1905 | /* Rewind one byte because it is a character code of |
| 1664 | composition elements. */ | 1906 | composition elements. */ |
| 1665 | src--; | 1907 | src--; |
| 1666 | coding->composing = COMPOSING_NO_RULE_HEAD; | 1908 | coding->composing = COMPOSING_NO_RULE_HEAD; |
| 1667 | ENCODE_COMPOSITION_NO_RULE_START; | 1909 | ENCODE_COMPOSITION_NO_RULE_START; |
| 1910 | coding->consumed_char++; | ||
| 1668 | } | 1911 | } |
| 1669 | break; | 1912 | break; |
| 1670 | 1913 | ||
| 1671 | case EMACS_invalid_code: | 1914 | case EMACS_invalid_code: |
| 1672 | *dst++ = c1; | 1915 | *dst++ = c1; |
| 1916 | coding->consumed_char++; | ||
| 1673 | break; | 1917 | break; |
| 1674 | } | 1918 | } |
| 1675 | continue; | 1919 | continue; |
| 1676 | label_end_of_loop: | 1920 | label_end_of_loop: |
| 1677 | /* We reach here because the source date ends not at character | 1921 | result = CODING_FINISH_INSUFFICIENT_SRC; |
| 1678 | boundary. */ | 1922 | src = src_base; |
| 1679 | coding->carryover_size = src_end - src_base; | ||
| 1680 | bcopy (src_base, coding->carryover, coding->carryover_size); | ||
| 1681 | src = src_end; | ||
| 1682 | break; | 1923 | break; |
| 1683 | } | 1924 | } |
| 1684 | 1925 | ||
| 1926 | if (result == CODING_FINISH_NORMAL | ||
| 1927 | && src < src_end) | ||
| 1928 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 1929 | |||
| 1685 | /* If this is the last block of the text to be encoded, we must | 1930 | /* If this is the last block of the text to be encoded, we must |
| 1686 | reset graphic planes and registers to the initial state. */ | 1931 | reset graphic planes and registers to the initial state, and |
| 1687 | if (src >= src_end && coding->last_block) | 1932 | flush out the carryover if any. */ |
| 1688 | { | 1933 | if (coding->mode & CODING_MODE_LAST_BLOCK) |
| 1689 | ENCODE_RESET_PLANE_AND_REGISTER; | 1934 | ENCODE_RESET_PLANE_AND_REGISTER; |
| 1690 | if (coding->carryover_size > 0 | 1935 | |
| 1691 | && coding->carryover_size < (dst_end - dst)) | 1936 | coding->consumed = src - source; |
| 1692 | { | 1937 | coding->produced = coding->produced_char = dst - destination; |
| 1693 | bcopy (coding->carryover, dst, coding->carryover_size); | 1938 | return result; |
| 1694 | dst += coding->carryover_size; | ||
| 1695 | coding->carryover_size = 0; | ||
| 1696 | } | ||
| 1697 | } | ||
| 1698 | *consumed = src - source; | ||
| 1699 | return dst - destination; | ||
| 1700 | } | 1939 | } |
| 1701 | 1940 | ||
| 1702 | 1941 | ||
| @@ -1787,6 +2026,7 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1787 | DECODE_CHARACTER_DIMENSION1 (charset_alt, c1); \ | 2026 | DECODE_CHARACTER_DIMENSION1 (charset_alt, c1); \ |
| 1788 | else \ | 2027 | else \ |
| 1789 | DECODE_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ | 2028 | DECODE_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ |
| 2029 | coding->produced_char++; \ | ||
| 1790 | } while (0) | 2030 | } while (0) |
| 1791 | 2031 | ||
| 1792 | #define ENCODE_SJIS_BIG5_CHARACTER(charset, c1, c2) \ | 2032 | #define ENCODE_SJIS_BIG5_CHARACTER(charset, c1, c2) \ |
| @@ -1829,6 +2069,7 @@ encode_coding_iso2022 (coding, source, destination, | |||
| 1829 | else \ | 2069 | else \ |
| 1830 | *dst++ = charset_alt, *dst++ = c1, *dst++ = c2; \ | 2070 | *dst++ = charset_alt, *dst++ = c1, *dst++ = c2; \ |
| 1831 | } \ | 2071 | } \ |
| 2072 | coding->consumed_char++; \ | ||
| 1832 | } while (0); | 2073 | } while (0); |
| 1833 | 2074 | ||
| 1834 | /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | 2075 | /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". |
| @@ -1844,8 +2085,6 @@ detect_coding_sjis (src, src_end) | |||
| 1844 | while (src < src_end) | 2085 | while (src < src_end) |
| 1845 | { | 2086 | { |
| 1846 | c = *src++; | 2087 | c = *src++; |
| 1847 | if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) | ||
| 1848 | return 0; | ||
| 1849 | if ((c >= 0x80 && c < 0xA0) || c >= 0xE0) | 2088 | if ((c >= 0x80 && c < 0xA0) || c >= 0xE0) |
| 1850 | { | 2089 | { |
| 1851 | if (src < src_end && *src++ < 0x40) | 2090 | if (src < src_end && *src++ < 0x40) |
| @@ -1868,8 +2107,6 @@ detect_coding_big5 (src, src_end) | |||
| 1868 | while (src < src_end) | 2107 | while (src < src_end) |
| 1869 | { | 2108 | { |
| 1870 | c = *src++; | 2109 | c = *src++; |
| 1871 | if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) | ||
| 1872 | return 0; | ||
| 1873 | if (c >= 0xA1) | 2110 | if (c >= 0xA1) |
| 1874 | { | 2111 | { |
| 1875 | if (src >= src_end) | 2112 | if (src >= src_end) |
| @@ -1887,11 +2124,10 @@ detect_coding_big5 (src, src_end) | |||
| 1887 | 2124 | ||
| 1888 | int | 2125 | int |
| 1889 | decode_coding_sjis_big5 (coding, source, destination, | 2126 | decode_coding_sjis_big5 (coding, source, destination, |
| 1890 | src_bytes, dst_bytes, consumed, sjis_p) | 2127 | src_bytes, dst_bytes, sjis_p) |
| 1891 | struct coding_system *coding; | 2128 | struct coding_system *coding; |
| 1892 | unsigned char *source, *destination; | 2129 | unsigned char *source, *destination; |
| 1893 | int src_bytes, dst_bytes; | 2130 | int src_bytes, dst_bytes; |
| 1894 | int *consumed; | ||
| 1895 | int sjis_p; | 2131 | int sjis_p; |
| 1896 | { | 2132 | { |
| 1897 | unsigned char *src = source; | 2133 | unsigned char *src = source; |
| @@ -1904,11 +2140,15 @@ decode_coding_sjis_big5 (coding, source, destination, | |||
| 1904 | unsigned char *adjusted_dst_end = dst_end - 3; | 2140 | unsigned char *adjusted_dst_end = dst_end - 3; |
| 1905 | Lisp_Object unification_table | 2141 | Lisp_Object unification_table |
| 1906 | = coding->character_unification_table_for_decode; | 2142 | = coding->character_unification_table_for_decode; |
| 2143 | int result = CODING_FINISH_NORMAL; | ||
| 1907 | 2144 | ||
| 1908 | if (!NILP (Venable_character_unification) && NILP (unification_table)) | 2145 | if (!NILP (Venable_character_unification) && NILP (unification_table)) |
| 1909 | unification_table = Vstandard_character_unification_table_for_decode; | 2146 | unification_table = Vstandard_character_unification_table_for_decode; |
| 1910 | 2147 | ||
| 1911 | while (src < src_end && dst < adjusted_dst_end) | 2148 | coding->produced_char = 0; |
| 2149 | while (src < src_end && (dst_bytes | ||
| 2150 | ? (dst < adjusted_dst_end) | ||
| 2151 | : (dst < src - 3))) | ||
| 1912 | { | 2152 | { |
| 1913 | /* SRC_BASE remembers the start position in source in each loop. | 2153 | /* SRC_BASE remembers the start position in source in each loop. |
| 1914 | The loop will be exited when there's not enough source text | 2154 | The loop will be exited when there's not enough source text |
| @@ -1917,24 +2157,41 @@ decode_coding_sjis_big5 (coding, source, destination, | |||
| 1917 | unsigned char *src_base = src; | 2157 | unsigned char *src_base = src; |
| 1918 | unsigned char c1 = *src++, c2, c3, c4; | 2158 | unsigned char c1 = *src++, c2, c3, c4; |
| 1919 | 2159 | ||
| 1920 | if (c1 == '\r') | 2160 | if (c1 < 0x20) |
| 1921 | { | 2161 | { |
| 1922 | if (coding->eol_type == CODING_EOL_CRLF) | 2162 | if (c1 == '\r') |
| 1923 | { | 2163 | { |
| 1924 | ONE_MORE_BYTE (c2); | 2164 | if (coding->eol_type == CODING_EOL_CRLF) |
| 1925 | if (c2 == '\n') | 2165 | { |
| 1926 | *dst++ = c2; | 2166 | ONE_MORE_BYTE (c2); |
| 2167 | if (c2 == '\n') | ||
| 2168 | *dst++ = c2; | ||
| 2169 | else if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | ||
| 2170 | { | ||
| 2171 | result = CODING_FINISH_INCONSISTENT_EOL; | ||
| 2172 | goto label_end_of_loop_2; | ||
| 2173 | } | ||
| 2174 | else | ||
| 2175 | /* To process C2 again, SRC is subtracted by 1. */ | ||
| 2176 | *dst++ = c1, src--; | ||
| 2177 | } | ||
| 2178 | else if (coding->eol_type == CODING_EOL_CR) | ||
| 2179 | *dst++ = '\n'; | ||
| 1927 | else | 2180 | else |
| 1928 | /* To process C2 again, SRC is subtracted by 1. */ | 2181 | *dst++ = c1; |
| 1929 | *dst++ = c1, src--; | 2182 | } |
| 2183 | else if (c1 == '\n' | ||
| 2184 | && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | ||
| 2185 | && (coding->eol_type == CODING_EOL_CR | ||
| 2186 | || coding->eol_type == CODING_EOL_CRLF)) | ||
| 2187 | { | ||
| 2188 | result = CODING_FINISH_INCONSISTENT_EOL; | ||
| 2189 | goto label_end_of_loop_2; | ||
| 1930 | } | 2190 | } |
| 1931 | else if (coding->eol_type == CODING_EOL_CR) | ||
| 1932 | *dst++ = '\n'; | ||
| 1933 | else | 2191 | else |
| 1934 | *dst++ = c1; | 2192 | *dst++ = c1; |
| 2193 | coding->produced_char++; | ||
| 1935 | } | 2194 | } |
| 1936 | else if (c1 < 0x20) | ||
| 1937 | *dst++ = c1; | ||
| 1938 | else if (c1 < 0x80) | 2195 | else if (c1 < 0x80) |
| 1939 | DECODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2); | 2196 | DECODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2); |
| 1940 | else if (c1 < 0xA0 || c1 >= 0xE0) | 2197 | else if (c1 < 0xA0 || c1 >= 0xE0) |
| @@ -1955,13 +2212,17 @@ decode_coding_sjis_big5 (coding, source, destination, | |||
| 1955 | DECODE_SJIS_BIG5_CHARACTER (charset, c3, c4); | 2212 | DECODE_SJIS_BIG5_CHARACTER (charset, c3, c4); |
| 1956 | } | 2213 | } |
| 1957 | else /* Invalid code */ | 2214 | else /* Invalid code */ |
| 1958 | *dst++ = c1; | 2215 | { |
| 2216 | *dst++ = c1; | ||
| 2217 | coding->produced_char++; | ||
| 2218 | } | ||
| 1959 | } | 2219 | } |
| 1960 | else | 2220 | else |
| 1961 | { | 2221 | { |
| 1962 | /* SJIS -> JISX0201-Kana, BIG5 -> Big5 */ | 2222 | /* SJIS -> JISX0201-Kana, BIG5 -> Big5 */ |
| 1963 | if (sjis_p) | 2223 | if (sjis_p) |
| 1964 | DECODE_SJIS_BIG5_CHARACTER (charset_katakana_jisx0201, c1, /* dummy */ c2); | 2224 | DECODE_SJIS_BIG5_CHARACTER (charset_katakana_jisx0201, c1, |
| 2225 | /* dummy */ c2); | ||
| 1965 | else | 2226 | else |
| 1966 | { | 2227 | { |
| 1967 | int charset; | 2228 | int charset; |
| @@ -1974,14 +2235,19 @@ decode_coding_sjis_big5 (coding, source, destination, | |||
| 1974 | continue; | 2235 | continue; |
| 1975 | 2236 | ||
| 1976 | label_end_of_loop: | 2237 | label_end_of_loop: |
| 1977 | coding->carryover_size = src - src_base; | 2238 | result = CODING_FINISH_INSUFFICIENT_SRC; |
| 1978 | bcopy (src_base, coding->carryover, coding->carryover_size); | 2239 | label_end_of_loop_2: |
| 1979 | src = src_base; | 2240 | src = src_base; |
| 1980 | break; | 2241 | break; |
| 1981 | } | 2242 | } |
| 1982 | 2243 | ||
| 1983 | *consumed = src - source; | 2244 | if (result == CODING_FINISH_NORMAL |
| 1984 | return dst - destination; | 2245 | && src < src_end) |
| 2246 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 2247 | |||
| 2248 | coding->consumed = coding->consumed_char = src - source; | ||
| 2249 | coding->produced = dst - destination; | ||
| 2250 | return result; | ||
| 1985 | } | 2251 | } |
| 1986 | 2252 | ||
| 1987 | /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". | 2253 | /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". |
| @@ -1994,11 +2260,10 @@ decode_coding_sjis_big5 (coding, source, destination, | |||
| 1994 | 2260 | ||
| 1995 | int | 2261 | int |
| 1996 | encode_coding_sjis_big5 (coding, source, destination, | 2262 | encode_coding_sjis_big5 (coding, source, destination, |
| 1997 | src_bytes, dst_bytes, consumed, sjis_p) | 2263 | src_bytes, dst_bytes, sjis_p) |
| 1998 | struct coding_system *coding; | 2264 | struct coding_system *coding; |
| 1999 | unsigned char *source, *destination; | 2265 | unsigned char *source, *destination; |
| 2000 | int src_bytes, dst_bytes; | 2266 | int src_bytes, dst_bytes; |
| 2001 | int *consumed; | ||
| 2002 | int sjis_p; | 2267 | int sjis_p; |
| 2003 | { | 2268 | { |
| 2004 | unsigned char *src = source; | 2269 | unsigned char *src = source; |
| @@ -2011,11 +2276,15 @@ encode_coding_sjis_big5 (coding, source, destination, | |||
| 2011 | unsigned char *adjusted_dst_end = dst_end - 1; | 2276 | unsigned char *adjusted_dst_end = dst_end - 1; |
| 2012 | Lisp_Object unification_table | 2277 | Lisp_Object unification_table |
| 2013 | = coding->character_unification_table_for_encode; | 2278 | = coding->character_unification_table_for_encode; |
| 2279 | int result = CODING_FINISH_NORMAL; | ||
| 2014 | 2280 | ||
| 2015 | if (!NILP (Venable_character_unification) && NILP (unification_table)) | 2281 | if (!NILP (Venable_character_unification) && NILP (unification_table)) |
| 2016 | unification_table = Vstandard_character_unification_table_for_encode; | 2282 | unification_table = Vstandard_character_unification_table_for_encode; |
| 2017 | 2283 | ||
| 2018 | while (src < src_end && dst < adjusted_dst_end) | 2284 | coding->consumed_char = 0; |
| 2285 | while (src < src_end && (dst_bytes | ||
| 2286 | ? (dst < adjusted_dst_end) | ||
| 2287 | : (dst < src - 1))) | ||
| 2019 | { | 2288 | { |
| 2020 | /* SRC_BASE remembers the start position in source in each loop. | 2289 | /* SRC_BASE remembers the start position in source in each loop. |
| 2021 | The loop will be exited when there's not enough source text | 2290 | The loop will be exited when there's not enough source text |
| @@ -2046,12 +2315,14 @@ encode_coding_sjis_big5 (coding, source, destination, | |||
| 2046 | 2315 | ||
| 2047 | case EMACS_control_code: | 2316 | case EMACS_control_code: |
| 2048 | *dst++ = c1; | 2317 | *dst++ = c1; |
| 2318 | coding->consumed_char++; | ||
| 2049 | break; | 2319 | break; |
| 2050 | 2320 | ||
| 2051 | case EMACS_carriage_return_code: | 2321 | case EMACS_carriage_return_code: |
| 2052 | if (!coding->selective) | 2322 | if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY)) |
| 2053 | { | 2323 | { |
| 2054 | *dst++ = c1; | 2324 | *dst++ = c1; |
| 2325 | coding->consumed_char++; | ||
| 2055 | break; | 2326 | break; |
| 2056 | } | 2327 | } |
| 2057 | /* fall down to treat '\r' as '\n' ... */ | 2328 | /* fall down to treat '\r' as '\n' ... */ |
| @@ -2064,6 +2335,7 @@ encode_coding_sjis_big5 (coding, source, destination, | |||
| 2064 | *dst++ = '\r', *dst++ = '\n'; | 2335 | *dst++ = '\r', *dst++ = '\n'; |
| 2065 | else | 2336 | else |
| 2066 | *dst++ = '\r'; | 2337 | *dst++ = '\r'; |
| 2338 | coding->consumed_char++; | ||
| 2067 | break; | 2339 | break; |
| 2068 | 2340 | ||
| 2069 | case EMACS_leading_code_2: | 2341 | case EMACS_leading_code_2: |
| @@ -2087,18 +2359,22 @@ encode_coding_sjis_big5 (coding, source, destination, | |||
| 2087 | 2359 | ||
| 2088 | default: /* i.e. case EMACS_invalid_code: */ | 2360 | default: /* i.e. case EMACS_invalid_code: */ |
| 2089 | *dst++ = c1; | 2361 | *dst++ = c1; |
| 2362 | coding->consumed_char++; | ||
| 2090 | } | 2363 | } |
| 2091 | continue; | 2364 | continue; |
| 2092 | 2365 | ||
| 2093 | label_end_of_loop: | 2366 | label_end_of_loop: |
| 2094 | coding->carryover_size = src_end - src_base; | 2367 | result = CODING_FINISH_INSUFFICIENT_SRC; |
| 2095 | bcopy (src_base, coding->carryover, coding->carryover_size); | 2368 | src = src_base; |
| 2096 | src = src_end; | ||
| 2097 | break; | 2369 | break; |
| 2098 | } | 2370 | } |
| 2099 | 2371 | ||
| 2100 | *consumed = src - source; | 2372 | if (result == CODING_FINISH_NORMAL |
| 2101 | return dst - destination; | 2373 | && src < src_end) |
| 2374 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 2375 | coding->consumed = src - source; | ||
| 2376 | coding->produced = coding->produced_char = dst - destination; | ||
| 2377 | return result; | ||
| 2102 | } | 2378 | } |
| 2103 | 2379 | ||
| 2104 | 2380 | ||
| @@ -2108,17 +2384,19 @@ encode_coding_sjis_big5 (coding, source, destination, | |||
| 2108 | This function is called only when `coding->eol_type' is | 2384 | This function is called only when `coding->eol_type' is |
| 2109 | CODING_EOL_CRLF or CODING_EOL_CR. */ | 2385 | CODING_EOL_CRLF or CODING_EOL_CR. */ |
| 2110 | 2386 | ||
| 2111 | decode_eol (coding, source, destination, src_bytes, dst_bytes, consumed) | 2387 | decode_eol (coding, source, destination, src_bytes, dst_bytes) |
| 2112 | struct coding_system *coding; | 2388 | struct coding_system *coding; |
| 2113 | unsigned char *source, *destination; | 2389 | unsigned char *source, *destination; |
| 2114 | int src_bytes, dst_bytes; | 2390 | int src_bytes, dst_bytes; |
| 2115 | int *consumed; | ||
| 2116 | { | 2391 | { |
| 2117 | unsigned char *src = source; | 2392 | unsigned char *src = source; |
| 2118 | unsigned char *src_end = source + src_bytes; | 2393 | unsigned char *src_end = source + src_bytes; |
| 2119 | unsigned char *dst = destination; | 2394 | unsigned char *dst = destination; |
| 2120 | unsigned char *dst_end = destination + dst_bytes; | 2395 | unsigned char *dst_end = destination + dst_bytes; |
| 2121 | int produced; | 2396 | int result = CODING_FINISH_NORMAL; |
| 2397 | |||
| 2398 | if (src_bytes <= 0) | ||
| 2399 | return result; | ||
| 2122 | 2400 | ||
| 2123 | switch (coding->eol_type) | 2401 | switch (coding->eol_type) |
| 2124 | { | 2402 | { |
| @@ -2129,7 +2407,9 @@ decode_eol (coding, source, destination, src_bytes, dst_bytes, consumed) | |||
| 2129 | necessary only at the head of loop. */ | 2407 | necessary only at the head of loop. */ |
| 2130 | unsigned char *adjusted_dst_end = dst_end - 1; | 2408 | unsigned char *adjusted_dst_end = dst_end - 1; |
| 2131 | 2409 | ||
| 2132 | while (src < src_end && dst < adjusted_dst_end) | 2410 | while (src < src_end && (dst_bytes |
| 2411 | ? (dst < adjusted_dst_end) | ||
| 2412 | : (dst < src - 1))) | ||
| 2133 | { | 2413 | { |
| 2134 | unsigned char *src_base = src; | 2414 | unsigned char *src_base = src; |
| 2135 | unsigned char c = *src++; | 2415 | unsigned char c = *src++; |
| @@ -2137,110 +2417,147 @@ decode_eol (coding, source, destination, src_bytes, dst_bytes, consumed) | |||
| 2137 | { | 2417 | { |
| 2138 | ONE_MORE_BYTE (c); | 2418 | ONE_MORE_BYTE (c); |
| 2139 | if (c != '\n') | 2419 | if (c != '\n') |
| 2140 | *dst++ = '\r'; | 2420 | { |
| 2421 | if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | ||
| 2422 | { | ||
| 2423 | result = CODING_FINISH_INCONSISTENT_EOL; | ||
| 2424 | goto label_end_of_loop_2; | ||
| 2425 | } | ||
| 2426 | *dst++ = '\r'; | ||
| 2427 | } | ||
| 2141 | *dst++ = c; | 2428 | *dst++ = c; |
| 2142 | } | 2429 | } |
| 2430 | else if (c == '\n' | ||
| 2431 | && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)) | ||
| 2432 | { | ||
| 2433 | result = CODING_FINISH_INCONSISTENT_EOL; | ||
| 2434 | goto label_end_of_loop_2; | ||
| 2435 | } | ||
| 2143 | else | 2436 | else |
| 2144 | *dst++ = c; | 2437 | *dst++ = c; |
| 2145 | continue; | 2438 | continue; |
| 2146 | 2439 | ||
| 2147 | label_end_of_loop: | 2440 | label_end_of_loop: |
| 2148 | coding->carryover_size = src - src_base; | 2441 | result = CODING_FINISH_INSUFFICIENT_SRC; |
| 2149 | bcopy (src_base, coding->carryover, coding->carryover_size); | 2442 | label_end_of_loop_2: |
| 2150 | src = src_base; | 2443 | src = src_base; |
| 2151 | break; | 2444 | break; |
| 2152 | } | 2445 | } |
| 2153 | *consumed = src - source; | 2446 | if (result == CODING_FINISH_NORMAL |
| 2154 | produced = dst - destination; | 2447 | && src < src_end) |
| 2155 | break; | 2448 | result = CODING_FINISH_INSUFFICIENT_DST; |
| 2156 | } | 2449 | } |
| 2450 | break; | ||
| 2157 | 2451 | ||
| 2158 | case CODING_EOL_CR: | 2452 | case CODING_EOL_CR: |
| 2159 | produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | 2453 | if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) |
| 2160 | bcopy (source, destination, produced); | 2454 | { |
| 2161 | dst_end = destination + produced; | 2455 | while (src < src_end) if (*src++ == '\n') break; |
| 2162 | while (dst < dst_end) | 2456 | if (*--src == '\n') |
| 2163 | if (*dst++ == '\r') dst[-1] = '\n'; | 2457 | { |
| 2164 | *consumed = produced; | 2458 | src_bytes = src - source; |
| 2459 | result = CODING_FINISH_INCONSISTENT_EOL; | ||
| 2460 | } | ||
| 2461 | } | ||
| 2462 | if (dst_bytes && src_bytes > dst_bytes) | ||
| 2463 | { | ||
| 2464 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 2465 | src_bytes = dst_bytes; | ||
| 2466 | } | ||
| 2467 | if (dst_bytes) | ||
| 2468 | bcopy (source, destination, src_bytes); | ||
| 2469 | else | ||
| 2470 | safe_bcopy (source, destination, src_bytes); | ||
| 2471 | src = source + src_bytes; | ||
| 2472 | while (src_bytes--) if (*dst++ == '\r') dst[-1] = '\n'; | ||
| 2165 | break; | 2473 | break; |
| 2166 | 2474 | ||
| 2167 | default: /* i.e. case: CODING_EOL_LF */ | 2475 | default: /* i.e. case: CODING_EOL_LF */ |
| 2168 | produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | 2476 | if (dst_bytes && src_bytes > dst_bytes) |
| 2169 | bcopy (source, destination, produced); | 2477 | { |
| 2170 | *consumed = produced; | 2478 | result = CODING_FINISH_INSUFFICIENT_DST; |
| 2479 | src_bytes = dst_bytes; | ||
| 2480 | } | ||
| 2481 | if (dst_bytes) | ||
| 2482 | bcopy (source, destination, src_bytes); | ||
| 2483 | else | ||
| 2484 | safe_bcopy (source, destination, src_bytes); | ||
| 2485 | src += src_bytes; | ||
| 2486 | dst += dst_bytes; | ||
| 2171 | break; | 2487 | break; |
| 2172 | } | 2488 | } |
| 2173 | 2489 | ||
| 2174 | return produced; | 2490 | coding->consumed = coding->consumed_char = src - source; |
| 2491 | coding->produced = coding->produced_char = dst - destination; | ||
| 2492 | return result; | ||
| 2175 | } | 2493 | } |
| 2176 | 2494 | ||
| 2177 | /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode | 2495 | /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode |
| 2178 | format of end-of-line according to `coding->eol_type'. If | 2496 | format of end-of-line according to `coding->eol_type'. If |
| 2179 | `coding->selective' is 1, code '\r' in source text also means | 2497 | `coding->mode & CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code |
| 2180 | end-of-line. */ | 2498 | '\r' in source text also means end-of-line. */ |
| 2181 | 2499 | ||
| 2182 | encode_eol (coding, source, destination, src_bytes, dst_bytes, consumed) | 2500 | encode_eol (coding, source, destination, src_bytes, dst_bytes) |
| 2183 | struct coding_system *coding; | 2501 | struct coding_system *coding; |
| 2184 | unsigned char *source, *destination; | 2502 | unsigned char *source, *destination; |
| 2185 | int src_bytes, dst_bytes; | 2503 | int src_bytes, dst_bytes; |
| 2186 | int *consumed; | ||
| 2187 | { | 2504 | { |
| 2188 | unsigned char *src = source; | 2505 | unsigned char *src = source; |
| 2189 | unsigned char *dst = destination; | 2506 | unsigned char *dst = destination; |
| 2190 | int produced; | 2507 | int result = CODING_FINISH_NORMAL; |
| 2191 | |||
| 2192 | if (src_bytes <= 0) | ||
| 2193 | return 0; | ||
| 2194 | 2508 | ||
| 2195 | switch (coding->eol_type) | 2509 | if (coding->eol_type == CODING_EOL_CRLF) |
| 2196 | { | 2510 | { |
| 2197 | case CODING_EOL_LF: | 2511 | unsigned char c; |
| 2198 | case CODING_EOL_UNDECIDED: | 2512 | unsigned char *src_end = source + src_bytes; |
| 2199 | produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | 2513 | unsigned char *dst_end = destination + dst_bytes; |
| 2200 | bcopy (source, destination, produced); | 2514 | /* Since the maximum bytes produced by each loop is 2, we |
| 2201 | if (coding->selective) | 2515 | subtract 1 from DST_END to assure overflow checking is |
| 2516 | necessary only at the head of loop. */ | ||
| 2517 | unsigned char *adjusted_dst_end = dst_end - 1; | ||
| 2518 | |||
| 2519 | while (src < src_end && (dst_bytes | ||
| 2520 | ? (dst < adjusted_dst_end) | ||
| 2521 | : (dst < src - 1))) | ||
| 2202 | { | 2522 | { |
| 2203 | int i = produced; | 2523 | c = *src++; |
| 2204 | while (i--) | 2524 | if (c == '\n' |
| 2525 | || (c == '\r' && (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))) | ||
| 2526 | *dst++ = '\r', *dst++ = '\n'; | ||
| 2527 | else | ||
| 2528 | *dst++ = c; | ||
| 2529 | } | ||
| 2530 | if (src < src_end) | ||
| 2531 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 2532 | } | ||
| 2533 | else | ||
| 2534 | { | ||
| 2535 | if (dst_bytes && src_bytes > dst_bytes) | ||
| 2536 | { | ||
| 2537 | src_bytes = dst_bytes; | ||
| 2538 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 2539 | } | ||
| 2540 | if (dst_bytes) | ||
| 2541 | bcopy (source, destination, src_bytes); | ||
| 2542 | else | ||
| 2543 | safe_bcopy (source, destination, src_bytes); | ||
| 2544 | if (coding->eol_type == CODING_EOL_CRLF) | ||
| 2545 | { | ||
| 2546 | while (src_bytes--) | ||
| 2547 | if (*dst++ == '\n') dst[-1] = '\r'; | ||
| 2548 | } | ||
| 2549 | else if (coding->mode & CODING_MODE_SELECTIVE_DISPLAY) | ||
| 2550 | { | ||
| 2551 | while (src_bytes--) | ||
| 2205 | if (*dst++ == '\r') dst[-1] = '\n'; | 2552 | if (*dst++ == '\r') dst[-1] = '\n'; |
| 2206 | } | 2553 | } |
| 2207 | *consumed = produced; | 2554 | src += src_bytes; |
| 2208 | 2555 | dst += src_bytes; | |
| 2209 | case CODING_EOL_CRLF: | ||
| 2210 | { | ||
| 2211 | unsigned char c; | ||
| 2212 | unsigned char *src_end = source + src_bytes; | ||
| 2213 | unsigned char *dst_end = destination + dst_bytes; | ||
| 2214 | /* Since the maximum bytes produced by each loop is 2, we | ||
| 2215 | subtract 1 from DST_END to assure overflow checking is | ||
| 2216 | necessary only at the head of loop. */ | ||
| 2217 | unsigned char *adjusted_dst_end = dst_end - 1; | ||
| 2218 | |||
| 2219 | while (src < src_end && dst < adjusted_dst_end) | ||
| 2220 | { | ||
| 2221 | c = *src++; | ||
| 2222 | if (c == '\n' || (c == '\r' && coding->selective)) | ||
| 2223 | *dst++ = '\r', *dst++ = '\n'; | ||
| 2224 | else | ||
| 2225 | *dst++ = c; | ||
| 2226 | } | ||
| 2227 | produced = dst - destination; | ||
| 2228 | *consumed = src - source; | ||
| 2229 | break; | ||
| 2230 | } | ||
| 2231 | |||
| 2232 | default: /* i.e. case CODING_EOL_CR: */ | ||
| 2233 | produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | ||
| 2234 | bcopy (source, destination, produced); | ||
| 2235 | { | ||
| 2236 | int i = produced; | ||
| 2237 | while (i--) | ||
| 2238 | if (*dst++ == '\n') dst[-1] = '\r'; | ||
| 2239 | } | ||
| 2240 | *consumed = produced; | ||
| 2241 | } | 2556 | } |
| 2242 | 2557 | ||
| 2243 | return produced; | 2558 | coding->consumed = coding->consumed_char = src - source; |
| 2559 | coding->produced = coding->produced_char = dst - destination; | ||
| 2560 | return result; | ||
| 2244 | } | 2561 | } |
| 2245 | 2562 | ||
| 2246 | 2563 | ||
| @@ -2317,36 +2634,66 @@ setup_coding_system (coding_system, coding) | |||
| 2317 | Lisp_Object coding_system; | 2634 | Lisp_Object coding_system; |
| 2318 | struct coding_system *coding; | 2635 | struct coding_system *coding; |
| 2319 | { | 2636 | { |
| 2320 | Lisp_Object coding_spec, plist, type, eol_type; | 2637 | Lisp_Object coding_spec, coding_type, eol_type, plist; |
| 2321 | Lisp_Object val; | 2638 | Lisp_Object val; |
| 2322 | int i; | 2639 | int i; |
| 2323 | 2640 | ||
| 2324 | /* At first, set several fields to default values. */ | 2641 | /* Initialize some fields required for all kinds of coding systems. */ |
| 2325 | coding->last_block = 0; | ||
| 2326 | coding->selective = 0; | ||
| 2327 | coding->composing = 0; | ||
| 2328 | coding->direction = 0; | ||
| 2329 | coding->carryover_size = 0; | ||
| 2330 | coding->post_read_conversion = coding->pre_write_conversion = Qnil; | ||
| 2331 | coding->character_unification_table_for_decode = Qnil; | ||
| 2332 | coding->character_unification_table_for_encode = Qnil; | ||
| 2333 | |||
| 2334 | coding->symbol = coding_system; | 2642 | coding->symbol = coding_system; |
| 2335 | eol_type = Qnil; | 2643 | coding->common_flags = 0; |
| 2336 | 2644 | coding->mode = 0; | |
| 2337 | /* Get values of property `coding-system' and `eol-type'. | 2645 | coding->heading_ascii = -1; |
| 2338 | Also get values of coding system properties: | 2646 | coding->post_read_conversion = coding->pre_write_conversion = Qnil; |
| 2339 | `post-read-conversion', `pre-write-conversion', | ||
| 2340 | `character-unification-table-for-decode', | ||
| 2341 | `character-unification-table-for-encode'. */ | ||
| 2342 | coding_spec = Fget (coding_system, Qcoding_system); | 2647 | coding_spec = Fget (coding_system, Qcoding_system); |
| 2343 | if (!VECTORP (coding_spec) | 2648 | if (!VECTORP (coding_spec) |
| 2344 | || XVECTOR (coding_spec)->size != 5 | 2649 | || XVECTOR (coding_spec)->size != 5 |
| 2345 | || !CONSP (XVECTOR (coding_spec)->contents[3])) | 2650 | || !CONSP (XVECTOR (coding_spec)->contents[3])) |
| 2346 | goto label_invalid_coding_system; | 2651 | goto label_invalid_coding_system; |
| 2347 | if (!inhibit_eol_conversion) | ||
| 2348 | eol_type = Fget (coding_system, Qeol_type); | ||
| 2349 | 2652 | ||
| 2653 | eol_type = inhibit_eol_conversion ? Qnil : Fget (coding_system, Qeol_type); | ||
| 2654 | if (VECTORP (eol_type)) | ||
| 2655 | { | ||
| 2656 | coding->eol_type = CODING_EOL_UNDECIDED; | ||
| 2657 | coding->common_flags = CODING_REQUIRE_DETECTION_MASK; | ||
| 2658 | } | ||
| 2659 | else if (XFASTINT (eol_type) == 1) | ||
| 2660 | { | ||
| 2661 | coding->eol_type = CODING_EOL_CRLF; | ||
| 2662 | coding->common_flags | ||
| 2663 | = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | ||
| 2664 | } | ||
| 2665 | else if (XFASTINT (eol_type) == 2) | ||
| 2666 | { | ||
| 2667 | coding->eol_type = CODING_EOL_CR; | ||
| 2668 | coding->common_flags | ||
| 2669 | = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | ||
| 2670 | } | ||
| 2671 | else | ||
| 2672 | coding->eol_type = CODING_EOL_LF; | ||
| 2673 | |||
| 2674 | coding_type = XVECTOR (coding_spec)->contents[0]; | ||
| 2675 | /* Try short cut. */ | ||
| 2676 | if (SYMBOLP (coding_type)) | ||
| 2677 | { | ||
| 2678 | if (EQ (coding_type, Qt)) | ||
| 2679 | { | ||
| 2680 | coding->type = coding_type_undecided; | ||
| 2681 | coding->common_flags |= CODING_REQUIRE_DETECTION_MASK; | ||
| 2682 | } | ||
| 2683 | else | ||
| 2684 | coding->type = coding_type_no_conversion; | ||
| 2685 | return 0; | ||
| 2686 | } | ||
| 2687 | |||
| 2688 | /* Initialize remaining fields. */ | ||
| 2689 | coding->composing = 0; | ||
| 2690 | coding->character_unification_table_for_decode = Qnil; | ||
| 2691 | coding->character_unification_table_for_encode = Qnil; | ||
| 2692 | |||
| 2693 | /* Get values of coding system properties: | ||
| 2694 | `post-read-conversion', `pre-write-conversion', | ||
| 2695 | `character-unification-table-for-decode', | ||
| 2696 | `character-unification-table-for-encode'. */ | ||
| 2350 | plist = XVECTOR (coding_spec)->contents[3]; | 2697 | plist = XVECTOR (coding_spec)->contents[3]; |
| 2351 | coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion); | 2698 | coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion); |
| 2352 | coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion); | 2699 | coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion); |
| @@ -2360,6 +2707,17 @@ setup_coding_system (coding_system, coding) | |||
| 2360 | val = Fget (val, Qcharacter_unification_table_for_encode); | 2707 | val = Fget (val, Qcharacter_unification_table_for_encode); |
| 2361 | coding->character_unification_table_for_encode | 2708 | coding->character_unification_table_for_encode |
| 2362 | = CHAR_TABLE_P (val) ? val : Qnil; | 2709 | = CHAR_TABLE_P (val) ? val : Qnil; |
| 2710 | val = Fplist_get (plist, Qcoding_category); | ||
| 2711 | if (!NILP (val)) | ||
| 2712 | { | ||
| 2713 | val = Fget (val, Qcoding_category_index); | ||
| 2714 | if (INTEGERP (val)) | ||
| 2715 | coding->category_idx = XINT (val); | ||
| 2716 | else | ||
| 2717 | goto label_invalid_coding_system; | ||
| 2718 | } | ||
| 2719 | else | ||
| 2720 | goto label_invalid_coding_system; | ||
| 2363 | 2721 | ||
| 2364 | val = Fplist_get (plist, Qsafe_charsets); | 2722 | val = Fplist_get (plist, Qsafe_charsets); |
| 2365 | if (EQ (val, Qt)) | 2723 | if (EQ (val, Qt)) |
| @@ -2378,31 +2736,7 @@ setup_coding_system (coding_system, coding) | |||
| 2378 | } | 2736 | } |
| 2379 | } | 2737 | } |
| 2380 | 2738 | ||
| 2381 | if (VECTORP (eol_type)) | 2739 | switch (XFASTINT (coding_type)) |
| 2382 | { | ||
| 2383 | coding->eol_type = CODING_EOL_UNDECIDED; | ||
| 2384 | coding->common_flags = CODING_REQUIRE_DETECTION_MASK; | ||
| 2385 | } | ||
| 2386 | else if (XFASTINT (eol_type) == 1) | ||
| 2387 | { | ||
| 2388 | coding->eol_type = CODING_EOL_CRLF; | ||
| 2389 | coding->common_flags | ||
| 2390 | = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | ||
| 2391 | } | ||
| 2392 | else if (XFASTINT (eol_type) == 2) | ||
| 2393 | { | ||
| 2394 | coding->eol_type = CODING_EOL_CR; | ||
| 2395 | coding->common_flags | ||
| 2396 | = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | ||
| 2397 | } | ||
| 2398 | else | ||
| 2399 | { | ||
| 2400 | coding->eol_type = CODING_EOL_LF; | ||
| 2401 | coding->common_flags = 0; | ||
| 2402 | } | ||
| 2403 | |||
| 2404 | type = XVECTOR (coding_spec)->contents[0]; | ||
| 2405 | switch (XFASTINT (type)) | ||
| 2406 | { | 2740 | { |
| 2407 | case 0: | 2741 | case 0: |
| 2408 | coding->type = coding_type_emacs_mule; | 2742 | coding->type = coding_type_emacs_mule; |
| @@ -2425,7 +2759,7 @@ setup_coding_system (coding_system, coding) | |||
| 2425 | { | 2759 | { |
| 2426 | Lisp_Object val, temp; | 2760 | Lisp_Object val, temp; |
| 2427 | Lisp_Object *flags; | 2761 | Lisp_Object *flags; |
| 2428 | int i, charset, default_reg_bits = 0; | 2762 | int i, charset, reg_bits = 0; |
| 2429 | 2763 | ||
| 2430 | val = XVECTOR (coding_spec)->contents[4]; | 2764 | val = XVECTOR (coding_spec)->contents[4]; |
| 2431 | 2765 | ||
| @@ -2480,7 +2814,7 @@ setup_coding_system (coding_system, coding) | |||
| 2480 | list of integer, nil, or t: designate the first | 2814 | list of integer, nil, or t: designate the first |
| 2481 | element (if integer) to REG initially, the remaining | 2815 | element (if integer) to REG initially, the remaining |
| 2482 | elements (if integer) is designated to REG on request, | 2816 | elements (if integer) is designated to REG on request, |
| 2483 | if an element is t, REG can be used by any charset, | 2817 | if an element is t, REG can be used by any charsets, |
| 2484 | nil: REG is never used. */ | 2818 | nil: REG is never used. */ |
| 2485 | for (charset = 0; charset <= MAX_CHARSET; charset++) | 2819 | for (charset = 0; charset <= MAX_CHARSET; charset++) |
| 2486 | CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | 2820 | CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) |
| @@ -2497,12 +2831,14 @@ setup_coding_system (coding_system, coding) | |||
| 2497 | else if (EQ (flags[i], Qt)) | 2831 | else if (EQ (flags[i], Qt)) |
| 2498 | { | 2832 | { |
| 2499 | CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1; | 2833 | CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1; |
| 2500 | default_reg_bits |= 1 << i; | 2834 | reg_bits |= 1 << i; |
| 2835 | coding->flags |= CODING_FLAG_ISO_DESIGNATION; | ||
| 2501 | } | 2836 | } |
| 2502 | else if (CONSP (flags[i])) | 2837 | else if (CONSP (flags[i])) |
| 2503 | { | 2838 | { |
| 2504 | Lisp_Object tail = flags[i]; | 2839 | Lisp_Object tail = flags[i]; |
| 2505 | 2840 | ||
| 2841 | coding->flags |= CODING_FLAG_ISO_DESIGNATION; | ||
| 2506 | if (INTEGERP (XCONS (tail)->car) | 2842 | if (INTEGERP (XCONS (tail)->car) |
| 2507 | && (charset = XINT (XCONS (tail)->car), | 2843 | && (charset = XINT (XCONS (tail)->car), |
| 2508 | CHARSET_VALID_P (charset)) | 2844 | CHARSET_VALID_P (charset)) |
| @@ -2523,7 +2859,7 @@ setup_coding_system (coding_system, coding) | |||
| 2523 | CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | 2859 | CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) |
| 2524 | = i; | 2860 | = i; |
| 2525 | else if (EQ (XCONS (tail)->car, Qt)) | 2861 | else if (EQ (XCONS (tail)->car, Qt)) |
| 2526 | default_reg_bits |= 1 << i; | 2862 | reg_bits |= 1 << i; |
| 2527 | tail = XCONS (tail)->cdr; | 2863 | tail = XCONS (tail)->cdr; |
| 2528 | } | 2864 | } |
| 2529 | } | 2865 | } |
| @@ -2534,46 +2870,39 @@ setup_coding_system (coding_system, coding) | |||
| 2534 | = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i); | 2870 | = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i); |
| 2535 | } | 2871 | } |
| 2536 | 2872 | ||
| 2537 | if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)) | 2873 | if (reg_bits && ! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)) |
| 2538 | { | 2874 | { |
| 2539 | /* REG 1 can be used only by locking shift in 7-bit env. */ | 2875 | /* REG 1 can be used only by locking shift in 7-bit env. */ |
| 2540 | if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) | 2876 | if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) |
| 2541 | default_reg_bits &= ~2; | 2877 | reg_bits &= ~2; |
| 2542 | if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) | 2878 | if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) |
| 2543 | /* Without any shifting, only REG 0 and 1 can be used. */ | 2879 | /* Without any shifting, only REG 0 and 1 can be used. */ |
| 2544 | default_reg_bits &= 3; | 2880 | reg_bits &= 3; |
| 2545 | } | 2881 | } |
| 2546 | 2882 | ||
| 2547 | for (charset = 0; charset <= MAX_CHARSET; charset++) | 2883 | if (reg_bits) |
| 2548 | if (CHARSET_VALID_P (charset) | 2884 | for (charset = 0; charset <= MAX_CHARSET; charset++) |
| 2549 | && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | ||
| 2550 | == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)) | ||
| 2551 | { | 2885 | { |
| 2552 | /* We have not yet decided where to designate CHARSET. */ | 2886 | if (CHARSET_VALID_P (charset)) |
| 2553 | int reg_bits = default_reg_bits; | 2887 | { |
| 2554 | 2888 | /* There exist some default graphic registers to be | |
| 2555 | if (CHARSET_CHARS (charset) == 96) | 2889 | used CHARSET. */ |
| 2556 | /* A charset of CHARS96 can't be designated to REG 0. */ | 2890 | |
| 2557 | reg_bits &= ~1; | 2891 | /* We had better avoid designating a charset of |
| 2558 | 2892 | CHARS96 to REG 0 as far as possible. */ | |
| 2559 | if (reg_bits) | 2893 | if (CHARSET_CHARS (charset) == 96) |
| 2560 | /* There exist some default graphic register. */ | 2894 | CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) |
| 2561 | CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | 2895 | = (reg_bits & 2 |
| 2562 | = (reg_bits & 1 | 2896 | ? 1 : (reg_bits & 4 ? 2 : (reg_bits & 8 ? 3 : 0))); |
| 2563 | ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3))); | 2897 | else |
| 2564 | else | 2898 | CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) |
| 2565 | /* We anyway have to designate CHARSET to somewhere. */ | 2899 | = (reg_bits & 1 |
| 2566 | CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | 2900 | ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3))); |
| 2567 | = (CHARSET_CHARS (charset) == 94 | 2901 | } |
| 2568 | ? 0 | ||
| 2569 | : ((coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT | ||
| 2570 | || ! coding->flags & CODING_FLAG_ISO_SEVEN_BITS) | ||
| 2571 | ? 1 | ||
| 2572 | : (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT | ||
| 2573 | ? 2 : 0))); | ||
| 2574 | } | 2902 | } |
| 2575 | } | 2903 | } |
| 2576 | coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK; | 2904 | coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK; |
| 2905 | coding->spec.iso2022.last_invalid_designation_register = -1; | ||
| 2577 | break; | 2906 | break; |
| 2578 | 2907 | ||
| 2579 | case 3: | 2908 | case 3: |
| @@ -2610,23 +2939,16 @@ setup_coding_system (coding_system, coding) | |||
| 2610 | break; | 2939 | break; |
| 2611 | 2940 | ||
| 2612 | default: | 2941 | default: |
| 2613 | if (EQ (type, Qt)) | 2942 | goto label_invalid_coding_system; |
| 2614 | { | ||
| 2615 | coding->type = coding_type_undecided; | ||
| 2616 | coding->common_flags |= CODING_REQUIRE_DETECTION_MASK; | ||
| 2617 | } | ||
| 2618 | else | ||
| 2619 | coding->type = coding_type_no_conversion; | ||
| 2620 | break; | ||
| 2621 | } | 2943 | } |
| 2622 | return 0; | 2944 | return 0; |
| 2623 | 2945 | ||
| 2624 | label_invalid_coding_system: | 2946 | label_invalid_coding_system: |
| 2625 | coding->type = coding_type_no_conversion; | 2947 | coding->type = coding_type_no_conversion; |
| 2948 | coding->category_idx = CODING_CATEGORY_IDX_BINARY; | ||
| 2626 | coding->common_flags = 0; | 2949 | coding->common_flags = 0; |
| 2627 | coding->eol_type = CODING_EOL_LF; | 2950 | coding->eol_type = CODING_EOL_LF; |
| 2628 | coding->symbol = coding->pre_write_conversion = coding->post_read_conversion | 2951 | coding->pre_write_conversion = coding->post_read_conversion = Qnil; |
| 2629 | = Qnil; | ||
| 2630 | return -1; | 2952 | return -1; |
| 2631 | } | 2953 | } |
| 2632 | 2954 | ||
| @@ -2652,8 +2974,14 @@ setup_coding_system (coding_system, coding) | |||
| 2652 | 2974 | ||
| 2653 | The category for a coding system which has the same code range | 2975 | The category for a coding system which has the same code range |
| 2654 | as ISO2022 of 7-bit environment. This doesn't use any locking | 2976 | as ISO2022 of 7-bit environment. This doesn't use any locking |
| 2655 | shift and single shift functions. Assigned the coding-system | 2977 | shift and single shift functions. This can encode/decode all |
| 2656 | (Lisp symbol) `iso-2022-7bit' by default. | 2978 | charsets. Assigned the coding-system (Lisp symbol) |
| 2979 | `iso-2022-7bit' by default. | ||
| 2980 | |||
| 2981 | o coding-category-iso-7-tight | ||
| 2982 | |||
| 2983 | Same as coding-category-iso-7 except that this can | ||
| 2984 | encode/decode only the specified charsets. | ||
| 2657 | 2985 | ||
| 2658 | o coding-category-iso-8-1 | 2986 | o coding-category-iso-8-1 |
| 2659 | 2987 | ||
| @@ -2707,19 +3035,23 @@ setup_coding_system (coding_system, coding) | |||
| 2707 | 3035 | ||
| 2708 | */ | 3036 | */ |
| 2709 | 3037 | ||
| 2710 | /* Detect how a text of length SRC_BYTES pointed by SRC is encoded. | 3038 | /* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded. |
| 2711 | If it detects possible coding systems, return an integer in which | 3039 | If it detects possible coding systems, return an integer in which |
| 2712 | appropriate flag bits are set. Flag bits are defined by macros | 3040 | appropriate flag bits are set. Flag bits are defined by macros |
| 2713 | CODING_CATEGORY_MASK_XXX in `coding.h'. */ | 3041 | CODING_CATEGORY_MASK_XXX in `coding.h'. |
| 2714 | 3042 | ||
| 2715 | int | 3043 | How many ASCII characters are at the head is returned as *SKIP. */ |
| 2716 | detect_coding_mask (src, src_bytes) | 3044 | |
| 2717 | unsigned char *src; | 3045 | static int |
| 2718 | int src_bytes; | 3046 | detect_coding_mask (source, src_bytes, priorities, skip) |
| 3047 | unsigned char *source; | ||
| 3048 | int src_bytes, *priorities, *skip; | ||
| 2719 | { | 3049 | { |
| 2720 | register unsigned char c; | 3050 | register unsigned char c; |
| 2721 | unsigned char *src_end = src + src_bytes; | 3051 | unsigned char *src = source, *src_end = source + src_bytes; |
| 2722 | int mask; | 3052 | unsigned int mask = (CODING_CATEGORY_MASK_ISO_7BIT |
| 3053 | | CODING_CATEGORY_MASK_ISO_SHIFT); | ||
| 3054 | int i; | ||
| 2723 | 3055 | ||
| 2724 | /* At first, skip all ASCII characters and control characters except | 3056 | /* At first, skip all ASCII characters and control characters except |
| 2725 | for three ISO2022 specific control characters. */ | 3057 | for three ISO2022 specific control characters. */ |
| @@ -2728,14 +3060,18 @@ detect_coding_mask (src, src_bytes) | |||
| 2728 | { | 3060 | { |
| 2729 | c = *src; | 3061 | c = *src; |
| 2730 | if (c >= 0x80 | 3062 | if (c >= 0x80 |
| 2731 | || (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)) | 3063 | || ((mask & CODING_CATEGORY_MASK_ISO_7BIT) |
| 3064 | && c == ISO_CODE_ESC) | ||
| 3065 | || ((mask & CODING_CATEGORY_MASK_ISO_SHIFT) | ||
| 3066 | && (c == ISO_CODE_SI || c == ISO_CODE_SO))) | ||
| 2732 | break; | 3067 | break; |
| 2733 | src++; | 3068 | src++; |
| 2734 | } | 3069 | } |
| 3070 | *skip = src - source; | ||
| 2735 | 3071 | ||
| 2736 | if (src >= src_end) | 3072 | if (src >= src_end) |
| 2737 | /* We found nothing other than ASCII. There's nothing to do. */ | 3073 | /* We found nothing other than ASCII. There's nothing to do. */ |
| 2738 | return CODING_CATEGORY_MASK_ANY; | 3074 | return 0; |
| 2739 | 3075 | ||
| 2740 | /* The text seems to be encoded in some multilingual coding system. | 3076 | /* The text seems to be encoded in some multilingual coding system. |
| 2741 | Now, try to find in which coding system the text is encoded. */ | 3077 | Now, try to find in which coding system the text is encoded. */ |
| @@ -2744,49 +3080,90 @@ detect_coding_mask (src, src_bytes) | |||
| 2744 | /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */ | 3080 | /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */ |
| 2745 | /* C is an ISO2022 specific control code of C0. */ | 3081 | /* C is an ISO2022 specific control code of C0. */ |
| 2746 | mask = detect_coding_iso2022 (src, src_end); | 3082 | mask = detect_coding_iso2022 (src, src_end); |
| 2747 | src++; | ||
| 2748 | if (mask == 0) | 3083 | if (mask == 0) |
| 2749 | /* No valid ISO2022 code follows C. Try again. */ | 3084 | { |
| 2750 | goto label_loop_detect_coding; | 3085 | /* No valid ISO2022 code follows C. Try again. */ |
| 2751 | mask |= CODING_CATEGORY_MASK_RAW_TEXT; | 3086 | src++; |
| 3087 | mask = (c != ISO_CODE_ESC | ||
| 3088 | ? CODING_CATEGORY_MASK_ISO_7BIT | ||
| 3089 | : CODING_CATEGORY_MASK_ISO_SHIFT); | ||
| 3090 | goto label_loop_detect_coding; | ||
| 3091 | } | ||
| 3092 | if (priorities) | ||
| 3093 | goto label_return_highest_only; | ||
| 2752 | } | 3094 | } |
| 2753 | else if (c < 0xA0) | 3095 | else |
| 2754 | { | 3096 | { |
| 2755 | /* If C is a special latin extra code, | 3097 | int try; |
| 2756 | or is an ISO2022 specific control code of C1 (SS2 or SS3), | ||
| 2757 | or is an ISO2022 control-sequence-introducer (CSI), | ||
| 2758 | we should also consider the possibility of ISO2022 codings. */ | ||
| 2759 | if ((VECTORP (Vlatin_extra_code_table) | ||
| 2760 | && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) | ||
| 2761 | || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3) | ||
| 2762 | || (c == ISO_CODE_CSI | ||
| 2763 | && (src < src_end | ||
| 2764 | && (*src == ']' | ||
| 2765 | || (src + 1 < src_end | ||
| 2766 | && src[1] == ']' | ||
| 2767 | && (*src == '0' || *src == '1' || *src == '2')))))) | ||
| 2768 | mask = (detect_coding_iso2022 (src, src_end) | ||
| 2769 | | detect_coding_sjis (src, src_end) | ||
| 2770 | | detect_coding_emacs_mule (src, src_end) | ||
| 2771 | | CODING_CATEGORY_MASK_RAW_TEXT); | ||
| 2772 | 3098 | ||
| 3099 | if (c < 0xA0) | ||
| 3100 | { | ||
| 3101 | /* C is the first byte of SJIS character code, | ||
| 3102 | or a leading-code of Emacs' internal format (emacs-mule). */ | ||
| 3103 | try = CODING_CATEGORY_MASK_SJIS | CODING_CATEGORY_MASK_EMACS_MULE; | ||
| 3104 | |||
| 3105 | /* Or, if C is a special latin extra code, | ||
| 3106 | or is an ISO2022 specific control code of C1 (SS2 or SS3), | ||
| 3107 | or is an ISO2022 control-sequence-introducer (CSI), | ||
| 3108 | we should also consider the possibility of ISO2022 codings. */ | ||
| 3109 | if ((VECTORP (Vlatin_extra_code_table) | ||
| 3110 | && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) | ||
| 3111 | || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3) | ||
| 3112 | || (c == ISO_CODE_CSI | ||
| 3113 | && (src < src_end | ||
| 3114 | && (*src == ']' | ||
| 3115 | || ((*src == '0' || *src == '1' || *src == '2') | ||
| 3116 | && src + 1 < src_end | ||
| 3117 | && src[1] == ']'))))) | ||
| 3118 | try |= (CODING_CATEGORY_MASK_ISO_8_ELSE | ||
| 3119 | | CODING_CATEGORY_MASK_ISO_8BIT); | ||
| 3120 | } | ||
| 2773 | else | 3121 | else |
| 2774 | /* C is the first byte of SJIS character code, | 3122 | /* C is a character of ISO2022 in graphic plane right, |
| 2775 | or a leading-code of Emacs' internal format (emacs-mule). */ | 3123 | or a SJIS's 1-byte character code (i.e. JISX0201), |
| 2776 | mask = (detect_coding_sjis (src, src_end) | 3124 | or the first byte of BIG5's 2-byte code. */ |
| 2777 | | detect_coding_emacs_mule (src, src_end) | 3125 | try = (CODING_CATEGORY_MASK_ISO_8_ELSE |
| 2778 | | CODING_CATEGORY_MASK_RAW_TEXT); | 3126 | | CODING_CATEGORY_MASK_ISO_8BIT |
| 3127 | | CODING_CATEGORY_MASK_SJIS | ||
| 3128 | | CODING_CATEGORY_MASK_BIG5); | ||
| 3129 | |||
| 3130 | mask = 0; | ||
| 3131 | if (priorities) | ||
| 3132 | { | ||
| 3133 | for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) | ||
| 3134 | { | ||
| 3135 | priorities[i] &= try; | ||
| 3136 | if (priorities[i] & CODING_CATEGORY_MASK_ISO) | ||
| 3137 | mask = detect_coding_iso2022 (src, src_end); | ||
| 3138 | else if (priorities[i] & CODING_CATEGORY_MASK_SJIS) | ||
| 3139 | mask = detect_coding_sjis (src, src_end); | ||
| 3140 | else if (priorities[i] & CODING_CATEGORY_MASK_BIG5) | ||
| 3141 | mask = detect_coding_big5 (src, src_end); | ||
| 3142 | else if (priorities[i] & CODING_CATEGORY_MASK_EMACS_MULE) | ||
| 3143 | mask = detect_coding_emacs_mule (src, src_end); | ||
| 3144 | if (mask) | ||
| 3145 | goto label_return_highest_only; | ||
| 3146 | } | ||
| 3147 | return CODING_CATEGORY_MASK_RAW_TEXT; | ||
| 3148 | } | ||
| 3149 | if (try & CODING_CATEGORY_MASK_ISO) | ||
| 3150 | mask |= detect_coding_iso2022 (src, src_end); | ||
| 3151 | if (try & CODING_CATEGORY_MASK_SJIS) | ||
| 3152 | mask |= detect_coding_sjis (src, src_end); | ||
| 3153 | if (try & CODING_CATEGORY_MASK_BIG5) | ||
| 3154 | mask |= detect_coding_big5 (src, src_end); | ||
| 3155 | if (try & CODING_CATEGORY_MASK_EMACS_MULE) | ||
| 3156 | mask |= detect_coding_emacs_mule (src, src_end); | ||
| 2779 | } | 3157 | } |
| 2780 | else | 3158 | return (mask | CODING_CATEGORY_MASK_RAW_TEXT); |
| 2781 | /* C is a character of ISO2022 in graphic plane right, | 3159 | |
| 2782 | or a SJIS's 1-byte character code (i.e. JISX0201), | 3160 | label_return_highest_only: |
| 2783 | or the first byte of BIG5's 2-byte code. */ | 3161 | for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) |
| 2784 | mask = (detect_coding_iso2022 (src, src_end) | 3162 | { |
| 2785 | | detect_coding_sjis (src, src_end) | 3163 | if (mask & priorities[i]) |
| 2786 | | detect_coding_big5 (src, src_end) | 3164 | return priorities[i]; |
| 2787 | | CODING_CATEGORY_MASK_RAW_TEXT); | 3165 | } |
| 2788 | 3166 | return CODING_CATEGORY_MASK_RAW_TEXT; | |
| 2789 | return mask; | ||
| 2790 | } | 3167 | } |
| 2791 | 3168 | ||
| 2792 | /* Detect how a text of length SRC_BYTES pointed by SRC is encoded. | 3169 | /* Detect how a text of length SRC_BYTES pointed by SRC is encoded. |
| @@ -2798,61 +3175,81 @@ detect_coding (coding, src, src_bytes) | |||
| 2798 | unsigned char *src; | 3175 | unsigned char *src; |
| 2799 | int src_bytes; | 3176 | int src_bytes; |
| 2800 | { | 3177 | { |
| 2801 | int mask = detect_coding_mask (src, src_bytes); | 3178 | unsigned int idx; |
| 2802 | int idx; | 3179 | int skip, mask, i; |
| 3180 | int priorities[CODING_CATEGORY_IDX_MAX]; | ||
| 2803 | Lisp_Object val = Vcoding_category_list; | 3181 | Lisp_Object val = Vcoding_category_list; |
| 2804 | 3182 | ||
| 2805 | if (mask == CODING_CATEGORY_MASK_ANY) | 3183 | i = 0; |
| 2806 | /* We found nothing other than ASCII. There's nothing to do. */ | 3184 | while (CONSP (val) && i < CODING_CATEGORY_IDX_MAX) |
| 2807 | return; | 3185 | { |
| 3186 | if (! SYMBOLP (XCONS (val)->car)) | ||
| 3187 | break; | ||
| 3188 | idx = XFASTINT (Fget (XCONS (val)->car, Qcoding_category_index)); | ||
| 3189 | if (idx >= CODING_CATEGORY_IDX_MAX) | ||
| 3190 | break; | ||
| 3191 | priorities[i++] = (1 << idx); | ||
| 3192 | val = XCONS (val)->cdr; | ||
| 3193 | } | ||
| 3194 | /* If coding-category-list is valid and contains all coding | ||
| 3195 | categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not, | ||
| 3196 | the following code saves Emacs from craching. */ | ||
| 3197 | while (i < CODING_CATEGORY_IDX_MAX) | ||
| 3198 | priorities[i++] = CODING_CATEGORY_MASK_RAW_TEXT; | ||
| 2808 | 3199 | ||
| 2809 | /* We found some plausible coding systems. Let's use a coding | 3200 | mask = detect_coding_mask (src, src_bytes, priorities, &skip); |
| 2810 | system of the highest priority. */ | 3201 | coding->heading_ascii = skip; |
| 2811 | 3202 | ||
| 2812 | if (CONSP (val)) | 3203 | if (!mask) return; |
| 2813 | while (!NILP (val)) | 3204 | |
| 2814 | { | 3205 | /* We found a single coding system of the highest priority in MASK. */ |
| 2815 | idx = XFASTINT (Fget (XCONS (val)->car, Qcoding_category_index)); | 3206 | idx = 0; |
| 2816 | if ((idx < CODING_CATEGORY_IDX_MAX) && (mask & (1 << idx))) | 3207 | while (mask && ! (mask & 1)) mask >>= 1, idx++; |
| 2817 | break; | 3208 | if (! mask) |
| 2818 | val = XCONS (val)->cdr; | 3209 | idx = CODING_CATEGORY_IDX_RAW_TEXT; |
| 2819 | } | ||
| 2820 | else | ||
| 2821 | val = Qnil; | ||
| 2822 | 3210 | ||
| 2823 | if (NILP (val)) | 3211 | val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[idx])->value; |
| 3212 | |||
| 3213 | if (coding->eol_type != CODING_EOL_UNDECIDED) | ||
| 2824 | { | 3214 | { |
| 2825 | /* For unknown reason, `Vcoding_category_list' contains none of | 3215 | Lisp_Object tmp = Fget (val, Qeol_type); |
| 2826 | found categories. Let's use any of them. */ | 3216 | |
| 2827 | for (idx = 0; idx < CODING_CATEGORY_IDX_MAX; idx++) | 3217 | if (VECTORP (tmp)) |
| 2828 | if (mask & (1 << idx)) | 3218 | val = XVECTOR (tmp)->contents[coding->eol_type]; |
| 2829 | break; | ||
| 2830 | } | 3219 | } |
| 2831 | setup_coding_system (XSYMBOL (coding_category_table[idx])->value, coding); | 3220 | setup_coding_system (val, coding); |
| 3221 | /* Set this again because setup_coding_system reset this member. */ | ||
| 3222 | coding->heading_ascii = skip; | ||
| 2832 | } | 3223 | } |
| 2833 | 3224 | ||
| 2834 | /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC | 3225 | /* Detect how end-of-line of a text of length SRC_BYTES pointed by |
| 2835 | is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF, | 3226 | SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF, |
| 2836 | CODING_EOL_CR, and CODING_EOL_UNDECIDED. */ | 3227 | CODING_EOL_CR, and CODING_EOL_UNDECIDED. |
| 3228 | |||
| 3229 | How many non-eol characters are at the head is returned as *SKIP. */ | ||
| 2837 | 3230 | ||
| 2838 | #define MAX_EOL_CHECK_COUNT 3 | 3231 | #define MAX_EOL_CHECK_COUNT 3 |
| 2839 | 3232 | ||
| 2840 | int | 3233 | static int |
| 2841 | detect_eol_type (src, src_bytes) | 3234 | detect_eol_type (source, src_bytes, skip) |
| 2842 | unsigned char *src; | 3235 | unsigned char *source; |
| 2843 | int src_bytes; | 3236 | int src_bytes, *skip; |
| 2844 | { | 3237 | { |
| 2845 | unsigned char *src_end = src + src_bytes; | 3238 | unsigned char *src = source, *src_end = src + src_bytes; |
| 2846 | unsigned char c; | 3239 | unsigned char c; |
| 2847 | int total = 0; /* How many end-of-lines are found so far. */ | 3240 | int total = 0; /* How many end-of-lines are found so far. */ |
| 2848 | int eol_type = CODING_EOL_UNDECIDED; | 3241 | int eol_type = CODING_EOL_UNDECIDED; |
| 2849 | int this_eol_type; | 3242 | int this_eol_type; |
| 2850 | 3243 | ||
| 3244 | *skip = 0; | ||
| 3245 | |||
| 2851 | while (src < src_end && total < MAX_EOL_CHECK_COUNT) | 3246 | while (src < src_end && total < MAX_EOL_CHECK_COUNT) |
| 2852 | { | 3247 | { |
| 2853 | c = *src++; | 3248 | c = *src++; |
| 2854 | if (c == '\n' || c == '\r') | 3249 | if (c == '\n' || c == '\r') |
| 2855 | { | 3250 | { |
| 3251 | if (*skip == 0) | ||
| 3252 | *skip = src - 1 - source; | ||
| 2856 | total++; | 3253 | total++; |
| 2857 | if (c == '\n') | 3254 | if (c == '\n') |
| 2858 | this_eol_type = CODING_EOL_LF; | 3255 | this_eol_type = CODING_EOL_LF; |
| @@ -2865,12 +3262,16 @@ detect_eol_type (src, src_bytes) | |||
| 2865 | /* This is the first end-of-line. */ | 3262 | /* This is the first end-of-line. */ |
| 2866 | eol_type = this_eol_type; | 3263 | eol_type = this_eol_type; |
| 2867 | else if (eol_type != this_eol_type) | 3264 | else if (eol_type != this_eol_type) |
| 2868 | /* The found type is different from what found before. | 3265 | { |
| 2869 | Let's notice the caller about this inconsistency. */ | 3266 | /* The found type is different from what found before. */ |
| 2870 | return CODING_EOL_INCONSISTENT; | 3267 | eol_type = CODING_EOL_INCONSISTENT; |
| 3268 | break; | ||
| 3269 | } | ||
| 2871 | } | 3270 | } |
| 2872 | } | 3271 | } |
| 2873 | 3272 | ||
| 3273 | if (*skip == 0) | ||
| 3274 | *skip = src_end - source; | ||
| 2874 | return eol_type; | 3275 | return eol_type; |
| 2875 | } | 3276 | } |
| 2876 | 3277 | ||
| @@ -2885,12 +3286,16 @@ detect_eol (coding, src, src_bytes) | |||
| 2885 | int src_bytes; | 3286 | int src_bytes; |
| 2886 | { | 3287 | { |
| 2887 | Lisp_Object val; | 3288 | Lisp_Object val; |
| 2888 | int eol_type = detect_eol_type (src, src_bytes); | 3289 | int skip; |
| 3290 | int eol_type = detect_eol_type (src, src_bytes, &skip); | ||
| 3291 | |||
| 3292 | if (coding->heading_ascii > skip) | ||
| 3293 | coding->heading_ascii = skip; | ||
| 3294 | else | ||
| 3295 | skip = coding->heading_ascii; | ||
| 2889 | 3296 | ||
| 2890 | if (eol_type == CODING_EOL_UNDECIDED) | 3297 | if (eol_type == CODING_EOL_UNDECIDED) |
| 2891 | /* We found no end-of-line in the source text. */ | ||
| 2892 | return; | 3298 | return; |
| 2893 | |||
| 2894 | if (eol_type == CODING_EOL_INCONSISTENT) | 3299 | if (eol_type == CODING_EOL_INCONSISTENT) |
| 2895 | { | 3300 | { |
| 2896 | #if 0 | 3301 | #if 0 |
| @@ -2911,7 +3316,121 @@ detect_eol (coding, src, src_bytes) | |||
| 2911 | 3316 | ||
| 2912 | val = Fget (coding->symbol, Qeol_type); | 3317 | val = Fget (coding->symbol, Qeol_type); |
| 2913 | if (VECTORP (val) && XVECTOR (val)->size == 3) | 3318 | if (VECTORP (val) && XVECTOR (val)->size == 3) |
| 2914 | setup_coding_system (XVECTOR (val)->contents[eol_type], coding); | 3319 | { |
| 3320 | setup_coding_system (XVECTOR (val)->contents[eol_type], coding); | ||
| 3321 | coding->heading_ascii = skip; | ||
| 3322 | } | ||
| 3323 | } | ||
| 3324 | |||
| 3325 | #define CONVERSION_BUFFER_EXTRA_ROOM 256 | ||
| 3326 | |||
| 3327 | #define DECODING_BUFFER_MAG(coding) \ | ||
| 3328 | (coding->type == coding_type_iso2022 \ | ||
| 3329 | ? 3 \ | ||
| 3330 | : ((coding->type == coding_type_sjis || coding->type == coding_type_big5) \ | ||
| 3331 | ? 2 \ | ||
| 3332 | : (coding->type == coding_type_raw_text \ | ||
| 3333 | ? 1 \ | ||
| 3334 | : (coding->type == coding_type_ccl \ | ||
| 3335 | ? coding->spec.ccl.decoder.buf_magnification \ | ||
| 3336 | : 2)))) | ||
| 3337 | |||
| 3338 | /* Return maximum size (bytes) of a buffer enough for decoding | ||
| 3339 | SRC_BYTES of text encoded in CODING. */ | ||
| 3340 | |||
| 3341 | int | ||
| 3342 | decoding_buffer_size (coding, src_bytes) | ||
| 3343 | struct coding_system *coding; | ||
| 3344 | int src_bytes; | ||
| 3345 | { | ||
| 3346 | return (src_bytes * DECODING_BUFFER_MAG (coding) | ||
| 3347 | + CONVERSION_BUFFER_EXTRA_ROOM); | ||
| 3348 | } | ||
| 3349 | |||
| 3350 | /* Return maximum size (bytes) of a buffer enough for encoding | ||
| 3351 | SRC_BYTES of text to CODING. */ | ||
| 3352 | |||
| 3353 | int | ||
| 3354 | encoding_buffer_size (coding, src_bytes) | ||
| 3355 | struct coding_system *coding; | ||
| 3356 | int src_bytes; | ||
| 3357 | { | ||
| 3358 | int magnification; | ||
| 3359 | |||
| 3360 | if (coding->type == coding_type_ccl) | ||
| 3361 | magnification = coding->spec.ccl.encoder.buf_magnification; | ||
| 3362 | else | ||
| 3363 | magnification = 3; | ||
| 3364 | |||
| 3365 | return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM); | ||
| 3366 | } | ||
| 3367 | |||
| 3368 | #ifndef MINIMUM_CONVERSION_BUFFER_SIZE | ||
| 3369 | #define MINIMUM_CONVERSION_BUFFER_SIZE 1024 | ||
| 3370 | #endif | ||
| 3371 | |||
| 3372 | char *conversion_buffer; | ||
| 3373 | int conversion_buffer_size; | ||
| 3374 | |||
| 3375 | /* Return a pointer to a SIZE bytes of buffer to be used for encoding | ||
| 3376 | or decoding. Sufficient memory is allocated automatically. If we | ||
| 3377 | run out of memory, return NULL. */ | ||
| 3378 | |||
| 3379 | char * | ||
| 3380 | get_conversion_buffer (size) | ||
| 3381 | int size; | ||
| 3382 | { | ||
| 3383 | if (size > conversion_buffer_size) | ||
| 3384 | { | ||
| 3385 | char *buf; | ||
| 3386 | int real_size = conversion_buffer_size * 2; | ||
| 3387 | |||
| 3388 | while (real_size < size) real_size *= 2; | ||
| 3389 | buf = (char *) xmalloc (real_size); | ||
| 3390 | xfree (conversion_buffer); | ||
| 3391 | conversion_buffer = buf; | ||
| 3392 | conversion_buffer_size = real_size; | ||
| 3393 | } | ||
| 3394 | return conversion_buffer; | ||
| 3395 | } | ||
| 3396 | |||
| 3397 | int | ||
| 3398 | ccl_coding_driver (coding, source, destination, src_bytes, dst_bytes, encodep) | ||
| 3399 | struct coding_system *coding; | ||
| 3400 | unsigned char *source, *destination; | ||
| 3401 | int src_bytes, dst_bytes, encodep; | ||
| 3402 | { | ||
| 3403 | struct ccl_program *ccl | ||
| 3404 | = encodep ? &coding->spec.ccl.encoder : &coding->spec.ccl.decoder; | ||
| 3405 | int result; | ||
| 3406 | |||
| 3407 | coding->produced = ccl_driver (ccl, source, destination, | ||
| 3408 | src_bytes, dst_bytes, &(coding->consumed)); | ||
| 3409 | if (encodep) | ||
| 3410 | { | ||
| 3411 | coding->produced_char = coding->produced; | ||
| 3412 | coding->consumed_char | ||
| 3413 | = multibyte_chars_in_text (source, coding->consumed); | ||
| 3414 | } | ||
| 3415 | else | ||
| 3416 | { | ||
| 3417 | coding->produced_char | ||
| 3418 | = multibyte_chars_in_text (destination, coding->produced); | ||
| 3419 | coding->consumed_char = coding->consumed; | ||
| 3420 | } | ||
| 3421 | switch (ccl->status) | ||
| 3422 | { | ||
| 3423 | case CCL_STAT_SUSPEND_BY_SRC: | ||
| 3424 | result = CODING_FINISH_INSUFFICIENT_SRC; | ||
| 3425 | break; | ||
| 3426 | case CCL_STAT_SUSPEND_BY_DST: | ||
| 3427 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 3428 | break; | ||
| 3429 | default: | ||
| 3430 | result = CODING_FINISH_NORMAL; | ||
| 3431 | break; | ||
| 3432 | } | ||
| 3433 | return result; | ||
| 2915 | } | 3434 | } |
| 2916 | 3435 | ||
| 2917 | /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before | 3436 | /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before |
| @@ -2919,18 +3438,18 @@ detect_eol (coding, src, src_bytes) | |||
| 2919 | those are not yet decided. */ | 3438 | those are not yet decided. */ |
| 2920 | 3439 | ||
| 2921 | int | 3440 | int |
| 2922 | decode_coding (coding, source, destination, src_bytes, dst_bytes, consumed) | 3441 | decode_coding (coding, source, destination, src_bytes, dst_bytes) |
| 2923 | struct coding_system *coding; | 3442 | struct coding_system *coding; |
| 2924 | unsigned char *source, *destination; | 3443 | unsigned char *source, *destination; |
| 2925 | int src_bytes, dst_bytes; | 3444 | int src_bytes, dst_bytes; |
| 2926 | int *consumed; | ||
| 2927 | { | 3445 | { |
| 2928 | int produced; | 3446 | int result; |
| 2929 | 3447 | ||
| 2930 | if (src_bytes <= 0) | 3448 | if (src_bytes <= 0) |
| 2931 | { | 3449 | { |
| 2932 | *consumed = 0; | 3450 | coding->produced = coding->produced_char = 0; |
| 2933 | return 0; | 3451 | coding->consumed = coding->consumed_char = 0; |
| 3452 | return CODING_FINISH_NORMAL; | ||
| 2934 | } | 3453 | } |
| 2935 | 3454 | ||
| 2936 | if (coding->type == coding_type_undecided) | 3455 | if (coding->type == coding_type_undecided) |
| @@ -2939,184 +3458,714 @@ decode_coding (coding, source, destination, src_bytes, dst_bytes, consumed) | |||
| 2939 | if (coding->eol_type == CODING_EOL_UNDECIDED) | 3458 | if (coding->eol_type == CODING_EOL_UNDECIDED) |
| 2940 | detect_eol (coding, source, src_bytes); | 3459 | detect_eol (coding, source, src_bytes); |
| 2941 | 3460 | ||
| 2942 | coding->carryover_size = 0; | ||
| 2943 | switch (coding->type) | 3461 | switch (coding->type) |
| 2944 | { | 3462 | { |
| 2945 | case coding_type_no_conversion: | ||
| 2946 | label_no_conversion: | ||
| 2947 | produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | ||
| 2948 | bcopy (source, destination, produced); | ||
| 2949 | *consumed = produced; | ||
| 2950 | break; | ||
| 2951 | |||
| 2952 | case coding_type_emacs_mule: | 3463 | case coding_type_emacs_mule: |
| 2953 | case coding_type_undecided: | 3464 | case coding_type_undecided: |
| 2954 | case coding_type_raw_text: | 3465 | case coding_type_raw_text: |
| 2955 | if (coding->eol_type == CODING_EOL_LF | 3466 | if (coding->eol_type == CODING_EOL_LF |
| 2956 | || coding->eol_type == CODING_EOL_UNDECIDED) | 3467 | || coding->eol_type == CODING_EOL_UNDECIDED) |
| 2957 | goto label_no_conversion; | 3468 | goto label_no_conversion; |
| 2958 | produced = decode_eol (coding, source, destination, | 3469 | result = decode_eol (coding, source, destination, src_bytes, dst_bytes); |
| 2959 | src_bytes, dst_bytes, consumed); | ||
| 2960 | break; | 3470 | break; |
| 2961 | 3471 | ||
| 2962 | case coding_type_sjis: | 3472 | case coding_type_sjis: |
| 2963 | produced = decode_coding_sjis_big5 (coding, source, destination, | 3473 | result = decode_coding_sjis_big5 (coding, source, destination, |
| 2964 | src_bytes, dst_bytes, consumed, | 3474 | src_bytes, dst_bytes, 1); |
| 2965 | 1); | ||
| 2966 | break; | 3475 | break; |
| 2967 | 3476 | ||
| 2968 | case coding_type_iso2022: | 3477 | case coding_type_iso2022: |
| 2969 | produced = decode_coding_iso2022 (coding, source, destination, | 3478 | result = decode_coding_iso2022 (coding, source, destination, |
| 2970 | src_bytes, dst_bytes, consumed); | 3479 | src_bytes, dst_bytes); |
| 2971 | break; | 3480 | break; |
| 2972 | 3481 | ||
| 2973 | case coding_type_big5: | 3482 | case coding_type_big5: |
| 2974 | produced = decode_coding_sjis_big5 (coding, source, destination, | 3483 | result = decode_coding_sjis_big5 (coding, source, destination, |
| 2975 | src_bytes, dst_bytes, consumed, | 3484 | src_bytes, dst_bytes, 0); |
| 2976 | 0); | ||
| 2977 | break; | 3485 | break; |
| 2978 | 3486 | ||
| 2979 | case coding_type_ccl: | 3487 | case coding_type_ccl: |
| 2980 | produced = ccl_driver (&coding->spec.ccl.decoder, source, destination, | 3488 | result = ccl_coding_driver (coding, source, destination, |
| 2981 | src_bytes, dst_bytes, consumed); | 3489 | src_bytes, dst_bytes, 0); |
| 3490 | break; | ||
| 3491 | |||
| 3492 | default: /* i.e. case coding_type_no_conversion: */ | ||
| 3493 | label_no_conversion: | ||
| 3494 | if (dst_bytes && src_bytes > dst_bytes) | ||
| 3495 | { | ||
| 3496 | coding->produced = dst_bytes; | ||
| 3497 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 3498 | } | ||
| 3499 | else | ||
| 3500 | { | ||
| 3501 | coding->produced = src_bytes; | ||
| 3502 | result = CODING_FINISH_NORMAL; | ||
| 3503 | } | ||
| 3504 | if (dst_bytes) | ||
| 3505 | bcopy (source, destination, coding->produced); | ||
| 3506 | else | ||
| 3507 | safe_bcopy (source, destination, coding->produced); | ||
| 3508 | coding->consumed | ||
| 3509 | = coding->consumed_char = coding->produced_char = coding->produced; | ||
| 2982 | break; | 3510 | break; |
| 2983 | } | 3511 | } |
| 2984 | 3512 | ||
| 2985 | return produced; | 3513 | return result; |
| 2986 | } | 3514 | } |
| 2987 | 3515 | ||
| 2988 | /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". */ | 3516 | /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". */ |
| 2989 | 3517 | ||
| 2990 | int | 3518 | int |
| 2991 | encode_coding (coding, source, destination, src_bytes, dst_bytes, consumed) | 3519 | encode_coding (coding, source, destination, src_bytes, dst_bytes) |
| 2992 | struct coding_system *coding; | 3520 | struct coding_system *coding; |
| 2993 | unsigned char *source, *destination; | 3521 | unsigned char *source, *destination; |
| 2994 | int src_bytes, dst_bytes; | 3522 | int src_bytes, dst_bytes; |
| 2995 | int *consumed; | ||
| 2996 | { | 3523 | { |
| 2997 | int produced; | 3524 | int result; |
| 2998 | 3525 | ||
| 2999 | switch (coding->type) | 3526 | if (src_bytes <= 0) |
| 3000 | { | 3527 | { |
| 3001 | case coding_type_no_conversion: | 3528 | coding->produced = coding->produced_char = 0; |
| 3002 | label_no_conversion: | 3529 | coding->consumed = coding->consumed_char = 0; |
| 3003 | produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | 3530 | return CODING_FINISH_NORMAL; |
| 3004 | if (produced > 0) | 3531 | } |
| 3005 | { | ||
| 3006 | bcopy (source, destination, produced); | ||
| 3007 | if (coding->selective) | ||
| 3008 | { | ||
| 3009 | unsigned char *p = destination, *pend = destination + produced; | ||
| 3010 | while (p < pend) | ||
| 3011 | if (*p++ == '\015') p[-1] = '\n'; | ||
| 3012 | } | ||
| 3013 | } | ||
| 3014 | *consumed = produced; | ||
| 3015 | break; | ||
| 3016 | 3532 | ||
| 3533 | switch (coding->type) | ||
| 3534 | { | ||
| 3017 | case coding_type_emacs_mule: | 3535 | case coding_type_emacs_mule: |
| 3018 | case coding_type_undecided: | 3536 | case coding_type_undecided: |
| 3019 | case coding_type_raw_text: | 3537 | case coding_type_raw_text: |
| 3020 | if (coding->eol_type == CODING_EOL_LF | 3538 | if (coding->eol_type == CODING_EOL_LF |
| 3021 | || coding->eol_type == CODING_EOL_UNDECIDED) | 3539 | || coding->eol_type == CODING_EOL_UNDECIDED) |
| 3022 | goto label_no_conversion; | 3540 | goto label_no_conversion; |
| 3023 | produced = encode_eol (coding, source, destination, | 3541 | result = encode_eol (coding, source, destination, src_bytes, dst_bytes); |
| 3024 | src_bytes, dst_bytes, consumed); | ||
| 3025 | break; | 3542 | break; |
| 3026 | 3543 | ||
| 3027 | case coding_type_sjis: | 3544 | case coding_type_sjis: |
| 3028 | produced = encode_coding_sjis_big5 (coding, source, destination, | 3545 | result = encode_coding_sjis_big5 (coding, source, destination, |
| 3029 | src_bytes, dst_bytes, consumed, | 3546 | src_bytes, dst_bytes, 1); |
| 3030 | 1); | ||
| 3031 | break; | 3547 | break; |
| 3032 | 3548 | ||
| 3033 | case coding_type_iso2022: | 3549 | case coding_type_iso2022: |
| 3034 | produced = encode_coding_iso2022 (coding, source, destination, | 3550 | result = encode_coding_iso2022 (coding, source, destination, |
| 3035 | src_bytes, dst_bytes, consumed); | 3551 | src_bytes, dst_bytes); |
| 3036 | break; | 3552 | break; |
| 3037 | 3553 | ||
| 3038 | case coding_type_big5: | 3554 | case coding_type_big5: |
| 3039 | produced = encode_coding_sjis_big5 (coding, source, destination, | 3555 | result = encode_coding_sjis_big5 (coding, source, destination, |
| 3040 | src_bytes, dst_bytes, consumed, | 3556 | src_bytes, dst_bytes, 0); |
| 3041 | 0); | ||
| 3042 | break; | 3557 | break; |
| 3043 | 3558 | ||
| 3044 | case coding_type_ccl: | 3559 | case coding_type_ccl: |
| 3045 | produced = ccl_driver (&coding->spec.ccl.encoder, source, destination, | 3560 | result = ccl_coding_driver (coding, source, destination, |
| 3046 | src_bytes, dst_bytes, consumed); | 3561 | src_bytes, dst_bytes, 1); |
| 3562 | break; | ||
| 3563 | |||
| 3564 | default: /* i.e. case coding_type_no_conversion: */ | ||
| 3565 | label_no_conversion: | ||
| 3566 | if (dst_bytes && src_bytes > dst_bytes) | ||
| 3567 | { | ||
| 3568 | coding->produced = dst_bytes; | ||
| 3569 | result = CODING_FINISH_INSUFFICIENT_DST; | ||
| 3570 | } | ||
| 3571 | else | ||
| 3572 | { | ||
| 3573 | coding->produced = src_bytes; | ||
| 3574 | result = CODING_FINISH_NORMAL; | ||
| 3575 | } | ||
| 3576 | if (dst_bytes) | ||
| 3577 | bcopy (source, destination, coding->produced); | ||
| 3578 | else | ||
| 3579 | safe_bcopy (source, destination, coding->produced); | ||
| 3580 | if (coding->mode & CODING_MODE_SELECTIVE_DISPLAY) | ||
| 3581 | { | ||
| 3582 | unsigned char *p = destination, *pend = p + coding->produced; | ||
| 3583 | while (p < pend) | ||
| 3584 | if (*p++ == '\015') p[-1] = '\n'; | ||
| 3585 | } | ||
| 3586 | coding->consumed | ||
| 3587 | = coding->consumed_char = coding->produced_char = coding->produced; | ||
| 3047 | break; | 3588 | break; |
| 3048 | } | 3589 | } |
| 3049 | 3590 | ||
| 3050 | return produced; | 3591 | return result; |
| 3051 | } | 3592 | } |
| 3052 | 3593 | ||
| 3053 | #define CONVERSION_BUFFER_EXTRA_ROOM 256 | 3594 | /* Scan text in the region between *BEG and *END, skip characters |
| 3595 | which we don't have to decode by coding system CODING at the head | ||
| 3596 | and tail, then set *BEG and *END to the region of the text we | ||
| 3597 | actually have to convert. | ||
| 3054 | 3598 | ||
| 3055 | /* Return maximum size (bytes) of a buffer enough for decoding | 3599 | If STR is not NULL, *BEG and *END are indices into STR. */ |
| 3056 | SRC_BYTES of text encoded in CODING. */ | ||
| 3057 | 3600 | ||
| 3058 | int | 3601 | static void |
| 3059 | decoding_buffer_size (coding, src_bytes) | 3602 | shrink_decoding_region (beg, end, coding, str) |
| 3603 | int *beg, *end; | ||
| 3060 | struct coding_system *coding; | 3604 | struct coding_system *coding; |
| 3061 | int src_bytes; | 3605 | unsigned char *str; |
| 3062 | { | 3606 | { |
| 3063 | int magnification; | 3607 | unsigned char *begp_orig, *begp, *endp_orig, *endp; |
| 3608 | int eol_conversion; | ||
| 3064 | 3609 | ||
| 3065 | if (coding->type == coding_type_iso2022) | 3610 | if (coding->type == coding_type_ccl |
| 3066 | magnification = 3; | 3611 | || coding->type == coding_type_undecided |
| 3067 | else if (coding->type == coding_type_ccl) | 3612 | || !NILP (coding->post_read_conversion)) |
| 3068 | magnification = coding->spec.ccl.decoder.buf_magnification; | 3613 | { |
| 3614 | /* We can't skip any data. */ | ||
| 3615 | return; | ||
| 3616 | } | ||
| 3617 | else if (coding->type == coding_type_no_conversion) | ||
| 3618 | { | ||
| 3619 | /* We need no conversion. */ | ||
| 3620 | *beg = *end; | ||
| 3621 | return; | ||
| 3622 | } | ||
| 3623 | |||
| 3624 | if (coding->heading_ascii >= 0) | ||
| 3625 | /* Detection routine has already found how much we can skip at the | ||
| 3626 | head. */ | ||
| 3627 | *beg += coding->heading_ascii; | ||
| 3628 | |||
| 3629 | if (str) | ||
| 3630 | { | ||
| 3631 | begp_orig = begp = str + *beg; | ||
| 3632 | endp_orig = endp = str + *end; | ||
| 3633 | } | ||
| 3069 | else | 3634 | else |
| 3070 | magnification = 2; | 3635 | { |
| 3636 | move_gap (*beg); | ||
| 3637 | begp_orig = begp = GAP_END_ADDR; | ||
| 3638 | endp_orig = endp = begp + *end - *beg; | ||
| 3639 | } | ||
| 3071 | 3640 | ||
| 3072 | return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM); | 3641 | eol_conversion = (coding->eol_type != CODING_EOL_LF); |
| 3642 | |||
| 3643 | switch (coding->type) | ||
| 3644 | { | ||
| 3645 | case coding_type_emacs_mule: | ||
| 3646 | case coding_type_raw_text: | ||
| 3647 | if (eol_conversion) | ||
| 3648 | { | ||
| 3649 | if (coding->heading_ascii < 0) | ||
| 3650 | while (begp < endp && *begp != '\r') begp++; | ||
| 3651 | while (begp < endp && *(endp - 1) != '\r') endp--; | ||
| 3652 | } | ||
| 3653 | else | ||
| 3654 | begp = endp; | ||
| 3655 | break; | ||
| 3656 | |||
| 3657 | case coding_type_sjis: | ||
| 3658 | case coding_type_big5: | ||
| 3659 | /* We can skip all ASCII characters at the head. */ | ||
| 3660 | if (coding->heading_ascii < 0) | ||
| 3661 | { | ||
| 3662 | if (eol_conversion) | ||
| 3663 | while (begp < endp && *begp < 0x80 && *begp != '\n') begp++; | ||
| 3664 | else | ||
| 3665 | while (begp < endp && *begp < 0x80) begp++; | ||
| 3666 | } | ||
| 3667 | /* We can skip all ASCII characters at the tail except for the | ||
| 3668 | second byte of SJIS or BIG5 code. */ | ||
| 3669 | if (eol_conversion) | ||
| 3670 | while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--; | ||
| 3671 | else | ||
| 3672 | while (begp < endp && endp[-1] < 0x80) endp--; | ||
| 3673 | if (begp < endp && endp < endp_orig && endp[-1] >= 0x80) | ||
| 3674 | endp++; | ||
| 3675 | break; | ||
| 3676 | |||
| 3677 | default: /* i.e. case coding_type_iso2022: */ | ||
| 3678 | if (coding->heading_ascii < 0) | ||
| 3679 | { | ||
| 3680 | unsigned char c; | ||
| 3681 | |||
| 3682 | /* We can skip all ASCII characters at the head except for a | ||
| 3683 | few control codes. */ | ||
| 3684 | while (begp < endp && (c = *begp) < 0x80 | ||
| 3685 | && c != ISO_CODE_CR && c != ISO_CODE_SO | ||
| 3686 | && c != ISO_CODE_SI && c != ISO_CODE_ESC | ||
| 3687 | && (!eol_conversion || c != ISO_CODE_LF)) | ||
| 3688 | begp++; | ||
| 3689 | } | ||
| 3690 | switch (coding->category_idx) | ||
| 3691 | { | ||
| 3692 | case CODING_CATEGORY_IDX_ISO_8_1: | ||
| 3693 | case CODING_CATEGORY_IDX_ISO_8_2: | ||
| 3694 | /* We can skip all ASCII characters at the tail. */ | ||
| 3695 | if (eol_conversion) | ||
| 3696 | while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--; | ||
| 3697 | else | ||
| 3698 | while (begp < endp && endp[-1] < 0x80) endp--; | ||
| 3699 | break; | ||
| 3700 | |||
| 3701 | case CODING_CATEGORY_IDX_ISO_7: | ||
| 3702 | case CODING_CATEGORY_IDX_ISO_7_TIGHT: | ||
| 3703 | /* We can skip all charactes at the tail except for ESC and | ||
| 3704 | the following 2-byte at the tail. */ | ||
| 3705 | if (eol_conversion) | ||
| 3706 | while (begp < endp && endp[-1] != ISO_CODE_ESC && endp[-1] != '\n') | ||
| 3707 | endp--; | ||
| 3708 | else | ||
| 3709 | while (begp < endp && endp[-1] != ISO_CODE_ESC) | ||
| 3710 | endp--; | ||
| 3711 | if (begp < endp && endp[-1] == ISO_CODE_ESC) | ||
| 3712 | { | ||
| 3713 | if (endp + 1 < endp_orig && end[0] == '(' && end[1] == 'B') | ||
| 3714 | /* This is an ASCII designation sequence. We can | ||
| 3715 | surely skip the tail. */ | ||
| 3716 | endp += 2; | ||
| 3717 | else | ||
| 3718 | /* Hmmm, we can't skip the tail. */ | ||
| 3719 | endp = endp_orig; | ||
| 3720 | } | ||
| 3721 | } | ||
| 3722 | } | ||
| 3723 | *beg += begp - begp_orig; | ||
| 3724 | *end += endp - endp_orig; | ||
| 3725 | return; | ||
| 3073 | } | 3726 | } |
| 3074 | 3727 | ||
| 3075 | /* Return maximum size (bytes) of a buffer enough for encoding | 3728 | /* Like shrink_decoding_region but for encoding. */ |
| 3076 | SRC_BYTES of text to CODING. */ | ||
| 3077 | 3729 | ||
| 3078 | int | 3730 | static void |
| 3079 | encoding_buffer_size (coding, src_bytes) | 3731 | shrink_encoding_region (beg, end, coding, str) |
| 3732 | int *beg, *end; | ||
| 3080 | struct coding_system *coding; | 3733 | struct coding_system *coding; |
| 3081 | int src_bytes; | 3734 | unsigned char *str; |
| 3082 | { | 3735 | { |
| 3083 | int magnification; | 3736 | unsigned char *begp_orig, *begp, *endp_orig, *endp; |
| 3737 | int eol_conversion; | ||
| 3084 | 3738 | ||
| 3085 | if (coding->type == coding_type_ccl) | 3739 | if (coding->type == coding_type_ccl) |
| 3086 | magnification = coding->spec.ccl.encoder.buf_magnification; | 3740 | /* We can't skip any data. */ |
| 3741 | return; | ||
| 3742 | else if (coding->type == coding_type_no_conversion) | ||
| 3743 | { | ||
| 3744 | /* We need no conversion. */ | ||
| 3745 | *beg = *end; | ||
| 3746 | return; | ||
| 3747 | } | ||
| 3748 | |||
| 3749 | if (str) | ||
| 3750 | { | ||
| 3751 | begp_orig = begp = str + *beg; | ||
| 3752 | endp_orig = endp = str + *end; | ||
| 3753 | } | ||
| 3087 | else | 3754 | else |
| 3088 | magnification = 3; | 3755 | { |
| 3756 | move_gap (*beg); | ||
| 3757 | begp_orig = begp = GAP_END_ADDR; | ||
| 3758 | endp_orig = endp = begp + *end - *beg; | ||
| 3759 | } | ||
| 3089 | 3760 | ||
| 3090 | return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM); | 3761 | eol_conversion = (coding->eol_type == CODING_EOL_CR |
| 3762 | || coding->eol_type == CODING_EOL_CRLF); | ||
| 3763 | |||
| 3764 | /* Here, we don't have to check coding->pre_write_conversion because | ||
| 3765 | the caller is expected to have handled it already. */ | ||
| 3766 | switch (coding->type) | ||
| 3767 | { | ||
| 3768 | case coding_type_undecided: | ||
| 3769 | case coding_type_emacs_mule: | ||
| 3770 | case coding_type_raw_text: | ||
| 3771 | if (eol_conversion) | ||
| 3772 | { | ||
| 3773 | while (begp < endp && *begp != '\n') begp++; | ||
| 3774 | while (begp < endp && endp[-1] != '\n') endp--; | ||
| 3775 | } | ||
| 3776 | else | ||
| 3777 | begp = endp; | ||
| 3778 | break; | ||
| 3779 | |||
| 3780 | case coding_type_iso2022: | ||
| 3781 | if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL) | ||
| 3782 | { | ||
| 3783 | unsigned char *bol = begp; | ||
| 3784 | while (begp < endp && *begp < 0x80) | ||
| 3785 | { | ||
| 3786 | begp++; | ||
| 3787 | if (begp[-1] == '\n') | ||
| 3788 | bol = begp; | ||
| 3789 | } | ||
| 3790 | begp = bol; | ||
| 3791 | goto label_skip_tail; | ||
| 3792 | } | ||
| 3793 | /* fall down ... */ | ||
| 3794 | |||
| 3795 | default: | ||
| 3796 | /* We can skip all ASCII characters at the head and tail. */ | ||
| 3797 | if (eol_conversion) | ||
| 3798 | while (begp < endp && *begp < 0x80 && *begp != '\n') begp++; | ||
| 3799 | else | ||
| 3800 | while (begp < endp && *begp < 0x80) begp++; | ||
| 3801 | label_skip_tail: | ||
| 3802 | if (eol_conversion) | ||
| 3803 | while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--; | ||
| 3804 | else | ||
| 3805 | while (begp < endp && *(endp - 1) < 0x80) endp--; | ||
| 3806 | break; | ||
| 3807 | } | ||
| 3808 | |||
| 3809 | *beg += begp - begp_orig; | ||
| 3810 | *end += endp - endp_orig; | ||
| 3811 | return; | ||
| 3091 | } | 3812 | } |
| 3092 | 3813 | ||
| 3093 | #ifndef MINIMUM_CONVERSION_BUFFER_SIZE | 3814 | /* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the |
| 3094 | #define MINIMUM_CONVERSION_BUFFER_SIZE 1024 | 3815 | text from FROM to TO by coding system CODING, and return number of |
| 3095 | #endif | 3816 | characters in the resulting text. |
| 3096 | 3817 | ||
| 3097 | char *conversion_buffer; | 3818 | If ADJUST is nonzero, we do various things as if the original text |
| 3098 | int conversion_buffer_size; | 3819 | is deleted and a new text is inserted. See the comments in |
| 3820 | replace_range (insdel.c) to know what we are doing. | ||
| 3099 | 3821 | ||
| 3100 | /* Return a pointer to a SIZE bytes of buffer to be used for encoding | 3822 | ADJUST nonzero also means that post-read-conversion or |
| 3101 | or decoding. Sufficient memory is allocated automatically. If we | 3823 | pre-write-conversion functions (if any) should be processed. */ |
| 3102 | run out of memory, return NULL. */ | ||
| 3103 | 3824 | ||
| 3104 | char * | 3825 | int |
| 3105 | get_conversion_buffer (size) | 3826 | code_convert_region (from, to, coding, encodep, adjust) |
| 3106 | int size; | 3827 | int from, to, encodep, adjust; |
| 3828 | struct coding_system *coding; | ||
| 3107 | { | 3829 | { |
| 3108 | if (size > conversion_buffer_size) | 3830 | int len = to - from, require, inserted, inserted_byte; |
| 3831 | int from_byte, to_byte, len_byte; | ||
| 3832 | int from_byte_orig, to_byte_orig; | ||
| 3833 | Lisp_Object saved_coding_symbol = Qnil; | ||
| 3834 | |||
| 3835 | if (adjust) | ||
| 3109 | { | 3836 | { |
| 3110 | char *buf; | 3837 | prepare_to_modify_buffer (from, to, &from); |
| 3111 | int real_size = conversion_buffer_size * 2; | 3838 | to = from + len; |
| 3839 | } | ||
| 3840 | from_byte = CHAR_TO_BYTE (from); to_byte = CHAR_TO_BYTE (to); | ||
| 3841 | len_byte = from_byte - to_byte; | ||
| 3112 | 3842 | ||
| 3113 | while (real_size < size) real_size *= 2; | 3843 | if (! encodep && CODING_REQUIRE_DETECTION (coding)) |
| 3114 | buf = (char *) xmalloc (real_size); | 3844 | { |
| 3115 | xfree (conversion_buffer); | 3845 | /* We must detect encoding of text and eol. Even if detection |
| 3116 | conversion_buffer = buf; | 3846 | routines can't decide the encoding, we should not let them |
| 3117 | conversion_buffer_size = real_size; | 3847 | undecided because the deeper decoding routine (decode_coding) |
| 3848 | tries to detect the encodings in vain in that case. */ | ||
| 3849 | |||
| 3850 | if (from < GPT && to > GPT) | ||
| 3851 | move_gap_both (from, from_byte); | ||
| 3852 | if (coding->type == coding_type_undecided) | ||
| 3853 | { | ||
| 3854 | detect_coding (coding, BYTE_POS_ADDR (from), len); | ||
| 3855 | if (coding->type == coding_type_undecided) | ||
| 3856 | coding->type = coding_type_emacs_mule; | ||
| 3857 | } | ||
| 3858 | if (coding->eol_type == CODING_EOL_UNDECIDED) | ||
| 3859 | { | ||
| 3860 | saved_coding_symbol = coding->symbol; | ||
| 3861 | detect_eol (coding, BYTE_POS_ADDR (from_byte), len_byte); | ||
| 3862 | if (coding->eol_type == CODING_EOL_UNDECIDED) | ||
| 3863 | coding->eol_type = CODING_EOL_LF; | ||
| 3864 | /* We had better recover the original eol format if we | ||
| 3865 | encounter an inconsitent eol format while decoding. */ | ||
| 3866 | coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL; | ||
| 3867 | } | ||
| 3118 | } | 3868 | } |
| 3119 | return conversion_buffer; | 3869 | |
| 3870 | if (encodep | ||
| 3871 | ? ! CODING_REQUIRE_ENCODING (coding) | ||
| 3872 | : ! CODING_REQUIRE_DECODING (coding)) | ||
| 3873 | return len; | ||
| 3874 | |||
| 3875 | /* Now we convert the text. */ | ||
| 3876 | |||
| 3877 | /* For encoding, we must process pre-write-conversion in advance. */ | ||
| 3878 | if (encodep | ||
| 3879 | && adjust | ||
| 3880 | && ! NILP (coding->pre_write_conversion) | ||
| 3881 | && SYMBOLP (coding->pre_write_conversion) | ||
| 3882 | && ! NILP (Ffboundp (coding->pre_write_conversion))) | ||
| 3883 | { | ||
| 3884 | /* The function in pre-write-conversion put a new text in a new | ||
| 3885 | buffer. */ | ||
| 3886 | struct buffer *prev = current_buffer, *new; | ||
| 3887 | |||
| 3888 | call2 (coding->pre_write_conversion, from, to); | ||
| 3889 | if (current_buffer != prev) | ||
| 3890 | { | ||
| 3891 | len = ZV - BEGV; | ||
| 3892 | new = current_buffer; | ||
| 3893 | set_buffer_internal_1 (prev); | ||
| 3894 | del_range (from, to); | ||
| 3895 | insert_from_buffer (new, BEG, len, 0); | ||
| 3896 | to = from + len; | ||
| 3897 | to_byte = CHAR_TO_BYTE (to); | ||
| 3898 | len_byte = to_byte - from_byte; | ||
| 3899 | } | ||
| 3900 | } | ||
| 3901 | |||
| 3902 | /* Try to skip the heading and tailing ASCIIs. */ | ||
| 3903 | from_byte_orig = from_byte; to_byte_orig = to_byte; | ||
| 3904 | if (encodep) | ||
| 3905 | shrink_encoding_region (&from_byte, &to_byte, coding, NULL); | ||
| 3906 | else | ||
| 3907 | shrink_decoding_region (&from_byte, &to_byte, coding, NULL); | ||
| 3908 | if (from_byte == to_byte) | ||
| 3909 | return len; | ||
| 3910 | /* Here, the excluded region by shrinking contains only ASCIIs. */ | ||
| 3911 | from += (from_byte - from_byte_orig); | ||
| 3912 | to += (to_byte - to_byte_orig); | ||
| 3913 | len = to - from; | ||
| 3914 | len_byte = to_byte - from_byte; | ||
| 3915 | |||
| 3916 | /* For converion, we must put the gap before the text to be decoded | ||
| 3917 | in addition to make the gap larger for efficient decoding. The | ||
| 3918 | required gap size starts from 2000 which is the magic number used | ||
| 3919 | in make_gap. But, after one batch of conversion, it will be | ||
| 3920 | incremented if we find that it is not enough . */ | ||
| 3921 | require = 2000; | ||
| 3922 | |||
| 3923 | if (GAP_SIZE < require) | ||
| 3924 | make_gap (require - GAP_SIZE); | ||
| 3925 | move_gap_both (from, from_byte); | ||
| 3926 | |||
| 3927 | if (adjust) | ||
| 3928 | adjust_before_replace (from, from_byte, to, to_byte); | ||
| 3929 | |||
| 3930 | if (GPT - BEG < beg_unchanged) | ||
| 3931 | beg_unchanged = GPT - BEG; | ||
| 3932 | if (Z - GPT < end_unchanged) | ||
| 3933 | end_unchanged = Z - GPT; | ||
| 3934 | |||
| 3935 | inserted = inserted_byte = 0; | ||
| 3936 | for (;;) | ||
| 3937 | { | ||
| 3938 | int result, diff_char, diff_byte; | ||
| 3939 | |||
| 3940 | /* The buffer memory is changed from: | ||
| 3941 | +--------+converted-text+------------+-----original-text-----+---+ | ||
| 3942 | |<-from->|<--inserted-->|<-GAP_SIZE->|<---------len--------->|---| */ | ||
| 3943 | |||
| 3944 | if (encodep) | ||
| 3945 | result = encode_coding (coding, GAP_END_ADDR, GPT_ADDR, len_byte, 0); | ||
| 3946 | else | ||
| 3947 | result = decode_coding (coding, GAP_END_ADDR, GPT_ADDR, len_byte, 0); | ||
| 3948 | /* to: | ||
| 3949 | +--------+-------converted-text--------+--+---original-text--+---+ | ||
| 3950 | |<-from->|<----(inserted+produced)---->|--|<-(len-consumed)->|---| */ | ||
| 3951 | |||
| 3952 | diff_char = coding->produced_char - coding->consumed_char; | ||
| 3953 | diff_byte = coding->produced - coding->consumed; | ||
| 3954 | |||
| 3955 | GAP_SIZE -= diff_byte; | ||
| 3956 | ZV += diff_char; ZV_BYTE += diff_byte; | ||
| 3957 | Z += diff_char; Z_BYTE += diff_byte; | ||
| 3958 | GPT += coding->produced_char; GPT_BYTE += coding->produced; | ||
| 3959 | |||
| 3960 | inserted += coding->produced_char; | ||
| 3961 | inserted_byte += coding->produced; | ||
| 3962 | len -= coding->consumed_char; | ||
| 3963 | len_byte -= coding->consumed; | ||
| 3964 | |||
| 3965 | if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL) | ||
| 3966 | { | ||
| 3967 | unsigned char *p = GPT_ADDR - inserted_byte, *pend = GPT_ADDR; | ||
| 3968 | |||
| 3969 | /* Encode LFs back to the original eol format (CR or CRLF). */ | ||
| 3970 | if (coding->eol_type == CODING_EOL_CR) | ||
| 3971 | { | ||
| 3972 | while (p < pend) if (*p++ == '\n') p[-1] = '\r'; | ||
| 3973 | } | ||
| 3974 | else | ||
| 3975 | { | ||
| 3976 | unsigned char *p2 = p; | ||
| 3977 | int count = 0; | ||
| 3978 | |||
| 3979 | while (p2 < pend) if (*p2++ == '\n') count++; | ||
| 3980 | if (GAP_SIZE < count) | ||
| 3981 | make_gap (count - GAP_SIZE); | ||
| 3982 | p2 = GPT_ADDR + count; | ||
| 3983 | while (p < pend) | ||
| 3984 | { | ||
| 3985 | *--p2 = *--pend; | ||
| 3986 | if (*pend == '\n') *--p2 = '\r'; | ||
| 3987 | } | ||
| 3988 | GPT += count; GAP_SIZE -= count; ZV += count; Z += count; | ||
| 3989 | ZV_BYTE += count; Z_BYTE += count; | ||
| 3990 | coding->produced += count; | ||
| 3991 | coding->produced_char += count; | ||
| 3992 | inserted += count; | ||
| 3993 | inserted_byte += count; | ||
| 3994 | } | ||
| 3995 | |||
| 3996 | /* Suppress eol-format conversion in the further conversion. */ | ||
| 3997 | coding->eol_type = CODING_EOL_LF; | ||
| 3998 | |||
| 3999 | /* Restore the original symbol. */ | ||
| 4000 | coding->symbol = saved_coding_symbol; | ||
| 4001 | } | ||
| 4002 | if (len_byte <= 0) | ||
| 4003 | break; | ||
| 4004 | if (result == CODING_FINISH_INSUFFICIENT_SRC) | ||
| 4005 | { | ||
| 4006 | /* The source text ends in invalid codes. Let's just | ||
| 4007 | make them valid buffer contents, and finish conversion. */ | ||
| 4008 | inserted += len; | ||
| 4009 | inserted_byte += len_byte; | ||
| 4010 | break; | ||
| 4011 | } | ||
| 4012 | if (inserted == coding->produced_char) | ||
| 4013 | /* We have just done the first batch of conversion. Let's | ||
| 4014 | reconsider the required gap size now. | ||
| 4015 | |||
| 4016 | We have converted CONSUMED bytes into PRODUCED bytes. To | ||
| 4017 | convert the remaining LEN bytes, we may need REQUIRE bytes | ||
| 4018 | of gap, where: | ||
| 4019 | REQUIRE + LEN = (LEN * PRODUCED / CONSUMED) | ||
| 4020 | REQUIRE = LEN * (PRODUCED - CONSUMED) / CONSUMED | ||
| 4021 | = LEN * DIFF / CONSUMED | ||
| 4022 | Here, we are sure that DIFF is positive. */ | ||
| 4023 | require = len_byte * diff_byte / coding->consumed; | ||
| 4024 | if (GAP_SIZE < require) | ||
| 4025 | make_gap (require - GAP_SIZE); | ||
| 4026 | } | ||
| 4027 | if (GAP_SIZE > 0) *GPT_ADDR = 0; /* Put an anchor. */ | ||
| 4028 | |||
| 4029 | if (adjust) | ||
| 4030 | { | ||
| 4031 | adjust_after_replace (from, from_byte, to, to_byte, | ||
| 4032 | inserted, inserted_byte); | ||
| 4033 | |||
| 4034 | if (! encodep && ! NILP (coding->post_read_conversion)) | ||
| 4035 | { | ||
| 4036 | Lisp_Object val; | ||
| 4037 | int orig_inserted = inserted, pos = PT; | ||
| 4038 | |||
| 4039 | temp_set_point_both (current_buffer, from, from_byte); | ||
| 4040 | val = call1 (coding->post_read_conversion, make_number (inserted)); | ||
| 4041 | if (! NILP (val)) | ||
| 4042 | { | ||
| 4043 | CHECK_NUMBER (val, 0); | ||
| 4044 | inserted = XFASTINT (val); | ||
| 4045 | } | ||
| 4046 | if (pos >= from + orig_inserted) | ||
| 4047 | temp_set_point (current_buffer, pos + (inserted - orig_inserted)); | ||
| 4048 | } | ||
| 4049 | } | ||
| 4050 | |||
| 4051 | return ((from_byte - from_byte_orig) + inserted + (to_byte_orig - to_byte)); | ||
| 4052 | } | ||
| 4053 | |||
| 4054 | Lisp_Object | ||
| 4055 | code_convert_string (str, coding, encodep, nocopy) | ||
| 4056 | Lisp_Object str; | ||
| 4057 | struct coding_system *coding; | ||
| 4058 | int encodep, nocopy; | ||
| 4059 | { | ||
| 4060 | int len; | ||
| 4061 | char *buf; | ||
| 4062 | int from = 0, to = XSTRING (str)->size, to_byte = XSTRING (str)->size_byte; | ||
| 4063 | struct gcpro gcpro1; | ||
| 4064 | Lisp_Object saved_coding_symbol = Qnil; | ||
| 4065 | int result; | ||
| 4066 | |||
| 4067 | if (encodep && !NILP (coding->pre_write_conversion) | ||
| 4068 | || !encodep && !NILP (coding->post_read_conversion)) | ||
| 4069 | { | ||
| 4070 | /* Since we have to call Lisp functions which assume target text | ||
| 4071 | is in a buffer, after setting a temporary buffer, call | ||
| 4072 | code_convert_region. */ | ||
| 4073 | int count = specpdl_ptr - specpdl; | ||
| 4074 | struct buffer *prev = current_buffer; | ||
| 4075 | |||
| 4076 | record_unwind_protect (Fset_buffer, Fcurrent_buffer ()); | ||
| 4077 | temp_output_buffer_setup (" *code-converting-work*"); | ||
| 4078 | set_buffer_internal (XBUFFER (Vstandard_output)); | ||
| 4079 | if (encodep) | ||
| 4080 | insert_from_string (str, 0, 0, to, to_byte, 0); | ||
| 4081 | else | ||
| 4082 | { | ||
| 4083 | /* We must insert the contents of STR as is without | ||
| 4084 | unibyte<->multibyte conversion. */ | ||
| 4085 | current_buffer->enable_multibyte_characters = Qnil; | ||
| 4086 | insert_from_string (str, 0, 0, to_byte, to_byte, 0); | ||
| 4087 | current_buffer->enable_multibyte_characters = Qt; | ||
| 4088 | } | ||
| 4089 | code_convert_region (BEGV, ZV, coding, encodep, 1); | ||
| 4090 | if (encodep) | ||
| 4091 | /* We must return the buffer contents as unibyte string. */ | ||
| 4092 | current_buffer->enable_multibyte_characters = Qnil; | ||
| 4093 | str = make_buffer_string (BEGV, ZV, 0); | ||
| 4094 | set_buffer_internal (prev); | ||
| 4095 | return unbind_to (count, str); | ||
| 4096 | } | ||
| 4097 | |||
| 4098 | if (! encodep && CODING_REQUIRE_DETECTION (coding)) | ||
| 4099 | { | ||
| 4100 | /* See the comments in code_convert_region. */ | ||
| 4101 | if (coding->type == coding_type_undecided) | ||
| 4102 | { | ||
| 4103 | detect_coding (coding, XSTRING (str)->data, to_byte); | ||
| 4104 | if (coding->type == coding_type_undecided) | ||
| 4105 | coding->type = coding_type_emacs_mule; | ||
| 4106 | } | ||
| 4107 | if (coding->eol_type == CODING_EOL_UNDECIDED) | ||
| 4108 | { | ||
| 4109 | saved_coding_symbol = coding->symbol; | ||
| 4110 | detect_eol (coding, XSTRING (str)->data, to_byte); | ||
| 4111 | if (coding->eol_type == CODING_EOL_UNDECIDED) | ||
| 4112 | coding->eol_type = CODING_EOL_LF; | ||
| 4113 | /* We had better recover the original eol format if we | ||
| 4114 | encounter an inconsitent eol format while decoding. */ | ||
| 4115 | coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL; | ||
| 4116 | } | ||
| 4117 | } | ||
| 4118 | |||
| 4119 | if (encodep | ||
| 4120 | ? ! CODING_REQUIRE_ENCODING (coding) | ||
| 4121 | : ! CODING_REQUIRE_DECODING (coding)) | ||
| 4122 | from = to_byte; | ||
| 4123 | else | ||
| 4124 | { | ||
| 4125 | /* Try to skip the heading and tailing ASCIIs. */ | ||
| 4126 | if (encodep) | ||
| 4127 | shrink_encoding_region (&from, &to_byte, coding, XSTRING (str)->data); | ||
| 4128 | else | ||
| 4129 | shrink_decoding_region (&from, &to_byte, coding, XSTRING (str)->data); | ||
| 4130 | } | ||
| 4131 | if (from == to_byte) | ||
| 4132 | return (nocopy ? str : Fcopy_sequence (str)); | ||
| 4133 | |||
| 4134 | if (encodep) | ||
| 4135 | len = encoding_buffer_size (coding, to_byte - from); | ||
| 4136 | else | ||
| 4137 | len = decoding_buffer_size (coding, to_byte - from); | ||
| 4138 | len += from + XSTRING (str)->size_byte - to_byte; | ||
| 4139 | GCPRO1 (str); | ||
| 4140 | buf = get_conversion_buffer (len); | ||
| 4141 | UNGCPRO; | ||
| 4142 | |||
| 4143 | if (from > 0) | ||
| 4144 | bcopy (XSTRING (str)->data, buf, from); | ||
| 4145 | result = (encodep | ||
| 4146 | ? encode_coding (coding, XSTRING (str)->data + from, | ||
| 4147 | buf + from, to_byte - from, len) | ||
| 4148 | : decode_coding (coding, XSTRING (str)->data + from, | ||
| 4149 | buf + from, to - from, len)); | ||
| 4150 | if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL) | ||
| 4151 | { | ||
| 4152 | /* We simple try to decode the whole string again but without | ||
| 4153 | eol-conversion this time. */ | ||
| 4154 | coding->eol_type = CODING_EOL_LF; | ||
| 4155 | coding->symbol = saved_coding_symbol; | ||
| 4156 | return code_convert_string (str, coding, encodep, nocopy); | ||
| 4157 | } | ||
| 4158 | |||
| 4159 | bcopy (XSTRING (str)->data + to_byte, buf + from + coding->produced, | ||
| 4160 | XSTRING (str)->size_byte - to_byte); | ||
| 4161 | |||
| 4162 | len = from + XSTRING (str)->size_byte - to_byte; | ||
| 4163 | if (encodep) | ||
| 4164 | str = make_unibyte_string (buf, len + coding->produced); | ||
| 4165 | else | ||
| 4166 | str = make_multibyte_string (buf, len + coding->produced_char, | ||
| 4167 | len + coding->produced); | ||
| 4168 | return str; | ||
| 3120 | } | 4169 | } |
| 3121 | 4170 | ||
| 3122 | 4171 | ||
| @@ -3187,465 +4236,173 @@ The value of property should be a vector of length 5.") | |||
| 3187 | Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil)); | 4236 | Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil)); |
| 3188 | } | 4237 | } |
| 3189 | 4238 | ||
| 3190 | DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region, | 4239 | Lisp_Object |
| 3191 | 2, 2, 0, | 4240 | detect_coding_system (src, src_bytes, highest) |
| 3192 | "Detect coding system of the text in the region between START and END.\n\ | 4241 | unsigned char *src; |
| 3193 | Return a list of possible coding systems ordered by priority.\n\ | 4242 | int src_bytes, highest; |
| 3194 | If only ASCII characters are found, it returns `undecided'\n\ | ||
| 3195 | or its subsidiary coding system according to a detected end-of-line format.") | ||
| 3196 | (b, e) | ||
| 3197 | Lisp_Object b, e; | ||
| 3198 | { | 4243 | { |
| 3199 | int coding_mask, eol_type; | 4244 | int coding_mask, eol_type; |
| 3200 | Lisp_Object val; | 4245 | Lisp_Object val, tmp; |
| 3201 | int beg, end; | 4246 | int dummy; |
| 3202 | int beg_byte, end_byte; | ||
| 3203 | |||
| 3204 | validate_region (&b, &e); | ||
| 3205 | beg = XINT (b), end = XINT (e); | ||
| 3206 | beg_byte = CHAR_TO_BYTE (beg); | ||
| 3207 | end_byte = CHAR_TO_BYTE (end); | ||
| 3208 | 4247 | ||
| 3209 | if (beg < GPT && end >= GPT) | 4248 | coding_mask = detect_coding_mask (src, src_bytes, NULL, &dummy); |
| 3210 | move_gap_both (end, end_byte); | 4249 | eol_type = detect_eol_type (src, src_bytes, &dummy); |
| 3211 | 4250 | if (eol_type == CODING_EOL_INCONSISTENT) | |
| 3212 | coding_mask = detect_coding_mask (BYTE_POS_ADDR (beg_byte), | 4251 | eol_type == CODING_EOL_UNDECIDED; |
| 3213 | end_byte - beg_byte); | ||
| 3214 | eol_type = detect_eol_type (BYTE_POS_ADDR (beg_byte), end_byte - beg_byte); | ||
| 3215 | 4252 | ||
| 3216 | if (coding_mask == CODING_CATEGORY_MASK_ANY) | 4253 | if (!coding_mask) |
| 3217 | { | 4254 | { |
| 3218 | val = Qundecided; | 4255 | val = Qundecided; |
| 3219 | if (eol_type != CODING_EOL_UNDECIDED | 4256 | if (eol_type != CODING_EOL_UNDECIDED) |
| 3220 | && eol_type != CODING_EOL_INCONSISTENT) | ||
| 3221 | { | 4257 | { |
| 3222 | Lisp_Object val2; | 4258 | Lisp_Object val2; |
| 3223 | val2 = Fget (Qundecided, Qeol_type); | 4259 | val2 = Fget (Qundecided, Qeol_type); |
| 3224 | if (VECTORP (val2)) | 4260 | if (VECTORP (val2)) |
| 3225 | val = XVECTOR (val2)->contents[eol_type]; | 4261 | val = XVECTOR (val2)->contents[eol_type]; |
| 3226 | } | 4262 | } |
| 4263 | return val; | ||
| 3227 | } | 4264 | } |
| 3228 | else | ||
| 3229 | { | ||
| 3230 | Lisp_Object val2; | ||
| 3231 | |||
| 3232 | /* At first, gather possible coding-systems in VAL in a reverse | ||
| 3233 | order. */ | ||
| 3234 | val = Qnil; | ||
| 3235 | for (val2 = Vcoding_category_list; | ||
| 3236 | !NILP (val2); | ||
| 3237 | val2 = XCONS (val2)->cdr) | ||
| 3238 | { | ||
| 3239 | int idx | ||
| 3240 | = XFASTINT (Fget (XCONS (val2)->car, Qcoding_category_index)); | ||
| 3241 | if (coding_mask & (1 << idx)) | ||
| 3242 | { | ||
| 3243 | #if 0 | ||
| 3244 | /* This code is suppressed until we find a better way to | ||
| 3245 | distinguish raw text file and binary file. */ | ||
| 3246 | 4265 | ||
| 3247 | if (idx == CODING_CATEGORY_IDX_RAW_TEXT | 4266 | /* At first, gather possible coding systems in VAL. */ |
| 3248 | && eol_type == CODING_EOL_INCONSISTENT) | 4267 | val = Qnil; |
| 3249 | val = Fcons (Qno_conversion, val); | 4268 | for (tmp = Vcoding_category_list; !NILP (tmp); tmp = XCONS (tmp)->cdr) |
| 3250 | else | ||
| 3251 | #endif /* 0 */ | ||
| 3252 | val = Fcons (Fsymbol_value (XCONS (val2)->car), val); | ||
| 3253 | } | ||
| 3254 | } | ||
| 3255 | |||
| 3256 | /* Then, change the order of the list, while getting subsidiary | ||
| 3257 | coding-systems. */ | ||
| 3258 | val2 = val; | ||
| 3259 | val = Qnil; | ||
| 3260 | if (eol_type == CODING_EOL_INCONSISTENT) | ||
| 3261 | eol_type == CODING_EOL_UNDECIDED; | ||
| 3262 | for (; !NILP (val2); val2 = XCONS (val2)->cdr) | ||
| 3263 | { | ||
| 3264 | if (eol_type == CODING_EOL_UNDECIDED) | ||
| 3265 | val = Fcons (XCONS (val2)->car, val); | ||
| 3266 | else | ||
| 3267 | { | ||
| 3268 | Lisp_Object val3; | ||
| 3269 | val3 = Fget (XCONS (val2)->car, Qeol_type); | ||
| 3270 | if (VECTORP (val3)) | ||
| 3271 | val = Fcons (XVECTOR (val3)->contents[eol_type], val); | ||
| 3272 | else | ||
| 3273 | val = Fcons (XCONS (val2)->car, val); | ||
| 3274 | } | ||
| 3275 | } | ||
| 3276 | } | ||
| 3277 | |||
| 3278 | return val; | ||
| 3279 | } | ||
| 3280 | |||
| 3281 | /* Scan text in the region between *BEGP and *ENDP, skip characters | ||
| 3282 | which we never have to encode to (iff ENCODEP is 1) or decode from | ||
| 3283 | coding system CODING at the head and tail, then set BEGP and ENDP | ||
| 3284 | to the addresses of start and end of the text we actually convert. */ | ||
| 3285 | |||
| 3286 | void | ||
| 3287 | shrink_conversion_area (begp, endp, coding, encodep) | ||
| 3288 | unsigned char **begp, **endp; | ||
| 3289 | struct coding_system *coding; | ||
| 3290 | int encodep; | ||
| 3291 | { | ||
| 3292 | register unsigned char *beg_addr = *begp, *end_addr = *endp; | ||
| 3293 | |||
| 3294 | if (coding->eol_type != CODING_EOL_LF | ||
| 3295 | && coding->eol_type != CODING_EOL_UNDECIDED) | ||
| 3296 | /* Since we anyway have to convert end-of-line format, it is not | ||
| 3297 | worth skipping at most 100 bytes or so. */ | ||
| 3298 | return; | ||
| 3299 | |||
| 3300 | if (encodep) /* for encoding */ | ||
| 3301 | { | ||
| 3302 | switch (coding->type) | ||
| 3303 | { | ||
| 3304 | case coding_type_no_conversion: | ||
| 3305 | case coding_type_emacs_mule: | ||
| 3306 | case coding_type_undecided: | ||
| 3307 | case coding_type_raw_text: | ||
| 3308 | /* We need no conversion. */ | ||
| 3309 | *begp = *endp; | ||
| 3310 | return; | ||
| 3311 | case coding_type_ccl: | ||
| 3312 | /* We can't skip any data. */ | ||
| 3313 | return; | ||
| 3314 | case coding_type_iso2022: | ||
| 3315 | if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL) | ||
| 3316 | { | ||
| 3317 | unsigned char *bol = beg_addr; | ||
| 3318 | while (beg_addr < end_addr && *beg_addr < 0x80) | ||
| 3319 | { | ||
| 3320 | beg_addr++; | ||
| 3321 | if (*(beg_addr - 1) == '\n') | ||
| 3322 | bol = beg_addr; | ||
| 3323 | } | ||
| 3324 | beg_addr = bol; | ||
| 3325 | goto label_skip_tail; | ||
| 3326 | } | ||
| 3327 | /* fall down ... */ | ||
| 3328 | default: | ||
| 3329 | /* We can skip all ASCII characters at the head and tail. */ | ||
| 3330 | while (beg_addr < end_addr && *beg_addr < 0x80) beg_addr++; | ||
| 3331 | label_skip_tail: | ||
| 3332 | while (beg_addr < end_addr && *(end_addr - 1) < 0x80) end_addr--; | ||
| 3333 | break; | ||
| 3334 | } | ||
| 3335 | } | ||
| 3336 | else /* for decoding */ | ||
| 3337 | { | 4269 | { |
| 3338 | switch (coding->type) | 4270 | int idx |
| 4271 | = XFASTINT (Fget (XCONS (tmp)->car, Qcoding_category_index)); | ||
| 4272 | if (coding_mask & (1 << idx)) | ||
| 3339 | { | 4273 | { |
| 3340 | case coding_type_no_conversion: | 4274 | val = Fcons (Fsymbol_value (XCONS (tmp)->car), val); |
| 3341 | /* We need no conversion. */ | 4275 | if (highest) |
| 3342 | *begp = *endp; | 4276 | break; |
| 3343 | return; | ||
| 3344 | case coding_type_emacs_mule: | ||
| 3345 | case coding_type_raw_text: | ||
| 3346 | if (coding->eol_type == CODING_EOL_LF) | ||
| 3347 | { | ||
| 3348 | /* We need no conversion. */ | ||
| 3349 | *begp = *endp; | ||
| 3350 | return; | ||
| 3351 | } | ||
| 3352 | /* We can skip all but carriage-return. */ | ||
| 3353 | while (beg_addr < end_addr && *beg_addr != '\r') beg_addr++; | ||
| 3354 | while (beg_addr < end_addr && *(end_addr - 1) != '\r') end_addr--; | ||
| 3355 | break; | ||
| 3356 | case coding_type_sjis: | ||
| 3357 | case coding_type_big5: | ||
| 3358 | /* We can skip all ASCII characters at the head. */ | ||
| 3359 | while (beg_addr < end_addr && *beg_addr < 0x80) beg_addr++; | ||
| 3360 | /* We can skip all ASCII characters at the tail except for | ||
| 3361 | the second byte of SJIS or BIG5 code. */ | ||
| 3362 | while (beg_addr < end_addr && *(end_addr - 1) < 0x80) end_addr--; | ||
| 3363 | if (end_addr != *endp) | ||
| 3364 | end_addr++; | ||
| 3365 | break; | ||
| 3366 | case coding_type_ccl: | ||
| 3367 | /* We can't skip any data. */ | ||
| 3368 | return; | ||
| 3369 | default: /* i.e. case coding_type_iso2022: */ | ||
| 3370 | { | ||
| 3371 | unsigned char c; | ||
| 3372 | |||
| 3373 | /* We can skip all ASCII characters except for a few | ||
| 3374 | control codes at the head. */ | ||
| 3375 | while (beg_addr < end_addr && (c = *beg_addr) < 0x80 | ||
| 3376 | && c != ISO_CODE_CR && c != ISO_CODE_SO | ||
| 3377 | && c != ISO_CODE_SI && c != ISO_CODE_ESC) | ||
| 3378 | beg_addr++; | ||
| 3379 | } | ||
| 3380 | break; | ||
| 3381 | } | 4277 | } |
| 3382 | } | 4278 | } |
| 3383 | *begp = beg_addr; | 4279 | if (!highest) |
| 3384 | *endp = end_addr; | 4280 | val = Fnreverse (val); |
| 3385 | return; | ||
| 3386 | } | ||
| 3387 | |||
| 3388 | /* Encode into or decode from (according to ENCODEP) coding system CODING | ||
| 3389 | the text between char positions B and E. */ | ||
| 3390 | |||
| 3391 | Lisp_Object | ||
| 3392 | code_convert_region (b, e, coding, encodep) | ||
| 3393 | Lisp_Object b, e; | ||
| 3394 | struct coding_system *coding; | ||
| 3395 | int encodep; | ||
| 3396 | { | ||
| 3397 | int beg, end, len, consumed, produced; | ||
| 3398 | char *buf; | ||
| 3399 | unsigned char *begp, *endp; | ||
| 3400 | int opoint = PT, opoint_byte = PT_BYTE; | ||
| 3401 | int beg_byte, end_byte, len_byte; | ||
| 3402 | int zv_before = ZV; | ||
| 3403 | int zv_byte_before = ZV_BYTE; | ||
| 3404 | |||
| 3405 | validate_region (&b, &e); | ||
| 3406 | beg = XINT (b), end = XINT (e); | ||
| 3407 | beg_byte = CHAR_TO_BYTE (beg); | ||
| 3408 | end_byte = CHAR_TO_BYTE (end); | ||
| 3409 | |||
| 3410 | if (beg < GPT && end >= GPT) | ||
| 3411 | move_gap_both (end, end_byte); | ||
| 3412 | 4281 | ||
| 3413 | if (encodep && !NILP (coding->pre_write_conversion)) | 4282 | /* Then, substitute the elements by subsidiary coding systems. */ |
| 4283 | for (tmp = val; !NILP (tmp); tmp = XCONS (tmp)->cdr) | ||
| 3414 | { | 4284 | { |
| 3415 | /* We must call a pre-conversion function which may put a new | 4285 | if (eol_type != CODING_EOL_UNDECIDED) |
| 3416 | text to be converted in a new buffer. */ | ||
| 3417 | struct buffer *old = current_buffer, *new; | ||
| 3418 | |||
| 3419 | TEMP_SET_PT_BOTH (beg, beg_byte); | ||
| 3420 | call2 (coding->pre_write_conversion, b, e); | ||
| 3421 | if (old != current_buffer) | ||
| 3422 | { | 4286 | { |
| 3423 | /* Replace the original text by the text just generated. */ | 4287 | Lisp_Object eol; |
| 3424 | len = ZV - BEGV; | 4288 | eol = Fget (XCONS (tmp)->car, Qeol_type); |
| 3425 | len_byte = ZV_BYTE - BEGV_BYTE; | 4289 | if (VECTORP (eol)) |
| 3426 | new = current_buffer; | 4290 | XCONS (tmp)->car = XVECTOR (eol)->contents[eol_type]; |
| 3427 | set_buffer_internal (old); | ||
| 3428 | del_range_both (beg, end, beg_byte, end_byte, 1); | ||
| 3429 | insert_from_buffer (new, 1, len, 0); | ||
| 3430 | end = beg + len; | ||
| 3431 | end_byte = len_byte; | ||
| 3432 | } | 4291 | } |
| 3433 | } | 4292 | } |
| 4293 | return (highest ? XCONS (val)->car : val); | ||
| 4294 | } | ||
| 3434 | 4295 | ||
| 3435 | /* We may be able to shrink the conversion region. */ | 4296 | DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region, |
| 3436 | begp = BYTE_POS_ADDR (beg_byte); | 4297 | 2, 3, 0, |
| 3437 | endp = begp + (end_byte - beg_byte); | 4298 | "Detect coding system of the text in the region between START and END.\n\ |
| 3438 | shrink_conversion_area (&begp, &endp, coding, encodep); | 4299 | Return a list of possible coding systems ordered by priority.\n\ |
| 3439 | 4300 | \n\ | |
| 3440 | if (begp == endp) | 4301 | If only ASCII characters are found, it returns `undecided'\n\ |
| 3441 | /* We need no conversion. */ | 4302 | or its subsidiary coding system according to a detected end-of-line format.\n\ |
| 3442 | len = end - beg; | 4303 | \n\ |
| 3443 | else | 4304 | If optional argument HIGHEST is non-nil, return the coding system of\n\ |
| 3444 | { | 4305 | highest priority.") |
| 3445 | int shrunk_beg_byte, shrunk_end_byte; | 4306 | (start, end, highest) |
| 3446 | int shrunk_beg; | 4307 | Lisp_Object start, end, highest; |
| 3447 | int shrunk_len_byte; | 4308 | { |
| 3448 | int new_len_byte; | 4309 | int from, to; |
| 3449 | int buflen; | 4310 | int from_byte, to_byte; |
| 3450 | |||
| 3451 | shrunk_beg_byte = PTR_BYTE_POS (begp); | ||
| 3452 | shrunk_beg = BYTE_TO_CHAR (shrunk_beg_byte); | ||
| 3453 | shrunk_end_byte = PTR_BYTE_POS (endp); | ||
| 3454 | shrunk_len_byte = shrunk_end_byte - shrunk_beg_byte; | ||
| 3455 | |||
| 3456 | if (encodep) | ||
| 3457 | buflen = encoding_buffer_size (coding, shrunk_len_byte); | ||
| 3458 | else | ||
| 3459 | buflen = decoding_buffer_size (coding, shrunk_len_byte); | ||
| 3460 | buf = get_conversion_buffer (buflen); | ||
| 3461 | |||
| 3462 | coding->last_block = 1; | ||
| 3463 | produced = (encodep | ||
| 3464 | ? encode_coding (coding, begp, buf, shrunk_len_byte, buflen, | ||
| 3465 | &consumed) | ||
| 3466 | : decode_coding (coding, begp, buf, shrunk_len_byte, buflen, | ||
| 3467 | &consumed)); | ||
| 3468 | |||
| 3469 | TEMP_SET_PT_BOTH (shrunk_beg, shrunk_beg_byte); | ||
| 3470 | |||
| 3471 | /* We let the number of characters in the result | ||
| 3472 | be computed in accord with enable-multilibyte-characters | ||
| 3473 | even when encoding. Otherwise the buffer contents | ||
| 3474 | will be inconsistent. */ | ||
| 3475 | insert (buf, produced); | ||
| 3476 | |||
| 3477 | del_range_byte (PT_BYTE, PT_BYTE + shrunk_len_byte, 1); | ||
| 3478 | |||
| 3479 | if (opoint >= end) | ||
| 3480 | { | ||
| 3481 | opoint += ZV - zv_before; | ||
| 3482 | opoint_byte += ZV_BYTE - zv_byte_before; | ||
| 3483 | } | ||
| 3484 | else if (opoint > beg) | ||
| 3485 | { | ||
| 3486 | opoint = beg; | ||
| 3487 | opoint_byte = beg_byte; | ||
| 3488 | } | ||
| 3489 | TEMP_SET_PT_BOTH (opoint, opoint_byte); | ||
| 3490 | 4311 | ||
| 3491 | end += ZV - zv_before; | 4312 | CHECK_NUMBER_COERCE_MARKER (start, 0); |
| 3492 | } | 4313 | CHECK_NUMBER_COERCE_MARKER (end, 1); |
| 3493 | 4314 | ||
| 3494 | if (!encodep && !NILP (coding->post_read_conversion)) | 4315 | validate_region (&start, &end); |
| 3495 | { | 4316 | from = XINT (start), to = XINT (end); |
| 3496 | Lisp_Object insval; | 4317 | from_byte = CHAR_TO_BYTE (from); |
| 4318 | to_byte = CHAR_TO_BYTE (to); | ||
| 3497 | 4319 | ||
| 3498 | /* We must call a post-conversion function which may alter | 4320 | if (from < GPT && to >= GPT) |
| 3499 | the text just converted. */ | 4321 | move_gap_both (to, to_byte); |
| 3500 | zv_before = ZV; | ||
| 3501 | zv_byte_before = ZV_BYTE; | ||
| 3502 | 4322 | ||
| 3503 | TEMP_SET_PT_BOTH (beg, beg_byte); | 4323 | return detect_coding_system (BYTE_POS_ADDR (from_byte), |
| 3504 | insval = call1 (coding->post_read_conversion, make_number (end - beg)); | 4324 | to_byte - from_byte, |
| 3505 | CHECK_NUMBER (insval, 0); | 4325 | !NILP (highest)); |
| 4326 | } | ||
| 3506 | 4327 | ||
| 3507 | if (opoint >= beg + ZV - zv_before) | 4328 | DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string, |
| 3508 | { | 4329 | 1, 2, 0, |
| 3509 | opoint += ZV - zv_before; | 4330 | "Detect coding system of the text in STRING.\n\ |
| 3510 | opoint_byte += ZV_BYTE - zv_byte_before; | 4331 | Return a list of possible coding systems ordered by priority.\n\ |
| 3511 | } | 4332 | \n\ |
| 3512 | else if (opoint > beg) | 4333 | If only ASCII characters are found, it returns `undecided'\n\ |
| 3513 | { | 4334 | or its subsidiary coding system according to a detected end-of-line format.\n\ |
| 3514 | opoint = beg; | 4335 | \n\ |
| 3515 | opoint_byte = beg_byte; | 4336 | If optional argument HIGHEST is non-nil, return the coding system of\n\ |
| 3516 | } | 4337 | highest priority.") |
| 3517 | TEMP_SET_PT_BOTH (opoint, opoint_byte); | 4338 | (string, highest) |
| 3518 | len = XINT (insval); | 4339 | Lisp_Object string, highest; |
| 3519 | } | 4340 | { |
| 4341 | CHECK_STRING (string, 0); | ||
| 3520 | 4342 | ||
| 3521 | return make_number (len); | 4343 | return detect_coding_system (XSTRING (string)->data, |
| 4344 | XSTRING (string)->size_byte, | ||
| 4345 | !NILP (highest)); | ||
| 3522 | } | 4346 | } |
| 3523 | 4347 | ||
| 3524 | DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, | 4348 | DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, |
| 3525 | 3, 3, "r\nzCoding system: ", | 4349 | 3, 3, "r\nzCoding system: ", |
| 3526 | "Decode current region by specified coding system.\n\ | 4350 | "Decode the current region by specified coding system.\n\ |
| 3527 | When called from a program, takes three arguments:\n\ | 4351 | When called from a program, takes three arguments:\n\ |
| 3528 | START, END, and CODING-SYSTEM. START END are buffer positions.\n\ | 4352 | START, END, and CODING-SYSTEM. START and END are buffer positions.\n\ |
| 3529 | Return length of decoded text.") | 4353 | Return length of decoded text.") |
| 3530 | (b, e, coding_system) | 4354 | (start, end, coding_system) |
| 3531 | Lisp_Object b, e, coding_system; | 4355 | Lisp_Object start, end, coding_system; |
| 3532 | { | 4356 | { |
| 3533 | struct coding_system coding; | 4357 | struct coding_system coding; |
| 4358 | int from, to; | ||
| 3534 | 4359 | ||
| 3535 | CHECK_NUMBER_COERCE_MARKER (b, 0); | 4360 | CHECK_NUMBER_COERCE_MARKER (start, 0); |
| 3536 | CHECK_NUMBER_COERCE_MARKER (e, 1); | 4361 | CHECK_NUMBER_COERCE_MARKER (end, 1); |
| 3537 | CHECK_SYMBOL (coding_system, 2); | 4362 | CHECK_SYMBOL (coding_system, 2); |
| 3538 | 4363 | ||
| 4364 | validate_region (&start, &end); | ||
| 4365 | from = XFASTINT (start); | ||
| 4366 | to = XFASTINT (end); | ||
| 4367 | |||
| 3539 | if (NILP (coding_system)) | 4368 | if (NILP (coding_system)) |
| 3540 | return make_number (XFASTINT (e) - XFASTINT (b)); | 4369 | return make_number (to - from); |
| 4370 | |||
| 3541 | if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) | 4371 | if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) |
| 3542 | error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); | 4372 | error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data); |
| 3543 | 4373 | ||
| 3544 | return code_convert_region (b, e, &coding, 0); | 4374 | coding.mode |= CODING_MODE_LAST_BLOCK; |
| 4375 | return code_convert_region (from, to, &coding, 0, 1); | ||
| 3545 | } | 4376 | } |
| 3546 | 4377 | ||
| 3547 | DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region, | 4378 | DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region, |
| 3548 | 3, 3, "r\nzCoding system: ", | 4379 | 3, 3, "r\nzCoding system: ", |
| 3549 | "Encode current region by specified coding system.\n\ | 4380 | "Encode the current region by specified coding system.\n\ |
| 3550 | When called from a program, takes three arguments:\n\ | 4381 | When called from a program, takes three arguments:\n\ |
| 3551 | START, END, and CODING-SYSTEM. START END are buffer positions.\n\ | 4382 | START, END, and CODING-SYSTEM. START and END are buffer positions.\n\ |
| 3552 | Return length of encoded text.") | 4383 | Return length of encoded text.") |
| 3553 | (b, e, coding_system) | 4384 | (start, end, coding_system) |
| 3554 | Lisp_Object b, e, coding_system; | 4385 | Lisp_Object start, end, coding_system; |
| 3555 | { | 4386 | { |
| 3556 | struct coding_system coding; | 4387 | struct coding_system coding; |
| 4388 | int from, to; | ||
| 3557 | 4389 | ||
| 3558 | CHECK_NUMBER_COERCE_MARKER (b, 0); | 4390 | CHECK_NUMBER_COERCE_MARKER (start, 0); |
| 3559 | CHECK_NUMBER_COERCE_MARKER (e, 1); | 4391 | CHECK_NUMBER_COERCE_MARKER (end, 1); |
| 3560 | CHECK_SYMBOL (coding_system, 2); | 4392 | CHECK_SYMBOL (coding_system, 2); |
| 3561 | 4393 | ||
| 3562 | if (NILP (coding_system)) | 4394 | validate_region (&start, &end); |
| 3563 | return make_number (XFASTINT (e) - XFASTINT (b)); | 4395 | from = XFASTINT (start); |
| 3564 | if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) | 4396 | to = XFASTINT (end); |
| 3565 | error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); | ||
| 3566 | 4397 | ||
| 3567 | return code_convert_region (b, e, &coding, 1); | 4398 | if (NILP (coding_system)) |
| 3568 | } | 4399 | return make_number (to - from); |
| 3569 | |||
| 3570 | /* Encode or decode (according to ENCODEP) the text of string STR | ||
| 3571 | using coding CODING. If NOCOPY is nil, we never return STR | ||
| 3572 | itself, but always a copy. If NOCOPY is non-nil, we return STR | ||
| 3573 | if no change is needed. */ | ||
| 3574 | |||
| 3575 | Lisp_Object | ||
| 3576 | code_convert_string (str, coding, encodep, nocopy) | ||
| 3577 | Lisp_Object str, nocopy; | ||
| 3578 | struct coding_system *coding; | ||
| 3579 | int encodep; | ||
| 3580 | { | ||
| 3581 | int len, consumed, produced; | ||
| 3582 | char *buf; | ||
| 3583 | unsigned char *begp, *endp; | ||
| 3584 | int head_skip, tail_skip; | ||
| 3585 | struct gcpro gcpro1; | ||
| 3586 | |||
| 3587 | if (encodep && !NILP (coding->pre_write_conversion) | ||
| 3588 | || !encodep && !NILP (coding->post_read_conversion)) | ||
| 3589 | { | ||
| 3590 | /* Since we have to call Lisp functions which assume target text | ||
| 3591 | is in a buffer, after setting a temporary buffer, call | ||
| 3592 | code_convert_region. */ | ||
| 3593 | int count = specpdl_ptr - specpdl; | ||
| 3594 | int len = XSTRING (str)->size_byte; | ||
| 3595 | Lisp_Object result; | ||
| 3596 | struct buffer *old = current_buffer; | ||
| 3597 | |||
| 3598 | record_unwind_protect (Fset_buffer, Fcurrent_buffer ()); | ||
| 3599 | temp_output_buffer_setup (" *code-converting-work*"); | ||
| 3600 | set_buffer_internal (XBUFFER (Vstandard_output)); | ||
| 3601 | insert_from_string (str, 0, 0, XSTRING (str)->size, len, 0); | ||
| 3602 | code_convert_region (make_number (BEGV), make_number (ZV), | ||
| 3603 | coding, encodep); | ||
| 3604 | result = make_buffer_string (BEGV, ZV, 0); | ||
| 3605 | set_buffer_internal (old); | ||
| 3606 | return unbind_to (count, result); | ||
| 3607 | } | ||
| 3608 | |||
| 3609 | /* We may be able to shrink the conversion region. */ | ||
| 3610 | begp = XSTRING (str)->data; | ||
| 3611 | endp = begp + XSTRING (str)->size_byte; | ||
| 3612 | shrink_conversion_area (&begp, &endp, coding, encodep); | ||
| 3613 | |||
| 3614 | if (begp == endp) | ||
| 3615 | /* We need no conversion. */ | ||
| 3616 | return (NILP (nocopy) ? Fcopy_sequence (str) : str); | ||
| 3617 | |||
| 3618 | /* We assume that head_skip and tail_skip count single-byte characters. */ | ||
| 3619 | head_skip = begp - XSTRING (str)->data; | ||
| 3620 | tail_skip = XSTRING (str)->size_byte - head_skip - (endp - begp); | ||
| 3621 | |||
| 3622 | GCPRO1 (str); | ||
| 3623 | |||
| 3624 | if (encodep) | ||
| 3625 | len = encoding_buffer_size (coding, endp - begp); | ||
| 3626 | else | ||
| 3627 | len = decoding_buffer_size (coding, endp - begp); | ||
| 3628 | buf = get_conversion_buffer (len + head_skip + tail_skip); | ||
| 3629 | |||
| 3630 | bcopy (XSTRING (str)->data, buf, head_skip); | ||
| 3631 | coding->last_block = 1; | ||
| 3632 | produced = (encodep | ||
| 3633 | ? encode_coding (coding, XSTRING (str)->data + head_skip, | ||
| 3634 | buf + head_skip, endp - begp, len, &consumed) | ||
| 3635 | : decode_coding (coding, XSTRING (str)->data + head_skip, | ||
| 3636 | buf + head_skip, endp - begp, len, &consumed)); | ||
| 3637 | bcopy (XSTRING (str)->data + head_skip + (endp - begp), | ||
| 3638 | buf + head_skip + produced, | ||
| 3639 | tail_skip); | ||
| 3640 | |||
| 3641 | UNGCPRO; | ||
| 3642 | 4400 | ||
| 3643 | if (encodep) | 4401 | if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) |
| 3644 | /* When encoding, the result is all single-byte characters. */ | 4402 | error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data); |
| 3645 | return make_unibyte_string (buf, head_skip + produced + tail_skip); | ||
| 3646 | 4403 | ||
| 3647 | /* When decoding, count properly the number of chars in the string. */ | 4404 | coding.mode |= CODING_MODE_LAST_BLOCK; |
| 3648 | return make_string (buf, head_skip + produced + tail_skip); | 4405 | return code_convert_region (from, to, &coding, 1, 1); |
| 3649 | } | 4406 | } |
| 3650 | 4407 | ||
| 3651 | DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string, | 4408 | DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string, |
| @@ -3663,10 +4420,12 @@ if the decoding operation is trivial.") | |||
| 3663 | 4420 | ||
| 3664 | if (NILP (coding_system)) | 4421 | if (NILP (coding_system)) |
| 3665 | return (NILP (nocopy) ? Fcopy_sequence (string) : string); | 4422 | return (NILP (nocopy) ? Fcopy_sequence (string) : string); |
| 4423 | |||
| 3666 | if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) | 4424 | if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) |
| 3667 | error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); | 4425 | error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data); |
| 3668 | 4426 | ||
| 3669 | return code_convert_string (string, &coding, 0, nocopy); | 4427 | coding.mode |= CODING_MODE_LAST_BLOCK; |
| 4428 | return code_convert_string (string, &coding, 0, !NILP (nocopy)); | ||
| 3670 | } | 4429 | } |
| 3671 | 4430 | ||
| 3672 | DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string, | 4431 | DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string, |
| @@ -3684,10 +4443,12 @@ if the encoding operation is trivial.") | |||
| 3684 | 4443 | ||
| 3685 | if (NILP (coding_system)) | 4444 | if (NILP (coding_system)) |
| 3686 | return (NILP (nocopy) ? Fcopy_sequence (string) : string); | 4445 | return (NILP (nocopy) ? Fcopy_sequence (string) : string); |
| 4446 | |||
| 3687 | if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) | 4447 | if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) |
| 3688 | error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); | 4448 | error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data); |
| 3689 | 4449 | ||
| 3690 | return code_convert_string (string, &coding, 1, nocopy); | 4450 | coding.mode |= CODING_MODE_LAST_BLOCK; |
| 4451 | return code_convert_string (string, &coding, 1, !NILP (nocopy)); | ||
| 3691 | } | 4452 | } |
| 3692 | 4453 | ||
| 3693 | DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0, | 4454 | DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0, |
| @@ -3708,7 +4469,7 @@ Return the corresponding character.") | |||
| 3708 | } | 4469 | } |
| 3709 | 4470 | ||
| 3710 | DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0, | 4471 | DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0, |
| 3711 | "Encode a JISX0208 character CHAR to SJIS coding-system.\n\ | 4472 | "Encode a JISX0208 character CHAR to SJIS coding system.\n\ |
| 3712 | Return the corresponding character code in SJIS.") | 4473 | Return the corresponding character code in SJIS.") |
| 3713 | (ch) | 4474 | (ch) |
| 3714 | Lisp_Object ch; | 4475 | Lisp_Object ch; |
| @@ -3729,7 +4490,7 @@ Return the corresponding character code in SJIS.") | |||
| 3729 | } | 4490 | } |
| 3730 | 4491 | ||
| 3731 | DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0, | 4492 | DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0, |
| 3732 | "Decode a Big5 character CODE of BIG5 coding-system.\n\ | 4493 | "Decode a Big5 character CODE of BIG5 coding system.\n\ |
| 3733 | CODE is the character code in BIG5.\n\ | 4494 | CODE is the character code in BIG5.\n\ |
| 3734 | Return the corresponding character.") | 4495 | Return the corresponding character.") |
| 3735 | (code) | 4496 | (code) |
| @@ -3747,7 +4508,7 @@ Return the corresponding character.") | |||
| 3747 | } | 4508 | } |
| 3748 | 4509 | ||
| 3749 | DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0, | 4510 | DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0, |
| 3750 | "Encode the Big5 character CHAR to BIG5 coding-system.\n\ | 4511 | "Encode the Big5 character CHAR to BIG5 coding system.\n\ |
| 3751 | Return the corresponding character code in Big5.") | 4512 | Return the corresponding character code in Big5.") |
| 3752 | (ch) | 4513 | (ch) |
| 3753 | Lisp_Object ch; | 4514 | Lisp_Object ch; |
| @@ -3915,6 +4676,31 @@ which is a list of all the arguments given to this function.") | |||
| 3915 | return Qnil; | 4676 | return Qnil; |
| 3916 | } | 4677 | } |
| 3917 | 4678 | ||
| 4679 | DEFUN ("update-iso-coding-systems", Fupdate_iso_coding_systems, | ||
| 4680 | Supdate_iso_coding_systems, 0, 0, 0, | ||
| 4681 | "Update internal database for ISO2022 based coding systems.\n\ | ||
| 4682 | When values of the following coding categories are changed, you must\n\ | ||
| 4683 | call this function:\n\ | ||
| 4684 | coding-category-iso-7, coding-category-iso-7-tight,\n\ | ||
| 4685 | coding-category-iso-8-1, coding-category-iso-8-2,\n\ | ||
| 4686 | coding-category-iso-7-else, coding-category-iso-8-else") | ||
| 4687 | () | ||
| 4688 | { | ||
| 4689 | int i; | ||
| 4690 | |||
| 4691 | for (i = CODING_CATEGORY_IDX_ISO_7; i <= CODING_CATEGORY_IDX_ISO_8_ELSE; | ||
| 4692 | i++) | ||
| 4693 | { | ||
| 4694 | if (! coding_system_table[i]) | ||
| 4695 | coding_system_table[i] | ||
| 4696 | = (struct coding_system *) xmalloc (sizeof (struct coding_system)); | ||
| 4697 | setup_coding_system | ||
| 4698 | (XSYMBOL (XVECTOR (Vcoding_category_table)->contents[i])->value, | ||
| 4699 | coding_system_table[i]); | ||
| 4700 | } | ||
| 4701 | return Qnil; | ||
| 4702 | } | ||
| 4703 | |||
| 3918 | #endif /* emacs */ | 4704 | #endif /* emacs */ |
| 3919 | 4705 | ||
| 3920 | 4706 | ||
| @@ -3967,6 +4753,8 @@ init_coding_once () | |||
| 3967 | setup_coding_system (Qnil, &terminal_coding); | 4753 | setup_coding_system (Qnil, &terminal_coding); |
| 3968 | setup_coding_system (Qnil, &safe_terminal_coding); | 4754 | setup_coding_system (Qnil, &safe_terminal_coding); |
| 3969 | 4755 | ||
| 4756 | bzero (coding_system_table, sizeof coding_system_table); | ||
| 4757 | |||
| 3970 | #if defined (MSDOS) || defined (WINDOWSNT) | 4758 | #if defined (MSDOS) || defined (WINDOWSNT) |
| 3971 | system_eol_type = CODING_EOL_CRLF; | 4759 | system_eol_type = CODING_EOL_CRLF; |
| 3972 | #else | 4760 | #else |
| @@ -4042,17 +4830,22 @@ syms_of_coding () | |||
| 4042 | Fput (Qcoding_system_error, Qerror_message, | 4830 | Fput (Qcoding_system_error, Qerror_message, |
| 4043 | build_string ("Invalid coding system")); | 4831 | build_string ("Invalid coding system")); |
| 4044 | 4832 | ||
| 4833 | Qcoding_category = intern ("coding-category"); | ||
| 4834 | staticpro (&Qcoding_category); | ||
| 4045 | Qcoding_category_index = intern ("coding-category-index"); | 4835 | Qcoding_category_index = intern ("coding-category-index"); |
| 4046 | staticpro (&Qcoding_category_index); | 4836 | staticpro (&Qcoding_category_index); |
| 4047 | 4837 | ||
| 4838 | Vcoding_category_table | ||
| 4839 | = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX), Qnil); | ||
| 4840 | staticpro (&Vcoding_category_table); | ||
| 4048 | { | 4841 | { |
| 4049 | int i; | 4842 | int i; |
| 4050 | for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) | 4843 | for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) |
| 4051 | { | 4844 | { |
| 4052 | coding_category_table[i] = intern (coding_category_name[i]); | 4845 | XVECTOR (Vcoding_category_table)->contents[i] |
| 4053 | staticpro (&coding_category_table[i]); | 4846 | = intern (coding_category_name[i]); |
| 4054 | Fput (coding_category_table[i], Qcoding_category_index, | 4847 | Fput (XVECTOR (Vcoding_category_table)->contents[i], |
| 4055 | make_number (i)); | 4848 | Qcoding_category_index, make_number (i)); |
| 4056 | } | 4849 | } |
| 4057 | } | 4850 | } |
| 4058 | 4851 | ||
| @@ -4075,11 +4868,15 @@ syms_of_coding () | |||
| 4075 | Qemacs_mule = intern ("emacs-mule"); | 4868 | Qemacs_mule = intern ("emacs-mule"); |
| 4076 | staticpro (&Qemacs_mule); | 4869 | staticpro (&Qemacs_mule); |
| 4077 | 4870 | ||
| 4871 | Qraw_text = intern ("raw-text"); | ||
| 4872 | staticpro (&Qraw_text); | ||
| 4873 | |||
| 4078 | defsubr (&Scoding_system_p); | 4874 | defsubr (&Scoding_system_p); |
| 4079 | defsubr (&Sread_coding_system); | 4875 | defsubr (&Sread_coding_system); |
| 4080 | defsubr (&Sread_non_nil_coding_system); | 4876 | defsubr (&Sread_non_nil_coding_system); |
| 4081 | defsubr (&Scheck_coding_system); | 4877 | defsubr (&Scheck_coding_system); |
| 4082 | defsubr (&Sdetect_coding_region); | 4878 | defsubr (&Sdetect_coding_region); |
| 4879 | defsubr (&Sdetect_coding_string); | ||
| 4083 | defsubr (&Sdecode_coding_region); | 4880 | defsubr (&Sdecode_coding_region); |
| 4084 | defsubr (&Sencode_coding_region); | 4881 | defsubr (&Sencode_coding_region); |
| 4085 | defsubr (&Sdecode_coding_string); | 4882 | defsubr (&Sdecode_coding_string); |
| @@ -4094,6 +4891,7 @@ syms_of_coding () | |||
| 4094 | defsubr (&Sset_keyboard_coding_system_internal); | 4891 | defsubr (&Sset_keyboard_coding_system_internal); |
| 4095 | defsubr (&Skeyboard_coding_system); | 4892 | defsubr (&Skeyboard_coding_system); |
| 4096 | defsubr (&Sfind_operation_coding_system); | 4893 | defsubr (&Sfind_operation_coding_system); |
| 4894 | defsubr (&Supdate_iso_coding_systems); | ||
| 4097 | 4895 | ||
| 4098 | DEFVAR_LISP ("coding-system-list", &Vcoding_system_list, | 4896 | DEFVAR_LISP ("coding-system-list", &Vcoding_system_list, |
| 4099 | "List of coding systems.\n\ | 4897 | "List of coding systems.\n\ |
| @@ -4121,7 +4919,8 @@ updated by the functions `make-coding-system' and\n\ | |||
| 4121 | Vcoding_category_list = Qnil; | 4919 | Vcoding_category_list = Qnil; |
| 4122 | for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--) | 4920 | for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--) |
| 4123 | Vcoding_category_list | 4921 | Vcoding_category_list |
| 4124 | = Fcons (coding_category_table[i], Vcoding_category_list); | 4922 | = Fcons (XVECTOR (Vcoding_category_table)->contents[i], |
| 4923 | Vcoding_category_list); | ||
| 4125 | } | 4924 | } |
| 4126 | 4925 | ||
| 4127 | DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read, | 4926 | DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read, |
| @@ -4249,6 +5048,18 @@ a coding system of ISO 2022 variant which has a flag\n\ | |||
| 4249 | or reading output of a subprocess.\n\ | 5048 | or reading output of a subprocess.\n\ |
| 4250 | Only 128th through 159th elements has a meaning."); | 5049 | Only 128th through 159th elements has a meaning."); |
| 4251 | Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil); | 5050 | Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil); |
| 5051 | |||
| 5052 | DEFVAR_LISP ("select-safe-coding-system-function", | ||
| 5053 | &Vselect_safe_coding_system_function, | ||
| 5054 | "Function to call to select safe coding system for encoding a text.\n\ | ||
| 5055 | \n\ | ||
| 5056 | If set, this function is called to force a user to select a proper\n\ | ||
| 5057 | coding system which can encode the text in the case that a default\n\ | ||
| 5058 | coding system used in each operation can't encode the text.\n\ | ||
| 5059 | \n\ | ||
| 5060 | The default value is `select-safe-codign-system' (which see)."); | ||
| 5061 | Vselect_safe_coding_system_function = Qnil; | ||
| 5062 | |||
| 4252 | } | 5063 | } |
| 4253 | 5064 | ||
| 4254 | #endif /* emacs */ | 5065 | #endif /* emacs */ |