diff options
| author | Richard M. Stallman | 2006-07-03 15:48:23 +0000 |
|---|---|---|
| committer | Richard M. Stallman | 2006-07-03 15:48:23 +0000 |
| commit | 50148a91b426fb6be9009d968f64b0e6d645e799 (patch) | |
| tree | 204edd374edd45b246e087ea692bc8dc8704e9fe | |
| parent | 43d6731323bf64459bc5a4d9aaa18cceca9c7eb1 (diff) | |
| download | emacs-50148a91b426fb6be9009d968f64b0e6d645e799.tar.gz emacs-50148a91b426fb6be9009d968f64b0e6d645e799.zip | |
(Coding Systems): Move char translation stuff here.
(Specify Coding, Output Coding): New nodes, out of Recognize Coding.
(Recognize Coding): Substantial local rewrites.
(International): Update menu.
| -rw-r--r-- | man/mule.texi | 144 |
1 files changed, 79 insertions, 65 deletions
diff --git a/man/mule.texi b/man/mule.texi index 8220a5097d1..15ec08ce9b0 100644 --- a/man/mule.texi +++ b/man/mule.texi | |||
| @@ -91,6 +91,8 @@ to make sure Emacs interprets keyboard input correctly; see | |||
| 91 | * Coding Systems:: Character set conversion when you read and | 91 | * Coding Systems:: Character set conversion when you read and |
| 92 | write files, and so on. | 92 | write files, and so on. |
| 93 | * Recognize Coding:: How Emacs figures out which conversion to use. | 93 | * Recognize Coding:: How Emacs figures out which conversion to use. |
| 94 | * Specify Coding:: Specifying a file's coding system explicitly. | ||
| 95 | * Output Coding:: Choosing coding systems for output. | ||
| 94 | * Text Coding:: Choosing conversion to use for file text. | 96 | * Text Coding:: Choosing conversion to use for file text. |
| 95 | * Communication Coding:: Coding systems for interprocess communication. | 97 | * Communication Coding:: Coding systems for interprocess communication. |
| 96 | * File Name Coding:: Coding systems for file @emph{names}. | 98 | * File Name Coding:: Coding systems for file @emph{names}. |
| @@ -718,6 +720,23 @@ non-@acronym{ASCII} characters stored with the internal Emacs encoding. It | |||
| 718 | handles end-of-line conversion based on the data encountered, and has | 720 | handles end-of-line conversion based on the data encountered, and has |
| 719 | the usual three variants to specify the kind of end-of-line conversion. | 721 | the usual three variants to specify the kind of end-of-line conversion. |
| 720 | 722 | ||
| 723 | @findex unify-8859-on-decoding-mode | ||
| 724 | The @dfn{character translation} feature can modify the effect of | ||
| 725 | various coding systems, by changing the internal Emacs codes that | ||
| 726 | decoding produces. For instance, the command | ||
| 727 | @code{unify-8859-on-decoding-mode} enables a mode that ``unifies'' the | ||
| 728 | Latin alphabets when decoding text. This works by converting all | ||
| 729 | non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or | ||
| 730 | Unicode characters. This way it is easier to use various | ||
| 731 | Latin-@var{n} alphabets together. (In a future Emacs version we hope | ||
| 732 | to move towards full Unicode support and complete unification of | ||
| 733 | character sets.) | ||
| 734 | |||
| 735 | @vindex enable-character-translation | ||
| 736 | If you set the variable @code{enable-character-translation} to | ||
| 737 | @code{nil}, that disables all character translation (including | ||
| 738 | @code{unify-8859-on-decoding-mode}). | ||
| 739 | |||
| 721 | @node Recognize Coding | 740 | @node Recognize Coding |
| 722 | @section Recognizing Coding Systems | 741 | @section Recognizing Coding Systems |
| 723 | 742 | ||
| @@ -812,26 +831,6 @@ coding system @code{iso-2022-7bit}, and they won't be | |||
| 812 | decoded correctly when you visit those files if you suppress the | 831 | decoded correctly when you visit those files if you suppress the |
| 813 | escape sequence detection. | 832 | escape sequence detection. |
| 814 | 833 | ||
| 815 | @vindex coding | ||
| 816 | You can specify the coding system for a particular file using the | ||
| 817 | @w{@samp{-*-@dots{}-*-}} construct at the beginning of a file, or a | ||
| 818 | local variables list at the end (@pxref{File Variables}). You do this | ||
| 819 | by defining a value for the ``variable'' named @code{coding}. Emacs | ||
| 820 | does not really have a variable @code{coding}; instead of setting a | ||
| 821 | variable, this uses the specified coding system for the file. For | ||
| 822 | example, @samp{-*-mode: C; coding: latin-1;-*-} specifies use of the | ||
| 823 | Latin-1 coding system, as well as C mode. When you specify the coding | ||
| 824 | explicitly in the file, that overrides | ||
| 825 | @code{file-coding-system-alist}. | ||
| 826 | |||
| 827 | If you add the character @samp{!} at the end of the coding system | ||
| 828 | name, it disables any character translation while decoding the file. | ||
| 829 | For instance, it effectively cancels the effect of | ||
| 830 | @code{unify-8859-on-decoding-mode}. This is useful when you need to | ||
| 831 | make sure that the character codes in the Emacs buffer will not | ||
| 832 | according to user settings; for instance, for the sake of strings in | ||
| 833 | Emacs Lisp source files. | ||
| 834 | |||
| 835 | @vindex auto-coding-alist | 834 | @vindex auto-coding-alist |
| 836 | @vindex auto-coding-regexp-alist | 835 | @vindex auto-coding-regexp-alist |
| 837 | @vindex auto-coding-functions | 836 | @vindex auto-coding-functions |
| @@ -848,6 +847,24 @@ RMAIL files, whose names in general don't match any particular | |||
| 848 | pattern, are decoded correctly. One of the builtin | 847 | pattern, are decoded correctly. One of the builtin |
| 849 | @code{auto-coding-functions} detects the encoding for XML files. | 848 | @code{auto-coding-functions} detects the encoding for XML files. |
| 850 | 849 | ||
| 850 | @vindex rmail-decode-mime-charset | ||
| 851 | When you get new mail in Rmail, each message is translated | ||
| 852 | automatically from the coding system it is written in, as if it were a | ||
| 853 | separate file. This uses the priority list of coding systems that you | ||
| 854 | have specified. If a MIME message specifies a character set, Rmail | ||
| 855 | obeys that specification, unless @code{rmail-decode-mime-charset} is | ||
| 856 | @code{nil}. | ||
| 857 | |||
| 858 | @vindex rmail-file-coding-system | ||
| 859 | For reading and saving Rmail files themselves, Emacs uses the coding | ||
| 860 | system specified by the variable @code{rmail-file-coding-system}. The | ||
| 861 | default value is @code{nil}, which means that Rmail files are not | ||
| 862 | translated (they are read and written in the Emacs internal character | ||
| 863 | code). | ||
| 864 | |||
| 865 | @node Specify Coding | ||
| 866 | @section Specifying a File's Coding System | ||
| 867 | |||
| 851 | If Emacs recognizes the encoding of a file incorrectly, you can | 868 | If Emacs recognizes the encoding of a file incorrectly, you can |
| 852 | reread the file using the correct coding system by typing @kbd{C-x | 869 | reread the file using the correct coding system by typing @kbd{C-x |
| 853 | @key{RET} r @var{coding-system} @key{RET}}. To see what coding system | 870 | @key{RET} r @var{coding-system} @key{RET}}. To see what coding system |
| @@ -855,33 +872,45 @@ Emacs actually used to decode the file, look at the coding system | |||
| 855 | mnemonic letter near the left edge of the mode line (@pxref{Mode | 872 | mnemonic letter near the left edge of the mode line (@pxref{Mode |
| 856 | Line}), or type @kbd{C-h C @key{RET}}. | 873 | Line}), or type @kbd{C-h C @key{RET}}. |
| 857 | 874 | ||
| 858 | @findex unify-8859-on-decoding-mode | 875 | @vindex coding |
| 859 | The command @code{unify-8859-on-decoding-mode} enables a mode that | 876 | You can specify the coding system for a particular file in the file |
| 860 | ``unifies'' the Latin alphabets when decoding text. This works by | 877 | itself, using the @w{@samp{-*-@dots{}-*-}} construct at the beginning, |
| 861 | converting all non-@acronym{ASCII} Latin-@var{n} characters to either | 878 | or a local variables list at the end (@pxref{File Variables}). You do |
| 862 | Latin-1 or Unicode characters. This way it is easier to use various | 879 | this by defining a value for the ``variable'' named @code{coding}. |
| 863 | Latin-@var{n} alphabets together. In a future Emacs version we hope | 880 | Emacs does not really have a variable @code{coding}; instead of |
| 864 | to move towards full Unicode support and complete unification of | 881 | setting a variable, this uses the specified coding system for the |
| 865 | character sets. | 882 | file. For example, @samp{-*-mode: C; coding: latin-1;-*-} specifies |
| 883 | use of the Latin-1 coding system, as well as C mode. When you specify | ||
| 884 | the coding explicitly in the file, that overrides | ||
| 885 | @code{file-coding-system-alist}. | ||
| 886 | |||
| 887 | If you add the character @samp{!} at the end of the coding system | ||
| 888 | name in @code{coding}, it disables any character translation while | ||
| 889 | decoding the file. For instance, it effectively cancels the effect of | ||
| 890 | @code{unify-8859-on-decoding-mode}. This is useful when you need to | ||
| 891 | make sure that the character codes in the Emacs buffer will not vary | ||
| 892 | due to changes in user settings; for instance, for the sake of strings | ||
| 893 | in Emacs Lisp source files. | ||
| 894 | |||
| 895 | @node Output Coding | ||
| 896 | @section Choosing Coding Systems for Output | ||
| 866 | 897 | ||
| 867 | @vindex buffer-file-coding-system | 898 | @vindex buffer-file-coding-system |
| 868 | Once Emacs has chosen a coding system for a buffer, it stores that | 899 | Once Emacs has chosen a coding system for a buffer, it stores that |
| 869 | coding system in @code{buffer-file-coding-system} and uses that coding | 900 | coding system in @code{buffer-file-coding-system}. That makes it the |
| 870 | system, by default, for operations that write from this buffer into a | 901 | default for operations that write from this buffer into a file, such |
| 871 | file. This includes the commands @code{save-buffer} and | 902 | as @code{save-buffer} and @code{write-region}. You can specify a |
| 872 | @code{write-region}. If you want to write files from this buffer using | 903 | different coding system for further file output from the buffer using |
| 873 | a different coding system, you can specify a different coding system for | 904 | @code{set-buffer-file-coding-system} (@pxref{Text Coding}). |
| 874 | the buffer using @code{set-buffer-file-coding-system} (@pxref{Text | 905 | |
| 875 | Coding}). | 906 | You can insert any character Emacs supports into any Emacs buffer, |
| 876 | 907 | but most coding systems can only handle a subset of these characters. | |
| 877 | You can insert any possible character into any Emacs buffer, but | 908 | Therefore, you can insert characters that cannot be encoded with the |
| 878 | most coding systems can only handle some of the possible characters. | 909 | coding system that will be used to save the buffer. For example, you |
| 879 | This means that it is possible for you to insert characters that | 910 | could start with an @acronym{ASCII} file and insert a few Latin-1 |
| 880 | cannot be encoded with the coding system that will be used to save the | 911 | characters into it, or you could edit a text file in Polish encoded in |
| 881 | buffer. For example, you could start with an @acronym{ASCII} file and insert a | 912 | @code{iso-8859-2} and add some Russian words to it. When you save |
| 882 | few Latin-1 characters into it, or you could edit a text file in | 913 | that buffer, Emacs cannot use the current value of |
| 883 | Polish encoded in @code{iso-8859-2} and add some Russian words to it. | ||
| 884 | When you save the buffer, Emacs cannot use the current value of | ||
| 885 | @code{buffer-file-coding-system}, because the characters you added | 914 | @code{buffer-file-coding-system}, because the characters you added |
| 886 | cannot be encoded by that coding system. | 915 | cannot be encoded by that coding system. |
| 887 | 916 | ||
| @@ -896,12 +925,12 @@ contents, and asks you to choose one of those coding systems. | |||
| 896 | If you insert the unsuitable characters in a mail message, Emacs | 925 | If you insert the unsuitable characters in a mail message, Emacs |
| 897 | behaves a bit differently. It additionally checks whether the | 926 | behaves a bit differently. It additionally checks whether the |
| 898 | most-preferred coding system is recommended for use in MIME messages; | 927 | most-preferred coding system is recommended for use in MIME messages; |
| 899 | if not, Emacs tells you that the most-preferred coding system is | 928 | if not, Emacs tells you that the most-preferred coding system is not |
| 900 | not recommended and prompts you for another coding system. This is so | 929 | recommended and prompts you for another coding system. This is so you |
| 901 | you won't inadvertently send a message encoded in a way that your | 930 | won't inadvertently send a message encoded in a way that your |
| 902 | recipient's mail software will have difficulty decoding. (If you do | 931 | recipient's mail software will have difficulty decoding. (You can |
| 903 | want to use the most-preferred coding system, you can still type its | 932 | still use an unsuitable coding system if you type its name in response |
| 904 | name in response to the question.) | 933 | to the question.) |
| 905 | 934 | ||
| 906 | @vindex sendmail-coding-system | 935 | @vindex sendmail-coding-system |
| 907 | When you send a message with Mail mode (@pxref{Sending Mail}), Emacs has | 936 | When you send a message with Mail mode (@pxref{Sending Mail}), Emacs has |
| @@ -914,21 +943,6 @@ new files, which is controlled by your choice of language environment, | |||
| 914 | if that is non-@code{nil}. If all of these three values are @code{nil}, | 943 | if that is non-@code{nil}. If all of these three values are @code{nil}, |
| 915 | Emacs encodes outgoing mail using the Latin-1 coding system. | 944 | Emacs encodes outgoing mail using the Latin-1 coding system. |
| 916 | 945 | ||
| 917 | @vindex rmail-decode-mime-charset | ||
| 918 | When you get new mail in Rmail, each message is translated | ||
| 919 | automatically from the coding system it is written in, as if it were a | ||
| 920 | separate file. This uses the priority list of coding systems that you | ||
| 921 | have specified. If a MIME message specifies a character set, Rmail | ||
| 922 | obeys that specification, unless @code{rmail-decode-mime-charset} is | ||
| 923 | @code{nil}. | ||
| 924 | |||
| 925 | @vindex rmail-file-coding-system | ||
| 926 | For reading and saving Rmail files themselves, Emacs uses the coding | ||
| 927 | system specified by the variable @code{rmail-file-coding-system}. The | ||
| 928 | default value is @code{nil}, which means that Rmail files are not | ||
| 929 | translated (they are read and written in the Emacs internal character | ||
| 930 | code). | ||
| 931 | |||
| 932 | @node Text Coding | 946 | @node Text Coding |
| 933 | @section Specifying a Coding System for File Text | 947 | @section Specifying a Coding System for File Text |
| 934 | 948 | ||