aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRichard M. Stallman2006-07-03 15:48:23 +0000
committerRichard M. Stallman2006-07-03 15:48:23 +0000
commit50148a91b426fb6be9009d968f64b0e6d645e799 (patch)
tree204edd374edd45b246e087ea692bc8dc8704e9fe
parent43d6731323bf64459bc5a4d9aaa18cceca9c7eb1 (diff)
downloademacs-50148a91b426fb6be9009d968f64b0e6d645e799.tar.gz
emacs-50148a91b426fb6be9009d968f64b0e6d645e799.zip
(Coding Systems): Move char translation stuff here.
(Specify Coding, Output Coding): New nodes, out of Recognize Coding. (Recognize Coding): Substantial local rewrites. (International): Update menu.
-rw-r--r--man/mule.texi144
1 files changed, 79 insertions, 65 deletions
diff --git a/man/mule.texi b/man/mule.texi
index 8220a5097d1..15ec08ce9b0 100644
--- a/man/mule.texi
+++ b/man/mule.texi
@@ -91,6 +91,8 @@ to make sure Emacs interprets keyboard input correctly; see
91* Coding Systems:: Character set conversion when you read and 91* Coding Systems:: Character set conversion when you read and
92 write files, and so on. 92 write files, and so on.
93* Recognize Coding:: How Emacs figures out which conversion to use. 93* Recognize Coding:: How Emacs figures out which conversion to use.
94* Specify Coding:: Specifying a file's coding system explicitly.
95* Output Coding:: Choosing coding systems for output.
94* Text Coding:: Choosing conversion to use for file text. 96* Text Coding:: Choosing conversion to use for file text.
95* Communication Coding:: Coding systems for interprocess communication. 97* Communication Coding:: Coding systems for interprocess communication.
96* File Name Coding:: Coding systems for file @emph{names}. 98* File Name Coding:: Coding systems for file @emph{names}.
@@ -718,6 +720,23 @@ non-@acronym{ASCII} characters stored with the internal Emacs encoding. It
718handles end-of-line conversion based on the data encountered, and has 720handles end-of-line conversion based on the data encountered, and has
719the usual three variants to specify the kind of end-of-line conversion. 721the usual three variants to specify the kind of end-of-line conversion.
720 722
723@findex unify-8859-on-decoding-mode
724 The @dfn{character translation} feature can modify the effect of
725various coding systems, by changing the internal Emacs codes that
726decoding produces. For instance, the command
727@code{unify-8859-on-decoding-mode} enables a mode that ``unifies'' the
728Latin alphabets when decoding text. This works by converting all
729non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or
730Unicode characters. This way it is easier to use various
731Latin-@var{n} alphabets together. (In a future Emacs version we hope
732to move towards full Unicode support and complete unification of
733character sets.)
734
735@vindex enable-character-translation
736 If you set the variable @code{enable-character-translation} to
737@code{nil}, that disables all character translation (including
738@code{unify-8859-on-decoding-mode}).
739
721@node Recognize Coding 740@node Recognize Coding
722@section Recognizing Coding Systems 741@section Recognizing Coding Systems
723 742
@@ -812,26 +831,6 @@ coding system @code{iso-2022-7bit}, and they won't be
812decoded correctly when you visit those files if you suppress the 831decoded correctly when you visit those files if you suppress the
813escape sequence detection. 832escape sequence detection.
814 833
815@vindex coding
816 You can specify the coding system for a particular file using the
817@w{@samp{-*-@dots{}-*-}} construct at the beginning of a file, or a
818local variables list at the end (@pxref{File Variables}). You do this
819by defining a value for the ``variable'' named @code{coding}. Emacs
820does not really have a variable @code{coding}; instead of setting a
821variable, this uses the specified coding system for the file. For
822example, @samp{-*-mode: C; coding: latin-1;-*-} specifies use of the
823Latin-1 coding system, as well as C mode. When you specify the coding
824explicitly in the file, that overrides
825@code{file-coding-system-alist}.
826
827 If you add the character @samp{!} at the end of the coding system
828name, it disables any character translation while decoding the file.
829For instance, it effectively cancels the effect of
830@code{unify-8859-on-decoding-mode}. This is useful when you need to
831make sure that the character codes in the Emacs buffer will not
832according to user settings; for instance, for the sake of strings in
833Emacs Lisp source files.
834
835@vindex auto-coding-alist 834@vindex auto-coding-alist
836@vindex auto-coding-regexp-alist 835@vindex auto-coding-regexp-alist
837@vindex auto-coding-functions 836@vindex auto-coding-functions
@@ -848,6 +847,24 @@ RMAIL files, whose names in general don't match any particular
848pattern, are decoded correctly. One of the builtin 847pattern, are decoded correctly. One of the builtin
849@code{auto-coding-functions} detects the encoding for XML files. 848@code{auto-coding-functions} detects the encoding for XML files.
850 849
850@vindex rmail-decode-mime-charset
851 When you get new mail in Rmail, each message is translated
852automatically from the coding system it is written in, as if it were a
853separate file. This uses the priority list of coding systems that you
854have specified. If a MIME message specifies a character set, Rmail
855obeys that specification, unless @code{rmail-decode-mime-charset} is
856@code{nil}.
857
858@vindex rmail-file-coding-system
859 For reading and saving Rmail files themselves, Emacs uses the coding
860system specified by the variable @code{rmail-file-coding-system}. The
861default value is @code{nil}, which means that Rmail files are not
862translated (they are read and written in the Emacs internal character
863code).
864
865@node Specify Coding
866@section Specifying a File's Coding System
867
851 If Emacs recognizes the encoding of a file incorrectly, you can 868 If Emacs recognizes the encoding of a file incorrectly, you can
852reread the file using the correct coding system by typing @kbd{C-x 869reread the file using the correct coding system by typing @kbd{C-x
853@key{RET} r @var{coding-system} @key{RET}}. To see what coding system 870@key{RET} r @var{coding-system} @key{RET}}. To see what coding system
@@ -855,33 +872,45 @@ Emacs actually used to decode the file, look at the coding system
855mnemonic letter near the left edge of the mode line (@pxref{Mode 872mnemonic letter near the left edge of the mode line (@pxref{Mode
856Line}), or type @kbd{C-h C @key{RET}}. 873Line}), or type @kbd{C-h C @key{RET}}.
857 874
858@findex unify-8859-on-decoding-mode 875@vindex coding
859 The command @code{unify-8859-on-decoding-mode} enables a mode that 876 You can specify the coding system for a particular file in the file
860``unifies'' the Latin alphabets when decoding text. This works by 877itself, using the @w{@samp{-*-@dots{}-*-}} construct at the beginning,
861converting all non-@acronym{ASCII} Latin-@var{n} characters to either 878or a local variables list at the end (@pxref{File Variables}). You do
862Latin-1 or Unicode characters. This way it is easier to use various 879this by defining a value for the ``variable'' named @code{coding}.
863Latin-@var{n} alphabets together. In a future Emacs version we hope 880Emacs does not really have a variable @code{coding}; instead of
864to move towards full Unicode support and complete unification of 881setting a variable, this uses the specified coding system for the
865character sets. 882file. For example, @samp{-*-mode: C; coding: latin-1;-*-} specifies
883use of the Latin-1 coding system, as well as C mode. When you specify
884the coding explicitly in the file, that overrides
885@code{file-coding-system-alist}.
886
887 If you add the character @samp{!} at the end of the coding system
888name in @code{coding}, it disables any character translation while
889decoding the file. For instance, it effectively cancels the effect of
890@code{unify-8859-on-decoding-mode}. This is useful when you need to
891make sure that the character codes in the Emacs buffer will not vary
892due to changes in user settings; for instance, for the sake of strings
893in Emacs Lisp source files.
894
895@node Output Coding
896@section Choosing Coding Systems for Output
866 897
867@vindex buffer-file-coding-system 898@vindex buffer-file-coding-system
868 Once Emacs has chosen a coding system for a buffer, it stores that 899 Once Emacs has chosen a coding system for a buffer, it stores that
869coding system in @code{buffer-file-coding-system} and uses that coding 900coding system in @code{buffer-file-coding-system}. That makes it the
870system, by default, for operations that write from this buffer into a 901default for operations that write from this buffer into a file, such
871file. This includes the commands @code{save-buffer} and 902as @code{save-buffer} and @code{write-region}. You can specify a
872@code{write-region}. If you want to write files from this buffer using 903different coding system for further file output from the buffer using
873a different coding system, you can specify a different coding system for 904@code{set-buffer-file-coding-system} (@pxref{Text Coding}).
874the buffer using @code{set-buffer-file-coding-system} (@pxref{Text 905
875Coding}). 906 You can insert any character Emacs supports into any Emacs buffer,
876 907but most coding systems can only handle a subset of these characters.
877 You can insert any possible character into any Emacs buffer, but 908Therefore, you can insert characters that cannot be encoded with the
878most coding systems can only handle some of the possible characters. 909coding system that will be used to save the buffer. For example, you
879This means that it is possible for you to insert characters that 910could start with an @acronym{ASCII} file and insert a few Latin-1
880cannot be encoded with the coding system that will be used to save the 911characters into it, or you could edit a text file in Polish encoded in
881buffer. For example, you could start with an @acronym{ASCII} file and insert a 912@code{iso-8859-2} and add some Russian words to it. When you save
882few Latin-1 characters into it, or you could edit a text file in 913that buffer, Emacs cannot use the current value of
883Polish encoded in @code{iso-8859-2} and add some Russian words to it.
884When you save the buffer, Emacs cannot use the current value of
885@code{buffer-file-coding-system}, because the characters you added 914@code{buffer-file-coding-system}, because the characters you added
886cannot be encoded by that coding system. 915cannot be encoded by that coding system.
887 916
@@ -896,12 +925,12 @@ contents, and asks you to choose one of those coding systems.
896 If you insert the unsuitable characters in a mail message, Emacs 925 If you insert the unsuitable characters in a mail message, Emacs
897behaves a bit differently. It additionally checks whether the 926behaves a bit differently. It additionally checks whether the
898most-preferred coding system is recommended for use in MIME messages; 927most-preferred coding system is recommended for use in MIME messages;
899if not, Emacs tells you that the most-preferred coding system is 928if not, Emacs tells you that the most-preferred coding system is not
900not recommended and prompts you for another coding system. This is so 929recommended and prompts you for another coding system. This is so you
901you won't inadvertently send a message encoded in a way that your 930won't inadvertently send a message encoded in a way that your
902recipient's mail software will have difficulty decoding. (If you do 931recipient's mail software will have difficulty decoding. (You can
903want to use the most-preferred coding system, you can still type its 932still use an unsuitable coding system if you type its name in response
904name in response to the question.) 933to the question.)
905 934
906@vindex sendmail-coding-system 935@vindex sendmail-coding-system
907 When you send a message with Mail mode (@pxref{Sending Mail}), Emacs has 936 When you send a message with Mail mode (@pxref{Sending Mail}), Emacs has
@@ -914,21 +943,6 @@ new files, which is controlled by your choice of language environment,
914if that is non-@code{nil}. If all of these three values are @code{nil}, 943if that is non-@code{nil}. If all of these three values are @code{nil},
915Emacs encodes outgoing mail using the Latin-1 coding system. 944Emacs encodes outgoing mail using the Latin-1 coding system.
916 945
917@vindex rmail-decode-mime-charset
918 When you get new mail in Rmail, each message is translated
919automatically from the coding system it is written in, as if it were a
920separate file. This uses the priority list of coding systems that you
921have specified. If a MIME message specifies a character set, Rmail
922obeys that specification, unless @code{rmail-decode-mime-charset} is
923@code{nil}.
924
925@vindex rmail-file-coding-system
926 For reading and saving Rmail files themselves, Emacs uses the coding
927system specified by the variable @code{rmail-file-coding-system}. The
928default value is @code{nil}, which means that Rmail files are not
929translated (they are read and written in the Emacs internal character
930code).
931
932@node Text Coding 946@node Text Coding
933@section Specifying a Coding System for File Text 947@section Specifying a Coding System for File Text
934 948