aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--doc/emacs/ChangeLog16
-rw-r--r--doc/emacs/basic.texi42
-rw-r--r--doc/emacs/emacs.texi1
-rw-r--r--doc/emacs/mule.texi310
4 files changed, 162 insertions, 207 deletions
diff --git a/doc/emacs/ChangeLog b/doc/emacs/ChangeLog
index fc2c277972c..29be8c714d3 100644
--- a/doc/emacs/ChangeLog
+++ b/doc/emacs/ChangeLog
@@ -1,3 +1,19 @@
12009-05-06 Chong Yidong <cyd@stupidchicken.com>
2
3 * basic.texi (Inserting Text): Document ucs-insert.
4
5 * mule.texi (International Chars): Define "multibyte". Note that
6 internal representation is unicode-based. Simplify definition of raw
7 bytes. Mention ucs-insert.
8 (Enabling Multibyte): Remove obsolete discussion. Copyedits.
9 (Language Environments): Add language environments new to Emacs 23.
10 (Multibyte Conversion): Node deleted.
11 (Coding Systems): Remove obsolete unify-8859-on-decoding-mode. Don't
12 mention obsolete emacs-mule coding system.
13 (Output Coding): Copyedits.
14
15 * emacs.texi (Top): Update node listing.
16
12009-05-05 Per Starbäck <per@starback.se> (tiny change) 172009-05-05 Per Starbäck <per@starback.se> (tiny change)
2 18
3 * trouble.texi (Lossage): Use new binding of view-emacs-problems. 19 * trouble.texi (Lossage): Use new binding of view-emacs-problems.
diff --git a/doc/emacs/basic.texi b/doc/emacs/basic.texi
index 710a093f495..72ab17c33ac 100644
--- a/doc/emacs/basic.texi
+++ b/doc/emacs/basic.texi
@@ -64,9 +64,11 @@ key; other keys act as editing commands and do not insert themselves.
64For instance, @kbd{DEL} runs the command @code{delete-backward-char} 64For instance, @kbd{DEL} runs the command @code{delete-backward-char}
65by default (some modes bind it to a different command); it does not 65by default (some modes bind it to a different command); it does not
66insert a literal @samp{DEL} character (@acronym{ASCII} character code 66insert a literal @samp{DEL} character (@acronym{ASCII} character code
67127). To insert a non-graphic character, first @dfn{quote} it by 67127).
68typing @kbd{C-q} (@code{quoted-insert}). There are two ways to use 68
69@kbd{C-q}: 69 To insert a non-graphic character, or a character that your keyboard
70does not support, first @dfn{quote} it by typing @kbd{C-q}
71(@code{quoted-insert}). There are two ways to use @kbd{C-q}:
70 72
71@itemize @bullet 73@itemize @bullet
72@item 74@item
@@ -87,32 +89,24 @@ Overwrite mode, to give you a convenient way to insert a digit instead
87of overwriting with it. 89of overwriting with it.
88@end itemize 90@end itemize
89 91
90@cindex 8-bit character codes
91@noindent
92If you specify a code in the octal range 0200 through 0377, @kbd{C-q}
93assumes that you intend to use some ISO 8859-@var{n} character set,
94and converts the specified code to the corresponding Emacs character
95code. Your choice of language environment determines which of the ISO
968859 character sets to use (@pxref{Language Environments}). This
97feature is disabled if multibyte characters are disabled
98(@pxref{Enabling Multibyte}).
99
100@vindex read-quoted-char-radix 92@vindex read-quoted-char-radix
93@noindent
101To use decimal or hexadecimal instead of octal, set the variable 94To use decimal or hexadecimal instead of octal, set the variable
102@code{read-quoted-char-radix} to 10 or 16. If the radix is greater than 95@code{read-quoted-char-radix} to 10 or 16. If the radix is greater
10310, some letters starting with @kbd{a} serve as part of a character 96than 10, some letters starting with @kbd{a} serve as part of a
104code, just like digits. 97character code, just like digits.
105 98
106A numeric argument tells @kbd{C-q} how many copies of the quoted 99 A numeric argument tells @kbd{C-q} how many copies of the quoted
107character to insert (@pxref{Arguments}). 100character to insert (@pxref{Arguments}).
108 101
109@findex newline 102@findex ucs-insert
110@findex self-insert 103@cindex Unicode
111 Customization information: @key{DEL} in most modes runs the command 104 Instead of @kbd{C-q}, you can use @kbd{C-x 8 @key{RET}}
112@code{delete-backward-char}; @key{RET} runs the command 105(@code{ucs-insert}) to insert a character based on its Unicode name or
113@code{newline}, and self-inserting printing characters run the command 106code-point. This commands prompts for a character to insert, using
114@code{self-insert}, which inserts whatever character you typed. Some 107the minibuffer; you can specify the character using either (i) the
115major modes rebind @key{DEL} to other commands. 108character's name in the Unicode standard, or (ii) the character's
109code-point in the Unicode standard.
116 110
117@node Moving Point 111@node Moving Point
118@section Changing the Location of Point 112@section Changing the Location of Point
diff --git a/doc/emacs/emacs.texi b/doc/emacs/emacs.texi
index 4fb083ad22b..717e2b78c3e 100644
--- a/doc/emacs/emacs.texi
+++ b/doc/emacs/emacs.texi
@@ -507,7 +507,6 @@ International Character Set Support
507* Language Environments:: Setting things up for the language you use. 507* Language Environments:: Setting things up for the language you use.
508* Input Methods:: Entering text characters not on your keyboard. 508* Input Methods:: Entering text characters not on your keyboard.
509* Select Input Method:: Specifying your choice of input methods. 509* Select Input Method:: Specifying your choice of input methods.
510* Multibyte Conversion:: How single-byte characters convert to multibyte.
511* Coding Systems:: Character set conversion when you read and 510* Coding Systems:: Character set conversion when you read and
512 write files, and so on. 511 write files, and so on.
513* Recognize Coding:: How Emacs figures out which conversion to use. 512* Recognize Coding:: How Emacs figures out which conversion to use.
diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi
index a622722f1c6..aa25ed371de 100644
--- a/doc/emacs/mule.texi
+++ b/doc/emacs/mule.texi
@@ -89,7 +89,6 @@ to make sure Emacs interprets keyboard input correctly; see
89* Language Environments:: Setting things up for the language you use. 89* Language Environments:: Setting things up for the language you use.
90* Input Methods:: Entering text characters not on your keyboard. 90* Input Methods:: Entering text characters not on your keyboard.
91* Select Input Method:: Specifying your choice of input methods. 91* Select Input Method:: Specifying your choice of input methods.
92* Multibyte Conversion:: How single-byte characters convert to multibyte.
93* Coding Systems:: Character set conversion when you read and 92* Coding Systems:: Character set conversion when you read and
94 write files, and so on. 93 write files, and so on.
95* Recognize Coding:: How Emacs figures out which conversion to use. 94* Recognize Coding:: How Emacs figures out which conversion to use.
@@ -115,14 +114,17 @@ to make sure Emacs interprets keyboard input correctly; see
115 114
116 The users of international character sets and scripts have 115 The users of international character sets and scripts have
117established many more-or-less standard coding systems for storing 116established many more-or-less standard coding systems for storing
118files. Emacs internally uses a single multibyte character encoding, 117files. These coding systems are typically @dfn{multibyte}, meaning
119so that it can intermix characters from all these scripts in a single 118that sequences of two or more bytes are used to represent individual
120buffer or string. This encoding represents each non-@acronym{ASCII} 119non-@acronym{ASCII} characters.
121character as a sequence of bytes in the range 0200 through 0377. 120
122Emacs translates between the multibyte character encoding and various 121@cindex Unicode
123other coding systems when reading and writing files, when exchanging 122 Internally, Emacs uses its own multibyte character encoding, which
124data with subprocesses, and (in some cases) in the @kbd{C-q} command 123is a superset of the @dfn{Unicode} standard. This internal encoding
125(@pxref{Multibyte Conversion}). 124allows characters from almost every known script to be intermixed in a
125single buffer or string. Emacs translates between the multibyte
126character encoding and various other coding systems when reading and
127writing files, and when exchanging data with subprocesses.
126 128
127@kindex C-h h 129@kindex C-h h
128@findex view-hello-file 130@findex view-hello-file
@@ -134,10 +136,14 @@ This illustrates various scripts. If some characters can't be
134displayed on your terminal, they appear as @samp{?} or as hollow boxes 136displayed on your terminal, they appear as @samp{?} or as hollow boxes
135(@pxref{Undisplayable Characters}). 137(@pxref{Undisplayable Characters}).
136 138
137 Keyboards, even in the countries where these character sets are used, 139 Keyboards, even in the countries where these character sets are
138generally don't have keys for all the characters in them. So Emacs 140used, generally don't have keys for all the characters in them. You
139supports various @dfn{input methods}, typically one for each script or 141can insert characters that your keyboard does not support, using
140language, to make it convenient to type them. 142@kbd{C-q} (@code{quoted-insert}) or @kbd{C-x 8 @key{RET}}
143(@code{ucs-insert}). @xref{Inserting Text}. Emacs also supports
144various @dfn{input methods}, typically one for each script or
145language, which make it easier to type characters in the script.
146@xref{Input Methods}.
141 147
142@kindex C-x RET 148@kindex C-x RET
143 The prefix key @kbd{C-x @key{RET}} is used for commands that pertain 149 The prefix key @kbd{C-x @key{RET}} is used for commands that pertain
@@ -165,12 +171,12 @@ system encodes the character safely and with a single byte
165(@pxref{Coding Systems}). If the character's encoding is longer than 171(@pxref{Coding Systems}). If the character's encoding is longer than
166one byte, Emacs shows @samp{file ...}. 172one byte, Emacs shows @samp{file ...}.
167 173
168 However, if the character displayed is in the range 0200 through 174 As a special case, if the character lies in the range 128 (0200
1690377 octal, it may actually stand for an invalid UTF-8 byte read from 175octal) through 159 (0237 octal), it stands for a ``raw'' byte that
170a file. In Emacs, that byte is represented as a sequence of 8-bit 176does not correspond to any specific displayable character. Such a
171characters, but all of them together display as the original invalid 177``character'' lies within the @code{eight-bit-control} character set,
172byte, in octal code. In this case, @kbd{C-x =} shows @samp{part of 178and is displayed as an escaped octal character code. In this case,
173display ...} instead of @samp{file}. 179@kbd{C-x =} shows @samp{part of display ...} instead of @samp{file}.
174 180
175@cindex character set of character at point 181@cindex character set of character at point
176@cindex font of character at point 182@cindex font of character at point
@@ -235,74 +241,62 @@ There are text properties here:
235@node Enabling Multibyte 241@node Enabling Multibyte
236@section Enabling Multibyte Characters 242@section Enabling Multibyte Characters
237 243
238 By default, Emacs starts in multibyte mode, because that allows you to 244 By default, Emacs starts in multibyte mode: it stores the contents
239use all the supported languages and scripts without limitations. 245of buffers and strings using an internal encoding that represents
246non-@acronym{ASCII} characters using multi-byte sequences. Multibyte
247mode allows you to use all the supported languages and scripts without
248limitations.
240 249
241@cindex turn multibyte support on or off 250@cindex turn multibyte support on or off
242 You can enable or disable multibyte character support, either for 251 Under very special circumstances, you may want to disable multibyte
243Emacs as a whole, or for a single buffer. When multibyte characters 252character support, either for Emacs as a whole, or for a single
244are disabled in a buffer, we call that @dfn{unibyte mode}. Then each 253buffer. When multibyte characters are disabled in a buffer, we call
245byte in that buffer represents a character, even codes 0200 through 254that @dfn{unibyte mode}. In unibyte mode, each character in the
2460377. 255buffer has a character code ranging from 0 through 255 (0377 octal); 0
247 256through 127 (0177 octal) represent @acronym{ASCII} characters, and 128
248 The old features for supporting the European character sets, ISO 257(0200 octal) through 255 (0377 octal) represent non-@acronym{ASCII}
249Latin-1 and ISO Latin-2, work in unibyte mode as they did in Emacs 19 258characters.
250and also work for the other ISO 8859 character sets. However, there
251is no need to turn off multibyte character support to use ISO Latin;
252the Emacs multibyte character set includes all the characters in these
253character sets, and Emacs can translate automatically to and from the
254ISO codes.
255 259
256 To edit a particular file in unibyte representation, visit it using 260 To edit a particular file in unibyte representation, visit it using
257@code{find-file-literally}. @xref{Visiting}. To convert a buffer in 261@code{find-file-literally}. @xref{Visiting}. You can convert a
258multibyte representation into a single-byte representation of the same 262multibyte buffer to unibyte by saving it to a file, killing the
259characters, the easiest way is to save the contents in a file, kill the 263buffer, and visiting the file again with @code{find-file-literally}.
260buffer, and find the file again with @code{find-file-literally}. You 264Alternatively, you can use @kbd{C-x @key{RET} c}
261can also use @kbd{C-x @key{RET} c} 265(@code{universal-coding-system-argument}) and specify @samp{raw-text}
262(@code{universal-coding-system-argument}) and specify @samp{raw-text} as 266as the coding system with which to visit or save a file. @xref{Text
263the coding system with which to find or save a file. @xref{Text 267Coding}. Unlike @code{find-file-literally}, finding a file as
264Coding}. Finding a file as @samp{raw-text} doesn't disable format 268@samp{raw-text} doesn't disable format conversion, uncompression, or
265conversion, uncompression and auto mode selection as 269auto mode selection.
266@code{find-file-literally} does.
267 270
268@vindex enable-multibyte-characters 271@vindex enable-multibyte-characters
269@vindex default-enable-multibyte-characters 272@vindex default-enable-multibyte-characters
273@cindex environment variables, and non-@acronym{ASCII} characters
270 To turn off multibyte character support by default, start Emacs with 274 To turn off multibyte character support by default, start Emacs with
271the @samp{--unibyte} option (@pxref{Initial Options}), or set the 275the @samp{--unibyte} option (@pxref{Initial Options}), or set the
272environment variable @env{EMACS_UNIBYTE}. You can also customize 276environment variable @env{EMACS_UNIBYTE}. You can also customize
273@code{enable-multibyte-characters} or, equivalently, directly set the 277@code{enable-multibyte-characters} or, equivalently, directly set the
274variable @code{default-enable-multibyte-characters} to @code{nil} in 278variable @code{default-enable-multibyte-characters} to @code{nil} in
275your init file to have basically the same effect as @samp{--unibyte}. 279your init file to have basically the same effect as @samp{--unibyte}.
276 280With @samp{--unibyte}, multibyte strings are not created during
277@findex toggle-enable-multibyte-characters 281initialization from the values of environment variables,
278 To convert a unibyte session to a multibyte session, set 282@file{/etc/passwd} entries etc., even if those contain
279@code{default-enable-multibyte-characters} to @code{t}. Buffers which 283non-@acronym{ASCII} characters.
280were created in the unibyte session before you turn on multibyte support
281will stay unibyte. You can turn on multibyte support in a specific
282buffer by invoking the command @code{toggle-enable-multibyte-characters}
283in that buffer.
284 284
285@cindex Lisp files, and multibyte operation 285@cindex Lisp files, and multibyte operation
286@cindex multibyte operation, and Lisp files 286@cindex multibyte operation, and Lisp files
287@cindex unibyte operation, and Lisp files 287@cindex unibyte operation, and Lisp files
288@cindex init file, and non-@acronym{ASCII} characters 288@cindex init file, and non-@acronym{ASCII} characters
289@cindex environment variables, and non-@acronym{ASCII} characters
290 With @samp{--unibyte}, multibyte strings are not created during
291initialization from the values of environment variables,
292@file{/etc/passwd} entries etc.@: that contain non-@acronym{ASCII} 8-bit
293characters.
294
295 Emacs normally loads Lisp files as multibyte, regardless of whether 289 Emacs normally loads Lisp files as multibyte, regardless of whether
296you used @samp{--unibyte}. This includes the Emacs initialization file, 290you used @samp{--unibyte}. This includes the Emacs initialization
297@file{.emacs}, and the initialization files of Emacs packages such as 291file, @file{.emacs}, and the initialization files of Emacs packages
298Gnus. However, you can specify unibyte loading for a particular Lisp 292such as Gnus. However, you can specify unibyte loading for a
299file, by putting @w{@samp{-*-unibyte: t;-*-}} in a comment on the first 293particular Lisp file, by putting @w{@samp{-*-unibyte: t;-*-}} in a
300line (@pxref{File Variables}). Then that file is always loaded as 294comment on the first line (@pxref{File Variables}). Then that file is
301unibyte text, even if you did not start Emacs with @samp{--unibyte}. 295always loaded as unibyte text. The motivation for these conventions
302The motivation for these conventions is that it is more reliable to 296is that it is more reliable to always load any particular Lisp file in
303always load any particular Lisp file in the same way. However, you can 297the same way. However, you can load a Lisp file as unibyte, on any
304load a Lisp file as unibyte, on any one occasion, by typing @kbd{C-x 298one occasion, by typing @kbd{C-x @key{RET} c raw-text @key{RET}}
305@key{RET} c raw-text @key{RET}} immediately before loading it. 299immediately before loading it.
306 300
307 The mode line indicates whether multibyte character support is 301 The mode line indicates whether multibyte character support is
308enabled in the current buffer. If it is, there are two or more 302enabled in the current buffer. If it is, there are two or more
@@ -312,6 +306,14 @@ convention (colon, backslash, etc.). When multibyte characters
312are not enabled, nothing precedes the colon except a single dash. 306are not enabled, nothing precedes the colon except a single dash.
313@xref{Mode Line}, for more details about this. 307@xref{Mode Line}, for more details about this.
314 308
309@findex toggle-enable-multibyte-characters
310 To convert a unibyte session to a multibyte session, set
311@code{default-enable-multibyte-characters} to @code{t}. Buffers which
312were created in the unibyte session before you turn on multibyte
313support will stay unibyte. You can turn on multibyte support in a
314specific buffer by invoking the command
315@code{toggle-enable-multibyte-characters} in that buffer.
316
315@node Language Environments 317@node Language Environments
316@section Language Environments 318@section Language Environments
317@cindex language environments 319@cindex language environments
@@ -319,43 +321,41 @@ are not enabled, nothing precedes the colon except a single dash.
319 All supported character sets are supported in Emacs buffers whenever 321 All supported character sets are supported in Emacs buffers whenever
320multibyte characters are enabled; there is no need to select a 322multibyte characters are enabled; there is no need to select a
321particular language in order to display its characters in an Emacs 323particular language in order to display its characters in an Emacs
322buffer. However, it is important to select a @dfn{language environment} 324buffer. However, it is important to select a @dfn{language
323in order to set various defaults. The language environment really 325environment} in order to set various defaults. Roughly speaking, the
324represents a choice of preferred script (more or less) rather than a 326language environment represents a choice of preferred script rather
325choice of language. 327than a choice of language.
326 328
327 The language environment controls which coding systems to recognize 329 The language environment controls which coding systems to recognize
328when reading text (@pxref{Recognize Coding}). This applies to files, 330when reading text (@pxref{Recognize Coding}). This applies to files,
329incoming mail, netnews, and any other text you read into Emacs. It may 331incoming mail, and any other text you read into Emacs. It may also
330also specify the default coding system to use when you create a file. 332specify the default coding system to use when you create a file. Each
331Each language environment also specifies a default input method. 333language environment also specifies a default input method.
332 334
333@findex set-language-environment 335@findex set-language-environment
334@vindex current-language-environment 336@vindex current-language-environment
335 To select a language environment, you can customize the variable 337 To select a language environment, customize the variable
336@code{current-language-environment} or use the command @kbd{M-x 338@code{current-language-environment} or use the command @kbd{M-x
337set-language-environment}. It makes no difference which buffer is 339set-language-environment}. It makes no difference which buffer is
338current when you use this command, because the effects apply globally to 340current when you use this command, because the effects apply globally
339the Emacs session. The supported language environments include: 341to the Emacs session. The supported language environments include:
340 342
341@cindex Euro sign 343@cindex Euro sign
342@cindex UTF-8 344@cindex UTF-8
343@quotation 345@quotation
344ASCII, Belarusian, Brazilian Portuguese, Bulgarian, Chinese-BIG5, 346ASCII, Belarusian, Bengali, Brazilian Portuguese, Bulgarian,
345Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Croatian, Cyrillic-ALT, 347Chinese-BIG5, Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Chinese-GBK,
346Cyrillic-ISO, Cyrillic-KOI8, Czech, Devanagari, Dutch, English, 348Chinese-GB18030, Croatian, Cyrillic-ALT, Cyrillic-ISO, Cyrillic-KOI8,
347Esperanto, Ethiopic, French, Georgian, German, Greek, Hebrew, IPA, 349Czech, Devanagari, Dutch, English, Esperanto, Ethiopic, French,
348Italian, Japanese, Kannada, Korean, Lao, Latin-1, Latin-2, Latin-3, 350Georgian, German, Greek, Gujarati, Hebrew, IPA, Italian, Japanese,
349Latin-4, Latin-5, Latin-6, Latin-7, Latin-8 (Celtic), Latin-9 (updated 351Kannada, Khmer, Korean, Lao, Latin-1, Latin-2, Latin-3, Latin-4,
350Latin-1 with the Euro sign), Latvian, Lithuanian, Malayalam, Polish, 352Latin-5, Latin-6, Latin-7, Latin-8 (Celtic), Latin-9 (updated Latin-1
351Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tajik, Tamil, 353with the Euro sign), Latvian, Lithuanian, Malayalam, Oriya, Polish,
352Thai, Tibetan, Turkish, UTF-8 (for a setup which prefers Unicode 354Punjabi, Romanian, Russian, Sinhala, Slovak, Slovenian, Spanish,
353characters and files encoded in UTF-8), Ukrainian, Vietnamese, Welsh, 355Swedish, TaiViet, Tajik, Tamil, Telugu, Thai, Tibetan, Turkish, UTF-8
354and Windows-1255 (for a setup which prefers Cyrillic characters and 356(for a setup which prefers Unicode characters and files encoded in
355files encoded in Windows-1255). 357UTF-8), Ukrainian, Vietnamese, Welsh, and Windows-1255 (for a setup
356@tex 358which prefers Cyrillic characters and files encoded in Windows-1255).
357\hbadness=10000\par % just avoid underfull hbox warning
358@end tex
359@end quotation 359@end quotation
360 360
361@cindex fonts for various scripts 361@cindex fonts for various scripts
@@ -657,34 +657,6 @@ character.
657list-input-methods}. The list gives information about each input 657list-input-methods}. The list gives information about each input
658method, including the string that stands for it in the mode line. 658method, including the string that stands for it in the mode line.
659 659
660@node Multibyte Conversion
661@section Unibyte and Multibyte Non-@acronym{ASCII} characters
662
663 When multibyte characters are enabled, character codes 0240 (octal)
664through 0377 (octal) are not really legitimate in the buffer. The valid
665non-@acronym{ASCII} printing characters have codes that start from 0400.
666
667 If you type a self-inserting character in the range 0240 through
6680377, or if you use @kbd{C-q} to insert one, Emacs assumes you
669intended to use one of the ISO Latin-@var{n} character sets, and
670converts it to the Emacs code representing that Latin-@var{n}
671character. You select @emph{which} ISO Latin character set to use
672through your choice of language environment
673@iftex
674(see above).
675@end iftex
676@ifnottex
677(@pxref{Language Environments}).
678@end ifnottex
679If you do not specify a choice, the default is Latin-1.
680
681 If you insert a character in the range 0200 through 0237, which
682forms the @code{eight-bit-control} character set, it is inserted
683literally. You should normally avoid doing this since buffers
684containing such characters have to be written out in either the
685@code{emacs-mule} or @code{raw-text} coding system, which is usually
686not what you want.
687
688@node Coding Systems 660@node Coding Systems
689@section Coding Systems 661@section Coding Systems
690@cindex coding systems 662@cindex coding systems
@@ -698,11 +670,11 @@ possible in reading or writing files, in sending or receiving from the
698terminal, and in exchanging data with subprocesses. 670terminal, and in exchanging data with subprocesses.
699 671
700 Emacs assigns a name to each coding system. Most coding systems are 672 Emacs assigns a name to each coding system. Most coding systems are
701used for one language, and the name of the coding system starts with the 673used for one language, and the name of the coding system starts with
702language name. Some coding systems are used for several languages; 674the language name. Some coding systems are used for several
703their names usually start with @samp{iso}. There are also special 675languages; their names usually start with @samp{iso}. There are also
704coding systems @code{no-conversion}, @code{raw-text} and 676special coding systems, such as @code{no-conversion}, @code{raw-text},
705@code{emacs-mule} which do not convert printing characters at all. 677and @code{emacs-internal}.
706 678
707@cindex international files from DOS/Windows systems 679@cindex international files from DOS/Windows systems
708 A special class of coding systems, collectively known as 680 A special class of coding systems, collectively known as
@@ -814,37 +786,21 @@ the @kbd{M-x find-file-literally} command. This uses
814@code{no-conversion}, and also suppresses other Emacs features that 786@code{no-conversion}, and also suppresses other Emacs features that
815might convert the file contents before you see them. @xref{Visiting}. 787might convert the file contents before you see them. @xref{Visiting}.
816 788
817 The coding system @code{emacs-mule} means that the file contains 789 The coding system @code{emacs-internal} (or @code{utf-8-emacs},
818non-@acronym{ASCII} characters stored with the internal Emacs encoding. It 790which is equivalent) means that the file contains non-@acronym{ASCII}
819handles end-of-line conversion based on the data encountered, and has 791characters stored with the internal Emacs encoding. This coding
820the usual three variants to specify the kind of end-of-line conversion. 792system handles end-of-line conversion based on the data encountered,
821 793and has the usual three variants to specify the kind of end-of-line
822@findex unify-8859-on-decoding-mode 794conversion.
823@anchor{Character Translation}
824 The @dfn{character translation} feature can modify the effect of
825various coding systems, by changing the internal Emacs codes that
826decoding produces. For instance, the command
827@code{unify-8859-on-decoding-mode} enables a mode that ``unifies'' the
828Latin alphabets when decoding text. This works by converting all
829non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or
830Unicode characters. This way it is easier to use various
831Latin-@var{n} alphabets together. (In a future Emacs version we hope
832to move towards full Unicode support and complete unification of
833character sets.)
834
835@vindex enable-character-translation
836 If you set the variable @code{enable-character-translation} to
837@code{nil}, that disables all character translation (including
838@code{unify-8859-on-decoding-mode}).
839 795
840@node Recognize Coding 796@node Recognize Coding
841@section Recognizing Coding Systems 797@section Recognizing Coding Systems
842 798
843 Emacs tries to recognize which coding system to use for a given text 799 Whenever Emacs reads a given piece of text, it tries to recognize
844as an integral part of reading that text. (This applies to files 800which coding system to use. This applies to files being read, output
845being read, output from subprocesses, text from X selections, etc.) 801from subprocesses, text from X selections, etc. Emacs can select the
846Emacs can select the right coding system automatically most of the 802right coding system automatically most of the time---once you have
847time---once you have specified your preferences. 803specified your preferences.
848 804
849 Some coding systems can be recognized or distinguished by which byte 805 Some coding systems can be recognized or distinguished by which byte
850sequences appear in the data. However, there are coding systems that 806sequences appear in the data. However, there are coding systems that
@@ -948,19 +904,17 @@ pattern, are decoded correctly. One of the builtin
948@code{auto-coding-functions} detects the encoding for XML files. 904@code{auto-coding-functions} detects the encoding for XML files.
949 905
950@vindex rmail-decode-mime-charset 906@vindex rmail-decode-mime-charset
907@vindex rmail-file-coding-system
951 When you get new mail in Rmail, each message is translated 908 When you get new mail in Rmail, each message is translated
952automatically from the coding system it is written in, as if it were a 909automatically from the coding system it is written in, as if it were a
953separate file. This uses the priority list of coding systems that you 910separate file. This uses the priority list of coding systems that you
954have specified. If a MIME message specifies a character set, Rmail 911have specified. If a MIME message specifies a character set, Rmail
955obeys that specification, unless @code{rmail-decode-mime-charset} is 912obeys that specification, unless @code{rmail-decode-mime-charset} is
956@code{nil}. 913@code{nil}. For reading and saving Rmail files themselves, Emacs uses
957 914the coding system specified by the variable
958@vindex rmail-file-coding-system 915@code{rmail-file-coding-system}. The default value is @code{nil},
959 For reading and saving Rmail files themselves, Emacs uses the coding 916which means that Rmail files are not translated (they are read and
960system specified by the variable @code{rmail-file-coding-system}. The 917written in the Emacs internal character code).
961default value is @code{nil}, which means that Rmail files are not
962translated (they are read and written in the Emacs internal character
963code).
964 918
965@node Specify Coding 919@node Specify Coding
966@section Specifying a File's Coding System 920@section Specifying a File's Coding System
@@ -984,13 +938,6 @@ use of the Latin-1 coding system, as well as C mode. When you specify
984the coding explicitly in the file, that overrides 938the coding explicitly in the file, that overrides
985@code{file-coding-system-alist}. 939@code{file-coding-system-alist}.
986 940
987 If you add the character @samp{!} at the end of the coding system
988name in @code{coding}, it disables any character translation
989(@pxref{Character Translation}) while decoding the file. This is
990useful when you need to make sure that the character codes in the
991Emacs buffer will not vary due to changes in user settings; for
992instance, for the sake of strings in Emacs Lisp source files.
993
994@node Output Coding 941@node Output Coding
995@section Choosing Coding Systems for Output 942@section Choosing Coding Systems for Output
996 943
@@ -1004,22 +951,21 @@ different coding system for further file output from the buffer using
1004 951
1005 You can insert any character Emacs supports into any Emacs buffer, 952 You can insert any character Emacs supports into any Emacs buffer,
1006but most coding systems can only handle a subset of these characters. 953but most coding systems can only handle a subset of these characters.
1007Therefore, you can insert characters that cannot be encoded with the 954Therefore, it's possible that the characters you insert cannot be
1008coding system that will be used to save the buffer. For example, you 955encoded with the coding system that will be used to save the buffer.
1009could start with an @acronym{ASCII} file and insert a few Latin-1 956For example, you could visit a text file in Polish, encoded in
1010characters into it, or you could edit a text file in Polish encoded in 957@code{iso-8859-2}, and add some Russian words to it. When you save
1011@code{iso-8859-2} and add some Russian words to it. When you save
1012that buffer, Emacs cannot use the current value of 958that buffer, Emacs cannot use the current value of
1013@code{buffer-file-coding-system}, because the characters you added 959@code{buffer-file-coding-system}, because the characters you added
1014cannot be encoded by that coding system. 960cannot be encoded by that coding system.
1015 961
1016 When that happens, Emacs tries the most-preferred coding system (set 962 When that happens, Emacs tries the most-preferred coding system (set
1017by @kbd{M-x prefer-coding-system} or @kbd{M-x 963by @kbd{M-x prefer-coding-system} or @kbd{M-x
1018set-language-environment}), and if that coding system can safely 964set-language-environment}). If that coding system can safely encode
1019encode all of the characters in the buffer, Emacs uses it, and stores 965all of the characters in the buffer, Emacs uses it, and stores its
1020its value in @code{buffer-file-coding-system}. Otherwise, Emacs 966value in @code{buffer-file-coding-system}. Otherwise, Emacs displays
1021displays a list of coding systems suitable for encoding the buffer's 967a list of coding systems suitable for encoding the buffer's contents,
1022contents, and asks you to choose one of those coding systems. 968and asks you to choose one of those coding systems.
1023 969
1024 If you insert the unsuitable characters in a mail message, Emacs 970 If you insert the unsuitable characters in a mail message, Emacs
1025behaves a bit differently. It additionally checks whether the 971behaves a bit differently. It additionally checks whether the
@@ -1248,9 +1194,9 @@ interactively.
1248 1194
1249 If @code{file-name-coding-system} is @code{nil}, Emacs uses a 1195 If @code{file-name-coding-system} is @code{nil}, Emacs uses a
1250default coding system determined by the selected language environment. 1196default coding system determined by the selected language environment.
1251In the default language environment, any non-@acronym{ASCII} 1197In the default language environment, non-@acronym{ASCII} characters in
1252characters in file names are not encoded specially; they appear in the 1198file names are not encoded specially; they appear in the file system
1253file system using the internal Emacs representation. 1199using the internal Emacs representation.
1254 1200
1255 @strong{Warning:} if you change @code{file-name-coding-system} (or the 1201 @strong{Warning:} if you change @code{file-name-coding-system} (or the
1256language environment) in the middle of an Emacs session, problems can 1202language environment) in the middle of an Emacs session, problems can
@@ -1317,7 +1263,7 @@ You can do this by putting
1317@end lisp 1263@end lisp
1318 1264
1319@noindent 1265@noindent
1320in your @file{~/.emacs} file. 1266in your init file.
1321 1267
1322 There is a similarity between using a coding system translation for 1268 There is a similarity between using a coding system translation for
1323keyboard input, and using an input method: both define sequences of 1269keyboard input, and using an input method: both define sequences of