diff options
| author | Chong Yidong | 2009-05-06 03:55:12 +0000 |
|---|---|---|
| committer | Chong Yidong | 2009-05-06 03:55:12 +0000 |
| commit | ad36c4224c80d2ca0caf26a8fe9a96cdce43d64b (patch) | |
| tree | 79bf0100a27e4766a619b80b4e9abae4d691329e | |
| parent | 5996e1b74c382855c5af0ccf737aeef3ad5f4626 (diff) | |
| download | emacs-ad36c4224c80d2ca0caf26a8fe9a96cdce43d64b.tar.gz emacs-ad36c4224c80d2ca0caf26a8fe9a96cdce43d64b.zip | |
* basic.texi (Inserting Text): Document ucs-insert.
* mule.texi (International Chars): Define "multibyte". Note that
internal representation is unicode-based. Simplify definition of raw
bytes. Mention ucs-insert.
(Enabling Multibyte): Remove obsolete discussion. Copyedits.
(Language Environments): Add language environments new to Emacs 23.
(Multibyte Conversion): Node deleted.
(Coding Systems): Remove obsolete unify-8859-on-decoding-mode. Don't
mention obsolete emacs-mule coding system.
(Output Coding): Copyedits.
* emacs.texi (Top): Update node listing.
| -rw-r--r-- | doc/emacs/ChangeLog | 16 | ||||
| -rw-r--r-- | doc/emacs/basic.texi | 42 | ||||
| -rw-r--r-- | doc/emacs/emacs.texi | 1 | ||||
| -rw-r--r-- | doc/emacs/mule.texi | 310 |
4 files changed, 162 insertions, 207 deletions
diff --git a/doc/emacs/ChangeLog b/doc/emacs/ChangeLog index fc2c277972c..29be8c714d3 100644 --- a/doc/emacs/ChangeLog +++ b/doc/emacs/ChangeLog | |||
| @@ -1,3 +1,19 @@ | |||
| 1 | 2009-05-06 Chong Yidong <cyd@stupidchicken.com> | ||
| 2 | |||
| 3 | * basic.texi (Inserting Text): Document ucs-insert. | ||
| 4 | |||
| 5 | * mule.texi (International Chars): Define "multibyte". Note that | ||
| 6 | internal representation is unicode-based. Simplify definition of raw | ||
| 7 | bytes. Mention ucs-insert. | ||
| 8 | (Enabling Multibyte): Remove obsolete discussion. Copyedits. | ||
| 9 | (Language Environments): Add language environments new to Emacs 23. | ||
| 10 | (Multibyte Conversion): Node deleted. | ||
| 11 | (Coding Systems): Remove obsolete unify-8859-on-decoding-mode. Don't | ||
| 12 | mention obsolete emacs-mule coding system. | ||
| 13 | (Output Coding): Copyedits. | ||
| 14 | |||
| 15 | * emacs.texi (Top): Update node listing. | ||
| 16 | |||
| 1 | 2009-05-05 Per Starbäck <per@starback.se> (tiny change) | 17 | 2009-05-05 Per Starbäck <per@starback.se> (tiny change) |
| 2 | 18 | ||
| 3 | * trouble.texi (Lossage): Use new binding of view-emacs-problems. | 19 | * trouble.texi (Lossage): Use new binding of view-emacs-problems. |
diff --git a/doc/emacs/basic.texi b/doc/emacs/basic.texi index 710a093f495..72ab17c33ac 100644 --- a/doc/emacs/basic.texi +++ b/doc/emacs/basic.texi | |||
| @@ -64,9 +64,11 @@ key; other keys act as editing commands and do not insert themselves. | |||
| 64 | For instance, @kbd{DEL} runs the command @code{delete-backward-char} | 64 | For instance, @kbd{DEL} runs the command @code{delete-backward-char} |
| 65 | by default (some modes bind it to a different command); it does not | 65 | by default (some modes bind it to a different command); it does not |
| 66 | insert a literal @samp{DEL} character (@acronym{ASCII} character code | 66 | insert a literal @samp{DEL} character (@acronym{ASCII} character code |
| 67 | 127). To insert a non-graphic character, first @dfn{quote} it by | 67 | 127). |
| 68 | typing @kbd{C-q} (@code{quoted-insert}). There are two ways to use | 68 | |
| 69 | @kbd{C-q}: | 69 | To insert a non-graphic character, or a character that your keyboard |
| 70 | does not support, first @dfn{quote} it by typing @kbd{C-q} | ||
| 71 | (@code{quoted-insert}). There are two ways to use @kbd{C-q}: | ||
| 70 | 72 | ||
| 71 | @itemize @bullet | 73 | @itemize @bullet |
| 72 | @item | 74 | @item |
| @@ -87,32 +89,24 @@ Overwrite mode, to give you a convenient way to insert a digit instead | |||
| 87 | of overwriting with it. | 89 | of overwriting with it. |
| 88 | @end itemize | 90 | @end itemize |
| 89 | 91 | ||
| 90 | @cindex 8-bit character codes | ||
| 91 | @noindent | ||
| 92 | If you specify a code in the octal range 0200 through 0377, @kbd{C-q} | ||
| 93 | assumes that you intend to use some ISO 8859-@var{n} character set, | ||
| 94 | and converts the specified code to the corresponding Emacs character | ||
| 95 | code. Your choice of language environment determines which of the ISO | ||
| 96 | 8859 character sets to use (@pxref{Language Environments}). This | ||
| 97 | feature is disabled if multibyte characters are disabled | ||
| 98 | (@pxref{Enabling Multibyte}). | ||
| 99 | |||
| 100 | @vindex read-quoted-char-radix | 92 | @vindex read-quoted-char-radix |
| 93 | @noindent | ||
| 101 | To use decimal or hexadecimal instead of octal, set the variable | 94 | To use decimal or hexadecimal instead of octal, set the variable |
| 102 | @code{read-quoted-char-radix} to 10 or 16. If the radix is greater than | 95 | @code{read-quoted-char-radix} to 10 or 16. If the radix is greater |
| 103 | 10, some letters starting with @kbd{a} serve as part of a character | 96 | than 10, some letters starting with @kbd{a} serve as part of a |
| 104 | code, just like digits. | 97 | character code, just like digits. |
| 105 | 98 | ||
| 106 | A numeric argument tells @kbd{C-q} how many copies of the quoted | 99 | A numeric argument tells @kbd{C-q} how many copies of the quoted |
| 107 | character to insert (@pxref{Arguments}). | 100 | character to insert (@pxref{Arguments}). |
| 108 | 101 | ||
| 109 | @findex newline | 102 | @findex ucs-insert |
| 110 | @findex self-insert | 103 | @cindex Unicode |
| 111 | Customization information: @key{DEL} in most modes runs the command | 104 | Instead of @kbd{C-q}, you can use @kbd{C-x 8 @key{RET}} |
| 112 | @code{delete-backward-char}; @key{RET} runs the command | 105 | (@code{ucs-insert}) to insert a character based on its Unicode name or |
| 113 | @code{newline}, and self-inserting printing characters run the command | 106 | code-point. This commands prompts for a character to insert, using |
| 114 | @code{self-insert}, which inserts whatever character you typed. Some | 107 | the minibuffer; you can specify the character using either (i) the |
| 115 | major modes rebind @key{DEL} to other commands. | 108 | character's name in the Unicode standard, or (ii) the character's |
| 109 | code-point in the Unicode standard. | ||
| 116 | 110 | ||
| 117 | @node Moving Point | 111 | @node Moving Point |
| 118 | @section Changing the Location of Point | 112 | @section Changing the Location of Point |
diff --git a/doc/emacs/emacs.texi b/doc/emacs/emacs.texi index 4fb083ad22b..717e2b78c3e 100644 --- a/doc/emacs/emacs.texi +++ b/doc/emacs/emacs.texi | |||
| @@ -507,7 +507,6 @@ International Character Set Support | |||
| 507 | * Language Environments:: Setting things up for the language you use. | 507 | * Language Environments:: Setting things up for the language you use. |
| 508 | * Input Methods:: Entering text characters not on your keyboard. | 508 | * Input Methods:: Entering text characters not on your keyboard. |
| 509 | * Select Input Method:: Specifying your choice of input methods. | 509 | * Select Input Method:: Specifying your choice of input methods. |
| 510 | * Multibyte Conversion:: How single-byte characters convert to multibyte. | ||
| 511 | * Coding Systems:: Character set conversion when you read and | 510 | * Coding Systems:: Character set conversion when you read and |
| 512 | write files, and so on. | 511 | write files, and so on. |
| 513 | * Recognize Coding:: How Emacs figures out which conversion to use. | 512 | * Recognize Coding:: How Emacs figures out which conversion to use. |
diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi index a622722f1c6..aa25ed371de 100644 --- a/doc/emacs/mule.texi +++ b/doc/emacs/mule.texi | |||
| @@ -89,7 +89,6 @@ to make sure Emacs interprets keyboard input correctly; see | |||
| 89 | * Language Environments:: Setting things up for the language you use. | 89 | * Language Environments:: Setting things up for the language you use. |
| 90 | * Input Methods:: Entering text characters not on your keyboard. | 90 | * Input Methods:: Entering text characters not on your keyboard. |
| 91 | * Select Input Method:: Specifying your choice of input methods. | 91 | * Select Input Method:: Specifying your choice of input methods. |
| 92 | * Multibyte Conversion:: How single-byte characters convert to multibyte. | ||
| 93 | * Coding Systems:: Character set conversion when you read and | 92 | * Coding Systems:: Character set conversion when you read and |
| 94 | write files, and so on. | 93 | write files, and so on. |
| 95 | * Recognize Coding:: How Emacs figures out which conversion to use. | 94 | * Recognize Coding:: How Emacs figures out which conversion to use. |
| @@ -115,14 +114,17 @@ to make sure Emacs interprets keyboard input correctly; see | |||
| 115 | 114 | ||
| 116 | The users of international character sets and scripts have | 115 | The users of international character sets and scripts have |
| 117 | established many more-or-less standard coding systems for storing | 116 | established many more-or-less standard coding systems for storing |
| 118 | files. Emacs internally uses a single multibyte character encoding, | 117 | files. These coding systems are typically @dfn{multibyte}, meaning |
| 119 | so that it can intermix characters from all these scripts in a single | 118 | that sequences of two or more bytes are used to represent individual |
| 120 | buffer or string. This encoding represents each non-@acronym{ASCII} | 119 | non-@acronym{ASCII} characters. |
| 121 | character as a sequence of bytes in the range 0200 through 0377. | 120 | |
| 122 | Emacs translates between the multibyte character encoding and various | 121 | @cindex Unicode |
| 123 | other coding systems when reading and writing files, when exchanging | 122 | Internally, Emacs uses its own multibyte character encoding, which |
| 124 | data with subprocesses, and (in some cases) in the @kbd{C-q} command | 123 | is a superset of the @dfn{Unicode} standard. This internal encoding |
| 125 | (@pxref{Multibyte Conversion}). | 124 | allows characters from almost every known script to be intermixed in a |
| 125 | single buffer or string. Emacs translates between the multibyte | ||
| 126 | character encoding and various other coding systems when reading and | ||
| 127 | writing files, and when exchanging data with subprocesses. | ||
| 126 | 128 | ||
| 127 | @kindex C-h h | 129 | @kindex C-h h |
| 128 | @findex view-hello-file | 130 | @findex view-hello-file |
| @@ -134,10 +136,14 @@ This illustrates various scripts. If some characters can't be | |||
| 134 | displayed on your terminal, they appear as @samp{?} or as hollow boxes | 136 | displayed on your terminal, they appear as @samp{?} or as hollow boxes |
| 135 | (@pxref{Undisplayable Characters}). | 137 | (@pxref{Undisplayable Characters}). |
| 136 | 138 | ||
| 137 | Keyboards, even in the countries where these character sets are used, | 139 | Keyboards, even in the countries where these character sets are |
| 138 | generally don't have keys for all the characters in them. So Emacs | 140 | used, generally don't have keys for all the characters in them. You |
| 139 | supports various @dfn{input methods}, typically one for each script or | 141 | can insert characters that your keyboard does not support, using |
| 140 | language, to make it convenient to type them. | 142 | @kbd{C-q} (@code{quoted-insert}) or @kbd{C-x 8 @key{RET}} |
| 143 | (@code{ucs-insert}). @xref{Inserting Text}. Emacs also supports | ||
| 144 | various @dfn{input methods}, typically one for each script or | ||
| 145 | language, which make it easier to type characters in the script. | ||
| 146 | @xref{Input Methods}. | ||
| 141 | 147 | ||
| 142 | @kindex C-x RET | 148 | @kindex C-x RET |
| 143 | The prefix key @kbd{C-x @key{RET}} is used for commands that pertain | 149 | The prefix key @kbd{C-x @key{RET}} is used for commands that pertain |
| @@ -165,12 +171,12 @@ system encodes the character safely and with a single byte | |||
| 165 | (@pxref{Coding Systems}). If the character's encoding is longer than | 171 | (@pxref{Coding Systems}). If the character's encoding is longer than |
| 166 | one byte, Emacs shows @samp{file ...}. | 172 | one byte, Emacs shows @samp{file ...}. |
| 167 | 173 | ||
| 168 | However, if the character displayed is in the range 0200 through | 174 | As a special case, if the character lies in the range 128 (0200 |
| 169 | 0377 octal, it may actually stand for an invalid UTF-8 byte read from | 175 | octal) through 159 (0237 octal), it stands for a ``raw'' byte that |
| 170 | a file. In Emacs, that byte is represented as a sequence of 8-bit | 176 | does not correspond to any specific displayable character. Such a |
| 171 | characters, but all of them together display as the original invalid | 177 | ``character'' lies within the @code{eight-bit-control} character set, |
| 172 | byte, in octal code. In this case, @kbd{C-x =} shows @samp{part of | 178 | and is displayed as an escaped octal character code. In this case, |
| 173 | display ...} instead of @samp{file}. | 179 | @kbd{C-x =} shows @samp{part of display ...} instead of @samp{file}. |
| 174 | 180 | ||
| 175 | @cindex character set of character at point | 181 | @cindex character set of character at point |
| 176 | @cindex font of character at point | 182 | @cindex font of character at point |
| @@ -235,74 +241,62 @@ There are text properties here: | |||
| 235 | @node Enabling Multibyte | 241 | @node Enabling Multibyte |
| 236 | @section Enabling Multibyte Characters | 242 | @section Enabling Multibyte Characters |
| 237 | 243 | ||
| 238 | By default, Emacs starts in multibyte mode, because that allows you to | 244 | By default, Emacs starts in multibyte mode: it stores the contents |
| 239 | use all the supported languages and scripts without limitations. | 245 | of buffers and strings using an internal encoding that represents |
| 246 | non-@acronym{ASCII} characters using multi-byte sequences. Multibyte | ||
| 247 | mode allows you to use all the supported languages and scripts without | ||
| 248 | limitations. | ||
| 240 | 249 | ||
| 241 | @cindex turn multibyte support on or off | 250 | @cindex turn multibyte support on or off |
| 242 | You can enable or disable multibyte character support, either for | 251 | Under very special circumstances, you may want to disable multibyte |
| 243 | Emacs as a whole, or for a single buffer. When multibyte characters | 252 | character support, either for Emacs as a whole, or for a single |
| 244 | are disabled in a buffer, we call that @dfn{unibyte mode}. Then each | 253 | buffer. When multibyte characters are disabled in a buffer, we call |
| 245 | byte in that buffer represents a character, even codes 0200 through | 254 | that @dfn{unibyte mode}. In unibyte mode, each character in the |
| 246 | 0377. | 255 | buffer has a character code ranging from 0 through 255 (0377 octal); 0 |
| 247 | 256 | through 127 (0177 octal) represent @acronym{ASCII} characters, and 128 | |
| 248 | The old features for supporting the European character sets, ISO | 257 | (0200 octal) through 255 (0377 octal) represent non-@acronym{ASCII} |
| 249 | Latin-1 and ISO Latin-2, work in unibyte mode as they did in Emacs 19 | 258 | characters. |
| 250 | and also work for the other ISO 8859 character sets. However, there | ||
| 251 | is no need to turn off multibyte character support to use ISO Latin; | ||
| 252 | the Emacs multibyte character set includes all the characters in these | ||
| 253 | character sets, and Emacs can translate automatically to and from the | ||
| 254 | ISO codes. | ||
| 255 | 259 | ||
| 256 | To edit a particular file in unibyte representation, visit it using | 260 | To edit a particular file in unibyte representation, visit it using |
| 257 | @code{find-file-literally}. @xref{Visiting}. To convert a buffer in | 261 | @code{find-file-literally}. @xref{Visiting}. You can convert a |
| 258 | multibyte representation into a single-byte representation of the same | 262 | multibyte buffer to unibyte by saving it to a file, killing the |
| 259 | characters, the easiest way is to save the contents in a file, kill the | 263 | buffer, and visiting the file again with @code{find-file-literally}. |
| 260 | buffer, and find the file again with @code{find-file-literally}. You | 264 | Alternatively, you can use @kbd{C-x @key{RET} c} |
| 261 | can also use @kbd{C-x @key{RET} c} | 265 | (@code{universal-coding-system-argument}) and specify @samp{raw-text} |
| 262 | (@code{universal-coding-system-argument}) and specify @samp{raw-text} as | 266 | as the coding system with which to visit or save a file. @xref{Text |
| 263 | the coding system with which to find or save a file. @xref{Text | 267 | Coding}. Unlike @code{find-file-literally}, finding a file as |
| 264 | Coding}. Finding a file as @samp{raw-text} doesn't disable format | 268 | @samp{raw-text} doesn't disable format conversion, uncompression, or |
| 265 | conversion, uncompression and auto mode selection as | 269 | auto mode selection. |
| 266 | @code{find-file-literally} does. | ||
| 267 | 270 | ||
| 268 | @vindex enable-multibyte-characters | 271 | @vindex enable-multibyte-characters |
| 269 | @vindex default-enable-multibyte-characters | 272 | @vindex default-enable-multibyte-characters |
| 273 | @cindex environment variables, and non-@acronym{ASCII} characters | ||
| 270 | To turn off multibyte character support by default, start Emacs with | 274 | To turn off multibyte character support by default, start Emacs with |
| 271 | the @samp{--unibyte} option (@pxref{Initial Options}), or set the | 275 | the @samp{--unibyte} option (@pxref{Initial Options}), or set the |
| 272 | environment variable @env{EMACS_UNIBYTE}. You can also customize | 276 | environment variable @env{EMACS_UNIBYTE}. You can also customize |
| 273 | @code{enable-multibyte-characters} or, equivalently, directly set the | 277 | @code{enable-multibyte-characters} or, equivalently, directly set the |
| 274 | variable @code{default-enable-multibyte-characters} to @code{nil} in | 278 | variable @code{default-enable-multibyte-characters} to @code{nil} in |
| 275 | your init file to have basically the same effect as @samp{--unibyte}. | 279 | your init file to have basically the same effect as @samp{--unibyte}. |
| 276 | 280 | With @samp{--unibyte}, multibyte strings are not created during | |
| 277 | @findex toggle-enable-multibyte-characters | 281 | initialization from the values of environment variables, |
| 278 | To convert a unibyte session to a multibyte session, set | 282 | @file{/etc/passwd} entries etc., even if those contain |
| 279 | @code{default-enable-multibyte-characters} to @code{t}. Buffers which | 283 | non-@acronym{ASCII} characters. |
| 280 | were created in the unibyte session before you turn on multibyte support | ||
| 281 | will stay unibyte. You can turn on multibyte support in a specific | ||
| 282 | buffer by invoking the command @code{toggle-enable-multibyte-characters} | ||
| 283 | in that buffer. | ||
| 284 | 284 | ||
| 285 | @cindex Lisp files, and multibyte operation | 285 | @cindex Lisp files, and multibyte operation |
| 286 | @cindex multibyte operation, and Lisp files | 286 | @cindex multibyte operation, and Lisp files |
| 287 | @cindex unibyte operation, and Lisp files | 287 | @cindex unibyte operation, and Lisp files |
| 288 | @cindex init file, and non-@acronym{ASCII} characters | 288 | @cindex init file, and non-@acronym{ASCII} characters |
| 289 | @cindex environment variables, and non-@acronym{ASCII} characters | ||
| 290 | With @samp{--unibyte}, multibyte strings are not created during | ||
| 291 | initialization from the values of environment variables, | ||
| 292 | @file{/etc/passwd} entries etc.@: that contain non-@acronym{ASCII} 8-bit | ||
| 293 | characters. | ||
| 294 | |||
| 295 | Emacs normally loads Lisp files as multibyte, regardless of whether | 289 | Emacs normally loads Lisp files as multibyte, regardless of whether |
| 296 | you used @samp{--unibyte}. This includes the Emacs initialization file, | 290 | you used @samp{--unibyte}. This includes the Emacs initialization |
| 297 | @file{.emacs}, and the initialization files of Emacs packages such as | 291 | file, @file{.emacs}, and the initialization files of Emacs packages |
| 298 | Gnus. However, you can specify unibyte loading for a particular Lisp | 292 | such as Gnus. However, you can specify unibyte loading for a |
| 299 | file, by putting @w{@samp{-*-unibyte: t;-*-}} in a comment on the first | 293 | particular Lisp file, by putting @w{@samp{-*-unibyte: t;-*-}} in a |
| 300 | line (@pxref{File Variables}). Then that file is always loaded as | 294 | comment on the first line (@pxref{File Variables}). Then that file is |
| 301 | unibyte text, even if you did not start Emacs with @samp{--unibyte}. | 295 | always loaded as unibyte text. The motivation for these conventions |
| 302 | The motivation for these conventions is that it is more reliable to | 296 | is that it is more reliable to always load any particular Lisp file in |
| 303 | always load any particular Lisp file in the same way. However, you can | 297 | the same way. However, you can load a Lisp file as unibyte, on any |
| 304 | load a Lisp file as unibyte, on any one occasion, by typing @kbd{C-x | 298 | one occasion, by typing @kbd{C-x @key{RET} c raw-text @key{RET}} |
| 305 | @key{RET} c raw-text @key{RET}} immediately before loading it. | 299 | immediately before loading it. |
| 306 | 300 | ||
| 307 | The mode line indicates whether multibyte character support is | 301 | The mode line indicates whether multibyte character support is |
| 308 | enabled in the current buffer. If it is, there are two or more | 302 | enabled in the current buffer. If it is, there are two or more |
| @@ -312,6 +306,14 @@ convention (colon, backslash, etc.). When multibyte characters | |||
| 312 | are not enabled, nothing precedes the colon except a single dash. | 306 | are not enabled, nothing precedes the colon except a single dash. |
| 313 | @xref{Mode Line}, for more details about this. | 307 | @xref{Mode Line}, for more details about this. |
| 314 | 308 | ||
| 309 | @findex toggle-enable-multibyte-characters | ||
| 310 | To convert a unibyte session to a multibyte session, set | ||
| 311 | @code{default-enable-multibyte-characters} to @code{t}. Buffers which | ||
| 312 | were created in the unibyte session before you turn on multibyte | ||
| 313 | support will stay unibyte. You can turn on multibyte support in a | ||
| 314 | specific buffer by invoking the command | ||
| 315 | @code{toggle-enable-multibyte-characters} in that buffer. | ||
| 316 | |||
| 315 | @node Language Environments | 317 | @node Language Environments |
| 316 | @section Language Environments | 318 | @section Language Environments |
| 317 | @cindex language environments | 319 | @cindex language environments |
| @@ -319,43 +321,41 @@ are not enabled, nothing precedes the colon except a single dash. | |||
| 319 | All supported character sets are supported in Emacs buffers whenever | 321 | All supported character sets are supported in Emacs buffers whenever |
| 320 | multibyte characters are enabled; there is no need to select a | 322 | multibyte characters are enabled; there is no need to select a |
| 321 | particular language in order to display its characters in an Emacs | 323 | particular language in order to display its characters in an Emacs |
| 322 | buffer. However, it is important to select a @dfn{language environment} | 324 | buffer. However, it is important to select a @dfn{language |
| 323 | in order to set various defaults. The language environment really | 325 | environment} in order to set various defaults. Roughly speaking, the |
| 324 | represents a choice of preferred script (more or less) rather than a | 326 | language environment represents a choice of preferred script rather |
| 325 | choice of language. | 327 | than a choice of language. |
| 326 | 328 | ||
| 327 | The language environment controls which coding systems to recognize | 329 | The language environment controls which coding systems to recognize |
| 328 | when reading text (@pxref{Recognize Coding}). This applies to files, | 330 | when reading text (@pxref{Recognize Coding}). This applies to files, |
| 329 | incoming mail, netnews, and any other text you read into Emacs. It may | 331 | incoming mail, and any other text you read into Emacs. It may also |
| 330 | also specify the default coding system to use when you create a file. | 332 | specify the default coding system to use when you create a file. Each |
| 331 | Each language environment also specifies a default input method. | 333 | language environment also specifies a default input method. |
| 332 | 334 | ||
| 333 | @findex set-language-environment | 335 | @findex set-language-environment |
| 334 | @vindex current-language-environment | 336 | @vindex current-language-environment |
| 335 | To select a language environment, you can customize the variable | 337 | To select a language environment, customize the variable |
| 336 | @code{current-language-environment} or use the command @kbd{M-x | 338 | @code{current-language-environment} or use the command @kbd{M-x |
| 337 | set-language-environment}. It makes no difference which buffer is | 339 | set-language-environment}. It makes no difference which buffer is |
| 338 | current when you use this command, because the effects apply globally to | 340 | current when you use this command, because the effects apply globally |
| 339 | the Emacs session. The supported language environments include: | 341 | to the Emacs session. The supported language environments include: |
| 340 | 342 | ||
| 341 | @cindex Euro sign | 343 | @cindex Euro sign |
| 342 | @cindex UTF-8 | 344 | @cindex UTF-8 |
| 343 | @quotation | 345 | @quotation |
| 344 | ASCII, Belarusian, Brazilian Portuguese, Bulgarian, Chinese-BIG5, | 346 | ASCII, Belarusian, Bengali, Brazilian Portuguese, Bulgarian, |
| 345 | Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Croatian, Cyrillic-ALT, | 347 | Chinese-BIG5, Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Chinese-GBK, |
| 346 | Cyrillic-ISO, Cyrillic-KOI8, Czech, Devanagari, Dutch, English, | 348 | Chinese-GB18030, Croatian, Cyrillic-ALT, Cyrillic-ISO, Cyrillic-KOI8, |
| 347 | Esperanto, Ethiopic, French, Georgian, German, Greek, Hebrew, IPA, | 349 | Czech, Devanagari, Dutch, English, Esperanto, Ethiopic, French, |
| 348 | Italian, Japanese, Kannada, Korean, Lao, Latin-1, Latin-2, Latin-3, | 350 | Georgian, German, Greek, Gujarati, Hebrew, IPA, Italian, Japanese, |
| 349 | Latin-4, Latin-5, Latin-6, Latin-7, Latin-8 (Celtic), Latin-9 (updated | 351 | Kannada, Khmer, Korean, Lao, Latin-1, Latin-2, Latin-3, Latin-4, |
| 350 | Latin-1 with the Euro sign), Latvian, Lithuanian, Malayalam, Polish, | 352 | Latin-5, Latin-6, Latin-7, Latin-8 (Celtic), Latin-9 (updated Latin-1 |
| 351 | Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tajik, Tamil, | 353 | with the Euro sign), Latvian, Lithuanian, Malayalam, Oriya, Polish, |
| 352 | Thai, Tibetan, Turkish, UTF-8 (for a setup which prefers Unicode | 354 | Punjabi, Romanian, Russian, Sinhala, Slovak, Slovenian, Spanish, |
| 353 | characters and files encoded in UTF-8), Ukrainian, Vietnamese, Welsh, | 355 | Swedish, TaiViet, Tajik, Tamil, Telugu, Thai, Tibetan, Turkish, UTF-8 |
| 354 | and Windows-1255 (for a setup which prefers Cyrillic characters and | 356 | (for a setup which prefers Unicode characters and files encoded in |
| 355 | files encoded in Windows-1255). | 357 | UTF-8), Ukrainian, Vietnamese, Welsh, and Windows-1255 (for a setup |
| 356 | @tex | 358 | which prefers Cyrillic characters and files encoded in Windows-1255). |
| 357 | \hbadness=10000\par % just avoid underfull hbox warning | ||
| 358 | @end tex | ||
| 359 | @end quotation | 359 | @end quotation |
| 360 | 360 | ||
| 361 | @cindex fonts for various scripts | 361 | @cindex fonts for various scripts |
| @@ -657,34 +657,6 @@ character. | |||
| 657 | list-input-methods}. The list gives information about each input | 657 | list-input-methods}. The list gives information about each input |
| 658 | method, including the string that stands for it in the mode line. | 658 | method, including the string that stands for it in the mode line. |
| 659 | 659 | ||
| 660 | @node Multibyte Conversion | ||
| 661 | @section Unibyte and Multibyte Non-@acronym{ASCII} characters | ||
| 662 | |||
| 663 | When multibyte characters are enabled, character codes 0240 (octal) | ||
| 664 | through 0377 (octal) are not really legitimate in the buffer. The valid | ||
| 665 | non-@acronym{ASCII} printing characters have codes that start from 0400. | ||
| 666 | |||
| 667 | If you type a self-inserting character in the range 0240 through | ||
| 668 | 0377, or if you use @kbd{C-q} to insert one, Emacs assumes you | ||
| 669 | intended to use one of the ISO Latin-@var{n} character sets, and | ||
| 670 | converts it to the Emacs code representing that Latin-@var{n} | ||
| 671 | character. You select @emph{which} ISO Latin character set to use | ||
| 672 | through your choice of language environment | ||
| 673 | @iftex | ||
| 674 | (see above). | ||
| 675 | @end iftex | ||
| 676 | @ifnottex | ||
| 677 | (@pxref{Language Environments}). | ||
| 678 | @end ifnottex | ||
| 679 | If you do not specify a choice, the default is Latin-1. | ||
| 680 | |||
| 681 | If you insert a character in the range 0200 through 0237, which | ||
| 682 | forms the @code{eight-bit-control} character set, it is inserted | ||
| 683 | literally. You should normally avoid doing this since buffers | ||
| 684 | containing such characters have to be written out in either the | ||
| 685 | @code{emacs-mule} or @code{raw-text} coding system, which is usually | ||
| 686 | not what you want. | ||
| 687 | |||
| 688 | @node Coding Systems | 660 | @node Coding Systems |
| 689 | @section Coding Systems | 661 | @section Coding Systems |
| 690 | @cindex coding systems | 662 | @cindex coding systems |
| @@ -698,11 +670,11 @@ possible in reading or writing files, in sending or receiving from the | |||
| 698 | terminal, and in exchanging data with subprocesses. | 670 | terminal, and in exchanging data with subprocesses. |
| 699 | 671 | ||
| 700 | Emacs assigns a name to each coding system. Most coding systems are | 672 | Emacs assigns a name to each coding system. Most coding systems are |
| 701 | used for one language, and the name of the coding system starts with the | 673 | used for one language, and the name of the coding system starts with |
| 702 | language name. Some coding systems are used for several languages; | 674 | the language name. Some coding systems are used for several |
| 703 | their names usually start with @samp{iso}. There are also special | 675 | languages; their names usually start with @samp{iso}. There are also |
| 704 | coding systems @code{no-conversion}, @code{raw-text} and | 676 | special coding systems, such as @code{no-conversion}, @code{raw-text}, |
| 705 | @code{emacs-mule} which do not convert printing characters at all. | 677 | and @code{emacs-internal}. |
| 706 | 678 | ||
| 707 | @cindex international files from DOS/Windows systems | 679 | @cindex international files from DOS/Windows systems |
| 708 | A special class of coding systems, collectively known as | 680 | A special class of coding systems, collectively known as |
| @@ -814,37 +786,21 @@ the @kbd{M-x find-file-literally} command. This uses | |||
| 814 | @code{no-conversion}, and also suppresses other Emacs features that | 786 | @code{no-conversion}, and also suppresses other Emacs features that |
| 815 | might convert the file contents before you see them. @xref{Visiting}. | 787 | might convert the file contents before you see them. @xref{Visiting}. |
| 816 | 788 | ||
| 817 | The coding system @code{emacs-mule} means that the file contains | 789 | The coding system @code{emacs-internal} (or @code{utf-8-emacs}, |
| 818 | non-@acronym{ASCII} characters stored with the internal Emacs encoding. It | 790 | which is equivalent) means that the file contains non-@acronym{ASCII} |
| 819 | handles end-of-line conversion based on the data encountered, and has | 791 | characters stored with the internal Emacs encoding. This coding |
| 820 | the usual three variants to specify the kind of end-of-line conversion. | 792 | system handles end-of-line conversion based on the data encountered, |
| 821 | 793 | and has the usual three variants to specify the kind of end-of-line | |
| 822 | @findex unify-8859-on-decoding-mode | 794 | conversion. |
| 823 | @anchor{Character Translation} | ||
| 824 | The @dfn{character translation} feature can modify the effect of | ||
| 825 | various coding systems, by changing the internal Emacs codes that | ||
| 826 | decoding produces. For instance, the command | ||
| 827 | @code{unify-8859-on-decoding-mode} enables a mode that ``unifies'' the | ||
| 828 | Latin alphabets when decoding text. This works by converting all | ||
| 829 | non-@acronym{ASCII} Latin-@var{n} characters to either Latin-1 or | ||
| 830 | Unicode characters. This way it is easier to use various | ||
| 831 | Latin-@var{n} alphabets together. (In a future Emacs version we hope | ||
| 832 | to move towards full Unicode support and complete unification of | ||
| 833 | character sets.) | ||
| 834 | |||
| 835 | @vindex enable-character-translation | ||
| 836 | If you set the variable @code{enable-character-translation} to | ||
| 837 | @code{nil}, that disables all character translation (including | ||
| 838 | @code{unify-8859-on-decoding-mode}). | ||
| 839 | 795 | ||
| 840 | @node Recognize Coding | 796 | @node Recognize Coding |
| 841 | @section Recognizing Coding Systems | 797 | @section Recognizing Coding Systems |
| 842 | 798 | ||
| 843 | Emacs tries to recognize which coding system to use for a given text | 799 | Whenever Emacs reads a given piece of text, it tries to recognize |
| 844 | as an integral part of reading that text. (This applies to files | 800 | which coding system to use. This applies to files being read, output |
| 845 | being read, output from subprocesses, text from X selections, etc.) | 801 | from subprocesses, text from X selections, etc. Emacs can select the |
| 846 | Emacs can select the right coding system automatically most of the | 802 | right coding system automatically most of the time---once you have |
| 847 | time---once you have specified your preferences. | 803 | specified your preferences. |
| 848 | 804 | ||
| 849 | Some coding systems can be recognized or distinguished by which byte | 805 | Some coding systems can be recognized or distinguished by which byte |
| 850 | sequences appear in the data. However, there are coding systems that | 806 | sequences appear in the data. However, there are coding systems that |
| @@ -948,19 +904,17 @@ pattern, are decoded correctly. One of the builtin | |||
| 948 | @code{auto-coding-functions} detects the encoding for XML files. | 904 | @code{auto-coding-functions} detects the encoding for XML files. |
| 949 | 905 | ||
| 950 | @vindex rmail-decode-mime-charset | 906 | @vindex rmail-decode-mime-charset |
| 907 | @vindex rmail-file-coding-system | ||
| 951 | When you get new mail in Rmail, each message is translated | 908 | When you get new mail in Rmail, each message is translated |
| 952 | automatically from the coding system it is written in, as if it were a | 909 | automatically from the coding system it is written in, as if it were a |
| 953 | separate file. This uses the priority list of coding systems that you | 910 | separate file. This uses the priority list of coding systems that you |
| 954 | have specified. If a MIME message specifies a character set, Rmail | 911 | have specified. If a MIME message specifies a character set, Rmail |
| 955 | obeys that specification, unless @code{rmail-decode-mime-charset} is | 912 | obeys that specification, unless @code{rmail-decode-mime-charset} is |
| 956 | @code{nil}. | 913 | @code{nil}. For reading and saving Rmail files themselves, Emacs uses |
| 957 | 914 | the coding system specified by the variable | |
| 958 | @vindex rmail-file-coding-system | 915 | @code{rmail-file-coding-system}. The default value is @code{nil}, |
| 959 | For reading and saving Rmail files themselves, Emacs uses the coding | 916 | which means that Rmail files are not translated (they are read and |
| 960 | system specified by the variable @code{rmail-file-coding-system}. The | 917 | written in the Emacs internal character code). |
| 961 | default value is @code{nil}, which means that Rmail files are not | ||
| 962 | translated (they are read and written in the Emacs internal character | ||
| 963 | code). | ||
| 964 | 918 | ||
| 965 | @node Specify Coding | 919 | @node Specify Coding |
| 966 | @section Specifying a File's Coding System | 920 | @section Specifying a File's Coding System |
| @@ -984,13 +938,6 @@ use of the Latin-1 coding system, as well as C mode. When you specify | |||
| 984 | the coding explicitly in the file, that overrides | 938 | the coding explicitly in the file, that overrides |
| 985 | @code{file-coding-system-alist}. | 939 | @code{file-coding-system-alist}. |
| 986 | 940 | ||
| 987 | If you add the character @samp{!} at the end of the coding system | ||
| 988 | name in @code{coding}, it disables any character translation | ||
| 989 | (@pxref{Character Translation}) while decoding the file. This is | ||
| 990 | useful when you need to make sure that the character codes in the | ||
| 991 | Emacs buffer will not vary due to changes in user settings; for | ||
| 992 | instance, for the sake of strings in Emacs Lisp source files. | ||
| 993 | |||
| 994 | @node Output Coding | 941 | @node Output Coding |
| 995 | @section Choosing Coding Systems for Output | 942 | @section Choosing Coding Systems for Output |
| 996 | 943 | ||
| @@ -1004,22 +951,21 @@ different coding system for further file output from the buffer using | |||
| 1004 | 951 | ||
| 1005 | You can insert any character Emacs supports into any Emacs buffer, | 952 | You can insert any character Emacs supports into any Emacs buffer, |
| 1006 | but most coding systems can only handle a subset of these characters. | 953 | but most coding systems can only handle a subset of these characters. |
| 1007 | Therefore, you can insert characters that cannot be encoded with the | 954 | Therefore, it's possible that the characters you insert cannot be |
| 1008 | coding system that will be used to save the buffer. For example, you | 955 | encoded with the coding system that will be used to save the buffer. |
| 1009 | could start with an @acronym{ASCII} file and insert a few Latin-1 | 956 | For example, you could visit a text file in Polish, encoded in |
| 1010 | characters into it, or you could edit a text file in Polish encoded in | 957 | @code{iso-8859-2}, and add some Russian words to it. When you save |
| 1011 | @code{iso-8859-2} and add some Russian words to it. When you save | ||
| 1012 | that buffer, Emacs cannot use the current value of | 958 | that buffer, Emacs cannot use the current value of |
| 1013 | @code{buffer-file-coding-system}, because the characters you added | 959 | @code{buffer-file-coding-system}, because the characters you added |
| 1014 | cannot be encoded by that coding system. | 960 | cannot be encoded by that coding system. |
| 1015 | 961 | ||
| 1016 | When that happens, Emacs tries the most-preferred coding system (set | 962 | When that happens, Emacs tries the most-preferred coding system (set |
| 1017 | by @kbd{M-x prefer-coding-system} or @kbd{M-x | 963 | by @kbd{M-x prefer-coding-system} or @kbd{M-x |
| 1018 | set-language-environment}), and if that coding system can safely | 964 | set-language-environment}). If that coding system can safely encode |
| 1019 | encode all of the characters in the buffer, Emacs uses it, and stores | 965 | all of the characters in the buffer, Emacs uses it, and stores its |
| 1020 | its value in @code{buffer-file-coding-system}. Otherwise, Emacs | 966 | value in @code{buffer-file-coding-system}. Otherwise, Emacs displays |
| 1021 | displays a list of coding systems suitable for encoding the buffer's | 967 | a list of coding systems suitable for encoding the buffer's contents, |
| 1022 | contents, and asks you to choose one of those coding systems. | 968 | and asks you to choose one of those coding systems. |
| 1023 | 969 | ||
| 1024 | If you insert the unsuitable characters in a mail message, Emacs | 970 | If you insert the unsuitable characters in a mail message, Emacs |
| 1025 | behaves a bit differently. It additionally checks whether the | 971 | behaves a bit differently. It additionally checks whether the |
| @@ -1248,9 +1194,9 @@ interactively. | |||
| 1248 | 1194 | ||
| 1249 | If @code{file-name-coding-system} is @code{nil}, Emacs uses a | 1195 | If @code{file-name-coding-system} is @code{nil}, Emacs uses a |
| 1250 | default coding system determined by the selected language environment. | 1196 | default coding system determined by the selected language environment. |
| 1251 | In the default language environment, any non-@acronym{ASCII} | 1197 | In the default language environment, non-@acronym{ASCII} characters in |
| 1252 | characters in file names are not encoded specially; they appear in the | 1198 | file names are not encoded specially; they appear in the file system |
| 1253 | file system using the internal Emacs representation. | 1199 | using the internal Emacs representation. |
| 1254 | 1200 | ||
| 1255 | @strong{Warning:} if you change @code{file-name-coding-system} (or the | 1201 | @strong{Warning:} if you change @code{file-name-coding-system} (or the |
| 1256 | language environment) in the middle of an Emacs session, problems can | 1202 | language environment) in the middle of an Emacs session, problems can |
| @@ -1317,7 +1263,7 @@ You can do this by putting | |||
| 1317 | @end lisp | 1263 | @end lisp |
| 1318 | 1264 | ||
| 1319 | @noindent | 1265 | @noindent |
| 1320 | in your @file{~/.emacs} file. | 1266 | in your init file. |
| 1321 | 1267 | ||
| 1322 | There is a similarity between using a coding system translation for | 1268 | There is a similarity between using a coding system translation for |
| 1323 | keyboard input, and using an input method: both define sequences of | 1269 | keyboard input, and using an input method: both define sequences of |