aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorEli Zaretskii2008-11-29 12:18:14 +0000
committerEli Zaretskii2008-11-29 12:18:14 +0000
commit800702607a4a0e84eb2ccea967d6819d4073a3ac (patch)
treedcf9b29df4ff01975259ac13a3f2f42e31617121
parent2543eb396b7c5b2754ed10c46e333e144c1967ce (diff)
downloademacs-800702607a4a0e84eb2ccea967d6819d4073a3ac.tar.gz
emacs-800702607a4a0e84eb2ccea967d6819d4073a3ac.zip
(Explicit Encoding): Update for Emacs 23.
(Character Codes): Document `max-char'.
-rw-r--r--doc/lispref/nonascii.texi192
1 files changed, 123 insertions, 69 deletions
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index eab748bab8d..256d2c8f38a 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -298,12 +298,36 @@ This returns @code{t} if @var{charcode} is a valid character, and
298@code{nil} otherwise. 298@code{nil} otherwise.
299 299
300@example 300@example
301@group
301(characterp 65) 302(characterp 65)
302 @result{} t 303 @result{} t
304@end group
305@group
303(characterp 4194303) 306(characterp 4194303)
304 @result{} t 307 @result{} t
308@end group
309@group
305(characterp 4194304) 310(characterp 4194304)
306 @result{} nil 311 @result{} nil
312@end group
313@end example
314@end defun
315
316@cindex maximum value of character codepoint
317@cindex codepoint, largest value
318@defun max-char
319This function returns the largest value that a valid character
320codepoint can have.
321
322@example
323@group
324(characterp (max-char))
325 @result{} t
326@end group
327@group
328(characterp (1+ (max-char)))
329 @result{} nil
330@end group
307@end example 331@end example
308@end defun 332@end defun
309 333
@@ -579,48 +603,51 @@ documented here.
579@subsection Basic Concepts of Coding Systems 603@subsection Basic Concepts of Coding Systems
580 604
581@cindex character code conversion 605@cindex character code conversion
582 @dfn{Character code conversion} involves conversion between the encoding 606 @dfn{Character code conversion} involves conversion between the
583used inside Emacs and some other encoding. Emacs supports many 607internal representation of characters used inside Emacs and some other
584different encodings, in that it can convert to and from them. For 608encoding. Emacs supports many different encodings, in that it can
585example, it can convert text to or from encodings such as Latin 1, Latin 609convert to and from them. For example, it can convert text to or from
5862, Latin 3, Latin 4, Latin 5, and several variants of ISO 2022. In some 610encodings such as Latin 1, Latin 2, Latin 3, Latin 4, Latin 5, and
587cases, Emacs supports several alternative encodings for the same 611several variants of ISO 2022. In some cases, Emacs supports several
588characters; for example, there are three coding systems for the Cyrillic 612alternative encodings for the same characters; for example, there are
589(Russian) alphabet: ISO, Alternativnyj, and KOI8. 613three coding systems for the Cyrillic (Russian) alphabet: ISO,
590 614Alternativnyj, and KOI8.
615
616@c I think this paragraph is no longer correct.
617@ignore
591 Most coding systems specify a particular character code for 618 Most coding systems specify a particular character code for
592conversion, but some of them leave the choice unspecified---to be chosen 619conversion, but some of them leave the choice unspecified---to be chosen
593heuristically for each file, based on the data. 620heuristically for each file, based on the data.
621@end ignore
594 622
595 In general, a coding system doesn't guarantee roundtrip identity: 623 In general, a coding system doesn't guarantee roundtrip identity:
596decoding a byte sequence using coding system, then encoding the 624decoding a byte sequence using coding system, then encoding the
597resulting text in the same coding system, can produce a different byte 625resulting text in the same coding system, can produce a different byte
598sequence. However, the following coding systems do guarantee that the 626sequence. But some coding systems do guarantee that the byte sequence
599byte sequence will be the same as what you originally decoded: 627will be the same as what you originally decoded. Here are a few
628examples:
600 629
601@quotation 630@quotation
602chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule 631iso-8859-1, utf-8, big5, shift_jis, euc-jp
603greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
604iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
605japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
606@end quotation 632@end quotation
607 633
608 Encoding buffer text and then decoding the result can also fail to 634 Encoding buffer text and then decoding the result can also fail to
609reproduce the original text. For instance, if you encode Latin-2 635reproduce the original text. For instance, if you encode a character
610characters with @code{utf-8} and decode the result using the same 636with a coding system which does not support that character, the result
611coding system, you'll get Unicode characters (of charset 637is unpredictable, and thus decoding it using the same coding system
612@code{mule-unicode-0100-24ff}). If you encode Unicode characters with 638may produce a different text. Currently, Emacs can't report errors
613@code{iso-latin-2} and decode the result with the same coding system, 639that result from encoding unsupported characters.
614you'll get Latin-2 characters.
615 640
616@cindex EOL conversion 641@cindex EOL conversion
617@cindex end-of-line conversion 642@cindex end-of-line conversion
618@cindex line end conversion 643@cindex line end conversion
619 @dfn{End of line conversion} handles three different conventions used 644 @dfn{End of line conversion} handles three different conventions
620on various systems for representing end of line in files. The Unix 645used on various systems for representing end of line in files. The
621convention is to use the linefeed character (also called newline). The 646Unix convention, used on GNU and Unix systems, is to use the linefeed
622DOS convention is to use a carriage-return and a linefeed at the end of 647character (also called newline). The DOS convention, used on
623a line. The Mac convention is to use just carriage-return. 648MS-Windows and MS-DOS systems, is to use a carriage-return and a
649linefeed at the end of a line. The Mac convention is to use just
650carriage-return.
624 651
625@cindex base coding system 652@cindex base coding system
626@cindex variant coding system 653@cindex variant coding system
@@ -639,7 +666,8 @@ data, and has the usual three variants which specify the end-of-line
639conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}: 666conversion. @code{no-conversion} is equivalent to @code{raw-text-unix}:
640it specifies no conversion of either character codes or end-of-line. 667it specifies no conversion of either character codes or end-of-line.
641 668
642 The coding system @code{emacs-mule} specifies that the data is 669@vindex emacs-internal@r{ coding system}
670 The coding system @code{emacs-internal} specifies that the data is
643represented in the internal Emacs encoding. This is like 671represented in the internal Emacs encoding. This is like
644@code{raw-text} in that no code conversion happens, but different in 672@code{raw-text} in that no code conversion happens, but different in
645that the result is multibyte data. 673that the result is multibyte data.
@@ -647,20 +675,20 @@ that the result is multibyte data.
647@defun coding-system-get coding-system property 675@defun coding-system-get coding-system property
648This function returns the specified property of the coding system 676This function returns the specified property of the coding system
649@var{coding-system}. Most coding system properties exist for internal 677@var{coding-system}. Most coding system properties exist for internal
650purposes, but one that you might find useful is @code{mime-charset}. 678purposes, but one that you might find useful is @code{:mime-charset}.
651That property's value is the name used in MIME for the character coding 679That property's value is the name used in MIME for the character coding
652which this coding system can read and write. Examples: 680which this coding system can read and write. Examples:
653 681
654@example 682@example
655(coding-system-get 'iso-latin-1 'mime-charset) 683(coding-system-get 'iso-latin-1 :mime-charset)
656 @result{} iso-8859-1 684 @result{} iso-8859-1
657(coding-system-get 'iso-2022-cn 'mime-charset) 685(coding-system-get 'iso-2022-cn :mime-charset)
658 @result{} iso-2022-cn 686 @result{} iso-2022-cn
659(coding-system-get 'cyrillic-koi8 'mime-charset) 687(coding-system-get 'cyrillic-koi8 :mime-charset)
660 @result{} koi8-r 688 @result{} koi8-r
661@end example 689@end example
662 690
663The value of the @code{mime-charset} property is also defined 691The value of the @code{:mime-charset} property is also defined
664as an alias for the coding system. 692as an alias for the coding system.
665@end defun 693@end defun
666 694
@@ -763,9 +791,11 @@ name or @code{nil}.
763@end defun 791@end defun
764 792
765@defun check-coding-system coding-system 793@defun check-coding-system coding-system
766This function checks the validity of @var{coding-system}. 794This function checks the validity of @var{coding-system}. If that is
767If that is valid, it returns @var{coding-system}. 795valid, it returns @var{coding-system}. If @var{coding-system} is
768Otherwise it signals an error with condition @code{coding-system-error}. 796@code{nil}, the function return @code{nil}. For any other values, it
797signals an error whose @code{error-symbol} is @code{coding-system-error}
798(@pxref{Signaling Errors, signal}).
769@end defun 799@end defun
770 800
771@defun coding-system-eol-type coding-system 801@defun coding-system-eol-type coding-system
@@ -837,8 +867,9 @@ encode all the character sets in the list @var{charsets}.
837 867
838@defun detect-coding-region start end &optional highest 868@defun detect-coding-region start end &optional highest
839This function chooses a plausible coding system for decoding the text 869This function chooses a plausible coding system for decoding the text
840from @var{start} to @var{end}. This text should be a byte sequence 870from @var{start} to @var{end}. This text should be a byte sequence,
841(@pxref{Explicit Encoding}). 871i.e.@: unibyte text or multibyte text with only @acronym{ASCII} and
872eight-bit characters (@pxref{Explicit Encoding}).
842 873
843Normally this function returns a list of coding systems that could 874Normally this function returns a list of coding systems that could
844handle decoding the text that was scanned. They are listed in order of 875handle decoding the text that was scanned. They are listed in order of
@@ -1160,10 +1191,12 @@ in this section.
1160 1191
1161 The result of encoding, and the input to decoding, are not ordinary 1192 The result of encoding, and the input to decoding, are not ordinary
1162text. They logically consist of a series of byte values; that is, a 1193text. They logically consist of a series of byte values; that is, a
1163series of characters whose codes are in the range 0 through 255. In a 1194series of @acronym{ASCII} and eight-bit characters. In unibyte
1164multibyte buffer or string, character codes 128 through 159 are 1195buffers and strings, these characters have codes in the range 0
1165represented by multibyte sequences, but this is invisible to Lisp 1196through 255. In a multibyte buffer or string, eight-bit characters
1166programs. 1197have character codes higher than 255 (@pxref{Text Representations}),
1198but Emacs transparently converts them to their single-byte values when
1199you encode or decode such text.
1167 1200
1168 The usual way to read a file into a buffer as a sequence of bytes, so 1201 The usual way to read a file into a buffer as a sequence of bytes, so
1169you can decode the contents explicitly, is with 1202you can decode the contents explicitly, is with
@@ -1181,19 +1214,28 @@ encoding by binding @code{coding-system-for-write} to
1181 Here are the functions to perform explicit encoding or decoding. The 1214 Here are the functions to perform explicit encoding or decoding. The
1182encoding functions produce sequences of bytes; the decoding functions 1215encoding functions produce sequences of bytes; the decoding functions
1183are meant to operate on sequences of bytes. All of these functions 1216are meant to operate on sequences of bytes. All of these functions
1184discard text properties. 1217discard text properties. They also set @code{last-coding-system-used}
1218to the precise coding system they used.
1185 1219
1186@deffn Command encode-coding-region start end coding-system 1220@deffn Command encode-coding-region start end coding-system &optional destination
1187This command encodes the text from @var{start} to @var{end} according 1221This command encodes the text from @var{start} to @var{end} according
1188to coding system @var{coding-system}. The encoded text replaces the 1222to coding system @var{coding-system}. Normally, the encoded text
1189original text in the buffer. The result of encoding is logically a 1223replaces the original text in the buffer, but the optional argument
1190sequence of bytes, but the buffer remains multibyte if it was multibyte 1224@var{destination} can change that. If @var{destination} is a buffer,
1191before. 1225the encoded text is inserted in that buffer after point (point does
1192 1226not move); if it is @code{t}, the command returns the encoded text as
1193This command returns the length of the encoded text. 1227a unibyte string without inserting it.
1228
1229If encoded text is inserted in some buffer, this command returns the
1230length of the encoded text.
1231
1232The result of encoding is logically a sequence of bytes, but the
1233buffer remains multibyte if it was multibyte before, and any 8-bit
1234bytes are converted to their multibyte representation (@pxref{Text
1235Representations}).
1194@end deffn 1236@end deffn
1195 1237
1196@defun encode-coding-string string coding-system &optional nocopy 1238@defun encode-coding-string string coding-system &optional nocopy buffer
1197This function encodes the text in @var{string} according to coding 1239This function encodes the text in @var{string} according to coding
1198system @var{coding-system}. It returns a new string containing the 1240system @var{coding-system}. It returns a new string containing the
1199encoded text, except when @var{nocopy} is non-@code{nil}, in which 1241encoded text, except when @var{nocopy} is non-@code{nil}, in which
@@ -1201,24 +1243,36 @@ case the function may return @var{string} itself if the encoding
1201operation is trivial. The result of encoding is a unibyte string. 1243operation is trivial. The result of encoding is a unibyte string.
1202@end defun 1244@end defun
1203 1245
1204@deffn Command decode-coding-region start end coding-system 1246@deffn Command decode-coding-region start end coding-system destination
1205This command decodes the text from @var{start} to @var{end} according 1247This command decodes the text from @var{start} to @var{end} according
1206to coding system @var{coding-system}. The decoded text replaces the 1248to coding system @var{coding-system}. To make explicit decoding
1207original text in the buffer. To make explicit decoding useful, the text 1249useful, the text before decoding ought to be a sequence of byte
1208before decoding ought to be a sequence of byte values, but both 1250values, but both multibyte and unibyte buffers are acceptable (in the
1209multibyte and unibyte buffers are acceptable. 1251multibyte case, the raw byte values should be represented as eight-bit
1210 1252characters). Normally, the decoded text replaces the original text in
1211This command returns the length of the decoded text. 1253the buffer, but the optional argument @var{destination} can change
1254that. If @var{destination} is a buffer, the decoded text is inserted
1255in that buffer after point (point does not move); if it is @code{t},
1256the command returns the decoded text as a multibyte string without
1257inserting it.
1258
1259If decoded text is inserted in some buffer, this command returns the
1260length of the decoded text.
1212@end deffn 1261@end deffn
1213 1262
1214@defun decode-coding-string string coding-system &optional nocopy 1263@defun decode-coding-string string coding-system &optional nocopy buffer
1215This function decodes the text in @var{string} according to coding 1264This function decodes the text in @var{string} according to
1216system @var{coding-system}. It returns a new string containing the 1265@var{coding-system}. It returns a new string containing the decoded
1217decoded text, except when @var{nocopy} is non-@code{nil}, in which 1266text, except when @var{nocopy} is non-@code{nil}, in which case the
1218case the function may return @var{string} itself if the decoding 1267function may return @var{string} itself if the decoding operation is
1219operation is trivial. To make explicit decoding useful, the contents 1268trivial. To make explicit decoding useful, the contents of
1220of @var{string} ought to be a sequence of byte values, but a multibyte 1269@var{string} ought to be a unibyte string with a sequence of byte
1221string is acceptable. 1270values, but a multibyte string is also acceptable (assuming it
1271contains 8-bit bytes in their multibyte form).
1272
1273If optional argument @var{buffer} specifies a buffer, the decoded text
1274is inserted in that buffer after point (point does not move). In this
1275case, the return value is the length of the decoded text.
1222@end defun 1276@end defun
1223 1277
1224@defun decode-coding-inserted-region from to filename &optional visit beg end replace 1278@defun decode-coding-inserted-region from to filename &optional visit beg end replace
@@ -1236,10 +1290,10 @@ decoding, you can call this function.
1236@subsection Terminal I/O Encoding 1290@subsection Terminal I/O Encoding
1237 1291
1238 Emacs can decode keyboard input using a coding system, and encode 1292 Emacs can decode keyboard input using a coding system, and encode
1239terminal output. This is useful for terminals that transmit or display 1293terminal output. This is useful for terminals that transmit or
1240text using a particular encoding such as Latin-1. Emacs does not set 1294display text using a particular encoding such as Latin-1. Emacs does
1241@code{last-coding-system-used} for encoding or decoding for the 1295not set @code{last-coding-system-used} for encoding or decoding of
1242terminal. 1296terminal I/O.
1243 1297
1244@defun keyboard-coding-system 1298@defun keyboard-coding-system
1245This function returns the coding system that is in use for decoding 1299This function returns the coding system that is in use for decoding