aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--doc/lispref/ChangeLog7
-rw-r--r--doc/lispref/nonascii.texi54
2 files changed, 36 insertions, 25 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index 0f4abc7a984..419e81904ed 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,3 +1,10 @@
12010-01-02 Chong Yidong <cyd@stupidchicken.com>
2
3 * nonascii.texi (Text Representations, Character Codes)
4 (Converting Representations, Explicit Encoding)
5 (Translation of Characters): Use hex notation consistently.
6 (Character Sets): Fix map-charset-chars doc (Bug#5197).
7
12010-01-01 Chong Yidong <cyd@stupidchicken.com> 82010-01-01 Chong Yidong <cyd@stupidchicken.com>
2 9
3 * loading.texi (Where Defined): Make it clearer that these are 10 * loading.texi (Where Defined): Make it clearer that these are
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index d3bbc2c114f..59f790c90da 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -46,12 +46,12 @@ in most any known written language.
46follows the @dfn{Unicode Standard}. The Unicode Standard assigns a 46follows the @dfn{Unicode Standard}. The Unicode Standard assigns a
47unique number, called a @dfn{codepoint}, to each and every character. 47unique number, called a @dfn{codepoint}, to each and every character.
48The range of codepoints defined by Unicode, or the Unicode 48The range of codepoints defined by Unicode, or the Unicode
49@dfn{codespace}, is @code{0..10FFFF} (in hex), inclusive. Emacs 49@dfn{codespace}, is @code{0..#x10FFFF} (in hexadecimal notation),
50extends this range with codepoints in the range @code{110000..3FFFFF}, 50inclusive. Emacs extends this range with codepoints in the range
51which it uses for representing characters that are not unified with 51@code{#x110000..#x3FFFFF}, which it uses for representing characters
52Unicode and raw 8-bit bytes that cannot be interpreted as characters 52that are not unified with Unicode and @dfn{raw 8-bit bytes} that
53(the latter occupy the range @code{3FFF80..3FFFFF}). Thus, a 53cannot be interpreted as characters. Thus, a character codepoint in
54character codepoint in Emacs is a 22-bit integer number. 54Emacs is a 22-bit integer number.
55 55
56@cindex internal representation of characters 56@cindex internal representation of characters
57@cindex characters, representation in buffers and strings 57@cindex characters, representation in buffers and strings
@@ -189,8 +189,8 @@ of characters as @var{string}. If @var{string} is a multibyte string,
189it is returned unchanged. The function assumes that @var{string} 189it is returned unchanged. The function assumes that @var{string}
190includes only @acronym{ASCII} characters and raw 8-bit bytes; the 190includes only @acronym{ASCII} characters and raw 8-bit bytes; the
191latter are converted to their multibyte representation corresponding 191latter are converted to their multibyte representation corresponding
192to the codepoints in the @code{3FFF80..3FFFFF} area (@pxref{Text 192to the codepoints @code{#x3FFF80} through @code{#x3FFFFF}, inclusive
193Representations, codepoints}). 193(@pxref{Text Representations, codepoints}).
194@end defun 194@end defun
195 195
196@defun string-to-unibyte string 196@defun string-to-unibyte string
@@ -271,15 +271,19 @@ contains no text properties.
271 271
272 The unibyte and multibyte text representations use different 272 The unibyte and multibyte text representations use different
273character codes. The valid character codes for unibyte representation 273character codes. The valid character codes for unibyte representation
274range from 0 to 255---the values that can fit in one byte. The valid 274range from 0 to @code{#xFF} (255)---the values that can fit in one
275character codes for multibyte representation range from 0 to 4194303 275byte. The valid character codes for multibyte representation range
276(#x3FFFFF). In this code space, values 0 through 127 are for 276from 0 to @code{#x3FFFFF}. In this code space, values 0 through
277@acronym{ASCII} characters, and values 128 through 4194175 (#x3FFF7F) 277@code{#x7F} (127) are for @acronym{ASCII} characters, and values
278are for non-@acronym{ASCII} characters. Values 0 through 1114111 278@code{#x80} (128) through @code{#x3FFF7F} (4194175) are for
279(#10FFFF) correspond to Unicode characters of the same codepoint; 279non-@acronym{ASCII} characters.
280values 1114112 (#110000) through 4194175 (#x3FFF7F) represent 280
281characters that are not unified with Unicode; and values 4194176 281 Emacs character codes are a superset of the Unicode standard.
282(#x3FFF80) through 4194303 (#x3FFFFF) represent eight-bit raw bytes. 282Values 0 through @code{#x10FFFF} (1114111) correspond to Unicode
283characters of the same codepoint; values @code{#x110000} (1114112)
284through @code{#x3FFF7F} (4194175) represent characters that are not
285unified with Unicode; and values @code{#x3FFF80} (4194176) through
286@code{#x3FFFFF} (4194303) represent eight-bit raw bytes.
283 287
284@defun characterp charcode 288@defun characterp charcode
285This returns @code{t} if @var{charcode} is a valid character, and 289This returns @code{t} if @var{charcode} is a valid character, and
@@ -540,7 +544,7 @@ and strings.
540@cindex @code{eight-bit}, a charset 544@cindex @code{eight-bit}, a charset
541 Emacs defines several special character sets. The character set 545 Emacs defines several special character sets. The character set
542@code{unicode} includes all the characters whose Emacs code points are 546@code{unicode} includes all the characters whose Emacs code points are
543in the range @code{0..10FFFF}. The character set @code{emacs} 547in the range @code{0..#x10FFFF}. The character set @code{emacs}
544includes all @acronym{ASCII} and non-@acronym{ASCII} characters. 548includes all @acronym{ASCII} and non-@acronym{ASCII} characters.
545Finally, the @code{eight-bit} charset includes the 8-bit raw bytes; 549Finally, the @code{eight-bit} charset includes the 8-bit raw bytes;
546Emacs uses it to represent raw bytes encountered in text. 550Emacs uses it to represent raw bytes encountered in text.
@@ -628,12 +632,12 @@ that fits the second argument of @code{decode-char} above. If
628 The following function comes in handy for applying a certain 632 The following function comes in handy for applying a certain
629function to all or part of the characters in a charset: 633function to all or part of the characters in a charset:
630 634
631@defun map-charset-chars function charset &optional arg from to 635@defun map-charset-chars function charset &optional arg from-code to-code
632Call @var{function} for characters in @var{charset}. @var{function} 636Call @var{function} for characters in @var{charset}. @var{function}
633is called with two arguments. The first one is a cons cell 637is called with two arguments. The first one is a cons cell
634@code{(@var{from} . @var{to})}, where @var{from} and @var{to} 638@code{(@var{from} . @var{to})}, where @var{from} and @var{to}
635indicate a range of characters contained in charset. The second 639indicate a range of characters contained in charset. The second
636argument is the optional argument @var{arg}. 640argument passed to @var{function} is @var{arg}.
637 641
638By default, the range of codepoints passed to @var{function} includes 642By default, the range of codepoints passed to @var{function} includes
639all the characters in @var{charset}, but optional arguments 643all the characters in @var{charset}, but optional arguments
@@ -751,7 +755,7 @@ This variable automatically becomes buffer-local when set.
751 755
752@defun make-translation-table-from-vector vec 756@defun make-translation-table-from-vector vec
753This function returns a translation table made from @var{vec} that is 757This function returns a translation table made from @var{vec} that is
754an array of 256 elements to map byte values 0 through 255 to 758an array of 256 elements to map bytes (values 0 through #xFF) to
755characters. Elements may be @code{nil} for untranslated bytes. The 759characters. Elements may be @code{nil} for untranslated bytes. The
756returned table has a translation table for reverse mapping in the 760returned table has a translation table for reverse mapping in the
757first extra slot, and the value @code{1} in the second extra slot. 761first extra slot, and the value @code{1} in the second extra slot.
@@ -1562,10 +1566,10 @@ in this section.
1562text. They logically consist of a series of byte values; that is, a 1566text. They logically consist of a series of byte values; that is, a
1563series of @acronym{ASCII} and eight-bit characters. In unibyte 1567series of @acronym{ASCII} and eight-bit characters. In unibyte
1564buffers and strings, these characters have codes in the range 0 1568buffers and strings, these characters have codes in the range 0
1565through 255. In a multibyte buffer or string, eight-bit characters 1569through #xFF (255). In a multibyte buffer or string, eight-bit
1566have character codes higher than 255 (@pxref{Text Representations}), 1570characters have character codes higher than #xFF (@pxref{Text
1567but Emacs transparently converts them to their single-byte values when 1571Representations}), but Emacs transparently converts them to their
1568you encode or decode such text. 1572single-byte values when you encode or decode such text.
1569 1573
1570 The usual way to read a file into a buffer as a sequence of bytes, so 1574 The usual way to read a file into a buffer as a sequence of bytes, so
1571you can decode the contents explicitly, is with 1575you can decode the contents explicitly, is with