diff options
| author | Chong Yidong | 2010-01-02 13:55:19 -0500 |
|---|---|---|
| committer | Chong Yidong | 2010-01-02 13:55:19 -0500 |
| commit | 85eeac935fabe769e3b73d228215005f2a5cece2 (patch) | |
| tree | 907051c346643b11e375015d83bbc59507c13ef4 | |
| parent | b894c439536a226d8524941bcc3d0117e26da11b (diff) | |
| download | emacs-85eeac935fabe769e3b73d228215005f2a5cece2.tar.gz emacs-85eeac935fabe769e3b73d228215005f2a5cece2.zip | |
Consistently hex notation to represent character codes.
* nonascii.texi (Text Representations, Character Codes)
(Converting Representations, Explicit Encoding)
(Translation of Characters): Use hex notation consistently.
(Character Sets): Fix map-charset-chars doc (Bug#5197).
| -rw-r--r-- | doc/lispref/ChangeLog | 7 | ||||
| -rw-r--r-- | doc/lispref/nonascii.texi | 54 |
2 files changed, 36 insertions, 25 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 0f4abc7a984..419e81904ed 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog | |||
| @@ -1,3 +1,10 @@ | |||
| 1 | 2010-01-02 Chong Yidong <cyd@stupidchicken.com> | ||
| 2 | |||
| 3 | * nonascii.texi (Text Representations, Character Codes) | ||
| 4 | (Converting Representations, Explicit Encoding) | ||
| 5 | (Translation of Characters): Use hex notation consistently. | ||
| 6 | (Character Sets): Fix map-charset-chars doc (Bug#5197). | ||
| 7 | |||
| 1 | 2010-01-01 Chong Yidong <cyd@stupidchicken.com> | 8 | 2010-01-01 Chong Yidong <cyd@stupidchicken.com> |
| 2 | 9 | ||
| 3 | * loading.texi (Where Defined): Make it clearer that these are | 10 | * loading.texi (Where Defined): Make it clearer that these are |
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index d3bbc2c114f..59f790c90da 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi | |||
| @@ -46,12 +46,12 @@ in most any known written language. | |||
| 46 | follows the @dfn{Unicode Standard}. The Unicode Standard assigns a | 46 | follows the @dfn{Unicode Standard}. The Unicode Standard assigns a |
| 47 | unique number, called a @dfn{codepoint}, to each and every character. | 47 | unique number, called a @dfn{codepoint}, to each and every character. |
| 48 | The range of codepoints defined by Unicode, or the Unicode | 48 | The range of codepoints defined by Unicode, or the Unicode |
| 49 | @dfn{codespace}, is @code{0..10FFFF} (in hex), inclusive. Emacs | 49 | @dfn{codespace}, is @code{0..#x10FFFF} (in hexadecimal notation), |
| 50 | extends this range with codepoints in the range @code{110000..3FFFFF}, | 50 | inclusive. Emacs extends this range with codepoints in the range |
| 51 | which it uses for representing characters that are not unified with | 51 | @code{#x110000..#x3FFFFF}, which it uses for representing characters |
| 52 | Unicode and raw 8-bit bytes that cannot be interpreted as characters | 52 | that are not unified with Unicode and @dfn{raw 8-bit bytes} that |
| 53 | (the latter occupy the range @code{3FFF80..3FFFFF}). Thus, a | 53 | cannot be interpreted as characters. Thus, a character codepoint in |
| 54 | character codepoint in Emacs is a 22-bit integer number. | 54 | Emacs is a 22-bit integer number. |
| 55 | 55 | ||
| 56 | @cindex internal representation of characters | 56 | @cindex internal representation of characters |
| 57 | @cindex characters, representation in buffers and strings | 57 | @cindex characters, representation in buffers and strings |
| @@ -189,8 +189,8 @@ of characters as @var{string}. If @var{string} is a multibyte string, | |||
| 189 | it is returned unchanged. The function assumes that @var{string} | 189 | it is returned unchanged. The function assumes that @var{string} |
| 190 | includes only @acronym{ASCII} characters and raw 8-bit bytes; the | 190 | includes only @acronym{ASCII} characters and raw 8-bit bytes; the |
| 191 | latter are converted to their multibyte representation corresponding | 191 | latter are converted to their multibyte representation corresponding |
| 192 | to the codepoints in the @code{3FFF80..3FFFFF} area (@pxref{Text | 192 | to the codepoints @code{#x3FFF80} through @code{#x3FFFFF}, inclusive |
| 193 | Representations, codepoints}). | 193 | (@pxref{Text Representations, codepoints}). |
| 194 | @end defun | 194 | @end defun |
| 195 | 195 | ||
| 196 | @defun string-to-unibyte string | 196 | @defun string-to-unibyte string |
| @@ -271,15 +271,19 @@ contains no text properties. | |||
| 271 | 271 | ||
| 272 | The unibyte and multibyte text representations use different | 272 | The unibyte and multibyte text representations use different |
| 273 | character codes. The valid character codes for unibyte representation | 273 | character codes. The valid character codes for unibyte representation |
| 274 | range from 0 to 255---the values that can fit in one byte. The valid | 274 | range from 0 to @code{#xFF} (255)---the values that can fit in one |
| 275 | character codes for multibyte representation range from 0 to 4194303 | 275 | byte. The valid character codes for multibyte representation range |
| 276 | (#x3FFFFF). In this code space, values 0 through 127 are for | 276 | from 0 to @code{#x3FFFFF}. In this code space, values 0 through |
| 277 | @acronym{ASCII} characters, and values 128 through 4194175 (#x3FFF7F) | 277 | @code{#x7F} (127) are for @acronym{ASCII} characters, and values |
| 278 | are for non-@acronym{ASCII} characters. Values 0 through 1114111 | 278 | @code{#x80} (128) through @code{#x3FFF7F} (4194175) are for |
| 279 | (#10FFFF) correspond to Unicode characters of the same codepoint; | 279 | non-@acronym{ASCII} characters. |
| 280 | values 1114112 (#110000) through 4194175 (#x3FFF7F) represent | 280 | |
| 281 | characters that are not unified with Unicode; and values 4194176 | 281 | Emacs character codes are a superset of the Unicode standard. |
| 282 | (#x3FFF80) through 4194303 (#x3FFFFF) represent eight-bit raw bytes. | 282 | Values 0 through @code{#x10FFFF} (1114111) correspond to Unicode |
| 283 | characters of the same codepoint; values @code{#x110000} (1114112) | ||
| 284 | through @code{#x3FFF7F} (4194175) represent characters that are not | ||
| 285 | unified with Unicode; and values @code{#x3FFF80} (4194176) through | ||
| 286 | @code{#x3FFFFF} (4194303) represent eight-bit raw bytes. | ||
| 283 | 287 | ||
| 284 | @defun characterp charcode | 288 | @defun characterp charcode |
| 285 | This returns @code{t} if @var{charcode} is a valid character, and | 289 | This returns @code{t} if @var{charcode} is a valid character, and |
| @@ -540,7 +544,7 @@ and strings. | |||
| 540 | @cindex @code{eight-bit}, a charset | 544 | @cindex @code{eight-bit}, a charset |
| 541 | Emacs defines several special character sets. The character set | 545 | Emacs defines several special character sets. The character set |
| 542 | @code{unicode} includes all the characters whose Emacs code points are | 546 | @code{unicode} includes all the characters whose Emacs code points are |
| 543 | in the range @code{0..10FFFF}. The character set @code{emacs} | 547 | in the range @code{0..#x10FFFF}. The character set @code{emacs} |
| 544 | includes all @acronym{ASCII} and non-@acronym{ASCII} characters. | 548 | includes all @acronym{ASCII} and non-@acronym{ASCII} characters. |
| 545 | Finally, the @code{eight-bit} charset includes the 8-bit raw bytes; | 549 | Finally, the @code{eight-bit} charset includes the 8-bit raw bytes; |
| 546 | Emacs uses it to represent raw bytes encountered in text. | 550 | Emacs uses it to represent raw bytes encountered in text. |
| @@ -628,12 +632,12 @@ that fits the second argument of @code{decode-char} above. If | |||
| 628 | The following function comes in handy for applying a certain | 632 | The following function comes in handy for applying a certain |
| 629 | function to all or part of the characters in a charset: | 633 | function to all or part of the characters in a charset: |
| 630 | 634 | ||
| 631 | @defun map-charset-chars function charset &optional arg from to | 635 | @defun map-charset-chars function charset &optional arg from-code to-code |
| 632 | Call @var{function} for characters in @var{charset}. @var{function} | 636 | Call @var{function} for characters in @var{charset}. @var{function} |
| 633 | is called with two arguments. The first one is a cons cell | 637 | is called with two arguments. The first one is a cons cell |
| 634 | @code{(@var{from} . @var{to})}, where @var{from} and @var{to} | 638 | @code{(@var{from} . @var{to})}, where @var{from} and @var{to} |
| 635 | indicate a range of characters contained in charset. The second | 639 | indicate a range of characters contained in charset. The second |
| 636 | argument is the optional argument @var{arg}. | 640 | argument passed to @var{function} is @var{arg}. |
| 637 | 641 | ||
| 638 | By default, the range of codepoints passed to @var{function} includes | 642 | By default, the range of codepoints passed to @var{function} includes |
| 639 | all the characters in @var{charset}, but optional arguments | 643 | all the characters in @var{charset}, but optional arguments |
| @@ -751,7 +755,7 @@ This variable automatically becomes buffer-local when set. | |||
| 751 | 755 | ||
| 752 | @defun make-translation-table-from-vector vec | 756 | @defun make-translation-table-from-vector vec |
| 753 | This function returns a translation table made from @var{vec} that is | 757 | This function returns a translation table made from @var{vec} that is |
| 754 | an array of 256 elements to map byte values 0 through 255 to | 758 | an array of 256 elements to map bytes (values 0 through #xFF) to |
| 755 | characters. Elements may be @code{nil} for untranslated bytes. The | 759 | characters. Elements may be @code{nil} for untranslated bytes. The |
| 756 | returned table has a translation table for reverse mapping in the | 760 | returned table has a translation table for reverse mapping in the |
| 757 | first extra slot, and the value @code{1} in the second extra slot. | 761 | first extra slot, and the value @code{1} in the second extra slot. |
| @@ -1562,10 +1566,10 @@ in this section. | |||
| 1562 | text. They logically consist of a series of byte values; that is, a | 1566 | text. They logically consist of a series of byte values; that is, a |
| 1563 | series of @acronym{ASCII} and eight-bit characters. In unibyte | 1567 | series of @acronym{ASCII} and eight-bit characters. In unibyte |
| 1564 | buffers and strings, these characters have codes in the range 0 | 1568 | buffers and strings, these characters have codes in the range 0 |
| 1565 | through 255. In a multibyte buffer or string, eight-bit characters | 1569 | through #xFF (255). In a multibyte buffer or string, eight-bit |
| 1566 | have character codes higher than 255 (@pxref{Text Representations}), | 1570 | characters have character codes higher than #xFF (@pxref{Text |
| 1567 | but Emacs transparently converts them to their single-byte values when | 1571 | Representations}), but Emacs transparently converts them to their |
| 1568 | you encode or decode such text. | 1572 | single-byte values when you encode or decode such text. |
| 1569 | 1573 | ||
| 1570 | The usual way to read a file into a buffer as a sequence of bytes, so | 1574 | The usual way to read a file into a buffer as a sequence of bytes, so |
| 1571 | you can decode the contents explicitly, is with | 1575 | you can decode the contents explicitly, is with |