diff options
| author | Kenichi Handa | 2009-06-17 01:14:36 +0000 |
|---|---|---|
| committer | Kenichi Handa | 2009-06-17 01:14:36 +0000 |
| commit | 3af970a06ef84118eea62944da16ba37b4bb41d9 (patch) | |
| tree | 1d107a868a65061627a5ca64d85fadc4d68aae2f | |
| parent | 7f1faf1cc202ec9ee543bd9c6b35d89e162fbe5b (diff) | |
| download | emacs-3af970a06ef84118eea62944da16ba37b4bb41d9.tar.gz emacs-3af970a06ef84118eea62944da16ba37b4bb41d9.zip | |
(Charsets): Update the description for the new charset.
(list-character-sets): New findex.
| -rw-r--r-- | doc/emacs/mule.texi | 56 |
1 files changed, 37 insertions, 19 deletions
diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi index 9302ef2f988..a663d206536 100644 --- a/doc/emacs/mule.texi +++ b/doc/emacs/mule.texi | |||
| @@ -1620,30 +1620,48 @@ Use @kbd{C-x 8 C-h} to list all the available @kbd{C-x 8} translations. | |||
| 1620 | @section Charsets | 1620 | @section Charsets |
| 1621 | @cindex charsets | 1621 | @cindex charsets |
| 1622 | 1622 | ||
| 1623 | Emacs groups all supported characters into disjoint @dfn{charsets}. | 1623 | Emacs defines most of popular character sets (e.g. ascii, |
| 1624 | Each character code belongs to one and only one charset. For | 1624 | iso-8859-1, cp1250, big5, unicode) as @dfn{charsets} and a few of its |
| 1625 | historical reasons, Emacs typically divides an 8-bit character code | 1625 | own charsets (e.g. emacs, unicode-bmp, eight-bit). All supported |
| 1626 | for an extended version of @acronym{ASCII} into two charsets: | 1626 | characters belong to one or more charsets. Usually you don't have to |
| 1627 | @acronym{ASCII}, which covers the codes 0 through 127, plus another | 1627 | take care of ``charset'', but knowing about it may help understanding |
| 1628 | charset which covers the ``right-hand part'' (the codes 128 and up). | 1628 | the behavior of Emacs in some cases. |
| 1629 | For instance, the characters of Latin-1 include the Emacs charset | 1629 | |
| 1630 | @code{ascii} plus the Emacs charset @code{latin-iso8859-1}. | 1630 | One example is a font selection. In each language environment, |
| 1631 | 1631 | charsets have different priorities. Emacs, at first, tries to use a | |
| 1632 | Emacs characters belonging to different charsets may look the same, | 1632 | font that matches with charsets of higher priority. For instance, in |
| 1633 | but they are still different characters. For example, the letter | 1633 | Japanese language environment, the charset @code{japanese-jisx0208} |
| 1634 | @samp{o} with acute accent in charset @code{latin-iso8859-1}, used for | 1634 | has the highest priority (@xref{describe-language-environment}). So, |
| 1635 | Latin-1, is different from the letter @samp{o} with acute accent in | 1635 | Emacs tries to use a font whose @code{registry} property is |
| 1636 | charset @code{latin-iso8859-2}, used for Latin-2. | 1636 | ``JISX0208.1983-0'' for characters belonging to that charset. |
| 1637 | |||
| 1638 | Another example is a use of @code{charset} text property. When | ||
| 1639 | Emacs reads a file encoded in a coding systems that uses escape | ||
| 1640 | sequences to switch charsets (e.g. iso-2022-int-1), the buffer text | ||
| 1641 | keep the information of the original charset by @code{charset} text | ||
| 1642 | property. By using this information, Emacs can write the file with | ||
| 1643 | the same byte sequence as the original. | ||
| 1637 | 1644 | ||
| 1638 | @findex list-charset-chars | 1645 | @findex list-charset-chars |
| 1639 | @cindex characters in a certain charset | 1646 | @cindex characters in a certain charset |
| 1640 | @findex describe-character-set | 1647 | @findex describe-character-set |
| 1641 | There are two commands for obtaining information about Emacs | 1648 | There are two commands for obtaining information about Emacs |
| 1642 | charsets. The command @kbd{M-x list-charset-chars} prompts for a name | 1649 | charsets. The command @kbd{M-x list-charset-chars} prompts for a |
| 1643 | of a character set, and displays all the characters in that character | 1650 | charset name, and displays all the characters in that character set. |
| 1644 | set. The command @kbd{M-x describe-character-set} prompts for a | 1651 | The command @kbd{M-x describe-character-set} prompts for a charset |
| 1645 | charset name and displays information about that charset, including | 1652 | name and displays information about that charset, including its |
| 1646 | its internal representation within Emacs. | 1653 | internal representation within Emacs. |
| 1654 | |||
| 1655 | @findex list-character-sets | ||
| 1656 | To display a list of all the supported charsets, type @kbd{M-x | ||
| 1657 | list-character-sets}. The list gives the names of charsets and | ||
| 1658 | additional information to identity each charset (see ISO/IEC's this | ||
| 1659 | page <http://www.itscj.ipsj.or.jp/ISO-IR/> for the detail). In the | ||
| 1660 | list, charsets are categorized into two; the normal charsets are | ||
| 1661 | listed first, and the supplementary charsets are listed last. A | ||
| 1662 | charset in the latter category is used for defining another charset | ||
| 1663 | (as a parent or a subset), or was used only in Emacs of the older | ||
| 1664 | versions. | ||
| 1647 | 1665 | ||
| 1648 | To find out which charset a character in the buffer belongs to, | 1666 | To find out which charset a character in the buffer belongs to, |
| 1649 | put point before it and type @kbd{C-u C-x =}. | 1667 | put point before it and type @kbd{C-u C-x =}. |