(Charsets): Update the description for the new charset.

(list-character-sets): New findex.
author: Kenichi Handa 2009-06-17 01:14:36 +0000
committer: Kenichi Handa 2009-06-17 01:14:36 +0000
commit: 3af970a06ef84118eea62944da16ba37b4bb41d9 (patch)
tree: 1d107a868a65061627a5ca64d85fadc4d68aae2f
parent: 7f1faf1cc202ec9ee543bd9c6b35d89e162fbe5b (diff)
download: emacs-3af970a06ef84118eea62944da16ba37b4bb41d9.tar.gz
emacs-3af970a06ef84118eea62944da16ba37b4bb41d9.zip
1 files changed, 37 insertions, 19 deletions
diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi
index 9302ef2f988..a663d206536 100644
--- a/doc/emacs/mule.texi
+++ b/doc/emacs/mule.texi
@@ -1620,30 +1620,48 @@ Use @kbd{C-x 8 C-h} to list all the available @kbd{C-x 8} translations.
 @section Charsets
 @cindex charsets
-  Emacs groups all supported characters into disjoint @dfn{charsets}.
+  Emacs defines most of popular character sets (e.g. ascii,
-Each character code belongs to one and only one charset.  For
+iso-8859-1, cp1250, big5, unicode) as @dfn{charsets} and a few of its
-historical reasons, Emacs typically divides an 8-bit character code
+own charsets (e.g. emacs, unicode-bmp, eight-bit).  All supported
-for an extended version of @acronym{ASCII} into two charsets:
+characters belong to one or more charsets.  Usually you don't have to
-@acronym{ASCII}, which covers the codes 0 through 127, plus another
+take care of ``charset'', but knowing about it may help understanding
-charset which covers the ``right-hand part'' (the codes 128 and up).
+the behavior of Emacs in some cases.
-For instance, the characters of Latin-1 include the Emacs charset
-@code{ascii} plus the Emacs charset @code{latin-iso8859-1}.
+  One example is a font selection.  In each language environment,
+charsets have different priorities.  Emacs, at first, tries to use a
-  Emacs characters belonging to different charsets may look the same,
+font that matches with charsets of higher priority.  For instance, in
-but they are still different characters.  For example, the letter
+Japanese language environment, the charset @code{japanese-jisx0208}
-@samp{o} with acute accent in charset @code{latin-iso8859-1}, used for
+has the highest priority (@xref{describe-language-environment}).  So,
-Latin-1, is different from the letter @samp{o} with acute accent in
+Emacs tries to use a font whose @code{registry} property is
-charset @code{latin-iso8859-2}, used for Latin-2.
+``JISX0208.1983-0'' for characters belonging to that charset.
+  Another example is a use of @code{charset} text property.  When
+Emacs reads a file encoded in a coding systems that uses escape
+sequences to switch charsets (e.g. iso-2022-int-1), the buffer text
+keep the information of the original charset by @code{charset} text
+property.  By using this information, Emacs can write the file with
+the same byte sequence as the original.
 @findex list-charset-chars
 @cindex characters in a certain charset
 @findex describe-character-set
  There are two commands for obtaining information about Emacs
-charsets.  The command @kbd{M-x list-charset-chars} prompts for a name
+charsets.  The command @kbd{M-x list-charset-chars} prompts for a
-of a character set, and displays all the characters in that character
+charset name, and displays all the characters in that character set.
-set.  The command @kbd{M-x describe-character-set} prompts for a
+The command @kbd{M-x describe-character-set} prompts for a charset
-charset name and displays information about that charset, including
+name and displays information about that charset, including its
-its internal representation within Emacs.
+internal representation within Emacs.
+@findex list-character-sets
+  To display a list of all the supported charsets, type @kbd{M-x
+list-character-sets}.  The list gives the names of charsets and
+additional information to identity each charset (see ISO/IEC's this
+page <http://www.itscj.ipsj.or.jp/ISO-IR/> for the detail).  In the
+list, charsets are categorized into two; the normal charsets are
+listed first, and the supplementary charsets are listed last.  A
+charset in the latter category is used for defining another charset
+(as a parent or a subset), or was used only in Emacs of the older
+versions.
  To find out which charset a character in the buffer belongs to,
 put point before it and type @kbd{C-u C-x =}.
author	Kenichi Handa	2009-06-17 01:14:36 +0000
committer	Kenichi Handa	2009-06-17 01:14:36 +0000
commit	3af970a06ef84118eea62944da16ba37b4bb41d9 (patch)
tree	1d107a868a65061627a5ca64d85fadc4d68aae2f
parent	7f1faf1cc202ec9ee543bd9c6b35d89e162fbe5b (diff)
download	emacs-3af970a06ef84118eea62944da16ba37b4bb41d9.tar.gz emacs-3af970a06ef84118eea62944da16ba37b4bb41d9.zip

diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi index 9302ef2f988..a663d206536 100644 --- a/doc/emacs/mule.texi +++ b/doc/emacs/mule.texi
@@ -1620,30 +1620,48 @@ Use @kbd{C-x 8 C-h} to list all the available @kbd{C-x 8} translations.
1620	@section Charsets	1620	@section Charsets
1621	@cindex charsets	1621	@cindex charsets
1622		1622
1623	Emacs groups all supported characters into disjoint @dfn{charsets}.	1623	Emacs defines most of popular character sets (e.g. ascii,
1624	Each character code belongs to one and only one charset. For	1624	iso-8859-1, cp1250, big5, unicode) as @dfn{charsets} and a few of its
1625	historical reasons, Emacs typically divides an 8-bit character code	1625	own charsets (e.g. emacs, unicode-bmp, eight-bit). All supported
1626	for an extended version of @acronym{ASCII} into two charsets:	1626	characters belong to one or more charsets. Usually you don't have to
1627	@acronym{ASCII}, which covers the codes 0 through 127, plus another	1627	take care of ``charset'', but knowing about it may help understanding
1628	charset which covers the ``right-hand part'' (the codes 128 and up).	1628	the behavior of Emacs in some cases.
1629	For instance, the characters of Latin-1 include the Emacs charset	1629
1630	@code{ascii} plus the Emacs charset @code{latin-iso8859-1}.	1630	One example is a font selection. In each language environment,
1631		1631	charsets have different priorities. Emacs, at first, tries to use a
1632	Emacs characters belonging to different charsets may look the same,	1632	font that matches with charsets of higher priority. For instance, in
1633	but they are still different characters. For example, the letter	1633	Japanese language environment, the charset @code{japanese-jisx0208}
1634	@samp{o} with acute accent in charset @code{latin-iso8859-1}, used for	1634	has the highest priority (@xref{describe-language-environment}). So,
1635	Latin-1, is different from the letter @samp{o} with acute accent in	1635	Emacs tries to use a font whose @code{registry} property is
1636	charset @code{latin-iso8859-2}, used for Latin-2.	1636	``JISX0208.1983-0'' for characters belonging to that charset.
		1637
		1638	Another example is a use of @code{charset} text property. When
		1639	Emacs reads a file encoded in a coding systems that uses escape
		1640	sequences to switch charsets (e.g. iso-2022-int-1), the buffer text
		1641	keep the information of the original charset by @code{charset} text
		1642	property. By using this information, Emacs can write the file with
		1643	the same byte sequence as the original.
1637		1644
1638	@findex list-charset-chars	1645	@findex list-charset-chars
1639	@cindex characters in a certain charset	1646	@cindex characters in a certain charset
1640	@findex describe-character-set	1647	@findex describe-character-set
1641	There are two commands for obtaining information about Emacs	1648	There are two commands for obtaining information about Emacs
1642	charsets. The command @kbd{M-x list-charset-chars} prompts for a name	1649	charsets. The command @kbd{M-x list-charset-chars} prompts for a
1643	of a character set, and displays all the characters in that character	1650	charset name, and displays all the characters in that character set.
1644	set. The command @kbd{M-x describe-character-set} prompts for a	1651	The command @kbd{M-x describe-character-set} prompts for a charset
1645	charset name and displays information about that charset, including	1652	name and displays information about that charset, including its
1646	its internal representation within Emacs.	1653	internal representation within Emacs.
		1654
		1655	@findex list-character-sets
		1656	To display a list of all the supported charsets, type @kbd{M-x
		1657	list-character-sets}. The list gives the names of charsets and
		1658	additional information to identity each charset (see ISO/IEC's this
		1659	page <http://www.itscj.ipsj.or.jp/ISO-IR/> for the detail). In the
		1660	list, charsets are categorized into two; the normal charsets are
		1661	listed first, and the supplementary charsets are listed last. A
		1662	charset in the latter category is used for defining another charset
		1663	(as a parent or a subset), or was used only in Emacs of the older
		1664	versions.
1647		1665
1648	To find out which charset a character in the buffer belongs to,	1666	To find out which charset a character in the buffer belongs to,
1649	put point before it and type @kbd{C-u C-x =}.	1667	put point before it and type @kbd{C-u C-x =}.