2 files changed, 36 insertions, 25 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index 0f4abc7a984..419e81904ed 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,3 +1,10 @@
+2010-01-02  Chong Yidong  <cyd@stupidchicken.com>
+        * nonascii.texi (Text Representations, Character Codes)
+        (Converting Representations, Explicit Encoding)
+        (Translation of Characters): Use hex notation consistently.
+        (Character Sets): Fix map-charset-chars doc (Bug#5197).
 2010-01-01  Chong Yidong  <cyd@stupidchicken.com>
        * loading.texi (Where Defined): Make it clearer that these are
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index d3bbc2c114f..59f790c90da 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -46,12 +46,12 @@ in most any known written language.
 follows the @dfn{Unicode Standard}.  The Unicode Standard assigns a
 unique number, called a @dfn{codepoint}, to each and every character.
 The range of codepoints defined by Unicode, or the Unicode
-@dfn{codespace}, is @code{0..10FFFF} (in hex), inclusive.  Emacs
+@dfn{codespace}, is @code{0..#x10FFFF} (in hexadecimal notation),
-extends this range with codepoints in the range @code{110000..3FFFFF},
+inclusive.  Emacs extends this range with codepoints in the range
-which it uses for representing characters that are not unified with
+@code{#x110000..#x3FFFFF}, which it uses for representing characters
-Unicode and raw 8-bit bytes that cannot be interpreted as characters
+that are not unified with Unicode and @dfn{raw 8-bit bytes} that
-(the latter occupy the range @code{3FFF80..3FFFFF}).  Thus, a
+cannot be interpreted as characters.  Thus, a character codepoint in
-character codepoint in Emacs is a 22-bit integer number.
+Emacs is a 22-bit integer number.
 @cindex internal representation of characters
 @cindex characters, representation in buffers and strings
@@ -189,8 +189,8 @@ of characters as @var{string}.  If @var{string} is a multibyte string,
 it is returned unchanged.  The function assumes that @var{string}
 includes only @acronym{ASCII} characters and raw 8-bit bytes; the
 latter are converted to their multibyte representation corresponding
-to the codepoints in the @code{3FFF80..3FFFFF} area (@pxref{Text
+to the codepoints @code{#x3FFF80} through @code{#x3FFFFF}, inclusive
-Representations, codepoints}).
+(@pxref{Text Representations, codepoints}).
 @end defun
 @defun string-to-unibyte string
@@ -271,15 +271,19 @@ contains no text properties.
  The unibyte and multibyte text representations use different
 character codes.  The valid character codes for unibyte representation
-range from 0 to 255---the values that can fit in one byte.  The valid
+range from 0 to @code{#xFF} (255)---the values that can fit in one
-character codes for multibyte representation range from 0 to 4194303
+byte.  The valid character codes for multibyte representation range
-(#x3FFFFF).  In this code space, values 0 through 127 are for
+from 0 to @code{#x3FFFFF}.  In this code space, values 0 through
-@acronym{ASCII} characters, and values 128 through 4194175 (#x3FFF7F)
+@code{#x7F} (127) are for @acronym{ASCII} characters, and values
-are for non-@acronym{ASCII} characters.  Values 0 through 1114111
+@code{#x80} (128) through @code{#x3FFF7F} (4194175) are for
-(#10FFFF) correspond to Unicode characters of the same codepoint;
+non-@acronym{ASCII} characters.
-values 1114112 (#110000) through 4194175 (#x3FFF7F) represent
-characters that are not unified with Unicode; and values 4194176
+  Emacs character codes are a superset of the Unicode standard.
-(#x3FFF80) through 4194303 (#x3FFFFF) represent eight-bit raw bytes.
+Values 0 through @code{#x10FFFF} (1114111) correspond to Unicode
+characters of the same codepoint; values @code{#x110000} (1114112)
+through @code{#x3FFF7F} (4194175) represent characters that are not
+unified with Unicode; and values @code{#x3FFF80} (4194176) through
+@code{#x3FFFFF} (4194303) represent eight-bit raw bytes.
 @defun characterp charcode
 This returns @code{t} if @var{charcode} is a valid character, and
@@ -540,7 +544,7 @@ and strings.
 @cindex @code{eight-bit}, a charset
  Emacs defines several special character sets.  The character set
 @code{unicode} includes all the characters whose Emacs code points are
-in the range @code{0..10FFFF}.  The character set @code{emacs}
+in the range @code{0..#x10FFFF}.  The character set @code{emacs}
 includes all @acronym{ASCII} and non-@acronym{ASCII} characters.
 Finally, the @code{eight-bit} charset includes the 8-bit raw bytes;
 Emacs uses it to represent raw bytes encountered in text.
@@ -628,12 +632,12 @@ that fits the second argument of @code{decode-char} above.  If
  The following function comes in handy for applying a certain
 function to all or part of the characters in a charset:
-@defun map-charset-chars function charset &optional arg from to
+@defun map-charset-chars function charset &optional arg from-code to-code
 Call @var{function} for characters in @var{charset}.  @var{function}
 is called with two arguments.  The first one is a cons cell
 @code{(@var{from} .  @var{to})}, where @var{from} and @var{to}
 indicate a range of characters contained in charset.  The second
-argument is the optional argument @var{arg}.
+argument passed to @var{function} is @var{arg}.
 By default, the range of codepoints passed to @var{function} includes
 all the characters in @var{charset}, but optional arguments
@@ -751,7 +755,7 @@ This variable automatically becomes buffer-local when set.
 @defun make-translation-table-from-vector vec
 This function returns a translation table made from @var{vec} that is
-an array of 256 elements to map byte values 0 through 255 to
+an array of 256 elements to map bytes (values 0 through #xFF) to
 characters.  Elements may be @code{nil} for untranslated bytes.  The
 returned table has a translation table for reverse mapping in the
 first extra slot, and the value @code{1} in the second extra slot.
@@ -1562,10 +1566,10 @@ in this section.
 text.  They logically consist of a series of byte values; that is, a
 series of @acronym{ASCII} and eight-bit characters.  In unibyte
 buffers and strings, these characters have codes in the range 0
-through 255.  In a multibyte buffer or string, eight-bit characters
+through #xFF (255).  In a multibyte buffer or string, eight-bit
-have character codes higher than 255 (@pxref{Text Representations}),
+characters have character codes higher than #xFF (@pxref{Text
-but Emacs transparently converts them to their single-byte values when
+Representations}), but Emacs transparently converts them to their
-you encode or decode such text.
+single-byte values when you encode or decode such text.
  The usual way to read a file into a buffer as a sequence of bytes, so
 you can decode the contents explicitly, is with

diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 0f4abc7a984..419e81904ed 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog
@@ -1,3 +1,10 @@
		1	2010-01-02 Chong Yidong <cyd@stupidchicken.com>
		2
		3	* nonascii.texi (Text Representations, Character Codes)
		4	(Converting Representations, Explicit Encoding)
		5	(Translation of Characters): Use hex notation consistently.
		6	(Character Sets): Fix map-charset-chars doc (Bug#5197).
		7
1	2010-01-01 Chong Yidong <cyd@stupidchicken.com>	8	2010-01-01 Chong Yidong <cyd@stupidchicken.com>
2		9
3	* loading.texi (Where Defined): Make it clearer that these are	10	* loading.texi (Where Defined): Make it clearer that these are


diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index d3bbc2c114f..59f790c90da 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi
@@ -46,12 +46,12 @@ in most any known written language.
46	follows the @dfn{Unicode Standard}. The Unicode Standard assigns a	46	follows the @dfn{Unicode Standard}. The Unicode Standard assigns a
47	unique number, called a @dfn{codepoint}, to each and every character.	47	unique number, called a @dfn{codepoint}, to each and every character.
48	The range of codepoints defined by Unicode, or the Unicode	48	The range of codepoints defined by Unicode, or the Unicode
49	@dfn{codespace}, is @code{0..10FFFF} (in hex), inclusive. Emacs	49	@dfn{codespace}, is @code{0..#x10FFFF} (in hexadecimal notation),
50	extends this range with codepoints in the range @code{110000..3FFFFF},	50	inclusive. Emacs extends this range with codepoints in the range
51	which it uses for representing characters that are not unified with	51	@code{#x110000..#x3FFFFF}, which it uses for representing characters
52	Unicode and raw 8-bit bytes that cannot be interpreted as characters	52	that are not unified with Unicode and @dfn{raw 8-bit bytes} that
53	(the latter occupy the range @code{3FFF80..3FFFFF}). Thus, a	53	cannot be interpreted as characters. Thus, a character codepoint in
54	character codepoint in Emacs is a 22-bit integer number.	54	Emacs is a 22-bit integer number.
55		55
56	@cindex internal representation of characters	56	@cindex internal representation of characters
57	@cindex characters, representation in buffers and strings	57	@cindex characters, representation in buffers and strings
@@ -189,8 +189,8 @@ of characters as @var{string}. If @var{string} is a multibyte string,
189	it is returned unchanged. The function assumes that @var{string}	189	it is returned unchanged. The function assumes that @var{string}
190	includes only @acronym{ASCII} characters and raw 8-bit bytes; the	190	includes only @acronym{ASCII} characters and raw 8-bit bytes; the
191	latter are converted to their multibyte representation corresponding	191	latter are converted to their multibyte representation corresponding
192	to the codepoints in the @code{3FFF80..3FFFFF} area (@pxref{Text	192	to the codepoints @code{#x3FFF80} through @code{#x3FFFFF}, inclusive
193	Representations, codepoints}).	193	(@pxref{Text Representations, codepoints}).
194	@end defun	194	@end defun
195		195
196	@defun string-to-unibyte string	196	@defun string-to-unibyte string
@@ -271,15 +271,19 @@ contains no text properties.
271		271
272	The unibyte and multibyte text representations use different	272	The unibyte and multibyte text representations use different
273	character codes. The valid character codes for unibyte representation	273	character codes. The valid character codes for unibyte representation
274	range from 0 to 255---the values that can fit in one byte. The valid	274	range from 0 to @code{#xFF} (255)---the values that can fit in one
275	character codes for multibyte representation range from 0 to 4194303	275	byte. The valid character codes for multibyte representation range
276	(#x3FFFFF). In this code space, values 0 through 127 are for	276	from 0 to @code{#x3FFFFF}. In this code space, values 0 through
277	@acronym{ASCII} characters, and values 128 through 4194175 (#x3FFF7F)	277	@code{#x7F} (127) are for @acronym{ASCII} characters, and values
278	are for non-@acronym{ASCII} characters. Values 0 through 1114111	278	@code{#x80} (128) through @code{#x3FFF7F} (4194175) are for
279	(#10FFFF) correspond to Unicode characters of the same codepoint;	279	non-@acronym{ASCII} characters.
280	values 1114112 (#110000) through 4194175 (#x3FFF7F) represent	280
281	characters that are not unified with Unicode; and values 4194176	281	Emacs character codes are a superset of the Unicode standard.
282	(#x3FFF80) through 4194303 (#x3FFFFF) represent eight-bit raw bytes.	282	Values 0 through @code{#x10FFFF} (1114111) correspond to Unicode
		283	characters of the same codepoint; values @code{#x110000} (1114112)
		284	through @code{#x3FFF7F} (4194175) represent characters that are not
		285	unified with Unicode; and values @code{#x3FFF80} (4194176) through
		286	@code{#x3FFFFF} (4194303) represent eight-bit raw bytes.
283		287
284	@defun characterp charcode	288	@defun characterp charcode
285	This returns @code{t} if @var{charcode} is a valid character, and	289	This returns @code{t} if @var{charcode} is a valid character, and
@@ -540,7 +544,7 @@ and strings.
540	@cindex @code{eight-bit}, a charset	544	@cindex @code{eight-bit}, a charset
541	Emacs defines several special character sets. The character set	545	Emacs defines several special character sets. The character set
542	@code{unicode} includes all the characters whose Emacs code points are	546	@code{unicode} includes all the characters whose Emacs code points are
543	in the range @code{0..10FFFF}. The character set @code{emacs}	547	in the range @code{0..#x10FFFF}. The character set @code{emacs}
544	includes all @acronym{ASCII} and non-@acronym{ASCII} characters.	548	includes all @acronym{ASCII} and non-@acronym{ASCII} characters.
545	Finally, the @code{eight-bit} charset includes the 8-bit raw bytes;	549	Finally, the @code{eight-bit} charset includes the 8-bit raw bytes;
546	Emacs uses it to represent raw bytes encountered in text.	550	Emacs uses it to represent raw bytes encountered in text.
@@ -628,12 +632,12 @@ that fits the second argument of @code{decode-char} above. If
628	The following function comes in handy for applying a certain	632	The following function comes in handy for applying a certain
629	function to all or part of the characters in a charset:	633	function to all or part of the characters in a charset:
630		634
631	@defun map-charset-chars function charset &optional arg from to	635	@defun map-charset-chars function charset &optional arg from-code to-code
632	Call @var{function} for characters in @var{charset}. @var{function}	636	Call @var{function} for characters in @var{charset}. @var{function}
633	is called with two arguments. The first one is a cons cell	637	is called with two arguments. The first one is a cons cell
634	@code{(@var{from} . @var{to})}, where @var{from} and @var{to}	638	@code{(@var{from} . @var{to})}, where @var{from} and @var{to}
635	indicate a range of characters contained in charset. The second	639	indicate a range of characters contained in charset. The second
636	argument is the optional argument @var{arg}.	640	argument passed to @var{function} is @var{arg}.
637		641
638	By default, the range of codepoints passed to @var{function} includes	642	By default, the range of codepoints passed to @var{function} includes
639	all the characters in @var{charset}, but optional arguments	643	all the characters in @var{charset}, but optional arguments
@@ -751,7 +755,7 @@ This variable automatically becomes buffer-local when set.
751		755
752	@defun make-translation-table-from-vector vec	756	@defun make-translation-table-from-vector vec
753	This function returns a translation table made from @var{vec} that is	757	This function returns a translation table made from @var{vec} that is
754	an array of 256 elements to map byte values 0 through 255 to	758	an array of 256 elements to map bytes (values 0 through #xFF) to
755	characters. Elements may be @code{nil} for untranslated bytes. The	759	characters. Elements may be @code{nil} for untranslated bytes. The
756	returned table has a translation table for reverse mapping in the	760	returned table has a translation table for reverse mapping in the
757	first extra slot, and the value @code{1} in the second extra slot.	761	first extra slot, and the value @code{1} in the second extra slot.
@@ -1562,10 +1566,10 @@ in this section.
1562	text. They logically consist of a series of byte values; that is, a	1566	text. They logically consist of a series of byte values; that is, a
1563	series of @acronym{ASCII} and eight-bit characters. In unibyte	1567	series of @acronym{ASCII} and eight-bit characters. In unibyte
1564	buffers and strings, these characters have codes in the range 0	1568	buffers and strings, these characters have codes in the range 0
1565	through 255. In a multibyte buffer or string, eight-bit characters	1569	through #xFF (255). In a multibyte buffer or string, eight-bit
1566	have character codes higher than 255 (@pxref{Text Representations}),	1570	characters have character codes higher than #xFF (@pxref{Text
1567	but Emacs transparently converts them to their single-byte values when	1571	Representations}), but Emacs transparently converts them to their
1568	you encode or decode such text.	1572	single-byte values when you encode or decode such text.
1569		1573
1570	The usual way to read a file into a buffer as a sequence of bytes, so	1574	The usual way to read a file into a buffer as a sequence of bytes, so
1571	you can decode the contents explicitly, is with	1575	you can decode the contents explicitly, is with