* nonascii.texi (Text Representations): Copyedits.

(Coding System Basics): Also mention utf-8-emacs. (Converting Representations, Selecting a Representation) (Scanning Charsets, Translation of Characters, Encoding and I/O): Copyedits. (Character Codes): Mention role of codepoints 1114112 to 4194175.
author: Chong Yidong 2009-04-10 01:16:27 +0000
committer: Chong Yidong 2009-04-10 01:16:27 +0000
commit: 97d8273fa2687731d652687cf6b4c7c48dd0661a (patch)
tree: d65f28226463eb7dd59f7c4aa925db18d2d8dc3c
parent: c872c51e2b8805ca4ee674ee7600f5b914492a68 (diff)
download: emacs-97d8273fa2687731d652687cf6b4c7c48dd0661a.tar.gz
emacs-97d8273fa2687731d652687cf6b4c7c48dd0661a.zip
2 files changed, 83 insertions, 79 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index 50e87de8332..283598c2137 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,3 +1,12 @@
+2009-04-10  Chong Yidong  <cyd@stupidchicken.com>
+        * nonascii.texi (Text Representations): Copyedits.
+        (Coding System Basics): Also mention utf-8-emacs.
+        (Converting Representations, Selecting a Representation)
+        (Scanning Charsets, Translation of Characters, Encoding and I/O):
+        Copyedits.
+        (Character Codes): Mention role of codepoints 1114112 to 4194175.
 2009-04-09  Chong Yidong  <cyd@stupidchicken.com>
        * text.texi (Yank Commands): Note that yank uses push-mark.
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index 478a9eca060..818cc096b83 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -36,8 +36,8 @@ how they are stored in strings and buffers.
 @cindex text representation
  Emacs buffers and strings support a large repertoire of characters
-from many different scripts.  This is so users could type and display
+from many different scripts, allowing users to type and display text
-text in most any known written language.
+in most any known written language.
 @cindex character codepoint
 @cindex codespace
@@ -65,15 +65,13 @@ This internal representation is based on one of the encodings defined
 by the Unicode Standard, called @dfn{UTF-8}, for representing any
 Unicode codepoint, but Emacs extends UTF-8 to represent the additional
 codepoints it uses for raw 8-bit bytes and characters not unified with
-Unicode.}.
+Unicode.}.  For example, any @acronym{ASCII} character takes up only 1
-For example, any @acronym{ASCII} character takes up only 1 byte, a
+byte, a Latin-1 character takes up 2 bytes, etc.  We call this
-Latin-1 character takes up 2 bytes, etc.  We call this representation
+representation of text @dfn{multibyte}.
-of text @dfn{multibyte}, because it uses several bytes for each
-character.
  Outside Emacs, characters can be represented in many different
 encodings, such as ISO-8859-1, GB-2312, Big-5, etc.  Emacs converts
-between these external encodings and the internal representation, as
+between these external encodings and its internal representation, as
 appropriate, when it reads text into a buffer or a string, or when it
 writes text to a disk file or passes it to some other process.
@@ -87,9 +85,9 @@ Before the conversion, the buffer holds encoded text.
  Encoded text is not really text, as far as Emacs is concerned, but
 rather a sequence of raw 8-bit bytes.  We call buffers and strings
 that hold encoded text @dfn{unibyte} buffers and strings, because
-Emacs treats them as a sequence of individual bytes.  In particular,
+Emacs treats them as a sequence of individual bytes.  Usually, Emacs
-Emacs usually displays unibyte buffers and strings as octal codes such
+displays unibyte buffers and strings as octal codes such as
-as @code{\237}.  We recommend that you never use unibyte buffers and
+@code{\237}.  We recommend that you never use unibyte buffers and
 strings except for manipulating encoded text or binary non-text data.
  In a buffer, the buffer-local value of the variable
@@ -165,10 +163,10 @@ conversions happen when inserting text into a buffer, or when putting
 text from several strings together in one string.  You can also
 explicitly convert a string's contents to either representation.
-  Emacs chooses the representation for a string based on the text that
+  Emacs chooses the representation for a string based on the text from
-it is constructed from.  The general rule is to convert unibyte text to
+which it is constructed.  The general rule is to convert unibyte text
-multibyte text when combining it with other multibyte text, because the
+to multibyte text when combining it with other multibyte text, because
-multibyte representation is more general and can hold whatever
+the multibyte representation is more general and can hold whatever
 characters the unibyte text has.
  When inserting text into a buffer, Emacs converts the text to the
@@ -181,9 +179,9 @@ alternative, to convert the buffer contents to multibyte, is not
 acceptable because the buffer's representation is a choice made by the
 user that cannot be overridden automatically.
-  Converting unibyte text to multibyte text leaves @acronym{ASCII} characters
+  Converting unibyte text to multibyte text leaves @acronym{ASCII}
-unchanged, and converts bytes with codes 128 through 159 to the
+characters unchanged, and converts bytes with codes 128 through 159 to
-multibyte representation of raw eight-bit bytes.
+the multibyte representation of raw eight-bit bytes.
  Converting multibyte text to unibyte converts all @acronym{ASCII}
 and eight-bit characters to their single-byte form, but loses
@@ -214,9 +212,9 @@ characters.
 @end defun
 @defun multibyte-char-to-unibyte char
-This convert the multibyte character @var{char} to a unibyte
+This converts the multibyte character @var{char} to a unibyte
-character.  If @var{char} is a character that is neither
+character, and returns that character.  If @var{char} is neither
-@acronym{ASCII} nor eight-bit, the value is -1.
+@acronym{ASCII} nor eight-bit, the function returns -1.
 @end defun
 @defun unibyte-char-to-multibyte char
@@ -238,9 +236,9 @@ is @code{nil}, the buffer becomes unibyte.
 This function leaves the buffer contents unchanged when viewed as a
 sequence of bytes.  As a consequence, it can change the contents
-viewed as characters; a sequence of three bytes which is treated as
+viewed as characters; for instance, a sequence of three bytes which is
-one character in multibyte representation will count as three
+treated as one character in multibyte representation will count as
-characters in unibyte representation.  Eight-bit characters
+three characters in unibyte representation.  Eight-bit characters
 representing raw bytes are an exception.  They are represented by one
 byte in a unibyte buffer, but when the buffer is set to multibyte,
 they are converted to two-byte sequences, and vice versa.
@@ -256,28 +254,24 @@ base buffer.
 @end defun
 @defun string-as-unibyte string
-This function returns a string with the same bytes as @var{string} but
+If @var{string} is already a unibyte string, this function returns
-treating each byte as a character.  This means that the value may have
+@var{string} itself.  Otherwise, it returns a new string with the same
-more characters than @var{string} has.  Eight-bit characters
+bytes as @var{string}, but treating each byte as a separate character
-representing raw bytes are an exception: each one of them is converted
+(so that the value may have more characters than @var{string}); as an
-to a single byte.
+exception, each eight-bit character representing a raw byte is
+converted into a single byte.  The newly-created string contains no
-If @var{string} is already a unibyte string, then the value is
-@var{string} itself.  Otherwise it is a newly created string, with no
 text properties.
 @end defun
 @defun string-as-multibyte string
-This function returns a string with the same bytes as @var{string} but
+If @var{string} is a multibyte string, this function returns
-treating each multibyte sequence as one character.  This means that
+@var{string} itself.  Otherwise, it returns a new string with the same
-the value may have fewer characters than @var{string} has.  If a byte
+bytes as @var{string}, but treating each multibyte sequence as one
-sequence in @var{string} is invalid as a multibyte representation of a
+character.  This means that the value may have fewer characters than
-single character, each byte in the sequence is treated as raw 8-bit
+@var{string} has.  If a byte sequence in @var{string} is invalid as a
-byte.
+multibyte representation of a single character, each byte in the
+sequence is treated as a raw 8-bit byte.  The newly-created string
-If @var{string} is already a multibyte string, then the value is
+contains no text properties.
-@var{string} itself.  Otherwise it is a newly created string, with no
-text properties.
 @end defun
 @node Character Codes
@@ -291,9 +285,10 @@ character codes for multibyte representation range from 0 to 4194303
 (#x3FFFFF).  In this code space, values 0 through 127 are for
 @acronym{ASCII} charcters, and values 129 through 4194175 (#x3FFF7F)
 are for non-@acronym{ASCII} characters.  Values 0 through 1114111
-(#10FFFF) corresponds to Unicode characters of the same codepoint,
+(#10FFFF) correspond to Unicode characters of the same codepoint;
-while values 4194176 (#x3FFF80) through 4194303 (#x3FFFFF) are for
+values 1114112 (#110000) through 4194175 (#x3FFF7F) represent
-representing eight-bit raw bytes.
+characters that are not unified with Unicode; and values 4194176
+(#x3FFF80) through 4194303 (#x3FFFFF) represent eight-bit raw bytes.
 @defun characterp charcode
 This returns @code{t} if @var{charcode} is a valid character, and
@@ -334,9 +329,9 @@ codepoint can have.
 @end defun
 @defun get-byte pos &optional string
-This function returns the byte at current buffer's character position
+This function returns the byte at character position @var{pos} in the
-@var{pos}.  If the current buffer is unibyte, this is literally the
+current buffer.  If the current buffer is unibyte, this is literally
-byte at that position.  If the buffer is multibyte, byte values of
+the byte at that position.  If the buffer is multibyte, byte values of
 @acronym{ASCII} characters are the same as character codepoints,
 whereas eight-bit raw bytes are converted to their 8-bit codes.  The
 function signals an error if the character at @var{pos} is
@@ -360,13 +355,11 @@ of character properties.  In particular, Emacs supports the
 Model}, and the Emacs character property database is derived from the
 Unicode Character Database (@acronym{UCD}).  See the
 @uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character
-Properties chapter of the Unicode Standard}, for detailed description
+Properties chapter of the Unicode Standard}, for a detailed
-of Unicode character properties and their meaning.  This section
+description of Unicode character properties and their meaning.  This
-assumes you are already familiar with that chapter of the Unicode
+section assumes you are already familiar with that chapter of the
-Standard, and want to apply that knowledge to Emacs Lisp programs.
+Unicode Standard, and want to apply that knowledge to Emacs Lisp
+programs.
-  The facilities documented in this section are useful for setting and
-retrieving properties of characters.
  In Emacs, each property has a name, which is a symbol, and a set of
 possible values, whose types depend on the property; if a character
@@ -378,8 +371,8 @@ replacing each @samp{_} character with a dash @samp{-}.  For example,
 @code{canonical-combining-class}.  However, sometimes we shorten the
 names to make their use easier.
-  Here's the full list of value types for all the character properties
+  Here is the full list of value types for all the character
-that Emacs knows about:
+properties that Emacs knows about:
 @table @code
 @item name
@@ -428,7 +421,7 @@ corresponding number.
 @item numeric-value
 Corresponds to the Unicode @code{Numeric_Value} property for
 characters whose @code{Numeric_Type} is @samp{Numeric}.  The value of
-this property is an integer of a floating-point number.  Examples of
+this property is an integer or a floating-point number.  Examples of
 characters that have this property include fractions, subscripts,
 superscripts, Roman numerals, currency numerators, and encircled
 numbers.  For example, the value of this property for the character
@@ -656,16 +649,15 @@ or last codepoint of @var{charset}, respectively.
 @node Scanning Charsets
 @section Scanning for Character Sets
-  Sometimes it is useful to find out, for characters that appear in a
+  Sometimes it is useful to find out which character set a particular
-certain part of a buffer or a string, to which character sets they
+character belongs to.  One use for this is in determining which coding
-belong.  One use for this is in determining which coding systems
+systems (@pxref{Coding Systems}) are capable of representing all of
-(@pxref{Coding Systems}) are capable of representing all of the text
+the text in question; another is to determine the font(s) for
-in question; another is to determine the font(s) for displaying that
+displaying that text.
-text.
 @defun charset-after &optional pos
 This function returns the charset of highest priority containing the
-character in the current buffer at position @var{pos}.  If @var{pos}
+character at position @var{pos} in the current buffer.  If @var{pos}
 is omitted or @code{nil}, it defaults to the current value of point.
 If @var{pos} is out of range, the value is @code{nil}.
 @end defun
@@ -675,15 +667,15 @@ This function returns a list of the character sets of highest priority
 that contain characters in the current buffer between positions
 @var{beg} and @var{end}.
-The optional argument @var{translation} specifies a translation table to
+The optional argument @var{translation} specifies a translation table
-be used in scanning the text (@pxref{Translation of Characters}).  If it
+to use for scanning the text (@pxref{Translation of Characters}).  If
-is non-@code{nil}, then each character in the region is translated
+it is non-@code{nil}, then each character in the region is translated
 through this table, and the value returned describes the translated
 characters instead of the characters actually in the buffer.
 @end defun
 @defun find-charset-string string &optional translation
-This function returns a list of the character sets of highest priority
+This function returns a list of character sets of highest priority
 that contain characters in @var{string}.  It is just like
 @code{find-charset-region}, except that it applies to the contents of
 @var{string} instead of part of the current buffer.
@@ -721,7 +713,7 @@ character, say @var{to-alt}, @var{from} is also translated to
  During decoding, the translation table's translations are applied to
 the characters that result from ordinary decoding.  If a coding system
-has property @code{:decode-translation-table}, that specifies the
+has the property @code{:decode-translation-table}, that specifies the
 translation table to use, or a list of translation tables to apply in
 sequence.  (This is a property of the coding system, as returned by
 @code{coding-system-get}, not a property of the symbol that is the
@@ -779,8 +771,8 @@ respectively in the @var{props} argument to
 This function is similar to @code{make-translation-table} but returns
 a complex translation table rather than a simple one-to-one mapping.
 Each element of @var{alist} is of the form @code{(@var{from}
-. @var{to})}, where @var{from} and @var{to} are either a character or
+. @var{to})}, where @var{from} and @var{to} are either characters or
-a vector specifying a sequence of characters.  If @var{from} is a
+vectors specifying a sequence of characters.  If @var{from} is a
 character, that character is translated to @var{to} (i.e.@: to a
 character or a character sequence).  If @var{from} is a vector of
 characters, that sequence is translated to @var{to}.  The returned
@@ -891,10 +883,13 @@ end-of-line conversion.
 codes or end-of-line.
 @vindex emacs-internal@r{ coding system}
-  The coding system @code{emacs-internal} specifies that the data is
+@vindex utf-8-emacs@r{ coding system}
-represented in the internal Emacs encoding.  This is like
+  The coding system @code{utf-8-emacs} specifies that the data is
-@code{raw-text} in that no code conversion happens, but different in
+represented in the internal Emacs encoding (@pxref{Text
-that the result is multibyte data.
+Representations}).  This is like @code{raw-text} in that no code
+conversion happens, but different in that the result is multibyte
+data.  The name @code{emacs-internal} is an alias for
+@code{utf-8-emacs}.
 @defun coding-system-get coding-system property
 This function returns the specified property of the coding system
@@ -924,9 +919,9 @@ This function returns the list of aliases of @var{coding-system}.
 @subsection Encoding and I/O
  The principal purpose of coding systems is for use in reading and
-writing files.  The function @code{insert-file-contents} uses
+writing files.  The function @code{insert-file-contents} uses a coding
-a coding system for decoding the file data, and @code{write-region}
+system to decode the file data, and @code{write-region} uses one to
-uses one to encode the buffer contents.
+encode the buffer contents.
  You can specify the coding system to use either explicitly
 (@pxref{Specifying Coding Systems}), or implicitly using a default
author	Chong Yidong	2009-04-10 01:16:27 +0000
committer	Chong Yidong	2009-04-10 01:16:27 +0000
commit	97d8273fa2687731d652687cf6b4c7c48dd0661a (patch)
tree	d65f28226463eb7dd59f7c4aa925db18d2d8dc3c
parent	c872c51e2b8805ca4ee674ee7600f5b914492a68 (diff)
download	emacs-97d8273fa2687731d652687cf6b4c7c48dd0661a.tar.gz emacs-97d8273fa2687731d652687cf6b4c7c48dd0661a.zip