(Coding System Basics): Rewrite @ignore'd paragraph to speak about `undecided'.

(Character Properties): Don't explain the meaning of each property; instead, identify their Unicode Standard names.
author: Eli Zaretskii 2008-12-05 16:11:37 +0000
committer: Eli Zaretskii 2008-12-05 16:11:37 +0000
commit: af38459ffe2a4f4b9ce4492e19520e4f46bf46d5 (patch)
tree: c719260f03542abcb44a379d99f8959d901529a5
parent: 6530de7d397e2c051d1076fd4d75a04993006b77 (diff)
download: emacs-af38459ffe2a4f4b9ce4492e19520e4f46bf46d5.tar.gz
emacs-af38459ffe2a4f4b9ce4492e19520e4f46bf46d5.zip
2 files changed, 66 insertions, 59 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index 749ead0708c..96118a3afe9 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,3 +1,10 @@
+2008-12-05  Eli Zaretskii  <eliz@gnu.org>
+        * nonascii.texi (Coding System Basics): Rewrite @ignore'd
+        paragraph to speak about `undecided'.
+        (Character Properties): Don't explain the meaning of each
+        property; instead, identify their Unicode Standard names.
 2008-12-02  Glenn Morris  <rgm@gnu.org>
        * files.texi (Format Conversion Round-Trip): Rewrite format-write-file
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index c967c28f631..131b27d030e 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -360,95 +360,97 @@ of character properties.  In particular, Emacs supports the
 Model}, and the Emacs character property database is derived from the
 Unicode Character Database (@acronym{UCD}).  See the
 @uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character
-Properties chapter of the Unicode Standard}, for more details about
+Properties chapter of the Unicode Standard}, for detailed description
-Unicode character properties and their meaning.
+of Unicode character properties and their meaning.  This section
+assumes you are already familiar with that chapter of the Unicode
+Standard, and want to apply that knowledge to Emacs Lisp programs.
  The facilities documented in this section are useful for setting and
 retrieving properties of characters.
  In Emacs, each property has a name, which is a symbol, and a set of
-possible values, whose types depend on the property.  Here's the full
+possible values, whose types depend on the property; if a character
-list of character properties that Emacs knows about:
+does not have a certain property, the value is @code{nil}.  Here's the
+full list of value types for all the character properties that Emacs
+knows about:
 @table @code
 @item name
-The character's canonical unique name.  The value of the property is a
+This property corresponds to the Unicode @code{Name} property.  The
-string consisting of upper-case Latin letters A to Z, digits, spaces,
+value is a string consisting of upper-case Latin letters A to Z,
-and hyphen @samp{-} characters.
+digits, spaces, and hyphen @samp{-} characters.
 @item general-category
-This property assigns the character to one of the major classes, such
+This property corresponds to the Unicode @code{General_Category}
-as letters, punctuation, and symbols, and its important subclasses.
+property.  The value is a symbol whose name is a 2-letter abbreviation
-The value is a symbol whose name is a 2-letter abbreviation.  The
+of the character's classification.
-first letter specifies the character's major class and the second
-letter designates a subclass of that major class.
 @item canonical-combining-class
-This property classifies combining characters into several classes,
+Corresponds to the Unicode @code{Canonical_Combining_Class} property.
-depending on the details of their behavior in sequences of combining
+The value is an integer number.
-characters.  The property's value is an integer number.
 @item bidi-class
-This property specifies character attributes required for correct
+Corresponds to the Unicode @code{Bidi_Class} property.  The value is a
-display of @dfn{bidirectional text} used by right-to-left scripts,
+symbol whose name is the Unicode @dfn{directional type} of the
-such as Arabic and Hebrew.  The value is a symbol whose name is the
+character.
-Unicode @dfn{directional type} of the character.
 @item decomposition
-This property defines a mapping from a character to a sequence of one
+Corresponds to the Unicode @code{Decomposition_Type} and
-or more characters that is a canonical or compatibility equivalent to
+@code{Decomposition_Value} properties.  The value is a list, whose
-it.  The value is a list, whose first element may be a symbol
+first element may be a symbol representing a compatibility formatting
-representing a compatibility formatting tag, such as @code{<small>};
+tag, such as @code{small}@footnote{
-the other elements are characters that give the compatibility
+Note that Emacs strips the @samp{<..>} brackets from the corresponding
-decomposition sequence.
+Unicode tags; e.g., Unicode specifies @samp{<small>} where Emacs uses
+@samp{small}.
+}; the other elements are characters that give the compatibility
+decomposition sequence of this character.
 @item decimal-digit-value
-This property specifies a numeric value of characters that represent
+Corresponds to the Unicode @code{Numeric_Value} property for
-decimal digits.  The value is an integer number.
+characters whose @code{Numeric_Type} is @samp{Digit}.  The value is an
+integer number.
 @item digit
-This property specifies a numeric value of characters that represent
+Corresponds to the Unicode @code{Numeric_Value} property for
-digits, but not necessarily decimal.  Examples include compatibility
+characters whose @code{Numeric_Type} is @samp{Decimal}.  The value is
-subscript and superscript digits.  The value is an integer number.
+an integer number.  Examples of such characters include compatibility
+subscript and superscript digits, for which the value is the
+corresponding number.
 @item numeric-value
-This property specifies whether the character represents a number.
+Corresponds to the Unicode @code{Numeric_Value} property for
-Examples of characters that do include fractions, subscripts,
+characters whose @code{Numeric_Type} is @samp{Numeric}.  The value of
+this property is an integer of a floating-point number.  Examples of
+characters that have this property include fractions, subscripts,
 superscripts, Roman numerals, currency numerators, and encircled
-numbers.  The value is a symbol whose name gives the numeric value;
+numbers.  For example, the value of this property for the character
-for example, the value of this property for the character
+@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}.
-@code{U+2155} (@sc{vulgar fraction one fifth}) is the symbol
-@samp{1/5}.
 @item mirrored
-This is a property of characters such as parentheses, which need to be
+Corresponds to the Unicode @code{Bidi_Mirrored} property.  The value
-mirrored horizontally in right to left scripts.  The value is a
+of this property is a symbol, either @samp{Y} or @samp{N}.
-symbol, either @samp{Y} or @samp{N}.
 @item old-name
-This property's value specifies the name, if any, of the character in
+Corresponds to the Unicode @code{Unicode_1_Name} property.  The value
-the old version 1.0 of the Unicode Standard.  The value is a string.
+is a string.
 @item iso-10646-comment
-This character's comment field from the ISO 10646 standard.  The value
+Corresponds to the Unicode @code{ISO_Comment} property.  The value is
-is a string, or @code{nil} if there's no comment.
+a string.
 @item uppercase
-If this character has an upper-case equivalent that is a single
+Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
-character, then the value of this property is that upper-case
+The value of this property is a single character.
-equivalent.  Otherwise, the value is @code{nil}.
 @item lowercase
-If this character has an lower-case equivalent that is a single
+Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property.
-character, then the value of this property is that lower-case
+The value of this property is a single character.
-equivalent.  Otherwise, the value is @code{nil}.
 @item titlecase
+Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property.
 @dfn{Title case} is a special form of a character used when the first
-character of a word needs to be capitalized.  If a character has a
+character of a word needs to be capitalized.  The value of this
-title-case equivalent that is a single character, then the value of
+property is a single character.
-this property is that title-case equivalent.  Otherwise, the value is
-@code{nil}.
 @end table
 @defun get-char-code-property char propname
@@ -793,12 +795,10 @@ alternative encodings for the same characters; for example, there are
 three coding systems for the Cyrillic (Russian) alphabet: ISO,
 Alternativnyj, and KOI8.
-@c I think this paragraph is no longer correct.
+  Every coding system specifies a particular set of character code
-@ignore
+conversions, but the coding system @code{undecided} is special: it
-  Most coding systems specify a particular character code for
+leaves the choice unspecified, to be chosen heuristically for each
-conversion, but some of them leave the choice unspecified---to be chosen
+file, based on the file's data.
-heuristically for each file, based on the data.
-@end ignore
  In general, a coding system doesn't guarantee roundtrip identity:
 decoding a byte sequence using coding system, then encoding the
author	Eli Zaretskii	2008-12-05 16:11:37 +0000
committer	Eli Zaretskii	2008-12-05 16:11:37 +0000
commit	af38459ffe2a4f4b9ce4492e19520e4f46bf46d5 (patch)
tree	c719260f03542abcb44a379d99f8959d901529a5
parent	6530de7d397e2c051d1076fd4d75a04993006b77 (diff)
download	emacs-af38459ffe2a4f4b9ce4492e19520e4f46bf46d5.tar.gz emacs-af38459ffe2a4f4b9ce4492e19520e4f46bf46d5.zip

diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 749ead0708c..96118a3afe9 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog
@@ -1,3 +1,10 @@
		1	2008-12-05 Eli Zaretskii <eliz@gnu.org>
		2
		3	* nonascii.texi (Coding System Basics): Rewrite @ignore'd
		4	paragraph to speak about `undecided'.
		5	(Character Properties): Don't explain the meaning of each
		6	property; instead, identify their Unicode Standard names.
		7
1	2008-12-02 Glenn Morris <rgm@gnu.org>	8	2008-12-02 Glenn Morris <rgm@gnu.org>
2		9
3	* files.texi (Format Conversion Round-Trip): Rewrite format-write-file	10	* files.texi (Format Conversion Round-Trip): Rewrite format-write-file


diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index c967c28f631..131b27d030e 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi
@@ -360,95 +360,97 @@ of character properties. In particular, Emacs supports the
360	Model}, and the Emacs character property database is derived from the	360	Model}, and the Emacs character property database is derived from the
361	Unicode Character Database (@acronym{UCD}). See the	361	Unicode Character Database (@acronym{UCD}). See the
362	@uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character	362	@uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character
363	Properties chapter of the Unicode Standard}, for more details about	363	Properties chapter of the Unicode Standard}, for detailed description
364	Unicode character properties and their meaning.	364	of Unicode character properties and their meaning. This section
		365	assumes you are already familiar with that chapter of the Unicode
		366	Standard, and want to apply that knowledge to Emacs Lisp programs.
365		367
366	The facilities documented in this section are useful for setting and	368	The facilities documented in this section are useful for setting and
367	retrieving properties of characters.	369	retrieving properties of characters.
368		370
369	In Emacs, each property has a name, which is a symbol, and a set of	371	In Emacs, each property has a name, which is a symbol, and a set of
370	possible values, whose types depend on the property. Here's the full	372	possible values, whose types depend on the property; if a character
371	list of character properties that Emacs knows about:	373	does not have a certain property, the value is @code{nil}. Here's the
		374	full list of value types for all the character properties that Emacs
		375	knows about:
372		376
373	@table @code	377	@table @code
374	@item name	378	@item name
375	The character's canonical unique name. The value of the property is a	379	This property corresponds to the Unicode @code{Name} property. The
376	string consisting of upper-case Latin letters A to Z, digits, spaces,	380	value is a string consisting of upper-case Latin letters A to Z,
377	and hyphen @samp{-} characters.	381	digits, spaces, and hyphen @samp{-} characters.
378		382
379	@item general-category	383	@item general-category
380	This property assigns the character to one of the major classes, such	384	This property corresponds to the Unicode @code{General_Category}
381	as letters, punctuation, and symbols, and its important subclasses.	385	property. The value is a symbol whose name is a 2-letter abbreviation
382	The value is a symbol whose name is a 2-letter abbreviation. The	386	of the character's classification.
383	first letter specifies the character's major class and the second
384	letter designates a subclass of that major class.
385		387
386	@item canonical-combining-class	388	@item canonical-combining-class
387	This property classifies combining characters into several classes,	389	Corresponds to the Unicode @code{Canonical_Combining_Class} property.
388	depending on the details of their behavior in sequences of combining	390	The value is an integer number.
389	characters. The property's value is an integer number.
390		391
391	@item bidi-class	392	@item bidi-class
392	This property specifies character attributes required for correct	393	Corresponds to the Unicode @code{Bidi_Class} property. The value is a
393	display of @dfn{bidirectional text} used by right-to-left scripts,	394	symbol whose name is the Unicode @dfn{directional type} of the
394	such as Arabic and Hebrew. The value is a symbol whose name is the	395	character.
395	Unicode @dfn{directional type} of the character.
396		396
397	@item decomposition	397	@item decomposition
398	This property defines a mapping from a character to a sequence of one	398	Corresponds to the Unicode @code{Decomposition_Type} and
399	or more characters that is a canonical or compatibility equivalent to	399	@code{Decomposition_Value} properties. The value is a list, whose
400	it. The value is a list, whose first element may be a symbol	400	first element may be a symbol representing a compatibility formatting
401	representing a compatibility formatting tag, such as @code{<small>};	401	tag, such as @code{small}@footnote{
402	the other elements are characters that give the compatibility	402	Note that Emacs strips the @samp{<..>} brackets from the corresponding
403	decomposition sequence.	403	Unicode tags; e.g., Unicode specifies @samp{<small>} where Emacs uses
		404	@samp{small}.
		405	}; the other elements are characters that give the compatibility
		406	decomposition sequence of this character.
404		407
405	@item decimal-digit-value	408	@item decimal-digit-value
406	This property specifies a numeric value of characters that represent	409	Corresponds to the Unicode @code{Numeric_Value} property for
407	decimal digits. The value is an integer number.	410	characters whose @code{Numeric_Type} is @samp{Digit}. The value is an
		411	integer number.
408		412
409	@item digit	413	@item digit
410	This property specifies a numeric value of characters that represent	414	Corresponds to the Unicode @code{Numeric_Value} property for
411	digits, but not necessarily decimal. Examples include compatibility	415	characters whose @code{Numeric_Type} is @samp{Decimal}. The value is
412	subscript and superscript digits. The value is an integer number.	416	an integer number. Examples of such characters include compatibility
		417	subscript and superscript digits, for which the value is the
		418	corresponding number.
413		419
414	@item numeric-value	420	@item numeric-value
415	This property specifies whether the character represents a number.	421	Corresponds to the Unicode @code{Numeric_Value} property for
416	Examples of characters that do include fractions, subscripts,	422	characters whose @code{Numeric_Type} is @samp{Numeric}. The value of
		423	this property is an integer of a floating-point number. Examples of
		424	characters that have this property include fractions, subscripts,
417	superscripts, Roman numerals, currency numerators, and encircled	425	superscripts, Roman numerals, currency numerators, and encircled
418	numbers. The value is a symbol whose name gives the numeric value;	426	numbers. For example, the value of this property for the character
419	for example, the value of this property for the character	427	@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}.
420	@code{U+2155} (@sc{vulgar fraction one fifth}) is the symbol
421	@samp{1/5}.
422		428
423	@item mirrored	429	@item mirrored
424	This is a property of characters such as parentheses, which need to be	430	Corresponds to the Unicode @code{Bidi_Mirrored} property. The value
425	mirrored horizontally in right to left scripts. The value is a	431	of this property is a symbol, either @samp{Y} or @samp{N}.
426	symbol, either @samp{Y} or @samp{N}.
427		432
428	@item old-name	433	@item old-name
429	This property's value specifies the name, if any, of the character in	434	Corresponds to the Unicode @code{Unicode_1_Name} property. The value
430	the old version 1.0 of the Unicode Standard. The value is a string.	435	is a string.
431		436
432	@item iso-10646-comment	437	@item iso-10646-comment
433	This character's comment field from the ISO 10646 standard. The value	438	Corresponds to the Unicode @code{ISO_Comment} property. The value is
434	is a string, or @code{nil} if there's no comment.	439	a string.
435		440
436	@item uppercase	441	@item uppercase
437	If this character has an upper-case equivalent that is a single	442	Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
438	character, then the value of this property is that upper-case	443	The value of this property is a single character.
439	equivalent. Otherwise, the value is @code{nil}.
440		444
441	@item lowercase	445	@item lowercase
442	If this character has an lower-case equivalent that is a single	446	Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property.
443	character, then the value of this property is that lower-case	447	The value of this property is a single character.
444	equivalent. Otherwise, the value is @code{nil}.
445		448
446	@item titlecase	449	@item titlecase
		450	Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property.
447	@dfn{Title case} is a special form of a character used when the first	451	@dfn{Title case} is a special form of a character used when the first
448	character of a word needs to be capitalized. If a character has a	452	character of a word needs to be capitalized. The value of this
449	title-case equivalent that is a single character, then the value of	453	property is a single character.
450	this property is that title-case equivalent. Otherwise, the value is
451	@code{nil}.
452	@end table	454	@end table
453		455
454	@defun get-char-code-property char propname	456	@defun get-char-code-property char propname
@@ -793,12 +795,10 @@ alternative encodings for the same characters; for example, there are
793	three coding systems for the Cyrillic (Russian) alphabet: ISO,	795	three coding systems for the Cyrillic (Russian) alphabet: ISO,
794	Alternativnyj, and KOI8.	796	Alternativnyj, and KOI8.
795		797
796	@c I think this paragraph is no longer correct.	798	Every coding system specifies a particular set of character code
797	@ignore	799	conversions, but the coding system @code{undecided} is special: it
798	Most coding systems specify a particular character code for	800	leaves the choice unspecified, to be chosen heuristically for each
799	conversion, but some of them leave the choice unspecified---to be chosen	801	file, based on the file's data.
800	heuristically for each file, based on the data.
801	@end ignore
802		802
803	In general, a coding system doesn't guarantee roundtrip identity:	803	In general, a coding system doesn't guarantee roundtrip identity:
804	decoding a byte sequence using coding system, then encoding the	804	decoding a byte sequence using coding system, then encoding the