aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorEli Zaretskii2008-12-05 16:11:37 +0000
committerEli Zaretskii2008-12-05 16:11:37 +0000
commitaf38459ffe2a4f4b9ce4492e19520e4f46bf46d5 (patch)
treec719260f03542abcb44a379d99f8959d901529a5
parent6530de7d397e2c051d1076fd4d75a04993006b77 (diff)
downloademacs-af38459ffe2a4f4b9ce4492e19520e4f46bf46d5.tar.gz
emacs-af38459ffe2a4f4b9ce4492e19520e4f46bf46d5.zip
(Coding System Basics): Rewrite @ignore'd paragraph to speak about `undecided'.
(Character Properties): Don't explain the meaning of each property; instead, identify their Unicode Standard names.
-rw-r--r--doc/lispref/ChangeLog7
-rw-r--r--doc/lispref/nonascii.texi118
2 files changed, 66 insertions, 59 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index 749ead0708c..96118a3afe9 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,3 +1,10 @@
12008-12-05 Eli Zaretskii <eliz@gnu.org>
2
3 * nonascii.texi (Coding System Basics): Rewrite @ignore'd
4 paragraph to speak about `undecided'.
5 (Character Properties): Don't explain the meaning of each
6 property; instead, identify their Unicode Standard names.
7
12008-12-02 Glenn Morris <rgm@gnu.org> 82008-12-02 Glenn Morris <rgm@gnu.org>
2 9
3 * files.texi (Format Conversion Round-Trip): Rewrite format-write-file 10 * files.texi (Format Conversion Round-Trip): Rewrite format-write-file
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index c967c28f631..131b27d030e 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -360,95 +360,97 @@ of character properties. In particular, Emacs supports the
360Model}, and the Emacs character property database is derived from the 360Model}, and the Emacs character property database is derived from the
361Unicode Character Database (@acronym{UCD}). See the 361Unicode Character Database (@acronym{UCD}). See the
362@uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character 362@uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character
363Properties chapter of the Unicode Standard}, for more details about 363Properties chapter of the Unicode Standard}, for detailed description
364Unicode character properties and their meaning. 364of Unicode character properties and their meaning. This section
365assumes you are already familiar with that chapter of the Unicode
366Standard, and want to apply that knowledge to Emacs Lisp programs.
365 367
366 The facilities documented in this section are useful for setting and 368 The facilities documented in this section are useful for setting and
367retrieving properties of characters. 369retrieving properties of characters.
368 370
369 In Emacs, each property has a name, which is a symbol, and a set of 371 In Emacs, each property has a name, which is a symbol, and a set of
370possible values, whose types depend on the property. Here's the full 372possible values, whose types depend on the property; if a character
371list of character properties that Emacs knows about: 373does not have a certain property, the value is @code{nil}. Here's the
374full list of value types for all the character properties that Emacs
375knows about:
372 376
373@table @code 377@table @code
374@item name 378@item name
375The character's canonical unique name. The value of the property is a 379This property corresponds to the Unicode @code{Name} property. The
376string consisting of upper-case Latin letters A to Z, digits, spaces, 380value is a string consisting of upper-case Latin letters A to Z,
377and hyphen @samp{-} characters. 381digits, spaces, and hyphen @samp{-} characters.
378 382
379@item general-category 383@item general-category
380This property assigns the character to one of the major classes, such 384This property corresponds to the Unicode @code{General_Category}
381as letters, punctuation, and symbols, and its important subclasses. 385property. The value is a symbol whose name is a 2-letter abbreviation
382The value is a symbol whose name is a 2-letter abbreviation. The 386of the character's classification.
383first letter specifies the character's major class and the second
384letter designates a subclass of that major class.
385 387
386@item canonical-combining-class 388@item canonical-combining-class
387This property classifies combining characters into several classes, 389Corresponds to the Unicode @code{Canonical_Combining_Class} property.
388depending on the details of their behavior in sequences of combining 390The value is an integer number.
389characters. The property's value is an integer number.
390 391
391@item bidi-class 392@item bidi-class
392This property specifies character attributes required for correct 393Corresponds to the Unicode @code{Bidi_Class} property. The value is a
393display of @dfn{bidirectional text} used by right-to-left scripts, 394symbol whose name is the Unicode @dfn{directional type} of the
394such as Arabic and Hebrew. The value is a symbol whose name is the 395character.
395Unicode @dfn{directional type} of the character.
396 396
397@item decomposition 397@item decomposition
398This property defines a mapping from a character to a sequence of one 398Corresponds to the Unicode @code{Decomposition_Type} and
399or more characters that is a canonical or compatibility equivalent to 399@code{Decomposition_Value} properties. The value is a list, whose
400it. The value is a list, whose first element may be a symbol 400first element may be a symbol representing a compatibility formatting
401representing a compatibility formatting tag, such as @code{<small>}; 401tag, such as @code{small}@footnote{
402the other elements are characters that give the compatibility 402Note that Emacs strips the @samp{<..>} brackets from the corresponding
403decomposition sequence. 403Unicode tags; e.g., Unicode specifies @samp{<small>} where Emacs uses
404@samp{small}.
405}; the other elements are characters that give the compatibility
406decomposition sequence of this character.
404 407
405@item decimal-digit-value 408@item decimal-digit-value
406This property specifies a numeric value of characters that represent 409Corresponds to the Unicode @code{Numeric_Value} property for
407decimal digits. The value is an integer number. 410characters whose @code{Numeric_Type} is @samp{Digit}. The value is an
411integer number.
408 412
409@item digit 413@item digit
410This property specifies a numeric value of characters that represent 414Corresponds to the Unicode @code{Numeric_Value} property for
411digits, but not necessarily decimal. Examples include compatibility 415characters whose @code{Numeric_Type} is @samp{Decimal}. The value is
412subscript and superscript digits. The value is an integer number. 416an integer number. Examples of such characters include compatibility
417subscript and superscript digits, for which the value is the
418corresponding number.
413 419
414@item numeric-value 420@item numeric-value
415This property specifies whether the character represents a number. 421Corresponds to the Unicode @code{Numeric_Value} property for
416Examples of characters that do include fractions, subscripts, 422characters whose @code{Numeric_Type} is @samp{Numeric}. The value of
423this property is an integer of a floating-point number. Examples of
424characters that have this property include fractions, subscripts,
417superscripts, Roman numerals, currency numerators, and encircled 425superscripts, Roman numerals, currency numerators, and encircled
418numbers. The value is a symbol whose name gives the numeric value; 426numbers. For example, the value of this property for the character
419for example, the value of this property for the character 427@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}.
420@code{U+2155} (@sc{vulgar fraction one fifth}) is the symbol
421@samp{1/5}.
422 428
423@item mirrored 429@item mirrored
424This is a property of characters such as parentheses, which need to be 430Corresponds to the Unicode @code{Bidi_Mirrored} property. The value
425mirrored horizontally in right to left scripts. The value is a 431of this property is a symbol, either @samp{Y} or @samp{N}.
426symbol, either @samp{Y} or @samp{N}.
427 432
428@item old-name 433@item old-name
429This property's value specifies the name, if any, of the character in 434Corresponds to the Unicode @code{Unicode_1_Name} property. The value
430the old version 1.0 of the Unicode Standard. The value is a string. 435is a string.
431 436
432@item iso-10646-comment 437@item iso-10646-comment
433This character's comment field from the ISO 10646 standard. The value 438Corresponds to the Unicode @code{ISO_Comment} property. The value is
434is a string, or @code{nil} if there's no comment. 439a string.
435 440
436@item uppercase 441@item uppercase
437If this character has an upper-case equivalent that is a single 442Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
438character, then the value of this property is that upper-case 443The value of this property is a single character.
439equivalent. Otherwise, the value is @code{nil}.
440 444
441@item lowercase 445@item lowercase
442If this character has an lower-case equivalent that is a single 446Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property.
443character, then the value of this property is that lower-case 447The value of this property is a single character.
444equivalent. Otherwise, the value is @code{nil}.
445 448
446@item titlecase 449@item titlecase
450Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property.
447@dfn{Title case} is a special form of a character used when the first 451@dfn{Title case} is a special form of a character used when the first
448character of a word needs to be capitalized. If a character has a 452character of a word needs to be capitalized. The value of this
449title-case equivalent that is a single character, then the value of 453property is a single character.
450this property is that title-case equivalent. Otherwise, the value is
451@code{nil}.
452@end table 454@end table
453 455
454@defun get-char-code-property char propname 456@defun get-char-code-property char propname
@@ -793,12 +795,10 @@ alternative encodings for the same characters; for example, there are
793three coding systems for the Cyrillic (Russian) alphabet: ISO, 795three coding systems for the Cyrillic (Russian) alphabet: ISO,
794Alternativnyj, and KOI8. 796Alternativnyj, and KOI8.
795 797
796@c I think this paragraph is no longer correct. 798 Every coding system specifies a particular set of character code
797@ignore 799conversions, but the coding system @code{undecided} is special: it
798 Most coding systems specify a particular character code for 800leaves the choice unspecified, to be chosen heuristically for each
799conversion, but some of them leave the choice unspecified---to be chosen 801file, based on the file's data.
800heuristically for each file, based on the data.
801@end ignore
802 802
803 In general, a coding system doesn't guarantee roundtrip identity: 803 In general, a coding system doesn't guarantee roundtrip identity:
804decoding a byte sequence using coding system, then encoding the 804decoding a byte sequence using coding system, then encoding the