diff options
| author | Eli Zaretskii | 2008-12-05 16:11:37 +0000 |
|---|---|---|
| committer | Eli Zaretskii | 2008-12-05 16:11:37 +0000 |
| commit | af38459ffe2a4f4b9ce4492e19520e4f46bf46d5 (patch) | |
| tree | c719260f03542abcb44a379d99f8959d901529a5 | |
| parent | 6530de7d397e2c051d1076fd4d75a04993006b77 (diff) | |
| download | emacs-af38459ffe2a4f4b9ce4492e19520e4f46bf46d5.tar.gz emacs-af38459ffe2a4f4b9ce4492e19520e4f46bf46d5.zip | |
(Coding System Basics): Rewrite @ignore'd paragraph to speak about `undecided'.
(Character Properties): Don't explain the meaning of each property; instead,
identify their Unicode Standard names.
| -rw-r--r-- | doc/lispref/ChangeLog | 7 | ||||
| -rw-r--r-- | doc/lispref/nonascii.texi | 118 |
2 files changed, 66 insertions, 59 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 749ead0708c..96118a3afe9 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog | |||
| @@ -1,3 +1,10 @@ | |||
| 1 | 2008-12-05 Eli Zaretskii <eliz@gnu.org> | ||
| 2 | |||
| 3 | * nonascii.texi (Coding System Basics): Rewrite @ignore'd | ||
| 4 | paragraph to speak about `undecided'. | ||
| 5 | (Character Properties): Don't explain the meaning of each | ||
| 6 | property; instead, identify their Unicode Standard names. | ||
| 7 | |||
| 1 | 2008-12-02 Glenn Morris <rgm@gnu.org> | 8 | 2008-12-02 Glenn Morris <rgm@gnu.org> |
| 2 | 9 | ||
| 3 | * files.texi (Format Conversion Round-Trip): Rewrite format-write-file | 10 | * files.texi (Format Conversion Round-Trip): Rewrite format-write-file |
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index c967c28f631..131b27d030e 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi | |||
| @@ -360,95 +360,97 @@ of character properties. In particular, Emacs supports the | |||
| 360 | Model}, and the Emacs character property database is derived from the | 360 | Model}, and the Emacs character property database is derived from the |
| 361 | Unicode Character Database (@acronym{UCD}). See the | 361 | Unicode Character Database (@acronym{UCD}). See the |
| 362 | @uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character | 362 | @uref{http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf, Character |
| 363 | Properties chapter of the Unicode Standard}, for more details about | 363 | Properties chapter of the Unicode Standard}, for detailed description |
| 364 | Unicode character properties and their meaning. | 364 | of Unicode character properties and their meaning. This section |
| 365 | assumes you are already familiar with that chapter of the Unicode | ||
| 366 | Standard, and want to apply that knowledge to Emacs Lisp programs. | ||
| 365 | 367 | ||
| 366 | The facilities documented in this section are useful for setting and | 368 | The facilities documented in this section are useful for setting and |
| 367 | retrieving properties of characters. | 369 | retrieving properties of characters. |
| 368 | 370 | ||
| 369 | In Emacs, each property has a name, which is a symbol, and a set of | 371 | In Emacs, each property has a name, which is a symbol, and a set of |
| 370 | possible values, whose types depend on the property. Here's the full | 372 | possible values, whose types depend on the property; if a character |
| 371 | list of character properties that Emacs knows about: | 373 | does not have a certain property, the value is @code{nil}. Here's the |
| 374 | full list of value types for all the character properties that Emacs | ||
| 375 | knows about: | ||
| 372 | 376 | ||
| 373 | @table @code | 377 | @table @code |
| 374 | @item name | 378 | @item name |
| 375 | The character's canonical unique name. The value of the property is a | 379 | This property corresponds to the Unicode @code{Name} property. The |
| 376 | string consisting of upper-case Latin letters A to Z, digits, spaces, | 380 | value is a string consisting of upper-case Latin letters A to Z, |
| 377 | and hyphen @samp{-} characters. | 381 | digits, spaces, and hyphen @samp{-} characters. |
| 378 | 382 | ||
| 379 | @item general-category | 383 | @item general-category |
| 380 | This property assigns the character to one of the major classes, such | 384 | This property corresponds to the Unicode @code{General_Category} |
| 381 | as letters, punctuation, and symbols, and its important subclasses. | 385 | property. The value is a symbol whose name is a 2-letter abbreviation |
| 382 | The value is a symbol whose name is a 2-letter abbreviation. The | 386 | of the character's classification. |
| 383 | first letter specifies the character's major class and the second | ||
| 384 | letter designates a subclass of that major class. | ||
| 385 | 387 | ||
| 386 | @item canonical-combining-class | 388 | @item canonical-combining-class |
| 387 | This property classifies combining characters into several classes, | 389 | Corresponds to the Unicode @code{Canonical_Combining_Class} property. |
| 388 | depending on the details of their behavior in sequences of combining | 390 | The value is an integer number. |
| 389 | characters. The property's value is an integer number. | ||
| 390 | 391 | ||
| 391 | @item bidi-class | 392 | @item bidi-class |
| 392 | This property specifies character attributes required for correct | 393 | Corresponds to the Unicode @code{Bidi_Class} property. The value is a |
| 393 | display of @dfn{bidirectional text} used by right-to-left scripts, | 394 | symbol whose name is the Unicode @dfn{directional type} of the |
| 394 | such as Arabic and Hebrew. The value is a symbol whose name is the | 395 | character. |
| 395 | Unicode @dfn{directional type} of the character. | ||
| 396 | 396 | ||
| 397 | @item decomposition | 397 | @item decomposition |
| 398 | This property defines a mapping from a character to a sequence of one | 398 | Corresponds to the Unicode @code{Decomposition_Type} and |
| 399 | or more characters that is a canonical or compatibility equivalent to | 399 | @code{Decomposition_Value} properties. The value is a list, whose |
| 400 | it. The value is a list, whose first element may be a symbol | 400 | first element may be a symbol representing a compatibility formatting |
| 401 | representing a compatibility formatting tag, such as @code{<small>}; | 401 | tag, such as @code{small}@footnote{ |
| 402 | the other elements are characters that give the compatibility | 402 | Note that Emacs strips the @samp{<..>} brackets from the corresponding |
| 403 | decomposition sequence. | 403 | Unicode tags; e.g., Unicode specifies @samp{<small>} where Emacs uses |
| 404 | @samp{small}. | ||
| 405 | }; the other elements are characters that give the compatibility | ||
| 406 | decomposition sequence of this character. | ||
| 404 | 407 | ||
| 405 | @item decimal-digit-value | 408 | @item decimal-digit-value |
| 406 | This property specifies a numeric value of characters that represent | 409 | Corresponds to the Unicode @code{Numeric_Value} property for |
| 407 | decimal digits. The value is an integer number. | 410 | characters whose @code{Numeric_Type} is @samp{Digit}. The value is an |
| 411 | integer number. | ||
| 408 | 412 | ||
| 409 | @item digit | 413 | @item digit |
| 410 | This property specifies a numeric value of characters that represent | 414 | Corresponds to the Unicode @code{Numeric_Value} property for |
| 411 | digits, but not necessarily decimal. Examples include compatibility | 415 | characters whose @code{Numeric_Type} is @samp{Decimal}. The value is |
| 412 | subscript and superscript digits. The value is an integer number. | 416 | an integer number. Examples of such characters include compatibility |
| 417 | subscript and superscript digits, for which the value is the | ||
| 418 | corresponding number. | ||
| 413 | 419 | ||
| 414 | @item numeric-value | 420 | @item numeric-value |
| 415 | This property specifies whether the character represents a number. | 421 | Corresponds to the Unicode @code{Numeric_Value} property for |
| 416 | Examples of characters that do include fractions, subscripts, | 422 | characters whose @code{Numeric_Type} is @samp{Numeric}. The value of |
| 423 | this property is an integer of a floating-point number. Examples of | ||
| 424 | characters that have this property include fractions, subscripts, | ||
| 417 | superscripts, Roman numerals, currency numerators, and encircled | 425 | superscripts, Roman numerals, currency numerators, and encircled |
| 418 | numbers. The value is a symbol whose name gives the numeric value; | 426 | numbers. For example, the value of this property for the character |
| 419 | for example, the value of this property for the character | 427 | @code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. |
| 420 | @code{U+2155} (@sc{vulgar fraction one fifth}) is the symbol | ||
| 421 | @samp{1/5}. | ||
| 422 | 428 | ||
| 423 | @item mirrored | 429 | @item mirrored |
| 424 | This is a property of characters such as parentheses, which need to be | 430 | Corresponds to the Unicode @code{Bidi_Mirrored} property. The value |
| 425 | mirrored horizontally in right to left scripts. The value is a | 431 | of this property is a symbol, either @samp{Y} or @samp{N}. |
| 426 | symbol, either @samp{Y} or @samp{N}. | ||
| 427 | 432 | ||
| 428 | @item old-name | 433 | @item old-name |
| 429 | This property's value specifies the name, if any, of the character in | 434 | Corresponds to the Unicode @code{Unicode_1_Name} property. The value |
| 430 | the old version 1.0 of the Unicode Standard. The value is a string. | 435 | is a string. |
| 431 | 436 | ||
| 432 | @item iso-10646-comment | 437 | @item iso-10646-comment |
| 433 | This character's comment field from the ISO 10646 standard. The value | 438 | Corresponds to the Unicode @code{ISO_Comment} property. The value is |
| 434 | is a string, or @code{nil} if there's no comment. | 439 | a string. |
| 435 | 440 | ||
| 436 | @item uppercase | 441 | @item uppercase |
| 437 | If this character has an upper-case equivalent that is a single | 442 | Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property. |
| 438 | character, then the value of this property is that upper-case | 443 | The value of this property is a single character. |
| 439 | equivalent. Otherwise, the value is @code{nil}. | ||
| 440 | 444 | ||
| 441 | @item lowercase | 445 | @item lowercase |
| 442 | If this character has an lower-case equivalent that is a single | 446 | Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property. |
| 443 | character, then the value of this property is that lower-case | 447 | The value of this property is a single character. |
| 444 | equivalent. Otherwise, the value is @code{nil}. | ||
| 445 | 448 | ||
| 446 | @item titlecase | 449 | @item titlecase |
| 450 | Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property. | ||
| 447 | @dfn{Title case} is a special form of a character used when the first | 451 | @dfn{Title case} is a special form of a character used when the first |
| 448 | character of a word needs to be capitalized. If a character has a | 452 | character of a word needs to be capitalized. The value of this |
| 449 | title-case equivalent that is a single character, then the value of | 453 | property is a single character. |
| 450 | this property is that title-case equivalent. Otherwise, the value is | ||
| 451 | @code{nil}. | ||
| 452 | @end table | 454 | @end table |
| 453 | 455 | ||
| 454 | @defun get-char-code-property char propname | 456 | @defun get-char-code-property char propname |
| @@ -793,12 +795,10 @@ alternative encodings for the same characters; for example, there are | |||
| 793 | three coding systems for the Cyrillic (Russian) alphabet: ISO, | 795 | three coding systems for the Cyrillic (Russian) alphabet: ISO, |
| 794 | Alternativnyj, and KOI8. | 796 | Alternativnyj, and KOI8. |
| 795 | 797 | ||
| 796 | @c I think this paragraph is no longer correct. | 798 | Every coding system specifies a particular set of character code |
| 797 | @ignore | 799 | conversions, but the coding system @code{undecided} is special: it |
| 798 | Most coding systems specify a particular character code for | 800 | leaves the choice unspecified, to be chosen heuristically for each |
| 799 | conversion, but some of them leave the choice unspecified---to be chosen | 801 | file, based on the file's data. |
| 800 | heuristically for each file, based on the data. | ||
| 801 | @end ignore | ||
| 802 | 802 | ||
| 803 | In general, a coding system doesn't guarantee roundtrip identity: | 803 | In general, a coding system doesn't guarantee roundtrip identity: |
| 804 | decoding a byte sequence using coding system, then encoding the | 804 | decoding a byte sequence using coding system, then encoding the |