diff options
| author | Eli Zaretskii | 2011-08-23 17:45:14 +0300 |
|---|---|---|
| committer | Eli Zaretskii | 2011-08-23 17:45:14 +0300 |
| commit | bca633fb296b17c0e86d589c50fb3414b361e0b3 (patch) | |
| tree | 1b1e93f6017f7614f6aa950fa78ced1249a99b99 /doc | |
| parent | 4a5885a74a3310ed4f4ba86eee3c406019b2c334 (diff) | |
| download | emacs-bca633fb296b17c0e86d589c50fb3414b361e0b3.tar.gz emacs-bca633fb296b17c0e86d589c50fb3414b361e0b3.zip | |
Followup for character properties in 2011-08-23T11:48:07Z!handa@m17n.org.
src/bidi.c (bidi_get_type): Abort if we get zero as the bidi type of
a character.
admin/unidata/unidata-gen.el (unidata-prop-alist): Update the default
values of bidi-class according to DerivedBidiClass.txt from the
latest UCD.
lisp/international/uni-bidi.el: Regenerated.
doc/lispref/nonascii.texi (Character Properties): Document the values for
unassigned codepoints.
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/lispref/ChangeLog | 5 | ||||
| -rw-r--r-- | doc/lispref/nonascii.texi | 53 |
2 files changed, 43 insertions, 15 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 4cb4d0a6f50..43add469ec0 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog | |||
| @@ -1,3 +1,8 @@ | |||
| 1 | 2011-08-23 Eli Zaretskii <eliz@gnu.org> | ||
| 2 | |||
| 3 | * nonascii.texi (Character Properties): Document the values for | ||
| 4 | unassigned codepoints. | ||
| 5 | |||
| 1 | 2011-08-18 Eli Zaretskii <eliz@gnu.org> | 6 | 2011-08-18 Eli Zaretskii <eliz@gnu.org> |
| 2 | 7 | ||
| 3 | * nonascii.texi (Character Properties): Document use of | 8 | * nonascii.texi (Character Properties): Document use of |
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index 7b6d665b2ac..298c7c3d1a8 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi | |||
| @@ -369,6 +369,12 @@ replacing each @samp{_} character with a dash @samp{-}. For example, | |||
| 369 | @code{canonical-combining-class}. However, sometimes we shorten the | 369 | @code{canonical-combining-class}. However, sometimes we shorten the |
| 370 | names to make their use easier. | 370 | names to make their use easier. |
| 371 | 371 | ||
| 372 | @cindex unassigned character codepoints | ||
| 373 | Some codepoints are left @dfn{unassigned} by the | ||
| 374 | @acronym{UCD}---they don't correspond to any character. The Unicode | ||
| 375 | Standard defines default values of properties for such codepoints; | ||
| 376 | they are mentioned below for each property. | ||
| 377 | |||
| 372 | Here is the full list of value types for all the character | 378 | Here is the full list of value types for all the character |
| 373 | properties that Emacs knows about: | 379 | properties that Emacs knows about: |
| 374 | 380 | ||
| @@ -376,24 +382,31 @@ properties that Emacs knows about: | |||
| 376 | @item name | 382 | @item name |
| 377 | Corresponds to the @code{Name} Unicode property. The value is a | 383 | Corresponds to the @code{Name} Unicode property. The value is a |
| 378 | string consisting of upper-case Latin letters A to Z, digits, spaces, | 384 | string consisting of upper-case Latin letters A to Z, digits, spaces, |
| 379 | and hyphen @samp{-} characters. | 385 | and hyphen @samp{-} characters. For unassigned codepoints, the value |
| 386 | is an empty string. | ||
| 380 | 387 | ||
| 381 | @cindex unicode general category | 388 | @cindex unicode general category |
| 382 | @item general-category | 389 | @item general-category |
| 383 | Corresponds to the @code{General_Category} Unicode property. The | 390 | Corresponds to the @code{General_Category} Unicode property. The |
| 384 | value is a symbol whose name is a 2-letter abbreviation of the | 391 | value is a symbol whose name is a 2-letter abbreviation of the |
| 385 | character's classification. | 392 | character's classification. For unassigned codepoints, the value |
| 393 | is @code{Cn}. | ||
| 386 | 394 | ||
| 387 | @item canonical-combining-class | 395 | @item canonical-combining-class |
| 388 | Corresponds to the @code{Canonical_Combining_Class} Unicode property. | 396 | Corresponds to the @code{Canonical_Combining_Class} Unicode property. |
| 389 | The value is an integer number. | 397 | The value is an integer number. For unassigned codepoints, the value |
| 398 | is zero. | ||
| 390 | 399 | ||
| 391 | @cindex bidirectional class of characters | 400 | @cindex bidirectional class of characters |
| 392 | @item bidi-class | 401 | @item bidi-class |
| 393 | Corresponds to the Unicode @code{Bidi_Class} property. The value is a | 402 | Corresponds to the Unicode @code{Bidi_Class} property. The value is a |
| 394 | symbol whose name is the Unicode @dfn{directional type} of the | 403 | symbol whose name is the Unicode @dfn{directional type} of the |
| 395 | character. Emacs uses this property when it reorders bidirectional | 404 | character. Emacs uses this property when it reorders bidirectional |
| 396 | text for display (@pxref{Bidirectional Display}). | 405 | text for display (@pxref{Bidirectional Display}). For unassigned |
| 406 | codepoints, the value depends on the code blocks to which the | ||
| 407 | codepoint belongs: most unassigned codepoints get the value of | ||
| 408 | @code{L} (strong L), but some get values of @code{AL} (Arabic letter) | ||
| 409 | or @code{R} (strong R). | ||
| 397 | 410 | ||
| 398 | @item decomposition | 411 | @item decomposition |
| 399 | Corresponds to the Unicode @code{Decomposition_Type} and | 412 | Corresponds to the Unicode @code{Decomposition_Type} and |
| @@ -405,19 +418,22 @@ Note that the Unicode spec writes these tag names inside | |||
| 405 | brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses | 418 | brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses |
| 406 | @samp{small}. | 419 | @samp{small}. |
| 407 | }; the other elements are characters that give the compatibility | 420 | }; the other elements are characters that give the compatibility |
| 408 | decomposition sequence of this character. | 421 | decomposition sequence of this character. For unassigned codepoints, |
| 422 | the value is the character itself. | ||
| 409 | 423 | ||
| 410 | @item decimal-digit-value | 424 | @item decimal-digit-value |
| 411 | Corresponds to the Unicode @code{Numeric_Value} property for | 425 | Corresponds to the Unicode @code{Numeric_Value} property for |
| 412 | characters whose @code{Numeric_Type} is @samp{Digit}. The value is an | 426 | characters whose @code{Numeric_Type} is @samp{Digit}. The value is an |
| 413 | integer number. | 427 | integer number. For unassigned codepoints, the value is @code{nil}, |
| 428 | which means @acronym{NaN}, or ``not-a-number''. | ||
| 414 | 429 | ||
| 415 | @item digit-value | 430 | @item digit-value |
| 416 | Corresponds to the Unicode @code{Numeric_Value} property for | 431 | Corresponds to the Unicode @code{Numeric_Value} property for |
| 417 | characters whose @code{Numeric_Type} is @samp{Decimal}. The value is | 432 | characters whose @code{Numeric_Type} is @samp{Decimal}. The value is |
| 418 | an integer number. Examples of such characters include compatibility | 433 | an integer number. Examples of such characters include compatibility |
| 419 | subscript and superscript digits, for which the value is the | 434 | subscript and superscript digits, for which the value is the |
| 420 | corresponding number. | 435 | corresponding number. For unassigned codepoints, the value is |
| 436 | @code{nil}, which means @acronym{NaN}. | ||
| 421 | 437 | ||
| 422 | @item numeric-value | 438 | @item numeric-value |
| 423 | Corresponds to the Unicode @code{Numeric_Value} property for | 439 | Corresponds to the Unicode @code{Numeric_Value} property for |
| @@ -426,12 +442,15 @@ this property is an integer or a floating-point number. Examples of | |||
| 426 | characters that have this property include fractions, subscripts, | 442 | characters that have this property include fractions, subscripts, |
| 427 | superscripts, Roman numerals, currency numerators, and encircled | 443 | superscripts, Roman numerals, currency numerators, and encircled |
| 428 | numbers. For example, the value of this property for the character | 444 | numbers. For example, the value of this property for the character |
| 429 | @code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. | 445 | @code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. For |
| 446 | unassigned codepoints, the value is @code{nil}, which means | ||
| 447 | @acronym{NaN}. | ||
| 430 | 448 | ||
| 431 | @cindex mirroring of characters | 449 | @cindex mirroring of characters |
| 432 | @item mirrored | 450 | @item mirrored |
| 433 | Corresponds to the Unicode @code{Bidi_Mirrored} property. The value | 451 | Corresponds to the Unicode @code{Bidi_Mirrored} property. The value |
| 434 | of this property is a symbol, either @code{Y} or @code{N}. | 452 | of this property is a symbol, either @code{Y} or @code{N}. For |
| 453 | unassigned codepoints, the value is @code{N}. | ||
| 435 | 454 | ||
| 436 | @item mirroring | 455 | @item mirroring |
| 437 | Corresponds to the Unicode @code{Bidi_Mirroring_Glyph} property. The | 456 | Corresponds to the Unicode @code{Bidi_Mirroring_Glyph} property. The |
| @@ -443,29 +462,33 @@ property; however, some characters whose @code{mirrored} property is | |||
| 443 | @code{Y} also have @code{nil} for @code{mirroring}, because no | 462 | @code{Y} also have @code{nil} for @code{mirroring}, because no |
| 444 | appropriate characters exist with mirrored glyphs. Emacs uses this | 463 | appropriate characters exist with mirrored glyphs. Emacs uses this |
| 445 | property to display mirror images of characters when appropriate | 464 | property to display mirror images of characters when appropriate |
| 446 | (@pxref{Bidirectional Display}). | 465 | (@pxref{Bidirectional Display}). For unassigned codepoints, the value |
| 466 | is @code{nil}. | ||
| 447 | 467 | ||
| 448 | @item old-name | 468 | @item old-name |
| 449 | Corresponds to the Unicode @code{Unicode_1_Name} property. The value | 469 | Corresponds to the Unicode @code{Unicode_1_Name} property. The value |
| 450 | is a string. | 470 | is a string. For unassigned codepoints, the value is an empty string. |
| 451 | 471 | ||
| 452 | @item iso-10646-comment | 472 | @item iso-10646-comment |
| 453 | Corresponds to the Unicode @code{ISO_Comment} property. The value is | 473 | Corresponds to the Unicode @code{ISO_Comment} property. The value is |
| 454 | a string. | 474 | a string. For unassigned codepoints, the value is an empty string. |
| 455 | 475 | ||
| 456 | @item uppercase | 476 | @item uppercase |
| 457 | Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property. | 477 | Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property. |
| 458 | The value of this property is a single character. | 478 | The value of this property is a single character. For unassigned |
| 479 | codepoints, the value is @code{nil}, which means the character itself. | ||
| 459 | 480 | ||
| 460 | @item lowercase | 481 | @item lowercase |
| 461 | Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property. | 482 | Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property. |
| 462 | The value of this property is a single character. | 483 | The value of this property is a single character. For unassigned |
| 484 | codepoints, the value is @code{nil}, which means the character itself. | ||
| 463 | 485 | ||
| 464 | @item titlecase | 486 | @item titlecase |
| 465 | Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property. | 487 | Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property. |
| 466 | @dfn{Title case} is a special form of a character used when the first | 488 | @dfn{Title case} is a special form of a character used when the first |
| 467 | character of a word needs to be capitalized. The value of this | 489 | character of a word needs to be capitalized. The value of this |
| 468 | property is a single character. | 490 | property is a single character. For unassigned codepoints, the value |
| 491 | is @code{nil}, which means the character itself. | ||
| 469 | @end table | 492 | @end table |
| 470 | 493 | ||
| 471 | @defun get-char-code-property char propname | 494 | @defun get-char-code-property char propname |