aboutsummaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorEli Zaretskii2011-08-23 17:45:14 +0300
committerEli Zaretskii2011-08-23 17:45:14 +0300
commitbca633fb296b17c0e86d589c50fb3414b361e0b3 (patch)
tree1b1e93f6017f7614f6aa950fa78ced1249a99b99 /doc
parent4a5885a74a3310ed4f4ba86eee3c406019b2c334 (diff)
downloademacs-bca633fb296b17c0e86d589c50fb3414b361e0b3.tar.gz
emacs-bca633fb296b17c0e86d589c50fb3414b361e0b3.zip
Followup for character properties in 2011-08-23T11:48:07Z!handa@m17n.org.
src/bidi.c (bidi_get_type): Abort if we get zero as the bidi type of a character. admin/unidata/unidata-gen.el (unidata-prop-alist): Update the default values of bidi-class according to DerivedBidiClass.txt from the latest UCD. lisp/international/uni-bidi.el: Regenerated. doc/lispref/nonascii.texi (Character Properties): Document the values for unassigned codepoints.
Diffstat (limited to 'doc')
-rw-r--r--doc/lispref/ChangeLog5
-rw-r--r--doc/lispref/nonascii.texi53
2 files changed, 43 insertions, 15 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog
index 4cb4d0a6f50..43add469ec0 100644
--- a/doc/lispref/ChangeLog
+++ b/doc/lispref/ChangeLog
@@ -1,3 +1,8 @@
12011-08-23 Eli Zaretskii <eliz@gnu.org>
2
3 * nonascii.texi (Character Properties): Document the values for
4 unassigned codepoints.
5
12011-08-18 Eli Zaretskii <eliz@gnu.org> 62011-08-18 Eli Zaretskii <eliz@gnu.org>
2 7
3 * nonascii.texi (Character Properties): Document use of 8 * nonascii.texi (Character Properties): Document use of
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index 7b6d665b2ac..298c7c3d1a8 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -369,6 +369,12 @@ replacing each @samp{_} character with a dash @samp{-}. For example,
369@code{canonical-combining-class}. However, sometimes we shorten the 369@code{canonical-combining-class}. However, sometimes we shorten the
370names to make their use easier. 370names to make their use easier.
371 371
372@cindex unassigned character codepoints
373 Some codepoints are left @dfn{unassigned} by the
374@acronym{UCD}---they don't correspond to any character. The Unicode
375Standard defines default values of properties for such codepoints;
376they are mentioned below for each property.
377
372 Here is the full list of value types for all the character 378 Here is the full list of value types for all the character
373properties that Emacs knows about: 379properties that Emacs knows about:
374 380
@@ -376,24 +382,31 @@ properties that Emacs knows about:
376@item name 382@item name
377Corresponds to the @code{Name} Unicode property. The value is a 383Corresponds to the @code{Name} Unicode property. The value is a
378string consisting of upper-case Latin letters A to Z, digits, spaces, 384string consisting of upper-case Latin letters A to Z, digits, spaces,
379and hyphen @samp{-} characters. 385and hyphen @samp{-} characters. For unassigned codepoints, the value
386is an empty string.
380 387
381@cindex unicode general category 388@cindex unicode general category
382@item general-category 389@item general-category
383Corresponds to the @code{General_Category} Unicode property. The 390Corresponds to the @code{General_Category} Unicode property. The
384value is a symbol whose name is a 2-letter abbreviation of the 391value is a symbol whose name is a 2-letter abbreviation of the
385character's classification. 392character's classification. For unassigned codepoints, the value
393is @code{Cn}.
386 394
387@item canonical-combining-class 395@item canonical-combining-class
388Corresponds to the @code{Canonical_Combining_Class} Unicode property. 396Corresponds to the @code{Canonical_Combining_Class} Unicode property.
389The value is an integer number. 397The value is an integer number. For unassigned codepoints, the value
398is zero.
390 399
391@cindex bidirectional class of characters 400@cindex bidirectional class of characters
392@item bidi-class 401@item bidi-class
393Corresponds to the Unicode @code{Bidi_Class} property. The value is a 402Corresponds to the Unicode @code{Bidi_Class} property. The value is a
394symbol whose name is the Unicode @dfn{directional type} of the 403symbol whose name is the Unicode @dfn{directional type} of the
395character. Emacs uses this property when it reorders bidirectional 404character. Emacs uses this property when it reorders bidirectional
396text for display (@pxref{Bidirectional Display}). 405text for display (@pxref{Bidirectional Display}). For unassigned
406codepoints, the value depends on the code blocks to which the
407codepoint belongs: most unassigned codepoints get the value of
408@code{L} (strong L), but some get values of @code{AL} (Arabic letter)
409or @code{R} (strong R).
397 410
398@item decomposition 411@item decomposition
399Corresponds to the Unicode @code{Decomposition_Type} and 412Corresponds to the Unicode @code{Decomposition_Type} and
@@ -405,19 +418,22 @@ Note that the Unicode spec writes these tag names inside
405brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses 418brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses
406@samp{small}. 419@samp{small}.
407}; the other elements are characters that give the compatibility 420}; the other elements are characters that give the compatibility
408decomposition sequence of this character. 421decomposition sequence of this character. For unassigned codepoints,
422the value is the character itself.
409 423
410@item decimal-digit-value 424@item decimal-digit-value
411Corresponds to the Unicode @code{Numeric_Value} property for 425Corresponds to the Unicode @code{Numeric_Value} property for
412characters whose @code{Numeric_Type} is @samp{Digit}. The value is an 426characters whose @code{Numeric_Type} is @samp{Digit}. The value is an
413integer number. 427integer number. For unassigned codepoints, the value is @code{nil},
428which means @acronym{NaN}, or ``not-a-number''.
414 429
415@item digit-value 430@item digit-value
416Corresponds to the Unicode @code{Numeric_Value} property for 431Corresponds to the Unicode @code{Numeric_Value} property for
417characters whose @code{Numeric_Type} is @samp{Decimal}. The value is 432characters whose @code{Numeric_Type} is @samp{Decimal}. The value is
418an integer number. Examples of such characters include compatibility 433an integer number. Examples of such characters include compatibility
419subscript and superscript digits, for which the value is the 434subscript and superscript digits, for which the value is the
420corresponding number. 435corresponding number. For unassigned codepoints, the value is
436@code{nil}, which means @acronym{NaN}.
421 437
422@item numeric-value 438@item numeric-value
423Corresponds to the Unicode @code{Numeric_Value} property for 439Corresponds to the Unicode @code{Numeric_Value} property for
@@ -426,12 +442,15 @@ this property is an integer or a floating-point number. Examples of
426characters that have this property include fractions, subscripts, 442characters that have this property include fractions, subscripts,
427superscripts, Roman numerals, currency numerators, and encircled 443superscripts, Roman numerals, currency numerators, and encircled
428numbers. For example, the value of this property for the character 444numbers. For example, the value of this property for the character
429@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. 445@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. For
446unassigned codepoints, the value is @code{nil}, which means
447@acronym{NaN}.
430 448
431@cindex mirroring of characters 449@cindex mirroring of characters
432@item mirrored 450@item mirrored
433Corresponds to the Unicode @code{Bidi_Mirrored} property. The value 451Corresponds to the Unicode @code{Bidi_Mirrored} property. The value
434of this property is a symbol, either @code{Y} or @code{N}. 452of this property is a symbol, either @code{Y} or @code{N}. For
453unassigned codepoints, the value is @code{N}.
435 454
436@item mirroring 455@item mirroring
437Corresponds to the Unicode @code{Bidi_Mirroring_Glyph} property. The 456Corresponds to the Unicode @code{Bidi_Mirroring_Glyph} property. The
@@ -443,29 +462,33 @@ property; however, some characters whose @code{mirrored} property is
443@code{Y} also have @code{nil} for @code{mirroring}, because no 462@code{Y} also have @code{nil} for @code{mirroring}, because no
444appropriate characters exist with mirrored glyphs. Emacs uses this 463appropriate characters exist with mirrored glyphs. Emacs uses this
445property to display mirror images of characters when appropriate 464property to display mirror images of characters when appropriate
446(@pxref{Bidirectional Display}). 465(@pxref{Bidirectional Display}). For unassigned codepoints, the value
466is @code{nil}.
447 467
448@item old-name 468@item old-name
449Corresponds to the Unicode @code{Unicode_1_Name} property. The value 469Corresponds to the Unicode @code{Unicode_1_Name} property. The value
450is a string. 470is a string. For unassigned codepoints, the value is an empty string.
451 471
452@item iso-10646-comment 472@item iso-10646-comment
453Corresponds to the Unicode @code{ISO_Comment} property. The value is 473Corresponds to the Unicode @code{ISO_Comment} property. The value is
454a string. 474a string. For unassigned codepoints, the value is an empty string.
455 475
456@item uppercase 476@item uppercase
457Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property. 477Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
458The value of this property is a single character. 478The value of this property is a single character. For unassigned
479codepoints, the value is @code{nil}, which means the character itself.
459 480
460@item lowercase 481@item lowercase
461Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property. 482Corresponds to the Unicode @code{Simple_Lowercase_Mapping} property.
462The value of this property is a single character. 483The value of this property is a single character. For unassigned
484codepoints, the value is @code{nil}, which means the character itself.
463 485
464@item titlecase 486@item titlecase
465Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property. 487Corresponds to the Unicode @code{Simple_Titlecase_Mapping} property.
466@dfn{Title case} is a special form of a character used when the first 488@dfn{Title case} is a special form of a character used when the first
467character of a word needs to be capitalized. The value of this 489character of a word needs to be capitalized. The value of this
468property is a single character. 490property is a single character. For unassigned codepoints, the value
491is @code{nil}, which means the character itself.
469@end table 492@end table
470 493
471@defun get-char-code-property char propname 494@defun get-char-code-property char propname