aboutsummaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorPaul Eggert2016-04-21 19:26:34 -0700
committerPaul Eggert2016-04-21 19:29:41 -0700
commitbd1c7ca67e7429e07f78d4ff49163fd7a67a6765 (patch)
tree941d5cf573be2a4588468b3a315c0c6cb47e2c97 /doc
parente7cb38edc946ff60c1c878b30b068376d6ef56d2 (diff)
downloademacs-bd1c7ca67e7429e07f78d4ff49163fd7a67a6765.tar.gz
emacs-bd1c7ca67e7429e07f78d4ff49163fd7a67a6765.zip
Improve character name escapes
* doc/lispref/nonascii.texi (Character Properties): Avoid duplication of Unicode names. Reformat examples to fit in narrow pages. * doc/lispref/objects.texi (General Escape Syntax): Simplify and better-organize explanation of \N{...} escapes. * src/character.h (CHAR_SURROGATE_PAIR_P): Remove; unused. (char_surrogate_p): New inline function. * src/lread.c: Do not include string.h; no longer needed. (invalid_character_name, check_scalar_value): Remove; the ideas behind these functions are now bundled into character_name_to_code. (character_name_to_code): Remove undocumented support for "CJK IDEOGRAPH-XXXX" names, as "U+XXXX" suffices. Reject monstrosities like "\N{U+-0}" and null bytes in \N escapes. Reject floating point in \N escapes instead of returning garbage. Use AUTO_STRING_WITH_LEN to lessen pressure on the garbage collector. * test/src/lread-tests.el (lread-char-number, lread-char-name) (lread-string-char-number, lread-string-char-name): Test runtime behavior, not compile-time, as the test framework is not set up to test compile-time. (lread-char-surrogate-1, lread-char-surrogate-2) (lread-char-surrogate-3, lread-char-surrogate-4) (lread-string-char-number-2, lread-string-char-number-3): New tests. (lread-string-char-number-1): Rename from lread-string-char-number.
Diffstat (limited to 'doc')
-rw-r--r--doc/lispref/nonascii.texi15
-rw-r--r--doc/lispref/objects.texi52
2 files changed, 35 insertions, 32 deletions
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi
index 66ad9aca71e..0e4aa86e48b 100644
--- a/doc/lispref/nonascii.texi
+++ b/doc/lispref/nonascii.texi
@@ -622,18 +622,21 @@ This function returns the value of @var{char}'s @var{propname} property.
622 @result{} Nd 622 @result{} Nd
623@end group 623@end group
624@group 624@group
625;; U+2084 SUBSCRIPT FOUR 625;; U+2084
626(get-char-code-property ?\u2084 'digit-value) 626(get-char-code-property ?\N@{SUBSCRIPT FOUR@}
627 'digit-value)
627 @result{} 4 628 @result{} 4
628@end group 629@end group
629@group 630@group
630;; U+2155 VULGAR FRACTION ONE FIFTH 631;; U+2155
631(get-char-code-property ?\u2155 'numeric-value) 632(get-char-code-property ?\N@{VULGAR FRACTION ONE FIFTH@}
633 'numeric-value)
632 @result{} 0.2 634 @result{} 0.2
633@end group 635@end group
634@group 636@group
635;; U+2163 ROMAN NUMERAL FOUR 637;; U+2163
636(get-char-code-property ?\N@{ROMAN NUMERAL FOUR@} 'numeric-value) 638(get-char-code-property ?\N@{ROMAN NUMERAL FOUR@}
639 'numeric-value)
637 @result{} 4 640 @result{} 4
638@end group 641@end group
639@group 642@group
diff --git a/doc/lispref/objects.texi b/doc/lispref/objects.texi
index 96b334d2b81..54894b8e24e 100644
--- a/doc/lispref/objects.texi
+++ b/doc/lispref/objects.texi
@@ -353,25 +353,32 @@ following text.)
353control characters, Emacs provides several types of escape syntax that 353control characters, Emacs provides several types of escape syntax that
354you can use to specify non-@acronym{ASCII} text characters. 354you can use to specify non-@acronym{ASCII} text characters.
355 355
356@enumerate
357@item
356@cindex @samp{\} in character constant 358@cindex @samp{\} in character constant
357@cindex backslash in character constants 359@cindex backslash in character constants
358@cindex unicode character escape 360@cindex unicode character escape
359 Firstly, you can specify characters by their Unicode values. 361You can specify characters by their Unicode names, if any.
360@code{?\u@var{nnnn}} represents a character with Unicode code point 362@code{?\N@{@var{NAME}@}} represents the Unicode character named
361@samp{U+@var{nnnn}}, where @var{nnnn} is (by convention) a hexadecimal 363@var{NAME}. Thus, @samp{?\N@{LATIN SMALL LETTER A WITH GRAVE@}} is
362number with exactly four digits. The backslash indicates that the 364equivalent to @code{?à} and denotes the Unicode character U+00E0. To
363subsequent characters form an escape sequence, and the @samp{u} 365simplify entering multi-line strings, you can replace spaces in the
364specifies a Unicode escape sequence. 366names by non-empty sequences of whitespace (e.g., newlines).
365 367
366 There is a slightly different syntax for specifying Unicode 368@item
367characters with code points higher than @code{U+@var{ffff}}: 369You can specify characters by their Unicode values.
368@code{?\U00@var{nnnnnn}} represents the character with code point 370@code{?\N@{U+@var{X}@}} represents a character with Unicode code point
369@samp{U+@var{nnnnnn}}, where @var{nnnnnn} is a six-digit hexadecimal 371@var{X}, where @var{X} is a hexadecimal number. Also,
370number. The Unicode Standard only defines code points up to 372@code{?\u@var{xxxx}} and @code{?\U@var{xxxxxxxx}} represent code
371@samp{U+@var{10ffff}}, so if you specify a code point higher than 373points @var{xxxx} and @var{xxxxxxxx}, respectively, where each @var{x}
372that, Emacs signals an error. 374is a single hexadecimal digit. For example, @code{?\N@{U+E0@}},
373 375@code{?\u00e0} and @code{?\U000000E0} are all equivalent to @code{?à}
374 Secondly, you can specify characters by their hexadecimal character 376and to @samp{?\N@{LATIN SMALL LETTER A WITH GRAVE@}}. The Unicode
377Standard defines code points only up to @samp{U+@var{10ffff}}, so if
378you specify a code point higher than that, Emacs signals an error.
379
380@item
381You can specify characters by their hexadecimal character
375codes. A hexadecimal escape sequence consists of a backslash, 382codes. A hexadecimal escape sequence consists of a backslash,
376@samp{x}, and the hexadecimal character code. Thus, @samp{?\x41} is 383@samp{x}, and the hexadecimal character code. Thus, @samp{?\x41} is
377the character @kbd{A}, @samp{?\x1} is the character @kbd{C-a}, and 384the character @kbd{A}, @samp{?\x1} is the character @kbd{C-a}, and
@@ -379,23 +386,16 @@ the character @kbd{A}, @samp{?\x1} is the character @kbd{C-a}, and
379You can use any number of hex digits, so you can represent any 386You can use any number of hex digits, so you can represent any
380character code in this way. 387character code in this way.
381 388
389@item
382@cindex octal character code 390@cindex octal character code
383 Thirdly, you can specify characters by their character code in 391You can specify characters by their character code in
384octal. An octal escape sequence consists of a backslash followed by 392octal. An octal escape sequence consists of a backslash followed by
385up to three octal digits; thus, @samp{?\101} for the character 393up to three octal digits; thus, @samp{?\101} for the character
386@kbd{A}, @samp{?\001} for the character @kbd{C-a}, and @code{?\002} 394@kbd{A}, @samp{?\001} for the character @kbd{C-a}, and @code{?\002}
387for the character @kbd{C-b}. Only characters up to octal code 777 can 395for the character @kbd{C-b}. Only characters up to octal code 777 can
388be specified this way. 396be specified this way.
389 397
390 Fourthly, you can specify characters by their name. A character 398@end enumerate
391name escape sequence consists of a backslash, @samp{N@{}, the Unicode
392character name, and @samp{@}}. Alternatively, you can also put the
393numeric code point value between the braces, using the syntax
394@samp{\N@{U+nnnn@}}, where @samp{nnnn} denotes between one and eight
395hexadecimal digits. Thus, @samp{?\N@{LATIN CAPITAL LETTER A@}} and
396@samp{?\N@{U+41@}} both denote the character @kbd{A}. To simplify
397entering multi-line strings, you can replace spaces in the character
398names by arbitrary non-empty sequence of whitespace (e.g., newlines).
399 399
400 These escape sequences may also be used in strings. @xref{Non-ASCII 400 These escape sequences may also be used in strings. @xref{Non-ASCII
401in Strings}. 401in Strings}.