aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorGerd Moellmann2000-05-11 15:44:54 +0000
committerGerd Moellmann2000-05-11 15:44:54 +0000
commit0ace421a2d9e1f69f139c3316df662a541acbd67 (patch)
tree37db4c604f04574142f0c4a5d9b2966ec968c202
parent796184bc2047de12f0cfe7ae178be236f5a0256a (diff)
downloademacs-0ace421a2d9e1f69f139c3316df662a541acbd67.tar.gz
emacs-0ace421a2d9e1f69f139c3316df662a541acbd67.zip
*** empty log message ***
-rw-r--r--lispref/nonascii.texi138
-rw-r--r--src/ChangeLog3
2 files changed, 54 insertions, 87 deletions
diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi
index 149d0354c29..29d97d81acd 100644
--- a/lispref/nonascii.texi
+++ b/lispref/nonascii.texi
@@ -59,12 +59,13 @@ stored. The first byte of a multibyte character is always in the range
59character are always in the range 160 through 255 (octal 0240 through 59character are always in the range 160 through 255 (octal 0240 through
600377); these values are @dfn{trailing codes}. 600377); these values are @dfn{trailing codes}.
61 61
62 Some sequences of bytes do not form meaningful multibyte characters: 62 Some sequences of bytes are not valid in multibyte text: for example,
63for example, a single isolated byte in the range 128 through 255 is 63a single isolated byte in the range 128 through 159 is not allowed.
64never meaningful. Such byte sequences are not entirely valid, and never 64But character codes 128 through 159 can appear in multibyte text,
65appear in proper multibyte text (since that consists of a sequence of 65represented as two-byte sequences. None of the character codes 128
66@emph{characters}); but they can appear as part of ``raw bytes'' 66through 255 normally appear in ordinary multibyte text, but they do
67(@pxref{Explicit Encoding}). 67appear in multibyte buffers and strings when you do explicit encoding
68and decoding (@pxref{Explicit Encoding}).
68 69
69 In a buffer, the buffer-local value of the variable 70 In a buffer, the buffer-local value of the variable
70@code{enable-multibyte-characters} specifies the representation used. 71@code{enable-multibyte-characters} specifies the representation used.
@@ -237,10 +238,11 @@ If @var{string} is already a multibyte string, then the value is
237codes. The valid character codes for unibyte representation range from 238codes. The valid character codes for unibyte representation range from
2380 to 255---the values that can fit in one byte. The valid character 2390 to 255---the values that can fit in one byte. The valid character
239codes for multibyte representation range from 0 to 524287, but not all 240codes for multibyte representation range from 0 to 524287, but not all
240values in that range are valid. In particular, the values 128 through 241values in that range are valid. The values 128 through 255 are not
241255 are not legitimate in multibyte text (though they can occur in ``raw 242really proper in multibyte text, but they can occur if you do explicit
242bytes''; @pxref{Explicit Encoding}). Only the @sc{ascii} codes 0 243encoding and decoding (@pxref{Explicit Encoding}). Some other character
243through 127 are fully legitimate in both representations. 244codes cannot occur at all in multibyte text. Only the @sc{ascii} codes
2450 through 127 are truly legitimate in both representations.
244 246
245@defun char-valid-p charcode 247@defun char-valid-p charcode
246This returns @code{t} if @var{charcode} is valid for either one of the two 248This returns @code{t} if @var{charcode} is valid for either one of the two
@@ -410,17 +412,9 @@ is non-@code{nil}, then each character in the region is translated
410through this table, and the value returned describes the translated 412through this table, and the value returned describes the translated
411characters instead of the characters actually in the buffer. 413characters instead of the characters actually in the buffer.
412 414
413In two peculiar cases, the value includes the symbol @code{unknown}: 415When a buffer contains non-@sc{ascii} characters, codes 128 through 255,
414 416they are assigned the character set @code{unknown}. @xref{Explicit
415@itemize @bullet 417Encoding}.
416@item
417When a unibyte buffer contains non-@sc{ascii} characters.
418
419@item
420When a multibyte buffer contains invalid byte-sequences (raw bytes).
421@xref{Explicit Encoding}.
422@end itemize
423@end defun
424 418
425@defun find-charset-string string &optional translation 419@defun find-charset-string string &optional translation
426This function returns a list of the character sets that appear in the 420This function returns a list of the character sets that appear in the
@@ -690,7 +684,7 @@ encode all the character sets in the list @var{charsets}.
690 684
691@defun detect-coding-region start end &optional highest 685@defun detect-coding-region start end &optional highest
692This function chooses a plausible coding system for decoding the text 686This function chooses a plausible coding system for decoding the text
693from @var{start} to @var{end}. This text should be ``raw bytes'' 687from @var{start} to @var{end}. This text should be a byte sequence
694(@pxref{Explicit Encoding}). 688(@pxref{Explicit Encoding}).
695 689
696Normally this function returns a list of coding systems that could 690Normally this function returns a list of coding systems that could
@@ -923,90 +917,59 @@ ability to use a coding system to encode or decode the text.
923You can also explicitly encode and decode text using the functions 917You can also explicitly encode and decode text using the functions
924in this section. 918in this section.
925 919
926@cindex raw bytes
927 The result of encoding, and the input to decoding, are not ordinary 920 The result of encoding, and the input to decoding, are not ordinary
928text. They are ``raw bytes''---bytes that represent text in the same 921text. They logically consist of a series of byte values; that is, a
929way that an external file would. When a buffer contains raw bytes, it 922series of characters whose codes are in the range 0 through 255. In a
930is most natural to mark that buffer as using unibyte representation, 923multibyte buffer or string, character codes 128 through 159 are
931using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}), 924represented by multibyte sequences, but this is invisible to Lisp
932but this is not required. If the buffer's contents are only temporarily 925programs.
933raw, leave the buffer multibyte, which will be correct after you decode 926
934them. 927 The usual way to read a file into a buffer as a sequence of bytes, so
935 928you can decode the contents explicitly, is with
936 The usual way to get raw bytes in a buffer, for explicit decoding, is 929@code{insert-file-contents-literally} (@pxref{Reading from Files});
937to read them from a file with @code{insert-file-contents-literally} 930alternatively, specify a non-@code{nil} @var{rawfile} argument when
938(@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile} 931visiting a file with @code{find-file-noselect}. These methods result in
939argument when visiting a file with @code{find-file-noselect}. 932a unibyte buffer.
940 933
941 The usual way to use the raw bytes that result from explicitly 934 The usual way to use the byte sequence that results from explicitly
942encoding text is to copy them to a file or process---for example, to 935encoding text is to copy it to a file or process---for example, to write
943write them with @code{write-region} (@pxref{Writing to Files}), and 936it with @code{write-region} (@pxref{Writing to Files}), and suppress
944suppress encoding for that @code{write-region} call by binding 937encoding by binding @code{coding-system-for-write} to
945@code{coding-system-for-write} to @code{no-conversion}. 938@code{no-conversion}.
946
947 Raw bytes typically contain stray individual bytes with values in the
948range 128 through 255, that are legitimate only as part of multibyte
949sequences. Even if the buffer is multibyte, Emacs treats each such
950individual byte as a character and uses the byte value as its character
951code. In this way, character codes 128 through 255 can be found in a
952multibyte buffer, even though they are not legitimate multibyte
953character codes.
954
955 Raw bytes sometimes contain overlong byte-sequences that look like a
956proper multibyte character plus extra superfluous trailing codes. For
957most purposes, Emacs treats such a sequence in a buffer or string as a
958single character, and if you look at its character code, you get the
959value that corresponds to the multibyte character
960sequence---disregarding the extra trailing codes. This is not quite
961clean, but raw bytes are used only in limited ways, so as a practical
962matter it is not worth the trouble to treat this case differently.
963
964 When a multibyte buffer contains illegitimate byte sequences,
965sometimes insertion or deletion can cause them to coalesce into a
966legitimate multibyte character. For example, suppose the buffer
967contains the sequence 129 68 192, 68 being the character @samp{D}. If
968you delete the @samp{D}, the bytes 129 and 192 become adjacent, and thus
969become one multibyte character (Latin-1 A with grave accent). Point
970moves to one side or the other of the character, since it cannot be
971within a character. Don't be alarmed by this.
972
973 Some really peculiar situations prevent proper coalescence. For
974example, if you narrow the buffer so that the accessible portion begins
975just before the @samp{D}, then delete the @samp{D}, the two surrounding
976bytes cannot coalesce because one of them is outside the accessible
977portion of the buffer. In this case, the deletion cannot be done, so
978@code{delete-region} signals an error.
979 939
980 Here are the functions to perform explicit encoding or decoding. The 940 Here are the functions to perform explicit encoding or decoding. The
981decoding functions produce ``raw bytes''; the encoding functions are 941decoding functions produce sequences of bytes; the encoding functions
982meant to operate on ``raw bytes''. All of these functions discard text 942are meant to operate on sequences of bytes. All of these functions
983properties. 943discard text properties.
984 944
985@defun encode-coding-region start end coding-system 945@defun encode-coding-region start end coding-system
986This function encodes the text from @var{start} to @var{end} according 946This function encodes the text from @var{start} to @var{end} according
987to coding system @var{coding-system}. The encoded text replaces the 947to coding system @var{coding-system}. The encoded text replaces the
988original text in the buffer. The result of encoding is ``raw bytes,'' 948original text in the buffer. The result of encoding is logically a
989but the buffer remains multibyte if it was multibyte before. 949sequence of bytes, but the buffer remains multibyte if it was multibyte
950before.
990@end defun 951@end defun
991 952
992@defun encode-coding-string string coding-system 953@defun encode-coding-string string coding-system
993This function encodes the text in @var{string} according to coding 954This function encodes the text in @var{string} according to coding
994system @var{coding-system}. It returns a new string containing the 955system @var{coding-system}. It returns a new string containing the
995encoded text. The result of encoding is a unibyte string of ``raw bytes.'' 956encoded text. The result of encoding is a unibyte string.
996@end defun 957@end defun
997 958
998@defun decode-coding-region start end coding-system 959@defun decode-coding-region start end coding-system
999This function decodes the text from @var{start} to @var{end} according 960This function decodes the text from @var{start} to @var{end} according
1000to coding system @var{coding-system}. The decoded text replaces the 961to coding system @var{coding-system}. The decoded text replaces the
1001original text in the buffer. To make explicit decoding useful, the text 962original text in the buffer. To make explicit decoding useful, the text
1002before decoding ought to be ``raw bytes.'' 963before decoding ought to be a sequence of byte values, but both
964multibyte and unibyte buffers are acceptable.
1003@end defun 965@end defun
1004 966
1005@defun decode-coding-string string coding-system 967@defun decode-coding-string string coding-system
1006This function decodes the text in @var{string} according to coding 968This function decodes the text in @var{string} according to coding
1007system @var{coding-system}. It returns a new string containing the 969system @var{coding-system}. It returns a new string containing the
1008decoded text. To make explicit decoding useful, the contents of 970decoded text. To make explicit decoding useful, the contents of
1009@var{string} ought to be ``raw bytes.'' 971@var{string} ought to be a sequence of byte values, but a multibyte
972string is acceptable.
1010@end defun 973@end defun
1011 974
1012@node Terminal I/O Encoding 975@node Terminal I/O Encoding
@@ -1051,7 +1014,7 @@ that means do not encode terminal output.
1051 1014
1052 On MS-DOS and Microsoft Windows, Emacs guesses the appropriate 1015 On MS-DOS and Microsoft Windows, Emacs guesses the appropriate
1053end-of-line conversion for a file by looking at the file's name. This 1016end-of-line conversion for a file by looking at the file's name. This
1054feature classifies fils as @dfn{text files} and @dfn{binary files}. By 1017feature classifies files as @dfn{text files} and @dfn{binary files}. By
1055``binary file'' we mean a file of literal byte values that are not 1018``binary file'' we mean a file of literal byte values that are not
1056necessarily meant to be characters; Emacs does no end-of-line conversion 1019necessarily meant to be characters; Emacs does no end-of-line conversion
1057and no character code conversion for them. On the other hand, the bytes 1020and no character code conversion for them. On the other hand, the bytes
@@ -1157,14 +1120,14 @@ Here @var{input-method} is the input method name, a string;
1157environment this input method is recommended for. (That serves only for 1120environment this input method is recommended for. (That serves only for
1158documentation purposes.) 1121documentation purposes.)
1159 1122
1160@var{title} is a string to display in the mode line while this method is
1161active. @var{description} is a string describing this method and what
1162it is good for.
1163
1164@var{activate-func} is a function to call to activate this method. The 1123@var{activate-func} is a function to call to activate this method. The
1165@var{args}, if any, are passed as arguments to @var{activate-func}. All 1124@var{args}, if any, are passed as arguments to @var{activate-func}. All
1166told, the arguments to @var{activate-func} are @var{input-method} and 1125told, the arguments to @var{activate-func} are @var{input-method} and
1167the @var{args}. 1126the @var{args}.
1127
1128@var{title} is a string to display in the mode line while this method is
1129active. @var{description} is a string describing this method and what
1130it is good for.
1168@end defvar 1131@end defvar
1169 1132
1170 The fundamental interface to input methods is through the 1133 The fundamental interface to input methods is through the
@@ -1202,3 +1165,4 @@ Changing the locale can cause messages to appear according to the
1202conventions of a different language. If the variable is @code{nil}, the 1165conventions of a different language. If the variable is @code{nil}, the
1203locale is specified by environment variables in the usual POSIX fashion. 1166locale is specified by environment variables in the usual POSIX fashion.
1204@end defvar 1167@end defvar
1168
diff --git a/src/ChangeLog b/src/ChangeLog
index 3403e9f14f0..0d4d19feeef 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,5 +1,8 @@
12000-05-11 Gerd Moellmann <gerd@gnu.org> 12000-05-11 Gerd Moellmann <gerd@gnu.org>
2 2
3 * xdisp.c (add_to_log): Don't pass the terminating NUL byte
4 of the message to message_dolog.
5
3 * keyboard.c (read_char): Don't clear current message for help 6 * keyboard.c (read_char): Don't clear current message for help
4 events; let the code handling help events handle this. Change 7 events; let the code handling help events handle this. Change
5 code detecting help events that should be ignored. 8 code detecting help events that should be ignored.