diff options
| author | Gerd Moellmann | 2000-05-11 15:44:54 +0000 |
|---|---|---|
| committer | Gerd Moellmann | 2000-05-11 15:44:54 +0000 |
| commit | 0ace421a2d9e1f69f139c3316df662a541acbd67 (patch) | |
| tree | 37db4c604f04574142f0c4a5d9b2966ec968c202 | |
| parent | 796184bc2047de12f0cfe7ae178be236f5a0256a (diff) | |
| download | emacs-0ace421a2d9e1f69f139c3316df662a541acbd67.tar.gz emacs-0ace421a2d9e1f69f139c3316df662a541acbd67.zip | |
*** empty log message ***
| -rw-r--r-- | lispref/nonascii.texi | 138 | ||||
| -rw-r--r-- | src/ChangeLog | 3 |
2 files changed, 54 insertions, 87 deletions
diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi index 149d0354c29..29d97d81acd 100644 --- a/lispref/nonascii.texi +++ b/lispref/nonascii.texi | |||
| @@ -59,12 +59,13 @@ stored. The first byte of a multibyte character is always in the range | |||
| 59 | character are always in the range 160 through 255 (octal 0240 through | 59 | character are always in the range 160 through 255 (octal 0240 through |
| 60 | 0377); these values are @dfn{trailing codes}. | 60 | 0377); these values are @dfn{trailing codes}. |
| 61 | 61 | ||
| 62 | Some sequences of bytes do not form meaningful multibyte characters: | 62 | Some sequences of bytes are not valid in multibyte text: for example, |
| 63 | for example, a single isolated byte in the range 128 through 255 is | 63 | a single isolated byte in the range 128 through 159 is not allowed. |
| 64 | never meaningful. Such byte sequences are not entirely valid, and never | 64 | But character codes 128 through 159 can appear in multibyte text, |
| 65 | appear in proper multibyte text (since that consists of a sequence of | 65 | represented as two-byte sequences. None of the character codes 128 |
| 66 | @emph{characters}); but they can appear as part of ``raw bytes'' | 66 | through 255 normally appear in ordinary multibyte text, but they do |
| 67 | (@pxref{Explicit Encoding}). | 67 | appear in multibyte buffers and strings when you do explicit encoding |
| 68 | and decoding (@pxref{Explicit Encoding}). | ||
| 68 | 69 | ||
| 69 | In a buffer, the buffer-local value of the variable | 70 | In a buffer, the buffer-local value of the variable |
| 70 | @code{enable-multibyte-characters} specifies the representation used. | 71 | @code{enable-multibyte-characters} specifies the representation used. |
| @@ -237,10 +238,11 @@ If @var{string} is already a multibyte string, then the value is | |||
| 237 | codes. The valid character codes for unibyte representation range from | 238 | codes. The valid character codes for unibyte representation range from |
| 238 | 0 to 255---the values that can fit in one byte. The valid character | 239 | 0 to 255---the values that can fit in one byte. The valid character |
| 239 | codes for multibyte representation range from 0 to 524287, but not all | 240 | codes for multibyte representation range from 0 to 524287, but not all |
| 240 | values in that range are valid. In particular, the values 128 through | 241 | values in that range are valid. The values 128 through 255 are not |
| 241 | 255 are not legitimate in multibyte text (though they can occur in ``raw | 242 | really proper in multibyte text, but they can occur if you do explicit |
| 242 | bytes''; @pxref{Explicit Encoding}). Only the @sc{ascii} codes 0 | 243 | encoding and decoding (@pxref{Explicit Encoding}). Some other character |
| 243 | through 127 are fully legitimate in both representations. | 244 | codes cannot occur at all in multibyte text. Only the @sc{ascii} codes |
| 245 | 0 through 127 are truly legitimate in both representations. | ||
| 244 | 246 | ||
| 245 | @defun char-valid-p charcode | 247 | @defun char-valid-p charcode |
| 246 | This returns @code{t} if @var{charcode} is valid for either one of the two | 248 | This returns @code{t} if @var{charcode} is valid for either one of the two |
| @@ -410,17 +412,9 @@ is non-@code{nil}, then each character in the region is translated | |||
| 410 | through this table, and the value returned describes the translated | 412 | through this table, and the value returned describes the translated |
| 411 | characters instead of the characters actually in the buffer. | 413 | characters instead of the characters actually in the buffer. |
| 412 | 414 | ||
| 413 | In two peculiar cases, the value includes the symbol @code{unknown}: | 415 | When a buffer contains non-@sc{ascii} characters, codes 128 through 255, |
| 414 | 416 | they are assigned the character set @code{unknown}. @xref{Explicit | |
| 415 | @itemize @bullet | 417 | Encoding}. |
| 416 | @item | ||
| 417 | When a unibyte buffer contains non-@sc{ascii} characters. | ||
| 418 | |||
| 419 | @item | ||
| 420 | When a multibyte buffer contains invalid byte-sequences (raw bytes). | ||
| 421 | @xref{Explicit Encoding}. | ||
| 422 | @end itemize | ||
| 423 | @end defun | ||
| 424 | 418 | ||
| 425 | @defun find-charset-string string &optional translation | 419 | @defun find-charset-string string &optional translation |
| 426 | This function returns a list of the character sets that appear in the | 420 | This function returns a list of the character sets that appear in the |
| @@ -690,7 +684,7 @@ encode all the character sets in the list @var{charsets}. | |||
| 690 | 684 | ||
| 691 | @defun detect-coding-region start end &optional highest | 685 | @defun detect-coding-region start end &optional highest |
| 692 | This function chooses a plausible coding system for decoding the text | 686 | This function chooses a plausible coding system for decoding the text |
| 693 | from @var{start} to @var{end}. This text should be ``raw bytes'' | 687 | from @var{start} to @var{end}. This text should be a byte sequence |
| 694 | (@pxref{Explicit Encoding}). | 688 | (@pxref{Explicit Encoding}). |
| 695 | 689 | ||
| 696 | Normally this function returns a list of coding systems that could | 690 | Normally this function returns a list of coding systems that could |
| @@ -923,90 +917,59 @@ ability to use a coding system to encode or decode the text. | |||
| 923 | You can also explicitly encode and decode text using the functions | 917 | You can also explicitly encode and decode text using the functions |
| 924 | in this section. | 918 | in this section. |
| 925 | 919 | ||
| 926 | @cindex raw bytes | ||
| 927 | The result of encoding, and the input to decoding, are not ordinary | 920 | The result of encoding, and the input to decoding, are not ordinary |
| 928 | text. They are ``raw bytes''---bytes that represent text in the same | 921 | text. They logically consist of a series of byte values; that is, a |
| 929 | way that an external file would. When a buffer contains raw bytes, it | 922 | series of characters whose codes are in the range 0 through 255. In a |
| 930 | is most natural to mark that buffer as using unibyte representation, | 923 | multibyte buffer or string, character codes 128 through 159 are |
| 931 | using @code{set-buffer-multibyte} (@pxref{Selecting a Representation}), | 924 | represented by multibyte sequences, but this is invisible to Lisp |
| 932 | but this is not required. If the buffer's contents are only temporarily | 925 | programs. |
| 933 | raw, leave the buffer multibyte, which will be correct after you decode | 926 | |
| 934 | them. | 927 | The usual way to read a file into a buffer as a sequence of bytes, so |
| 935 | 928 | you can decode the contents explicitly, is with | |
| 936 | The usual way to get raw bytes in a buffer, for explicit decoding, is | 929 | @code{insert-file-contents-literally} (@pxref{Reading from Files}); |
| 937 | to read them from a file with @code{insert-file-contents-literally} | 930 | alternatively, specify a non-@code{nil} @var{rawfile} argument when |
| 938 | (@pxref{Reading from Files}) or specify a non-@code{nil} @var{rawfile} | 931 | visiting a file with @code{find-file-noselect}. These methods result in |
| 939 | argument when visiting a file with @code{find-file-noselect}. | 932 | a unibyte buffer. |
| 940 | 933 | ||
| 941 | The usual way to use the raw bytes that result from explicitly | 934 | The usual way to use the byte sequence that results from explicitly |
| 942 | encoding text is to copy them to a file or process---for example, to | 935 | encoding text is to copy it to a file or process---for example, to write |
| 943 | write them with @code{write-region} (@pxref{Writing to Files}), and | 936 | it with @code{write-region} (@pxref{Writing to Files}), and suppress |
| 944 | suppress encoding for that @code{write-region} call by binding | 937 | encoding by binding @code{coding-system-for-write} to |
| 945 | @code{coding-system-for-write} to @code{no-conversion}. | 938 | @code{no-conversion}. |
| 946 | |||
| 947 | Raw bytes typically contain stray individual bytes with values in the | ||
| 948 | range 128 through 255, that are legitimate only as part of multibyte | ||
| 949 | sequences. Even if the buffer is multibyte, Emacs treats each such | ||
| 950 | individual byte as a character and uses the byte value as its character | ||
| 951 | code. In this way, character codes 128 through 255 can be found in a | ||
| 952 | multibyte buffer, even though they are not legitimate multibyte | ||
| 953 | character codes. | ||
| 954 | |||
| 955 | Raw bytes sometimes contain overlong byte-sequences that look like a | ||
| 956 | proper multibyte character plus extra superfluous trailing codes. For | ||
| 957 | most purposes, Emacs treats such a sequence in a buffer or string as a | ||
| 958 | single character, and if you look at its character code, you get the | ||
| 959 | value that corresponds to the multibyte character | ||
| 960 | sequence---disregarding the extra trailing codes. This is not quite | ||
| 961 | clean, but raw bytes are used only in limited ways, so as a practical | ||
| 962 | matter it is not worth the trouble to treat this case differently. | ||
| 963 | |||
| 964 | When a multibyte buffer contains illegitimate byte sequences, | ||
| 965 | sometimes insertion or deletion can cause them to coalesce into a | ||
| 966 | legitimate multibyte character. For example, suppose the buffer | ||
| 967 | contains the sequence 129 68 192, 68 being the character @samp{D}. If | ||
| 968 | you delete the @samp{D}, the bytes 129 and 192 become adjacent, and thus | ||
| 969 | become one multibyte character (Latin-1 A with grave accent). Point | ||
| 970 | moves to one side or the other of the character, since it cannot be | ||
| 971 | within a character. Don't be alarmed by this. | ||
| 972 | |||
| 973 | Some really peculiar situations prevent proper coalescence. For | ||
| 974 | example, if you narrow the buffer so that the accessible portion begins | ||
| 975 | just before the @samp{D}, then delete the @samp{D}, the two surrounding | ||
| 976 | bytes cannot coalesce because one of them is outside the accessible | ||
| 977 | portion of the buffer. In this case, the deletion cannot be done, so | ||
| 978 | @code{delete-region} signals an error. | ||
| 979 | 939 | ||
| 980 | Here are the functions to perform explicit encoding or decoding. The | 940 | Here are the functions to perform explicit encoding or decoding. The |
| 981 | decoding functions produce ``raw bytes''; the encoding functions are | 941 | decoding functions produce sequences of bytes; the encoding functions |
| 982 | meant to operate on ``raw bytes''. All of these functions discard text | 942 | are meant to operate on sequences of bytes. All of these functions |
| 983 | properties. | 943 | discard text properties. |
| 984 | 944 | ||
| 985 | @defun encode-coding-region start end coding-system | 945 | @defun encode-coding-region start end coding-system |
| 986 | This function encodes the text from @var{start} to @var{end} according | 946 | This function encodes the text from @var{start} to @var{end} according |
| 987 | to coding system @var{coding-system}. The encoded text replaces the | 947 | to coding system @var{coding-system}. The encoded text replaces the |
| 988 | original text in the buffer. The result of encoding is ``raw bytes,'' | 948 | original text in the buffer. The result of encoding is logically a |
| 989 | but the buffer remains multibyte if it was multibyte before. | 949 | sequence of bytes, but the buffer remains multibyte if it was multibyte |
| 950 | before. | ||
| 990 | @end defun | 951 | @end defun |
| 991 | 952 | ||
| 992 | @defun encode-coding-string string coding-system | 953 | @defun encode-coding-string string coding-system |
| 993 | This function encodes the text in @var{string} according to coding | 954 | This function encodes the text in @var{string} according to coding |
| 994 | system @var{coding-system}. It returns a new string containing the | 955 | system @var{coding-system}. It returns a new string containing the |
| 995 | encoded text. The result of encoding is a unibyte string of ``raw bytes.'' | 956 | encoded text. The result of encoding is a unibyte string. |
| 996 | @end defun | 957 | @end defun |
| 997 | 958 | ||
| 998 | @defun decode-coding-region start end coding-system | 959 | @defun decode-coding-region start end coding-system |
| 999 | This function decodes the text from @var{start} to @var{end} according | 960 | This function decodes the text from @var{start} to @var{end} according |
| 1000 | to coding system @var{coding-system}. The decoded text replaces the | 961 | to coding system @var{coding-system}. The decoded text replaces the |
| 1001 | original text in the buffer. To make explicit decoding useful, the text | 962 | original text in the buffer. To make explicit decoding useful, the text |
| 1002 | before decoding ought to be ``raw bytes.'' | 963 | before decoding ought to be a sequence of byte values, but both |
| 964 | multibyte and unibyte buffers are acceptable. | ||
| 1003 | @end defun | 965 | @end defun |
| 1004 | 966 | ||
| 1005 | @defun decode-coding-string string coding-system | 967 | @defun decode-coding-string string coding-system |
| 1006 | This function decodes the text in @var{string} according to coding | 968 | This function decodes the text in @var{string} according to coding |
| 1007 | system @var{coding-system}. It returns a new string containing the | 969 | system @var{coding-system}. It returns a new string containing the |
| 1008 | decoded text. To make explicit decoding useful, the contents of | 970 | decoded text. To make explicit decoding useful, the contents of |
| 1009 | @var{string} ought to be ``raw bytes.'' | 971 | @var{string} ought to be a sequence of byte values, but a multibyte |
| 972 | string is acceptable. | ||
| 1010 | @end defun | 973 | @end defun |
| 1011 | 974 | ||
| 1012 | @node Terminal I/O Encoding | 975 | @node Terminal I/O Encoding |
| @@ -1051,7 +1014,7 @@ that means do not encode terminal output. | |||
| 1051 | 1014 | ||
| 1052 | On MS-DOS and Microsoft Windows, Emacs guesses the appropriate | 1015 | On MS-DOS and Microsoft Windows, Emacs guesses the appropriate |
| 1053 | end-of-line conversion for a file by looking at the file's name. This | 1016 | end-of-line conversion for a file by looking at the file's name. This |
| 1054 | feature classifies fils as @dfn{text files} and @dfn{binary files}. By | 1017 | feature classifies files as @dfn{text files} and @dfn{binary files}. By |
| 1055 | ``binary file'' we mean a file of literal byte values that are not | 1018 | ``binary file'' we mean a file of literal byte values that are not |
| 1056 | necessarily meant to be characters; Emacs does no end-of-line conversion | 1019 | necessarily meant to be characters; Emacs does no end-of-line conversion |
| 1057 | and no character code conversion for them. On the other hand, the bytes | 1020 | and no character code conversion for them. On the other hand, the bytes |
| @@ -1157,14 +1120,14 @@ Here @var{input-method} is the input method name, a string; | |||
| 1157 | environment this input method is recommended for. (That serves only for | 1120 | environment this input method is recommended for. (That serves only for |
| 1158 | documentation purposes.) | 1121 | documentation purposes.) |
| 1159 | 1122 | ||
| 1160 | @var{title} is a string to display in the mode line while this method is | ||
| 1161 | active. @var{description} is a string describing this method and what | ||
| 1162 | it is good for. | ||
| 1163 | |||
| 1164 | @var{activate-func} is a function to call to activate this method. The | 1123 | @var{activate-func} is a function to call to activate this method. The |
| 1165 | @var{args}, if any, are passed as arguments to @var{activate-func}. All | 1124 | @var{args}, if any, are passed as arguments to @var{activate-func}. All |
| 1166 | told, the arguments to @var{activate-func} are @var{input-method} and | 1125 | told, the arguments to @var{activate-func} are @var{input-method} and |
| 1167 | the @var{args}. | 1126 | the @var{args}. |
| 1127 | |||
| 1128 | @var{title} is a string to display in the mode line while this method is | ||
| 1129 | active. @var{description} is a string describing this method and what | ||
| 1130 | it is good for. | ||
| 1168 | @end defvar | 1131 | @end defvar |
| 1169 | 1132 | ||
| 1170 | The fundamental interface to input methods is through the | 1133 | The fundamental interface to input methods is through the |
| @@ -1202,3 +1165,4 @@ Changing the locale can cause messages to appear according to the | |||
| 1202 | conventions of a different language. If the variable is @code{nil}, the | 1165 | conventions of a different language. If the variable is @code{nil}, the |
| 1203 | locale is specified by environment variables in the usual POSIX fashion. | 1166 | locale is specified by environment variables in the usual POSIX fashion. |
| 1204 | @end defvar | 1167 | @end defvar |
| 1168 | |||
diff --git a/src/ChangeLog b/src/ChangeLog index 3403e9f14f0..0d4d19feeef 100644 --- a/src/ChangeLog +++ b/src/ChangeLog | |||
| @@ -1,5 +1,8 @@ | |||
| 1 | 2000-05-11 Gerd Moellmann <gerd@gnu.org> | 1 | 2000-05-11 Gerd Moellmann <gerd@gnu.org> |
| 2 | 2 | ||
| 3 | * xdisp.c (add_to_log): Don't pass the terminating NUL byte | ||
| 4 | of the message to message_dolog. | ||
| 5 | |||
| 3 | * keyboard.c (read_char): Don't clear current message for help | 6 | * keyboard.c (read_char): Don't clear current message for help |
| 4 | events; let the code handling help events handle this. Change | 7 | events; let the code handling help events handle this. Change |
| 5 | code detecting help events that should be ignored. | 8 | code detecting help events that should be ignored. |