diff options
| author | Eli Zaretskii | 2021-11-12 10:53:52 +0200 |
|---|---|---|
| committer | Eli Zaretskii | 2021-11-12 10:53:52 +0200 |
| commit | 0d0125daaeb77af5aa6091059ff6d0c1ce9f6cff (patch) | |
| tree | 3e499aaac6f4af23f08fd5931d45a8f50609c91c | |
| parent | a6905e90cc3358a21726646c4ee9154e80fc96d6 (diff) | |
| download | emacs-0d0125daaeb77af5aa6091059ff6d0c1ce9f6cff.tar.gz emacs-0d0125daaeb77af5aa6091059ff6d0c1ce9f6cff.zip | |
Improve documentation of 'decode-coding-region'
* src/coding.c (Fdecode_coding_region): Doc fix.
* doc/lispref/nonascii.texi (Coding System Basics)
(Explicit Encoding): Explain the significance of using 'undecided'
in 'decode-coding-*' functions.
| -rw-r--r-- | doc/lispref/nonascii.texi | 26 | ||||
| -rw-r--r-- | src/coding.c | 9 |
2 files changed, 27 insertions, 8 deletions
diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index 6980920a7b9..24117b50014 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi | |||
| @@ -1048,9 +1048,9 @@ Alternativnyj, and KOI8. | |||
| 1048 | Every coding system specifies a particular set of character code | 1048 | Every coding system specifies a particular set of character code |
| 1049 | conversions, but the coding system @code{undecided} is special: it | 1049 | conversions, but the coding system @code{undecided} is special: it |
| 1050 | leaves the choice unspecified, to be chosen heuristically for each | 1050 | leaves the choice unspecified, to be chosen heuristically for each |
| 1051 | file, based on the file's data. The coding system @code{prefer-utf-8} | 1051 | file or string, based on the file's or string's data, when they are |
| 1052 | is like @code{undecided}, but it prefers to choose @code{utf-8} when | 1052 | decoded or encoded. The coding system @code{prefer-utf-8} is like |
| 1053 | possible. | 1053 | @code{undecided}, but it prefers to choose @code{utf-8} when possible. |
| 1054 | 1054 | ||
| 1055 | In general, a coding system doesn't guarantee roundtrip identity: | 1055 | In general, a coding system doesn't guarantee roundtrip identity: |
| 1056 | decoding a byte sequence using a coding system, then encoding the | 1056 | decoding a byte sequence using a coding system, then encoding the |
| @@ -1921,9 +1921,24 @@ length of the decoded text. If that buffer is a unibyte buffer | |||
| 1921 | the decoded text (@pxref{Text Representations}) is inserted into the | 1921 | the decoded text (@pxref{Text Representations}) is inserted into the |
| 1922 | buffer as individual bytes. | 1922 | buffer as individual bytes. |
| 1923 | 1923 | ||
| 1924 | @cindex @code{charset}, text property on buffer text | ||
| 1924 | This command puts a @code{charset} text property on the decoded text. | 1925 | This command puts a @code{charset} text property on the decoded text. |
| 1925 | The value of the property states the character set used to decode the | 1926 | The value of the property states the character set used to decode the |
| 1926 | original text. | 1927 | original text. |
| 1928 | |||
| 1929 | @cindex undecided coding-system, when decoding | ||
| 1930 | This command detects the encoding of the text if necessary. If | ||
| 1931 | @var{coding-system} is @code{undecided}, the command detects the | ||
| 1932 | encoding of the text based on the byte sequences it finds in the text, | ||
| 1933 | and also detects the type of end-of-line convention used by the text | ||
| 1934 | (@pxref{Lisp and Coding Systems, eol type}). If @var{coding-system} | ||
| 1935 | is @code{undecided-@var{eol-type}}, where @var{eol-type} is | ||
| 1936 | @code{unix}, @code{dos}, or @code{mac}, then the command detects only | ||
| 1937 | the encoding of the text. Any @var{coding-system} that doesn't | ||
| 1938 | specify @var{eol-type}, as in @code{utf-8}, causes the command to | ||
| 1939 | detect the end-of-line convention; specify the encoding completely, as | ||
| 1940 | in @code{utf-8-unix}, if the EOL convention used by the text is known | ||
| 1941 | in advance, to prevent any automatic detection. | ||
| 1927 | @end deffn | 1942 | @end deffn |
| 1928 | 1943 | ||
| 1929 | @defun decode-coding-string string coding-system &optional nocopy buffer | 1944 | @defun decode-coding-string string coding-system &optional nocopy buffer |
| @@ -1936,13 +1951,16 @@ trivial. To make explicit decoding useful, the contents of | |||
| 1936 | values, but a multibyte string is also acceptable (assuming it | 1951 | values, but a multibyte string is also acceptable (assuming it |
| 1937 | contains 8-bit bytes in their multibyte form). | 1952 | contains 8-bit bytes in their multibyte form). |
| 1938 | 1953 | ||
| 1954 | This function detects the encoding of the string if needed, like | ||
| 1955 | @code{decode-coding-region} does. | ||
| 1956 | |||
| 1939 | If optional argument @var{buffer} specifies a buffer, the decoded text | 1957 | If optional argument @var{buffer} specifies a buffer, the decoded text |
| 1940 | is inserted in that buffer after point (point does not move). In this | 1958 | is inserted in that buffer after point (point does not move). In this |
| 1941 | case, the return value is the length of the decoded text. If that | 1959 | case, the return value is the length of the decoded text. If that |
| 1942 | buffer is a unibyte buffer, the internal representation of the decoded | 1960 | buffer is a unibyte buffer, the internal representation of the decoded |
| 1943 | text is inserted into it as individual bytes. | 1961 | text is inserted into it as individual bytes. |
| 1944 | 1962 | ||
| 1945 | @cindex @code{charset}, text property | 1963 | @cindex @code{charset}, text property on strings |
| 1946 | This function puts a @code{charset} text property on the decoded text. | 1964 | This function puts a @code{charset} text property on the decoded text. |
| 1947 | The value of the property states the character set used to decode the | 1965 | The value of the property states the character set used to decode the |
| 1948 | original text: | 1966 | original text: |
diff --git a/src/coding.c b/src/coding.c index 7030a53869a..02dccf5bdb0 100644 --- a/src/coding.c +++ b/src/coding.c | |||
| @@ -9455,11 +9455,12 @@ code_convert_region (Lisp_Object start, Lisp_Object end, | |||
| 9455 | DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, | 9455 | DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, |
| 9456 | 3, 4, "r\nzCoding system: ", | 9456 | 3, 4, "r\nzCoding system: ", |
| 9457 | doc: /* Decode the current region from the specified coding system. | 9457 | doc: /* Decode the current region from the specified coding system. |
| 9458 | Interactively, prompt for the coding system to decode the region. | ||
| 9458 | 9459 | ||
| 9459 | What's meant by \"decoding\" is transforming bytes into text | 9460 | \"Decoding\" means transforming bytes into readable text (characters). |
| 9460 | (characters). If, for instance, you have a region that contains data | 9461 | If, for instance, you have a region that contains data that represents |
| 9461 | that represents the two bytes #xc2 #xa9, after calling this function | 9462 | the two bytes #xc2 #xa9, after calling this function with the utf-8 |
| 9462 | with the utf-8 coding system, the region will contain the single | 9463 | coding system, the region will contain the single |
| 9463 | character ?\\N{COPYRIGHT SIGN}. | 9464 | character ?\\N{COPYRIGHT SIGN}. |
| 9464 | 9465 | ||
| 9465 | When called from a program, takes four arguments: | 9466 | When called from a program, takes four arguments: |