diff options
| author | Kenichi Handa | 2005-04-01 00:29:51 +0000 |
|---|---|---|
| committer | Kenichi Handa | 2005-04-01 00:29:51 +0000 |
| commit | 6fa886202fdca940dd16e9f0b863347c4f565e8a (patch) | |
| tree | d007919400bd9acc188ec3ea308b3463f0eb30f1 /lispref | |
| parent | 9b06ffa3dc182766ec67ee4fe06c2f7141602bc2 (diff) | |
| download | emacs-6fa886202fdca940dd16e9f0b863347c4f565e8a.tar.gz emacs-6fa886202fdca940dd16e9f0b863347c4f565e8a.zip | |
(Coding System Basics): Describe about rondtrip
identity of coding systems.
Diffstat (limited to 'lispref')
| -rw-r--r-- | lispref/nonascii.texi | 22 |
1 files changed, 22 insertions, 0 deletions
diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi index 70e77e0a837..91a47ea50f9 100644 --- a/lispref/nonascii.texi +++ b/lispref/nonascii.texi | |||
| @@ -628,6 +628,28 @@ characters; for example, there are three coding systems for the Cyrillic | |||
| 628 | conversion, but some of them leave the choice unspecified---to be chosen | 628 | conversion, but some of them leave the choice unspecified---to be chosen |
| 629 | heuristically for each file, based on the data. | 629 | heuristically for each file, based on the data. |
| 630 | 630 | ||
| 631 | In general, a coding system doesn't guarantee a roundtrip identity, | ||
| 632 | i.e. decoding followed by encoding in the same coding system can | ||
| 633 | result in the different byte sequence. But there are several coding | ||
| 634 | systems that go guarantee that the result will be the same as what you | ||
| 635 | originally decoded. They are: | ||
| 636 | |||
| 637 | @quotation | ||
| 638 | chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule | ||
| 639 | greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3 | ||
| 640 | iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe | ||
| 641 | japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text | ||
| 642 | @end quotation | ||
| 643 | |||
| 644 | Likewise, a coding systme doesn't guarantee the other way of roundtrip | ||
| 645 | identity, i.e. encoding buffer text into a coding system followed by | ||
| 646 | decoding again with the same coding system will produce the different | ||
| 647 | buffer text. For instance, when you encode Latin-2 characters by | ||
| 648 | @code{utf-8} and decode it back by the same coding system, you'll get | ||
| 649 | Unicode charactes (of charset @code{mule-unicode-0100-24ff}), and when | ||
| 650 | you encode Unicode characters by @code{iso-latin-2} and decode it back | ||
| 651 | by the same coding system, you'll get Latin-2 characters. | ||
| 652 | |||
| 631 | @cindex end of line conversion | 653 | @cindex end of line conversion |
| 632 | @dfn{End of line conversion} handles three different conventions used | 654 | @dfn{End of line conversion} handles three different conventions used |
| 633 | on various systems for representing end of line in files. The Unix | 655 | on various systems for representing end of line in files. The Unix |