(Coding System Basics): Describe about rondtrip

identity of coding systems.
author: Kenichi Handa 2005-04-01 00:29:51 +0000
committer: Kenichi Handa 2005-04-01 00:29:51 +0000
commit: 6fa886202fdca940dd16e9f0b863347c4f565e8a (patch)
tree: d007919400bd9acc188ec3ea308b3463f0eb30f1 /lispref
parent: 9b06ffa3dc182766ec67ee4fe06c2f7141602bc2 (diff)
download: emacs-6fa886202fdca940dd16e9f0b863347c4f565e8a.tar.gz
emacs-6fa886202fdca940dd16e9f0b863347c4f565e8a.zip
1 files changed, 22 insertions, 0 deletions
diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi
index 70e77e0a837..91a47ea50f9 100644
--- a/lispref/nonascii.texi
+++ b/lispref/nonascii.texi
@@ -628,6 +628,28 @@ characters; for example, there are three coding systems for the Cyrillic
 conversion, but some of them leave the choice unspecified---to be chosen
 heuristically for each file, based on the data.
+In general, a coding system doesn't guarantee a roundtrip identity,
+i.e. decoding followed by encoding in the same coding system can
+result in the different byte sequence.  But there are several coding
+systems that go guarantee that the result will be the same as what you
+originally decoded.  They are:
+@quotation
+chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
+greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
+iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
+japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
+@end quotation
+Likewise, a coding systme doesn't guarantee the other way of roundtrip
+identity, i.e. encoding buffer text into a coding system followed by
+decoding again with the same coding system will produce the different
+buffer text.  For instance, when you encode Latin-2 characters by
+@code{utf-8} and decode it back by the same coding system, you'll get
+Unicode charactes (of charset @code{mule-unicode-0100-24ff}), and when
+you encode Unicode characters by @code{iso-latin-2} and decode it back
+by the same coding system, you'll get Latin-2 characters.
 @cindex end of line conversion
  @dfn{End of line conversion} handles three different conventions used
 on various systems for representing end of line in files.  The Unix
author	Kenichi Handa	2005-04-01 00:29:51 +0000
committer	Kenichi Handa	2005-04-01 00:29:51 +0000
commit	6fa886202fdca940dd16e9f0b863347c4f565e8a (patch)
tree	d007919400bd9acc188ec3ea308b3463f0eb30f1 /lispref
parent	9b06ffa3dc182766ec67ee4fe06c2f7141602bc2 (diff)
download	emacs-6fa886202fdca940dd16e9f0b863347c4f565e8a.tar.gz emacs-6fa886202fdca940dd16e9f0b863347c4f565e8a.zip

diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi index 70e77e0a837..91a47ea50f9 100644 --- a/lispref/nonascii.texi +++ b/lispref/nonascii.texi
@@ -628,6 +628,28 @@ characters; for example, there are three coding systems for the Cyrillic
628	conversion, but some of them leave the choice unspecified---to be chosen	628	conversion, but some of them leave the choice unspecified---to be chosen
629	heuristically for each file, based on the data.	629	heuristically for each file, based on the data.
630		630
		631	In general, a coding system doesn't guarantee a roundtrip identity,
		632	i.e. decoding followed by encoding in the same coding system can
		633	result in the different byte sequence. But there are several coding
		634	systems that go guarantee that the result will be the same as what you
		635	originally decoded. They are:
		636
		637	@quotation
		638	chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
		639	greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
		640	iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
		641	japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
		642	@end quotation
		643
		644	Likewise, a coding systme doesn't guarantee the other way of roundtrip
		645	identity, i.e. encoding buffer text into a coding system followed by
		646	decoding again with the same coding system will produce the different
		647	buffer text. For instance, when you encode Latin-2 characters by
		648	@code{utf-8} and decode it back by the same coding system, you'll get
		649	Unicode charactes (of charset @code{mule-unicode-0100-24ff}), and when
		650	you encode Unicode characters by @code{iso-latin-2} and decode it back
		651	by the same coding system, you'll get Latin-2 characters.
		652
631	@cindex end of line conversion	653	@cindex end of line conversion
632	@dfn{End of line conversion} handles three different conventions used	654	@dfn{End of line conversion} handles three different conventions used
633	on various systems for representing end of line in files. The Unix	655	on various systems for representing end of line in files. The Unix