diff options
| author | Chong Yidong | 2012-03-08 13:27:03 +0800 |
|---|---|---|
| committer | Chong Yidong | 2012-03-08 13:27:03 +0800 |
| commit | 483ab23014e2879d1f83620cd27e1c5f7b3c3d46 (patch) | |
| tree | 08d32c4b029737e9b7fc39904054acc64216a5d7 | |
| parent | d9507ec54e19ebd27d0161fd3a5906de08e65ad8 (diff) | |
| download | emacs-483ab23014e2879d1f83620cd27e1c5f7b3c3d46.tar.gz emacs-483ab23014e2879d1f83620cd27e1c5f7b3c3d46.zip | |
More updates to Text chapter of Lisp manual.
* doc/lispref/text.texi (Mode-Specific Indent): Document new behavior of
indent-for-tab-command. Document tab-always-indent.
(Special Properties): Copyedits.
(Checksum/Hash): Improve secure-hash doc. Do not recommend MD5.
(Parsing HTML/XML): Rename from Parsing HTML. Update doc of
libxml-parse-html-region.
| -rw-r--r-- | doc/lispref/ChangeLog | 9 | ||||
| -rw-r--r-- | doc/lispref/elisp.texi | 3 | ||||
| -rw-r--r-- | doc/lispref/text.texi | 295 | ||||
| -rw-r--r-- | doc/lispref/vol1.texi | 3 | ||||
| -rw-r--r-- | doc/lispref/vol2.texi | 3 | ||||
| -rw-r--r-- | etc/NEWS | 11 |
6 files changed, 182 insertions, 142 deletions
diff --git a/doc/lispref/ChangeLog b/doc/lispref/ChangeLog index 42ec24fac5f..16291e144d3 100644 --- a/doc/lispref/ChangeLog +++ b/doc/lispref/ChangeLog | |||
| @@ -1,3 +1,12 @@ | |||
| 1 | 2012-03-08 Chong Yidong <cyd@gnu.org> | ||
| 2 | |||
| 3 | * text.texi (Mode-Specific Indent): Document new behavior of | ||
| 4 | indent-for-tab-command. Document tab-always-indent. | ||
| 5 | (Special Properties): Copyedits. | ||
| 6 | (Checksum/Hash): Improve secure-hash doc. Do not recommend MD5. | ||
| 7 | (Parsing HTML/XML): Rename from Parsing HTML. Update doc of | ||
| 8 | libxml-parse-html-region. | ||
| 9 | |||
| 1 | 2012-03-07 Glenn Morris <rgm@gnu.org> | 10 | 2012-03-07 Glenn Morris <rgm@gnu.org> |
| 2 | 11 | ||
| 3 | * markers.texi (The Region): Briefly mention use-empty-active-region | 12 | * markers.texi (The Region): Briefly mention use-empty-active-region |
diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi index 7a444ee4039..ea304292497 100644 --- a/doc/lispref/elisp.texi +++ b/doc/lispref/elisp.texi | |||
| @@ -1054,7 +1054,8 @@ Text | |||
| 1054 | * Registers:: How registers are implemented. Accessing | 1054 | * Registers:: How registers are implemented. Accessing |
| 1055 | the text or position stored in a register. | 1055 | the text or position stored in a register. |
| 1056 | * Base 64:: Conversion to or from base 64 encoding. | 1056 | * Base 64:: Conversion to or from base 64 encoding. |
| 1057 | * Checksum/Hash:: Computing "message digests"/"checksums"/"hashes". | 1057 | * Checksum/Hash:: Computing cryptographic hashes. |
| 1058 | * Parsing HTML/XML:: Parsing HTML and XML. | ||
| 1058 | * Atomic Changes:: Installing several buffer changes "atomically". | 1059 | * Atomic Changes:: Installing several buffer changes "atomically". |
| 1059 | * Change Hooks:: Supplying functions to be run when text is changed. | 1060 | * Change Hooks:: Supplying functions to be run when text is changed. |
| 1060 | 1061 | ||
diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 88cb6a157f8..c60150cc061 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi | |||
| @@ -56,8 +56,8 @@ the character after point. | |||
| 56 | * Registers:: How registers are implemented. Accessing the text or | 56 | * Registers:: How registers are implemented. Accessing the text or |
| 57 | position stored in a register. | 57 | position stored in a register. |
| 58 | * Base 64:: Conversion to or from base 64 encoding. | 58 | * Base 64:: Conversion to or from base 64 encoding. |
| 59 | * Checksum/Hash:: Computing "message digests"/"checksums"/"hashes". | 59 | * Checksum/Hash:: Computing cryptographic hashes. |
| 60 | * Parsing HTML:: Parsing HTML and XML. | 60 | * Parsing HTML/XML:: Parsing HTML and XML. |
| 61 | * Atomic Changes:: Installing several buffer changes "atomically". | 61 | * Atomic Changes:: Installing several buffer changes "atomically". |
| 62 | * Change Hooks:: Supplying functions to be run when text is changed. | 62 | * Change Hooks:: Supplying functions to be run when text is changed. |
| 63 | @end menu | 63 | @end menu |
| @@ -2203,14 +2203,48 @@ key to indent properly for the language being edited. This section | |||
| 2203 | describes the mechanism of the @key{TAB} key and how to control it. | 2203 | describes the mechanism of the @key{TAB} key and how to control it. |
| 2204 | The functions in this section return unpredictable values. | 2204 | The functions in this section return unpredictable values. |
| 2205 | 2205 | ||
| 2206 | @defvar indent-line-function | 2206 | @deffn Command indent-for-tab-command &optional rigid |
| 2207 | This variable's value is the function to be used by @key{TAB} (and | 2207 | This is the command bound to @key{TAB} in most editing modes. Its |
| 2208 | various commands) to indent the current line. The command | 2208 | usual action is to indent the current line, but it can alternatively |
| 2209 | @code{indent-according-to-mode} does little more than call this function. | 2209 | insert a tab character or indent a region. |
| 2210 | |||
| 2211 | Here is what it does: | ||
| 2210 | 2212 | ||
| 2211 | In Lisp mode, the value is the symbol @code{lisp-indent-line}; in C | 2213 | @itemize |
| 2212 | mode, @code{c-indent-line}; in Fortran mode, @code{fortran-indent-line}. | 2214 | @item |
| 2213 | The default value is @code{indent-relative}. @xref{Auto-Indentation}. | 2215 | First, it checks whether Transient Mark mode is enabled and the region |
| 2216 | is active. If so, it called @code{indent-region} to indent all the | ||
| 2217 | text in the region (@pxref{Region Indent}). | ||
| 2218 | |||
| 2219 | @item | ||
| 2220 | Otherwise, if the indentation function in @code{indent-line-function} | ||
| 2221 | is @code{indent-to-left-margin} (a trivial command that inserts a tab | ||
| 2222 | character), or if the variable @code{tab-always-indent} specifies that | ||
| 2223 | a tab character ought to be inserted (see below), then it inserts a | ||
| 2224 | tab character. | ||
| 2225 | |||
| 2226 | @item | ||
| 2227 | Otherwise, it indents the current line; this is done by calling the | ||
| 2228 | function in @code{indent-line-function}. If the line is already | ||
| 2229 | indented, and the value of @code{tab-always-indent} is @code{complete} | ||
| 2230 | (see below), it tries completing the text at point. | ||
| 2231 | @end itemize | ||
| 2232 | |||
| 2233 | If @var{rigid} is non-@code{nil} (interactively, with a prefix | ||
| 2234 | argument), then after this command indents a line or inserts a tab, it | ||
| 2235 | also rigidly indents the entire balanced expression which starts at | ||
| 2236 | the beginning of the current line, in order to reflect the new | ||
| 2237 | indentation. This argument is ignored if the command indents the | ||
| 2238 | region. | ||
| 2239 | @end deffn | ||
| 2240 | |||
| 2241 | @defvar indent-line-function | ||
| 2242 | This variable's value is the function to be used by | ||
| 2243 | @code{indent-for-tab-command}, and various other indentation commands, | ||
| 2244 | to indent the current line. It is usually assigned by the major mode; | ||
| 2245 | for instance, Lisp mode sets it to @code{lisp-indent-line}, C mode | ||
| 2246 | sets it to @code{c-indent-line}, and so on. The default value is | ||
| 2247 | @code{indent-relative}. @xref{Auto-Indentation}. | ||
| 2214 | @end defvar | 2248 | @end defvar |
| 2215 | 2249 | ||
| 2216 | @deffn Command indent-according-to-mode | 2250 | @deffn Command indent-according-to-mode |
| @@ -2218,41 +2252,31 @@ This command calls the function in @code{indent-line-function} to | |||
| 2218 | indent the current line in a way appropriate for the current major mode. | 2252 | indent the current line in a way appropriate for the current major mode. |
| 2219 | @end deffn | 2253 | @end deffn |
| 2220 | 2254 | ||
| 2221 | @deffn Command indent-for-tab-command &optional rigid | ||
| 2222 | This command calls the function in @code{indent-line-function} to | ||
| 2223 | indent the current line; however, if that function is | ||
| 2224 | @code{indent-to-left-margin}, @code{insert-tab} is called instead. | ||
| 2225 | (That is a trivial command that inserts a tab character.) If | ||
| 2226 | @var{rigid} is non-@code{nil}, this function also rigidly indents the | ||
| 2227 | entire balanced expression that starts at the beginning of the current | ||
| 2228 | line, to reflect change in indentation of the current line. | ||
| 2229 | @end deffn | ||
| 2230 | |||
| 2231 | @deffn Command newline-and-indent | 2255 | @deffn Command newline-and-indent |
| 2232 | This function inserts a newline, then indents the new line (the one | 2256 | This function inserts a newline, then indents the new line (the one |
| 2233 | following the newline just inserted) according to the major mode. | 2257 | following the newline just inserted) according to the major mode. It |
| 2234 | 2258 | does indentation by calling @code{indent-according-to-mode}. | |
| 2235 | It does indentation by calling the current @code{indent-line-function}. | ||
| 2236 | In programming language modes, this is the same thing @key{TAB} does, | ||
| 2237 | but in some text modes, where @key{TAB} inserts a tab, | ||
| 2238 | @code{newline-and-indent} indents to the column specified by | ||
| 2239 | @code{left-margin}. | ||
| 2240 | @end deffn | 2259 | @end deffn |
| 2241 | 2260 | ||
| 2242 | @deffn Command reindent-then-newline-and-indent | 2261 | @deffn Command reindent-then-newline-and-indent |
| 2243 | @comment !!SourceFile simple.el | ||
| 2244 | This command reindents the current line, inserts a newline at point, | 2262 | This command reindents the current line, inserts a newline at point, |
| 2245 | and then indents the new line (the one following the newline just | 2263 | and then indents the new line (the one following the newline just |
| 2246 | inserted). | 2264 | inserted). It does indentation on both lines by calling |
| 2247 | 2265 | @code{indent-according-to-mode}. | |
| 2248 | This command does indentation on both lines according to the current | ||
| 2249 | major mode, by calling the current value of @code{indent-line-function}. | ||
| 2250 | In programming language modes, this is the same thing @key{TAB} does, | ||
| 2251 | but in some text modes, where @key{TAB} inserts a tab, | ||
| 2252 | @code{reindent-then-newline-and-indent} indents to the column specified | ||
| 2253 | by @code{left-margin}. | ||
| 2254 | @end deffn | 2266 | @end deffn |
| 2255 | 2267 | ||
| 2268 | @defopt tab-always-indent | ||
| 2269 | This variable can be used to customize the behavior of the @key{TAB} | ||
| 2270 | (@code{indent-for-tab-command}) command. If the value is @code{t} | ||
| 2271 | (the default), the command normally just indents the current line. If | ||
| 2272 | the value is @code{nil}, the command indents the current line only if | ||
| 2273 | point is at the left margin or in the line's indentation; otherwise, | ||
| 2274 | it inserts a tab character. If the value is @code{complete}, the | ||
| 2275 | command first tries to indent the current line, and if the line was | ||
| 2276 | already indented, it calls @code{completion-at-point} to complete the | ||
| 2277 | text at point (@pxref{Completion in Buffers}). | ||
| 2278 | @end defopt | ||
| 2279 | |||
| 2256 | @node Region Indent | 2280 | @node Region Indent |
| 2257 | @subsection Indenting an Entire Region | 2281 | @subsection Indenting an Entire Region |
| 2258 | 2282 | ||
| @@ -2827,7 +2851,7 @@ faster to process chunks of text that have the same property value. | |||
| 2827 | comparing property values. In all cases, @var{object} defaults to the | 2851 | comparing property values. In all cases, @var{object} defaults to the |
| 2828 | current buffer. | 2852 | current buffer. |
| 2829 | 2853 | ||
| 2830 | For high performance, it's very important to use the @var{limit} | 2854 | For good performance, it's very important to use the @var{limit} |
| 2831 | argument to these functions, especially the ones that search for a | 2855 | argument to these functions, especially the ones that search for a |
| 2832 | single property---otherwise, they may spend a long time scanning to the | 2856 | single property---otherwise, they may spend a long time scanning to the |
| 2833 | end of the buffer, if the property you are interested in does not change. | 2857 | end of the buffer, if the property you are interested in does not change. |
| @@ -2839,15 +2863,15 @@ different properties. | |||
| 2839 | 2863 | ||
| 2840 | @defun next-property-change pos &optional object limit | 2864 | @defun next-property-change pos &optional object limit |
| 2841 | The function scans the text forward from position @var{pos} in the | 2865 | The function scans the text forward from position @var{pos} in the |
| 2842 | string or buffer @var{object} till it finds a change in some text | 2866 | string or buffer @var{object} until it finds a change in some text |
| 2843 | property, then returns the position of the change. In other words, it | 2867 | property, then returns the position of the change. In other words, it |
| 2844 | returns the position of the first character beyond @var{pos} whose | 2868 | returns the position of the first character beyond @var{pos} whose |
| 2845 | properties are not identical to those of the character just after | 2869 | properties are not identical to those of the character just after |
| 2846 | @var{pos}. | 2870 | @var{pos}. |
| 2847 | 2871 | ||
| 2848 | If @var{limit} is non-@code{nil}, then the scan ends at position | 2872 | If @var{limit} is non-@code{nil}, then the scan ends at position |
| 2849 | @var{limit}. If there is no property change before that point, | 2873 | @var{limit}. If there is no property change before that point, this |
| 2850 | @code{next-property-change} returns @var{limit}. | 2874 | function returns @var{limit}. |
| 2851 | 2875 | ||
| 2852 | The value is @code{nil} if the properties remain unchanged all the way | 2876 | The value is @code{nil} if the properties remain unchanged all the way |
| 2853 | to the end of @var{object} and @var{limit} is @code{nil}. If the value | 2877 | to the end of @var{object} and @var{limit} is @code{nil}. If the value |
| @@ -2980,10 +3004,9 @@ character. | |||
| 2980 | @item face | 3004 | @item face |
| 2981 | @cindex face codes of text | 3005 | @cindex face codes of text |
| 2982 | @kindex face @r{(text property)} | 3006 | @kindex face @r{(text property)} |
| 2983 | You can use the property @code{face} to control the font and color of | 3007 | The @code{face} property controls the appearance of the character, |
| 2984 | text. @xref{Faces}, for more information. | 3008 | such as its font and color. @xref{Faces}. The value of the property |
| 2985 | 3009 | can be the following: | |
| 2986 | @code{face} can be the following: | ||
| 2987 | 3010 | ||
| 2988 | @itemize @bullet | 3011 | @itemize @bullet |
| 2989 | @item | 3012 | @item |
| @@ -2996,10 +3019,10 @@ face attribute name and @var{value} is a meaningful value for that | |||
| 2996 | attribute. With this feature, you do not need to create a face each | 3019 | attribute. With this feature, you do not need to create a face each |
| 2997 | time you want to specify a particular attribute for certain text. | 3020 | time you want to specify a particular attribute for certain text. |
| 2998 | @xref{Face Attributes}. | 3021 | @xref{Face Attributes}. |
| 2999 | @end itemize | ||
| 3000 | 3022 | ||
| 3001 | @code{face} can also be a list, where each element uses one of the | 3023 | @item |
| 3002 | forms listed above. | 3024 | A list, where each element uses one of the two forms listed above. |
| 3025 | @end itemize | ||
| 3003 | 3026 | ||
| 3004 | Font Lock mode (@pxref{Font Lock Mode}) works in most buffers by | 3027 | Font Lock mode (@pxref{Font Lock Mode}) works in most buffers by |
| 3005 | dynamically updating the @code{face} property of characters based on | 3028 | dynamically updating the @code{face} property of characters based on |
| @@ -3354,15 +3377,15 @@ of the text. | |||
| 3354 | Self-inserting characters normally take on the same properties as the | 3377 | Self-inserting characters normally take on the same properties as the |
| 3355 | preceding character. This is called @dfn{inheritance} of properties. | 3378 | preceding character. This is called @dfn{inheritance} of properties. |
| 3356 | 3379 | ||
| 3357 | In a Lisp program, you can do insertion with inheritance or without, | 3380 | A Lisp program can do insertion with inheritance or without, |
| 3358 | depending on your choice of insertion primitive. The ordinary text | 3381 | depending on the choice of insertion primitive. The ordinary text |
| 3359 | insertion functions such as @code{insert} do not inherit any properties. | 3382 | insertion functions, such as @code{insert}, do not inherit any |
| 3360 | They insert text with precisely the properties of the string being | 3383 | properties. They insert text with precisely the properties of the |
| 3361 | inserted, and no others. This is correct for programs that copy text | 3384 | string being inserted, and no others. This is correct for programs |
| 3362 | from one context to another---for example, into or out of the kill ring. | 3385 | that copy text from one context to another---for example, into or out |
| 3363 | To insert with inheritance, use the special primitives described in this | 3386 | of the kill ring. To insert with inheritance, use the special |
| 3364 | section. Self-inserting characters inherit properties because they work | 3387 | primitives described in this section. Self-inserting characters |
| 3365 | using these primitives. | 3388 | inherit properties because they work using these primitives. |
| 3366 | 3389 | ||
| 3367 | When you do insertion with inheritance, @emph{which} properties are | 3390 | When you do insertion with inheritance, @emph{which} properties are |
| 3368 | inherited, and from where, depends on which properties are @dfn{sticky}. | 3391 | inherited, and from where, depends on which properties are @dfn{sticky}. |
| @@ -4063,46 +4086,64 @@ The decoding functions ignore newline characters in the encoded text. | |||
| 4063 | @node Checksum/Hash | 4086 | @node Checksum/Hash |
| 4064 | @section Checksum/Hash | 4087 | @section Checksum/Hash |
| 4065 | @cindex MD5 checksum | 4088 | @cindex MD5 checksum |
| 4066 | @cindex hashing, secure | 4089 | @cindex SHA hash |
| 4067 | @cindex SHA-1 | 4090 | @cindex hash, cryptographic |
| 4068 | @cindex message digest computation | 4091 | @cindex cryptographic hash |
| 4069 | 4092 | ||
| 4070 | MD5 cryptographic checksums, or @dfn{message digests}, are 128-bit | 4093 | Emacs has built-in support for computing @dfn{cryptographic hashes}. |
| 4071 | ``fingerprints'' of a document or program. They are used to verify | 4094 | A cryptographic hash, or @dfn{checksum}, is a digital ``fingerprint'' |
| 4072 | that you have an exact and unaltered copy of the data. The algorithm | 4095 | of a piece of data (e.g.@: a block of text) which can be used to check |
| 4073 | to calculate the MD5 message digest is defined in Internet | 4096 | that you have an unaltered copy of that data. |
| 4074 | RFC@footnote{ | 4097 | |
| 4075 | For an explanation of what is an RFC, see the footnote in @ref{Base | 4098 | @cindex message digest |
| 4076 | 64}. | 4099 | Emacs supports several common cryptographic hash algorithms: MD5, |
| 4077 | }1321. This section describes the Emacs facilities for computing | 4100 | SHA-1, SHA-2, SHA-224, SHA-256, SHA-384 and SHA-512. MD5 is the |
| 4078 | message digests and other forms of ``secure hash''. | 4101 | oldest of these algorithms, and is commonly used in @dfn{message |
| 4102 | digests} to check the integrity of messages transmitted over a | ||
| 4103 | network. MD5 is not ``collision resistant'' (i.e.@: it is possible to | ||
| 4104 | deliberately design different pieces of data which have the same MD5 | ||
| 4105 | hash), so you should not used it for anything security-related. A | ||
| 4106 | similar theoretical weakness also exists in SHA-1. Therefore, for | ||
| 4107 | security-related applications you should use the other hash types, | ||
| 4108 | such as SHA-2. | ||
| 4079 | 4109 | ||
| 4080 | @defun md5 object &optional start end coding-system noerror | 4110 | @defun secure-hash algorithm object &optional start end binary |
| 4081 | This function returns the MD5 message digest of @var{object}, which | 4111 | This function returns a hash for @var{object}. The argument |
| 4082 | should be a buffer or a string. | 4112 | @var{algorithm} is a symbol stating which hash to compute: one of |
| 4113 | @code{md5}, @code{sha1}, @code{sha224}, @code{sha256}, @code{sha384} | ||
| 4114 | or @code{sha512}. The argument @var{object} should be a buffer or a | ||
| 4115 | string. | ||
| 4083 | 4116 | ||
| 4084 | The two optional arguments @var{start} and @var{end} are character | 4117 | The optional arguments @var{start} and @var{end} are character |
| 4085 | positions specifying the portion of @var{object} to compute the | 4118 | positions specifying the portion of @var{object} to compute the |
| 4086 | message digest for. If they are @code{nil} or omitted, the digest is | 4119 | message digest for. If they are @code{nil} or omitted, the hash is |
| 4087 | computed for the whole of @var{object}. | 4120 | computed for the whole of @var{object}. |
| 4088 | 4121 | ||
| 4089 | The function @code{md5} does not compute the message digest directly | 4122 | If the argument @var{binary} is omitted or @code{nil}, the function |
| 4090 | from the internal Emacs representation of the text (@pxref{Text | 4123 | returns the @dfn{text form} of the hash, as an ordinary Lisp string. |
| 4091 | Representations}). Instead, it encodes the text using a coding | 4124 | If @var{binary} is non-@code{nil}, it returns the hash in @dfn{binary |
| 4092 | system, and computes the message digest from the encoded text. The | 4125 | form}, as a sequence of bytes stored in a unibyte string. |
| 4093 | optional fourth argument @var{coding-system} specifies which coding | 4126 | |
| 4094 | system to use for encoding the text. It should be the same coding | 4127 | This function does not compute the hash directly from the internal |
| 4095 | system that you used to read the text, or that you used or will use | 4128 | representation of @var{object}'s text (@pxref{Text Representations}). |
| 4096 | when saving or sending the text. @xref{Coding Systems}, for more | 4129 | Instead, it encodes the text using a coding system (@pxref{Coding |
| 4097 | information about coding systems. | 4130 | Systems}), and computes the hash from that encoded text. If |
| 4098 | 4131 | @var{object} is a buffer, the coding system used is the one which | |
| 4099 | If @var{coding-system} is @code{nil} or omitted, the default depends | 4132 | would be chosen by default for writing the text into a file. If |
| 4100 | on @var{object}. If @var{object} is a buffer, the default for | 4133 | @var{object} is a string, the user's preferred coding system is used |
| 4101 | @var{coding-system} is whatever coding system would be chosen by | 4134 | (@pxref{Recognize Coding,,, emacs, GNU Emacs Manual}). |
| 4102 | default for writing this text into a file. If @var{object} is a | 4135 | @end defun |
| 4103 | string, the user's most preferred coding system (@pxref{Recognize | 4136 | |
| 4104 | Coding, prefer-coding-system, the description of | 4137 | @defun md5 object &optional start end coding-system noerror |
| 4105 | @code{prefer-coding-system}, emacs, GNU Emacs Manual}) is used. | 4138 | This function returns an MD5 hash. It is semi-obsolete, since for |
| 4139 | most purposes it is equivalent to calling @code{secure-hash} with | ||
| 4140 | @code{md5} as the @var{algorithm} argument. The @var{object}, | ||
| 4141 | @var{start} and @var{end} arguments have the same meanings as in | ||
| 4142 | @code{secure-hash}. | ||
| 4143 | |||
| 4144 | If @var{coding-system} is non-@code{nil}, it specifies a coding system | ||
| 4145 | to use to encode the text; if omitted or @code{nil}, the default | ||
| 4146 | coding system is used, like in @code{secure-hash}. | ||
| 4106 | 4147 | ||
| 4107 | Normally, @code{md5} signals an error if the text can't be encoded | 4148 | Normally, @code{md5} signals an error if the text can't be encoded |
| 4108 | using the specified or chosen coding system. However, if | 4149 | using the specified or chosen coding system. However, if |
| @@ -4110,65 +4151,53 @@ using the specified or chosen coding system. However, if | |||
| 4110 | coding instead. | 4151 | coding instead. |
| 4111 | @end defun | 4152 | @end defun |
| 4112 | 4153 | ||
| 4113 | @defun secure-hash algorithm object &optional start end binary | 4154 | @node Parsing HTML/XML |
| 4114 | This function provides a general interface to a variety of secure | 4155 | @section Parsing HTML and XML |
| 4115 | hashing algorithms. As well as the MD5 algorithm, it supports SHA-1, | ||
| 4116 | SHA-2, SHA-224, SHA-256, SHA-384 and SHA-512. The argument | ||
| 4117 | @var{algorithm} is a symbol stating which hash to compute. The | ||
| 4118 | arguments @var{object}, @var{start}, and @var{end} are as for the | ||
| 4119 | @code{md5} function. If the optional argument @var{binary} is | ||
| 4120 | non-@code{nil}, returns a string in binary form. | ||
| 4121 | @end defun | ||
| 4122 | |||
| 4123 | @node Parsing HTML | ||
| 4124 | @section Parsing HTML | ||
| 4125 | @cindex parsing html | 4156 | @cindex parsing html |
| 4126 | 4157 | ||
| 4158 | When Emacs is compiled with libxml2 support, the following functions | ||
| 4159 | are available to parse HTML or XML text into Lisp object trees. | ||
| 4160 | |||
| 4127 | @defun libxml-parse-html-region start end &optional base-url | 4161 | @defun libxml-parse-html-region start end &optional base-url |
| 4128 | This function provides HTML parsing via the @code{libxml2} library. | 4162 | This function parses the text between @var{start} and @var{end} as |
| 4129 | It parses ``real world'' HTML and tries to return a sensible parse tree | 4163 | HTML, and returns a list representing the HTML @dfn{parse tree}. It |
| 4130 | regardless. | 4164 | attempts to handle ``real world'' HTML by robustly coping with syntax |
| 4165 | mistakes. | ||
| 4131 | 4166 | ||
| 4132 | In addition to @var{start} and @var{end} (specifying the start and end | 4167 | The optional argument @var{base-url}, if non-@code{nil}, should be a |
| 4133 | of the region to act on), it takes an optional parameter, | 4168 | string specifying the base URL for relative URLs occurring in links. |
| 4134 | @var{base-url}, which is used to expand relative URLs in the document, | ||
| 4135 | if any. | ||
| 4136 | 4169 | ||
| 4137 | Here's an example demonstrating the structure of the parsed data you | 4170 | In the parse tree, each HTML node is represented by a list in which |
| 4138 | get out. Given this HTML document: | 4171 | the first element is a symbol representing the node name, the second |
| 4172 | element is an alist of node attributes, and the remaining elements are | ||
| 4173 | the subnodes. | ||
| 4174 | |||
| 4175 | The following example demonstrates this. Given this (malformed) HTML | ||
| 4176 | document: | ||
| 4139 | 4177 | ||
| 4140 | @example | 4178 | @example |
| 4141 | <html><hEad></head><body width=101><div class=thing>Foo<div>Yes | 4179 | <html><head></head><body width=101><div class=thing>Foo<div>Yes |
| 4142 | @end example | 4180 | @end example |
| 4143 | 4181 | ||
| 4144 | You get this parse tree: | 4182 | @noindent |
| 4183 | A call to @code{libxml-parse-html-region} returns this: | ||
| 4145 | 4184 | ||
| 4146 | @example | 4185 | @example |
| 4147 | (html | 4186 | (html () |
| 4148 | (head) | 4187 | (head ()) |
| 4149 | (body | 4188 | (body ((width . "101")) |
| 4150 | (:width . "101") | 4189 | (div ((class . "thing")) |
| 4151 | (div | 4190 | "Foo" |
| 4152 | (:class . "thing") | 4191 | (div () |
| 4153 | (text . "Foo") | 4192 | "Yes")))) |
| 4154 | (div | ||
| 4155 | (text . "Yes\n"))))) | ||
| 4156 | @end example | 4193 | @end example |
| 4157 | |||
| 4158 | It's a simple tree structure, where the @code{car} for each node is | ||
| 4159 | the name of the node, and the @code{cdr} is the value, or the list of | ||
| 4160 | values. | ||
| 4161 | |||
| 4162 | Attributes are coded the same way as child nodes, but with @samp{:} as | ||
| 4163 | the first character. | ||
| 4164 | @end defun | 4194 | @end defun |
| 4165 | 4195 | ||
| 4166 | @cindex parsing xml | 4196 | @cindex parsing xml |
| 4167 | @defun libxml-parse-xml-region start end &optional base-url | 4197 | @defun libxml-parse-xml-region start end &optional base-url |
| 4168 | 4198 | This function is the same as @code{libxml-parse-html-region}, except | |
| 4169 | This is much the same as @code{libxml-parse-html-region} above, but | 4199 | that it parses the text as XML rather than HTML (so it is stricter |
| 4170 | operates on XML instead of HTML, and is correspondingly stricter about | 4200 | about syntax). |
| 4171 | syntax. | ||
| 4172 | @end defun | 4201 | @end defun |
| 4173 | 4202 | ||
| 4174 | @node Atomic Changes | 4203 | @node Atomic Changes |
diff --git a/doc/lispref/vol1.texi b/doc/lispref/vol1.texi index a92a807b747..58092f23157 100644 --- a/doc/lispref/vol1.texi +++ b/doc/lispref/vol1.texi | |||
| @@ -1076,7 +1076,8 @@ Text | |||
| 1076 | * Registers:: How registers are implemented. Accessing | 1076 | * Registers:: How registers are implemented. Accessing |
| 1077 | the text or position stored in a register. | 1077 | the text or position stored in a register. |
| 1078 | * Base 64:: Conversion to or from base 64 encoding. | 1078 | * Base 64:: Conversion to or from base 64 encoding. |
| 1079 | * Checksum/Hash:: Computing "message digests"/"checksums"/"hashes". | 1079 | * Checksum/Hash:: Computing cryptographic hashes. |
| 1080 | * Parsing HTML/XML:: Parsing HTML and XML. | ||
| 1080 | * Atomic Changes:: Installing several buffer changes "atomically". | 1081 | * Atomic Changes:: Installing several buffer changes "atomically". |
| 1081 | * Change Hooks:: Supplying functions to be run when text is changed. | 1082 | * Change Hooks:: Supplying functions to be run when text is changed. |
| 1082 | 1083 | ||
diff --git a/doc/lispref/vol2.texi b/doc/lispref/vol2.texi index 97b21aba10b..a42b70d77a4 100644 --- a/doc/lispref/vol2.texi +++ b/doc/lispref/vol2.texi | |||
| @@ -1075,7 +1075,8 @@ Text | |||
| 1075 | * Registers:: How registers are implemented. Accessing | 1075 | * Registers:: How registers are implemented. Accessing |
| 1076 | the text or position stored in a register. | 1076 | the text or position stored in a register. |
| 1077 | * Base 64:: Conversion to or from base 64 encoding. | 1077 | * Base 64:: Conversion to or from base 64 encoding. |
| 1078 | * Checksum/Hash:: Computing "message digests"/"checksums"/"hashes". | 1078 | * Checksum/Hash:: Computing cryptographic hashes. |
| 1079 | * Parsing HTML/XML:: Parsing HTML and XML. | ||
| 1079 | * Atomic Changes:: Installing several buffer changes "atomically". | 1080 | * Atomic Changes:: Installing several buffer changes "atomically". |
| 1080 | * Change Hooks:: Supplying functions to be run when text is changed. | 1081 | * Change Hooks:: Supplying functions to be run when text is changed. |
| 1081 | 1082 | ||
| @@ -1482,13 +1482,12 @@ These require Emacs to be built with ImageMagick support. | |||
| 1482 | image-transform-fit-to-height, image-transform-fit-to-width, | 1482 | image-transform-fit-to-height, image-transform-fit-to-width, |
| 1483 | image-transform-set-rotation, image-transform-set-scale. | 1483 | image-transform-set-rotation, image-transform-set-scale. |
| 1484 | 1484 | ||
| 1485 | +++ | ||
| 1485 | ** XML and HTML parsing | 1486 | ** XML and HTML parsing |
| 1486 | If Emacs is compiled with libxml2 support, there are two new functions: | 1487 | If Emacs is compiled with libxml2 support, there are two new |
| 1487 | `libxml-parse-html-region' (which parses "real world" HTML) and | 1488 | functions: `libxml-parse-html-region' (which parses "real world" HTML) |
| 1488 | `libxml-parse-xml-region' (which parses XML). Both return an Emacs | 1489 | and `libxml-parse-xml-region' (which parses XML). Both return an |
| 1489 | Lisp parse tree. | 1490 | Emacs Lisp parse tree. |
| 1490 | |||
| 1491 | FIXME: These should be front-ended by xml.el. | ||
| 1492 | 1491 | ||
| 1493 | ** GnuTLS | 1492 | ** GnuTLS |
| 1494 | 1493 | ||