diff options
| author | Lars Magne Ingebrigtsen | 2010-09-10 18:44:35 +0200 |
|---|---|---|
| committer | Lars Magne Ingebrigtsen | 2010-09-10 18:44:35 +0200 |
| commit | 381408e2192b8fd606babaa8c9a103186589d708 (patch) | |
| tree | 488a49b786d5cffcd0b068a527ec1ebe8339114a /doc | |
| parent | 36f7d3666905e1447a2e80957735a1ade23c894c (diff) | |
| download | emacs-381408e2192b8fd606babaa8c9a103186589d708.tar.gz emacs-381408e2192b8fd606babaa8c9a103186589d708.zip | |
Add support for the libxml2 library.
This adds the html-parse-string and xml-parse-string functions in the
new file src/xml.c, as well as autoconf detection of the library.
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/lispref/text.texi | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi index 142a071f494..ff4e65d299f 100644 --- a/doc/lispref/text.texi +++ b/doc/lispref/text.texi | |||
| @@ -59,6 +59,7 @@ the character after point. | |||
| 59 | position stored in a register. | 59 | position stored in a register. |
| 60 | * Base 64:: Conversion to or from base 64 encoding. | 60 | * Base 64:: Conversion to or from base 64 encoding. |
| 61 | * MD5 Checksum:: Compute the MD5 "message digest"/"checksum". | 61 | * MD5 Checksum:: Compute the MD5 "message digest"/"checksum". |
| 62 | * Parsing HTML:: Parsing HTML and XML. | ||
| 62 | * Atomic Changes:: Installing several buffer changes "atomically". | 63 | * Atomic Changes:: Installing several buffer changes "atomically". |
| 63 | * Change Hooks:: Supplying functions to be run when text is changed. | 64 | * Change Hooks:: Supplying functions to be run when text is changed. |
| 64 | @end menu | 65 | @end menu |
| @@ -4106,6 +4107,49 @@ using the specified or chosen coding system. However, if | |||
| 4106 | coding instead. | 4107 | coding instead. |
| 4107 | @end defun | 4108 | @end defun |
| 4108 | 4109 | ||
| 4110 | @node Parsing HTML | ||
| 4111 | @section Parsing HTML | ||
| 4112 | @cindex parsing html | ||
| 4113 | @cindex parsing xml | ||
| 4114 | |||
| 4115 | Emacs provides an interface to the @code{libxml2} library via two | ||
| 4116 | functions: @code{html-parse-buffer} and @code{xml-parse-buffer}. The | ||
| 4117 | HTML function will parse ``real world'' HTML and try to return a | ||
| 4118 | sensible parse tree, while the XML function is somewhat stricter about | ||
| 4119 | syntax. | ||
| 4120 | |||
| 4121 | They both take a two optional parameter. The first is a buffer, and | ||
| 4122 | the second is a base URL to be used to expand relative URLs in the | ||
| 4123 | document, if any. | ||
| 4124 | |||
| 4125 | Here's an example demonstrating the structure of the parsed data you | ||
| 4126 | get out. Given this HTML document: | ||
| 4127 | |||
| 4128 | @example | ||
| 4129 | <html><hEad></head><body width=101><div class=thing>Foo<div>Yes | ||
| 4130 | @end example | ||
| 4131 | |||
| 4132 | You get this parse tree: | ||
| 4133 | |||
| 4134 | @example | ||
| 4135 | (html | ||
| 4136 | (head) | ||
| 4137 | (body | ||
| 4138 | (:width . "101") | ||
| 4139 | (div | ||
| 4140 | (:class . "thing") | ||
| 4141 | (text . "Foo") | ||
| 4142 | (div | ||
| 4143 | (text . "Yes\n"))))) | ||
| 4144 | @end example | ||
| 4145 | |||
| 4146 | It's a simple tree structure, where the @code{car} for each node is | ||
| 4147 | the name of the node, and the @code{cdr} is the value, or the list of | ||
| 4148 | values. | ||
| 4149 | |||
| 4150 | Attributes are coded the same way as child nodes, but with @samp{:} as | ||
| 4151 | the first character. | ||
| 4152 | |||
| 4109 | @node Atomic Changes | 4153 | @node Atomic Changes |
| 4110 | @section Atomic Change Groups | 4154 | @section Atomic Change Groups |
| 4111 | @cindex atomic changes | 4155 | @cindex atomic changes |