aboutsummaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorLars Magne Ingebrigtsen2010-09-10 18:44:35 +0200
committerLars Magne Ingebrigtsen2010-09-10 18:44:35 +0200
commit381408e2192b8fd606babaa8c9a103186589d708 (patch)
tree488a49b786d5cffcd0b068a527ec1ebe8339114a /doc
parent36f7d3666905e1447a2e80957735a1ade23c894c (diff)
downloademacs-381408e2192b8fd606babaa8c9a103186589d708.tar.gz
emacs-381408e2192b8fd606babaa8c9a103186589d708.zip
Add support for the libxml2 library.
This adds the html-parse-string and xml-parse-string functions in the new file src/xml.c, as well as autoconf detection of the library.
Diffstat (limited to 'doc')
-rw-r--r--doc/lispref/text.texi44
1 files changed, 44 insertions, 0 deletions
diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index 142a071f494..ff4e65d299f 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -59,6 +59,7 @@ the character after point.
59 position stored in a register. 59 position stored in a register.
60* Base 64:: Conversion to or from base 64 encoding. 60* Base 64:: Conversion to or from base 64 encoding.
61* MD5 Checksum:: Compute the MD5 "message digest"/"checksum". 61* MD5 Checksum:: Compute the MD5 "message digest"/"checksum".
62* Parsing HTML:: Parsing HTML and XML.
62* Atomic Changes:: Installing several buffer changes "atomically". 63* Atomic Changes:: Installing several buffer changes "atomically".
63* Change Hooks:: Supplying functions to be run when text is changed. 64* Change Hooks:: Supplying functions to be run when text is changed.
64@end menu 65@end menu
@@ -4106,6 +4107,49 @@ using the specified or chosen coding system. However, if
4106coding instead. 4107coding instead.
4107@end defun 4108@end defun
4108 4109
4110@node Parsing HTML
4111@section Parsing HTML
4112@cindex parsing html
4113@cindex parsing xml
4114
4115Emacs provides an interface to the @code{libxml2} library via two
4116functions: @code{html-parse-buffer} and @code{xml-parse-buffer}. The
4117HTML function will parse ``real world'' HTML and try to return a
4118sensible parse tree, while the XML function is somewhat stricter about
4119syntax.
4120
4121They both take a two optional parameter. The first is a buffer, and
4122the second is a base URL to be used to expand relative URLs in the
4123document, if any.
4124
4125Here's an example demonstrating the structure of the parsed data you
4126get out. Given this HTML document:
4127
4128@example
4129<html><hEad></head><body width=101><div class=thing>Foo<div>Yes
4130@end example
4131
4132You get this parse tree:
4133
4134@example
4135(html
4136 (head)
4137 (body
4138 (:width . "101")
4139 (div
4140 (:class . "thing")
4141 (text . "Foo")
4142 (div
4143 (text . "Yes\n")))))
4144@end example
4145
4146It's a simple tree structure, where the @code{car} for each node is
4147the name of the node, and the @code{cdr} is the value, or the list of
4148values.
4149
4150Attributes are coded the same way as child nodes, but with @samp{:} as
4151the first character.
4152
4109@node Atomic Changes 4153@node Atomic Changes
4110@section Atomic Change Groups 4154@section Atomic Change Groups
4111@cindex atomic changes 4155@cindex atomic changes