aboutsummaryrefslogtreecommitdiffstats
path: root/admin/notes/tree-sitter/html-manual/Pattern-Matching.html
diff options
context:
space:
mode:
authorYuan Fu2022-10-05 14:11:33 -0700
committerYuan Fu2022-10-05 14:11:33 -0700
commitcb183f6467401fb5ed2b7fc98ca75be9d943cbe3 (patch)
treeef42ea6ae71e0829d900ffb46d8306fbba962a8e /admin/notes/tree-sitter/html-manual/Pattern-Matching.html
parent1ea503ed4b3a14b3dc0a597cfbfe57d73b871422 (diff)
downloademacs-cb183f6467401fb5ed2b7fc98ca75be9d943cbe3.tar.gz
emacs-cb183f6467401fb5ed2b7fc98ca75be9d943cbe3.zip
Add tree-sitter admin notes
starter-guide: Guide on writing major mode features. build-module: Script for building official language definitions. html-manual: HTML version of the manual for easy access. * admin/notes/tree-sitter/build-module/README: New file. * admin/notes/tree-sitter/build-module/batch.sh: New file. * admin/notes/tree-sitter/build-module/build.sh: New file. * admin/notes/tree-sitter/starter-guide: New file. * admin/notes/tree-sitter/html-manual/Accessing-Node.html: New file. * admin/notes/tree-sitter/html-manual/Language-Definitions.html: New file. * admin/notes/tree-sitter/html-manual/Multiple-Languages.html: New file. * admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html: New file. * admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html: New file. * admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html: New file. * admin/notes/tree-sitter/html-manual/Pattern-Matching.html: New file. * admin/notes/tree-sitter/html-manual/Retrieving-Node.html: New file. * admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html: New file. * admin/notes/tree-sitter/html-manual/Using-Parser.html: New file. * admin/notes/tree-sitter/html-manual/build-manual.sh: New file. * admin/notes/tree-sitter/html-manual/manual.css: New file.
Diffstat (limited to 'admin/notes/tree-sitter/html-manual/Pattern-Matching.html')
-rw-r--r--admin/notes/tree-sitter/html-manual/Pattern-Matching.html430
1 files changed, 430 insertions, 0 deletions
diff --git a/admin/notes/tree-sitter/html-manual/Pattern-Matching.html b/admin/notes/tree-sitter/html-manual/Pattern-Matching.html
new file mode 100644
index 00000000000..e14efe71629
--- /dev/null
+++ b/admin/notes/tree-sitter/html-manual/Pattern-Matching.html
@@ -0,0 +1,430 @@
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<html>
3<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
4<head>
5<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
6<!-- This is the GNU Emacs Lisp Reference Manual
7corresponding to Emacs version 29.0.50.
8
9Copyright © 1990-1996, 1998-2022 Free Software Foundation,
10Inc.
11
12Permission is granted to copy, distribute and/or modify this document
13under the terms of the GNU Free Documentation License, Version 1.3 or
14any later version published by the Free Software Foundation; with the
15Invariant Sections being "GNU General Public License," with the
16Front-Cover Texts being "A GNU Manual," and with the Back-Cover
17Texts as in (a) below. A copy of the license is included in the
18section entitled "GNU Free Documentation License."
19
20(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
21modify this GNU manual. Buying copies from the FSF supports it in
22developing GNU and promoting software freedom." -->
23<title>Pattern Matching (GNU Emacs Lisp Reference Manual)</title>
24
25<meta name="description" content="Pattern Matching (GNU Emacs Lisp Reference Manual)">
26<meta name="keywords" content="Pattern Matching (GNU Emacs Lisp Reference Manual)">
27<meta name="resource-type" content="document">
28<meta name="distribution" content="global">
29<meta name="Generator" content="makeinfo">
30<meta name="viewport" content="width=device-width,initial-scale=1">
31
32<link href="index.html" rel="start" title="Top">
33<link href="Index.html" rel="index" title="Index">
34<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
35<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
36<link href="Multiple-Languages.html" rel="next" title="Multiple Languages">
37<link href="Accessing-Node.html" rel="prev" title="Accessing Node">
38<style type="text/css">
39<!--
40a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
41a.summary-letter {text-decoration: none}
42blockquote.indentedblock {margin-right: 0em}
43div.display {margin-left: 3.2em}
44div.example {margin-left: 3.2em}
45kbd {font-style: oblique}
46pre.display {font-family: inherit}
47pre.format {font-family: inherit}
48pre.menu-comment {font-family: serif}
49pre.menu-preformatted {font-family: serif}
50span.nolinebreak {white-space: nowrap}
51span.roman {font-family: initial; font-weight: normal}
52span.sansserif {font-family: sans-serif; font-weight: normal}
53span:hover a.copiable-anchor {visibility: visible}
54ul.no-bullet {list-style: none}
55-->
56</style>
57<link rel="stylesheet" type="text/css" href="./manual.css">
58
59
60</head>
61
62<body lang="en">
63<div class="section" id="Pattern-Matching">
64<div class="header">
65<p>
66Next: <a href="Multiple-Languages.html" accesskey="n" rel="next">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html" accesskey="p" rel="prev">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
67</div>
68<hr>
69<span id="Pattern-Matching-Tree_002dsitter-Nodes"></span><h3 class="section">37.5 Pattern Matching Tree-sitter Nodes</h3>
70
71<p>Tree-sitter let us pattern match with a small declarative language.
72Pattern matching consists of two steps: first tree-sitter matches a
73<em>pattern</em> against nodes in the syntax tree, then it <em>captures</em>
74specific nodes in that pattern and returns the captured nodes.
75</p>
76<p>We describe first how to write the most basic query pattern and how to
77capture nodes in a pattern, then the pattern-match function, finally
78more advanced pattern syntax.
79</p>
80<span id="Basic-query-syntax"></span><h3 class="heading">Basic query syntax</h3>
81
82<span id="index-Tree_002dsitter-query-syntax"></span>
83<span id="index-Tree_002dsitter-query-pattern"></span>
84<p>A <em>query</em> consists of multiple <em>patterns</em>. Each pattern is an
85s-expression that matches a certain node in the syntax node. A
86pattern has the following shape:
87</p>
88<div class="example">
89<pre class="example">(<var>type</var> <var>child</var>...)
90</pre></div>
91
92<p>For example, a pattern that matches a <code>binary_expression</code> node that
93contains <code>number_literal</code> child nodes would look like
94</p>
95<div class="example">
96<pre class="example">(binary_expression (number_literal))
97</pre></div>
98
99<p>To <em>capture</em> a node in the query pattern above, append
100<code>@capture-name</code> after the node pattern you want to capture. For
101example,
102</p>
103<div class="example">
104<pre class="example">(binary_expression (number_literal) @number-in-exp)
105</pre></div>
106
107<p>captures <code>number_literal</code> nodes that are inside a
108<code>binary_expression</code> node with capture name <code>number-in-exp</code>.
109</p>
110<p>We can capture the <code>binary_expression</code> node too, with capture
111name <code>biexp</code>:
112</p>
113<div class="example">
114<pre class="example">(binary_expression
115 (number_literal) @number-in-exp) @biexp
116</pre></div>
117
118<span id="Query-function"></span><h3 class="heading">Query function</h3>
119
120<p>Now we can introduce the query functions.
121</p>
122<dl class="def">
123<dt id="index-treesit_002dquery_002dcapture"><span class="category">Function: </span><span><strong>treesit-query-capture</strong> <em>node query &amp;optional beg end node-only</em><a href='#index-treesit_002dquery_002dcapture' class='copiable-anchor'> &para;</a></span></dt>
124<dd><p>This function matches patterns in <var>query</var> in <var>node</var>.
125Parameter <var>query</var> can be either a string, a s-expression, or a
126compiled query object. For now, we focus on the string syntax;
127s-expression syntax and compiled query are described at the end of the
128section.
129</p>
130<p>Parameter <var>node</var> can also be a parser or a language symbol. A
131parser means using its root node, a language symbol means find or
132create a parser for that language in the current buffer, and use the
133root node.
134</p>
135<p>The function returns all captured nodes in a list of
136<code>(<var>capture_name</var> . <var>node</var>)</code>. If <var>node-only</var> is
137non-nil, a list of node is returned instead. If <var>beg</var> and
138<var>end</var> are both non-nil, this function only pattern matches nodes
139in that range.
140</p>
141<span id="index-treesit_002dquery_002derror"></span>
142<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is
143malformed. The signal data contains a description of the specific
144error. You can use <code>treesit-query-validate</code> to debug the query.
145</p></dd></dl>
146
147<p>For example, suppose <var>node</var>&rsquo;s content is <code>1 + 2</code>, and
148<var>query</var> is
149</p>
150<div class="example">
151<pre class="example">(setq query
152 &quot;(binary_expression
153 (number_literal) @number-in-exp) @biexp&quot;)
154</pre></div>
155
156<p>Querying that query would return
157</p>
158<div class="example">
159<pre class="example">(treesit-query-capture node query)
160 &rArr; ((biexp . <var>&lt;node for &quot;1 + 2&quot;&gt;</var>)
161 (number-in-exp . <var>&lt;node for &quot;1&quot;&gt;</var>)
162 (number-in-exp . <var>&lt;node for &quot;2&quot;&gt;</var>))
163</pre></div>
164
165<p>As we mentioned earlier, a <var>query</var> could contain multiple
166patterns. For example, it could have two top-level patterns:
167</p>
168<div class="example">
169<pre class="example">(setq query
170 &quot;(binary_expression) @biexp
171 (number_literal) @number @biexp&quot;)
172</pre></div>
173
174<dl class="def">
175<dt id="index-treesit_002dquery_002dstring"><span class="category">Function: </span><span><strong>treesit-query-string</strong> <em>string query language</em><a href='#index-treesit_002dquery_002dstring' class='copiable-anchor'> &para;</a></span></dt>
176<dd><p>This function parses <var>string</var> with <var>language</var>, pattern matches
177its root node with <var>query</var>, and returns the result.
178</p></dd></dl>
179
180<span id="More-query-syntax"></span><h3 class="heading">More query syntax</h3>
181
182<p>Besides node type and capture, tree-sitter&rsquo;s query syntax can express
183anonymous node, field name, wildcard, quantification, grouping,
184alternation, anchor, and predicate.
185</p>
186<span id="Anonymous-node"></span><h4 class="subheading">Anonymous node</h4>
187
188<p>An anonymous node is written verbatim, surrounded by quotes. A
189pattern matching (and capturing) keyword <code>return</code> would be
190</p>
191<div class="example">
192<pre class="example">&quot;return&quot; @keyword
193</pre></div>
194
195<span id="Wild-card"></span><h4 class="subheading">Wild card</h4>
196
197<p>In a query pattern, &lsquo;<samp>(_)</samp>&rsquo; matches any named node, and &lsquo;<samp>_</samp>&rsquo;
198matches any named and anonymous node. For example, to capture any
199named child of a <code>binary_expression</code> node, the pattern would be
200</p>
201<div class="example">
202<pre class="example">(binary_expression (_) @in_biexp)
203</pre></div>
204
205<span id="Field-name"></span><h4 class="subheading">Field name</h4>
206
207<p>We can capture child nodes that has specific field names:
208</p>
209<div class="example">
210<pre class="example">(function_definition
211 declarator: (_) @func-declarator
212 body: (_) @func-body)
213</pre></div>
214
215<p>We can also capture a node that doesn&rsquo;t have certain field, say, a
216<code>function_definition</code> without a <code>body</code> field.
217</p>
218<div class="example">
219<pre class="example">(function_definition !body) @func-no-body
220</pre></div>
221
222<span id="Quantify-node"></span><h4 class="subheading">Quantify node</h4>
223
224<p>Tree-sitter recognizes quantification operators &lsquo;<samp>*</samp>&rsquo;, &lsquo;<samp>+</samp>&rsquo; and
225&lsquo;<samp>?</samp>&rsquo;. Their meanings are the same as in regular expressions:
226&lsquo;<samp>*</samp>&rsquo; matches the preceding pattern zero or more times, &lsquo;<samp>+</samp>&rsquo;
227matches one or more times, and &lsquo;<samp>?</samp>&rsquo; matches zero or one time.
228</p>
229<p>For example, this pattern matches <code>type_declaration</code> nodes
230that has <em>zero or more</em> <code>long</code> keyword.
231</p>
232<div class="example">
233<pre class="example">(type_declaration &quot;long&quot;*) @long-type
234</pre></div>
235
236<p>And this pattern matches a type declaration that has zero or one
237<code>long</code> keyword:
238</p>
239<div class="example">
240<pre class="example">(type_declaration &quot;long&quot;?) @long-type
241</pre></div>
242
243<span id="Grouping"></span><h4 class="subheading">Grouping</h4>
244
245<p>Similar to groups in regular expression, we can bundle patterns into a
246group and apply quantification operators to it. For example, to
247express a comma separated list of identifiers, one could write
248</p>
249<div class="example">
250<pre class="example">(identifier) (&quot;,&quot; (identifier))*
251</pre></div>
252
253<span id="Alternation"></span><h4 class="subheading">Alternation</h4>
254
255<p>Again, similar to regular expressions, we can express &ldquo;match anyone
256from this group of patterns&rdquo; in the query pattern. The syntax is a
257list of patterns enclosed in square brackets. For example, to capture
258some keywords in C, the query pattern would be
259</p>
260<div class="example">
261<pre class="example">[
262 &quot;return&quot;
263 &quot;break&quot;
264 &quot;if&quot;
265 &quot;else&quot;
266] @keyword
267</pre></div>
268
269<span id="Anchor"></span><h4 class="subheading">Anchor</h4>
270
271<p>The anchor operator &lsquo;<samp>.</samp>&rsquo; can be used to enforce juxtaposition,
272i.e., to enforce two things to be directly next to each other. The
273two &ldquo;things&rdquo; can be two nodes, or a child and the end of its parent.
274For example, to capture the first child, the last child, or two
275adjacent children:
276</p>
277<div class="example">
278<pre class="example">;; Anchor the child with the end of its parent.
279(compound_expression (_) @last-child .)
280
281;; Anchor the child with the beginning of its parent.
282(compound_expression . (_) @first-child)
283
284;; Anchor two adjacent children.
285(compound_expression
286 (_) @prev-child
287 .
288 (_) @next-child)
289</pre></div>
290
291<p>Note that the enforcement of juxtaposition ignores any anonymous
292nodes.
293</p>
294<span id="Predicate"></span><h4 class="subheading">Predicate</h4>
295
296<p>We can add predicate constraints to a pattern. For example, if we use
297the following query pattern
298</p>
299<div class="example">
300<pre class="example">(
301 (array . (_) @first (_) @last .)
302 (#equal @first @last)
303)
304</pre></div>
305
306<p>Then tree-sitter only matches arrays where the first element equals to
307the last element. To attach a predicate to a pattern, we need to
308group then together. A predicate always starts with a &lsquo;<samp>#</samp>&rsquo;.
309Currently there are two predicates, <code>#equal</code> and <code>#match</code>.
310</p>
311<dl class="def">
312<dt id="index-equal-1"><span class="category">Predicate: </span><span><strong>equal</strong> <em>arg1 arg2</em><a href='#index-equal-1' class='copiable-anchor'> &para;</a></span></dt>
313<dd><p>Matches if <var>arg1</var> equals to <var>arg2</var>. Arguments can be either a
314string or a capture name. Capture names represent the text that the
315captured node spans in the buffer.
316</p></dd></dl>
317
318<dl class="def">
319<dt id="index-match"><span class="category">Predicate: </span><span><strong>match</strong> <em>regexp capture-name</em><a href='#index-match' class='copiable-anchor'> &para;</a></span></dt>
320<dd><p>Matches if the text that <var>capture-name</var>’s node spans in the buffer
321matches regular expression <var>regexp</var>. Matching is case-sensitive.
322</p></dd></dl>
323
324<p>Note that a predicate can only refer to capture names appeared in the
325same pattern. Indeed, it makes little sense to refer to capture names
326in other patterns anyway.
327</p>
328<span id="S_002dexpression-patterns"></span><h3 class="heading">S-expression patterns</h3>
329
330<p>Besides strings, Emacs provides a s-expression based syntax for query
331patterns. It largely resembles the string-based syntax. For example,
332the following pattern
333</p>
334<div class="example">
335<pre class="example">(treesit-query-capture
336 node &quot;(addition_expression
337 left: (_) @left
338 \&quot;+\&quot; @plus-sign
339 right: (_) @right) @addition
340
341 [\&quot;return\&quot; \&quot;break\&quot;] @keyword&quot;)
342</pre></div>
343
344<p>is equivalent to
345</p>
346<div class="example">
347<pre class="example">(treesit-query-capture
348 node '((addition_expression
349 left: (_) @left
350 &quot;+&quot; @plus-sign
351 right: (_) @right) @addition
352
353 [&quot;return&quot; &quot;break&quot;] @keyword))
354</pre></div>
355
356<p>Most pattern syntax can be written directly as strange but
357never-the-less valid s-expressions. Only a few of them needs
358modification:
359</p>
360<ul>
361<li> Anchor &lsquo;<samp>.</samp>&rsquo; is written as <code>:anchor</code>.
362</li><li> &lsquo;<samp>?</samp>&rsquo; is written as &lsquo;<samp>:?</samp>&rsquo;.
363</li><li> &lsquo;<samp>*</samp>&rsquo; is written as &lsquo;<samp>:*</samp>&rsquo;.
364</li><li> &lsquo;<samp>+</samp>&rsquo; is written as &lsquo;<samp>:+</samp>&rsquo;.
365</li><li> <code>#equal</code> is written as <code>:equal</code>. In general, predicates
366change their &lsquo;<samp>#</samp>&rsquo; to &lsquo;<samp>:</samp>&rsquo;.
367</li></ul>
368
369<p>For example,
370</p>
371<div class="example">
372<pre class="example">&quot;(
373 (compound_expression . (_) @first (_)* @rest)
374 (#match \&quot;love\&quot; @first)
375 )&quot;
376</pre></div>
377
378<p>is written in s-expression as
379</p>
380<div class="example">
381<pre class="example">'((
382 (compound_expression :anchor (_) @first (_) :* @rest)
383 (:match &quot;love&quot; @first)
384 ))
385</pre></div>
386
387<span id="Compiling-queries"></span><h3 class="heading">Compiling queries</h3>
388
389<p>If a query will be used repeatedly, especially in tight loops, it is
390important to compile that query, because a compiled query is much
391faster than an uncompiled one. A compiled query can be used anywhere
392a query is accepted.
393</p>
394<dl class="def">
395<dt id="index-treesit_002dquery_002dcompile"><span class="category">Function: </span><span><strong>treesit-query-compile</strong> <em>language query</em><a href='#index-treesit_002dquery_002dcompile' class='copiable-anchor'> &para;</a></span></dt>
396<dd><p>This function compiles <var>query</var> for <var>language</var> into a compiled
397query object and returns it.
398</p>
399<p>This function raise a <var>treesit-query-error</var> if <var>query</var> is
400malformed. The signal data contains a description of the specific
401error. You can use <code>treesit-query-validate</code> to debug the query.
402</p></dd></dl>
403
404<dl class="def">
405<dt id="index-treesit_002dquery_002dexpand"><span class="category">Function: </span><span><strong>treesit-query-expand</strong> <em>query</em><a href='#index-treesit_002dquery_002dexpand' class='copiable-anchor'> &para;</a></span></dt>
406<dd><p>This function expands the s-expression <var>query</var> into a string
407query.
408</p></dd></dl>
409
410<dl class="def">
411<dt id="index-treesit_002dpattern_002dexpand"><span class="category">Function: </span><span><strong>treesit-pattern-expand</strong> <em>pattern</em><a href='#index-treesit_002dpattern_002dexpand' class='copiable-anchor'> &para;</a></span></dt>
412<dd><p>This function expands the s-expression <var>pattern</var> into a string
413pattern.
414</p></dd></dl>
415
416<p>Finally, tree-sitter project&rsquo;s documentation about
417pattern-matching can be found at
418<a href="https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries">https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries</a>.
419</p>
420</div>
421<hr>
422<div class="header">
423<p>
424Next: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node.html">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
425</div>
426
427
428
429</body>
430</html>