aboutsummaryrefslogtreecommitdiffstats
path: root/admin/notes/tree-sitter/html-manual/Pattern-Matching.html
diff options
context:
space:
mode:
authorEli Zaretskii2023-03-19 08:09:33 +0200
committerEli Zaretskii2023-03-19 08:09:33 +0200
commit0bebd0e5f09b6fbed2e54f9b8464e93bdd6ad11e (patch)
tree41d19f431cd3e1e293d9f9a8f829e84ad100fa63 /admin/notes/tree-sitter/html-manual/Pattern-Matching.html
parent6674c362ad94373dacd22b7fd426406539e8d957 (diff)
downloademacs-0bebd0e5f09b6fbed2e54f9b8464e93bdd6ad11e.tar.gz
emacs-0bebd0e5f09b6fbed2e54f9b8464e93bdd6ad11e.zip
; Remove 'build-module' and 'html-manual' directories from 'admin'
These files were temporarily in the repository and are no longer needed, once they fulfilled their job.
Diffstat (limited to 'admin/notes/tree-sitter/html-manual/Pattern-Matching.html')
-rw-r--r--admin/notes/tree-sitter/html-manual/Pattern-Matching.html450
1 files changed, 0 insertions, 450 deletions
diff --git a/admin/notes/tree-sitter/html-manual/Pattern-Matching.html b/admin/notes/tree-sitter/html-manual/Pattern-Matching.html
deleted file mode 100644
index 9ef536b79dd..00000000000
--- a/admin/notes/tree-sitter/html-manual/Pattern-Matching.html
+++ /dev/null
@@ -1,450 +0,0 @@
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<html>
3<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
4<head>
5<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
6<!-- This is the GNU Emacs Lisp Reference Manual
7corresponding to Emacs version 29.0.50.
8
9Copyright © 1990-1996, 1998-2023 Free Software Foundation, Inc.
10
11Permission is granted to copy, distribute and/or modify this document
12under the terms of the GNU Free Documentation License, Version 1.3 or
13any later version published by the Free Software Foundation; with the
14Invariant Sections being "GNU General Public License," with the
15Front-Cover Texts being "A GNU Manual," and with the Back-Cover
16Texts as in (a) below. A copy of the license is included in the
17section entitled "GNU Free Documentation License."
18
19(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
20modify this GNU manual. Buying copies from the FSF supports it in
21developing GNU and promoting software freedom." -->
22<title>Pattern Matching (GNU Emacs Lisp Reference Manual)</title>
23
24<meta name="description" content="Pattern Matching (GNU Emacs Lisp Reference Manual)">
25<meta name="keywords" content="Pattern Matching (GNU Emacs Lisp Reference Manual)">
26<meta name="resource-type" content="document">
27<meta name="distribution" content="global">
28<meta name="Generator" content="makeinfo">
29<meta name="viewport" content="width=device-width,initial-scale=1">
30
31<link href="index.html" rel="start" title="Top">
32<link href="Index.html" rel="index" title="Index">
33<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
34<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
35<link href="Multiple-Languages.html" rel="next" title="Multiple Languages">
36<link href="Accessing-Node-Information.html" rel="prev" title="Accessing Node Information">
37<style type="text/css">
38<!--
39a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
40a.summary-letter {text-decoration: none}
41blockquote.indentedblock {margin-right: 0em}
42div.display {margin-left: 3.2em}
43div.example {margin-left: 3.2em}
44kbd {font-style: oblique}
45pre.display {font-family: inherit}
46pre.format {font-family: inherit}
47pre.menu-comment {font-family: serif}
48pre.menu-preformatted {font-family: serif}
49span.nolinebreak {white-space: nowrap}
50span.roman {font-family: initial; font-weight: normal}
51span.sansserif {font-family: sans-serif; font-weight: normal}
52span:hover a.copiable-anchor {visibility: visible}
53ul.no-bullet {list-style: none}
54-->
55</style>
56<link rel="stylesheet" type="text/css" href="./manual.css">
57
58
59</head>
60
61<body lang="en">
62<div class="section" id="Pattern-Matching">
63<div class="header">
64<p>
65Next: <a href="Multiple-Languages.html" accesskey="n" rel="next">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node-Information.html" accesskey="p" rel="prev">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
66</div>
67<hr>
68<span id="Pattern-Matching-Tree_002dsitter-Nodes"></span><h3 class="section">37.5 Pattern Matching Tree-sitter Nodes</h3>
69<span id="index-pattern-matching-with-tree_002dsitter-nodes"></span>
70
71<span id="index-capturing_002c-tree_002dsitter-node"></span>
72<p>Tree-sitter lets Lisp programs match patterns using a small
73declarative language. This pattern matching consists of two steps:
74first tree-sitter matches a <em>pattern</em> against nodes in the syntax
75tree, then it <em>captures</em> specific nodes that matched the pattern
76and returns the captured nodes.
77</p>
78<p>We describe first how to write the most basic query pattern and how to
79capture nodes in a pattern, then the pattern-matching function, and
80finally the more advanced pattern syntax.
81</p>
82<span id="Basic-query-syntax"></span><h3 class="heading">Basic query syntax</h3>
83
84<span id="index-tree_002dsitter-query-pattern-syntax"></span>
85<span id="index-pattern-syntax_002c-tree_002dsitter-query"></span>
86<span id="index-query_002c-tree_002dsitter"></span>
87<p>A <em>query</em> consists of multiple <em>patterns</em>. Each pattern is an
88s-expression that matches a certain node in the syntax node. A
89pattern has the form <code>(<var>type</var>&nbsp;(<var>child</var>&hellip;))</code><!-- /@w -->
90</p>
91<p>For example, a pattern that matches a <code>binary_expression</code> node that
92contains <code>number_literal</code> child nodes would look like
93</p>
94<div class="example">
95<pre class="example">(binary_expression (number_literal))
96</pre></div>
97
98<p>To <em>capture</em> a node using the query pattern above, append
99<code>@<var>capture-name</var></code> after the node pattern you want to
100capture. For example,
101</p>
102<div class="example">
103<pre class="example">(binary_expression (number_literal) @number-in-exp)
104</pre></div>
105
106<p>captures <code>number_literal</code> nodes that are inside a
107<code>binary_expression</code> node with the capture name
108<code>number-in-exp</code>.
109</p>
110<p>We can capture the <code>binary_expression</code> node as well, with, for
111example, the capture name <code>biexp</code>:
112</p>
113<div class="example">
114<pre class="example">(binary_expression
115 (number_literal) @number-in-exp) @biexp
116</pre></div>
117
118<span id="Query-function"></span><h3 class="heading">Query function</h3>
119
120<span id="index-query-functions_002c-tree_002dsitter"></span>
121<p>Now we can introduce the <em>query functions</em>.
122</p>
123<dl class="def">
124<dt id="index-treesit_002dquery_002dcapture"><span class="category">Function: </span><span><strong>treesit-query-capture</strong> <em>node query &amp;optional beg end node-only</em><a href='#index-treesit_002dquery_002dcapture' class='copiable-anchor'> &para;</a></span></dt>
125<dd><p>This function matches patterns in <var>query</var> within <var>node</var>.
126The argument <var>query</var> can be either a string, a s-expression, or a
127compiled query object. For now, we focus on the string syntax;
128s-expression syntax and compiled query are described at the end of the
129section.
130</p>
131<p>The argument <var>node</var> can also be a parser or a language symbol. A
132parser means using its root node, a language symbol means find or
133create a parser for that language in the current buffer, and use the
134root node.
135</p>
136<p>The function returns all the captured nodes in a list of the form
137<code>(<var><span class="nolinebreak">capture_name</span></var>&nbsp;.&nbsp;<var>node</var>)</code><!-- /@w -->. If <var>node-only</var> is
138non-<code>nil</code>, it returns the list of nodes instead. By default the
139entire text of <var>node</var> is searched, but if <var>beg</var> and <var>end</var>
140are both non-<code>nil</code>, they specify the region of buffer text where
141this function should match nodes. Any matching node whose span
142overlaps with the region between <var>beg</var> and <var>end</var> are captured,
143it doesn&rsquo;t have to be completely in the region.
144</p>
145<span id="index-treesit_002dquery_002derror"></span>
146<span id="index-treesit_002dquery_002dvalidate"></span>
147<p>This function raises the <code>treesit-query-error</code> error if
148<var>query</var> is malformed. The signal data contains a description of
149the specific error. You can use <code>treesit-query-validate</code> to
150validate and debug the query.
151</p></dd></dl>
152
153<p>For example, suppose <var>node</var>&rsquo;s text is <code>1 + 2</code>, and
154<var>query</var> is
155</p>
156<div class="example">
157<pre class="example">(setq query
158 &quot;(binary_expression
159 (number_literal) @number-in-exp) @biexp&quot;)
160</pre></div>
161
162<p>Matching that query would return
163</p>
164<div class="example">
165<pre class="example">(treesit-query-capture node query)
166 &rArr; ((biexp . <var>&lt;node for &quot;1 + 2&quot;&gt;</var>)
167 (number-in-exp . <var>&lt;node for &quot;1&quot;&gt;</var>)
168 (number-in-exp . <var>&lt;node for &quot;2&quot;&gt;</var>))
169</pre></div>
170
171<p>As mentioned earlier, <var>query</var> could contain multiple patterns.
172For example, it could have two top-level patterns:
173</p>
174<div class="example">
175<pre class="example">(setq query
176 &quot;(binary_expression) @biexp
177 (number_literal) @number @biexp&quot;)
178</pre></div>
179
180<dl class="def">
181<dt id="index-treesit_002dquery_002dstring"><span class="category">Function: </span><span><strong>treesit-query-string</strong> <em>string query language</em><a href='#index-treesit_002dquery_002dstring' class='copiable-anchor'> &para;</a></span></dt>
182<dd><p>This function parses <var>string</var> with <var>language</var>, matches its
183root node with <var>query</var>, and returns the result.
184</p></dd></dl>
185
186<span id="More-query-syntax"></span><h3 class="heading">More query syntax</h3>
187
188<p>Besides node type and capture, tree-sitter&rsquo;s pattern syntax can
189express anonymous node, field name, wildcard, quantification,
190grouping, alternation, anchor, and predicate.
191</p>
192<span id="Anonymous-node"></span><h4 class="subheading">Anonymous node</h4>
193
194<p>An anonymous node is written verbatim, surrounded by quotes. A
195pattern matching (and capturing) keyword <code>return</code> would be
196</p>
197<div class="example">
198<pre class="example">&quot;return&quot; @keyword
199</pre></div>
200
201<span id="Wild-card"></span><h4 class="subheading">Wild card</h4>
202
203<p>In a pattern, &lsquo;<samp>(_)</samp>&rsquo; matches any named node, and &lsquo;<samp>_</samp>&rsquo; matches
204any named and anonymous node. For example, to capture any named child
205of a <code>binary_expression</code> node, the pattern would be
206</p>
207<div class="example">
208<pre class="example">(binary_expression (_) @in_biexp)
209</pre></div>
210
211<span id="Field-name"></span><h4 class="subheading">Field name</h4>
212
213<p>It is possible to capture child nodes that have specific field names.
214In the pattern below, <code>declarator</code> and <code>body</code> are field
215names, indicated by the colon following them.
216</p>
217<div class="example">
218<pre class="example">(function_definition
219 declarator: (_) @func-declarator
220 body: (_) @func-body)
221</pre></div>
222
223<p>It is also possible to capture a node that doesn&rsquo;t have a certain
224field, say, a <code>function_definition</code> without a <code>body</code> field.
225</p>
226<div class="example">
227<pre class="example">(function_definition !body) @func-no-body
228</pre></div>
229
230<span id="Quantify-node"></span><h4 class="subheading">Quantify node</h4>
231
232<span id="index-quantify-node_002c-tree_002dsitter"></span>
233<p>Tree-sitter recognizes quantification operators &lsquo;<samp>*</samp>&rsquo;, &lsquo;<samp>+</samp>&rsquo; and
234&lsquo;<samp>?</samp>&rsquo;. Their meanings are the same as in regular expressions:
235&lsquo;<samp>*</samp>&rsquo; matches the preceding pattern zero or more times, &lsquo;<samp>+</samp>&rsquo;
236matches one or more times, and &lsquo;<samp>?</samp>&rsquo; matches zero or one time.
237</p>
238<p>For example, the following pattern matches <code>type_declaration</code>
239nodes that has <em>zero or more</em> <code>long</code> keyword.
240</p>
241<div class="example">
242<pre class="example">(type_declaration &quot;long&quot;*) @long-type
243</pre></div>
244
245<p>The following pattern matches a type declaration that has zero or one
246<code>long</code> keyword:
247</p>
248<div class="example">
249<pre class="example">(type_declaration &quot;long&quot;?) @long-type
250</pre></div>
251
252<span id="Grouping"></span><h4 class="subheading">Grouping</h4>
253
254<p>Similar to groups in regular expression, we can bundle patterns into
255groups and apply quantification operators to them. For example, to
256express a comma separated list of identifiers, one could write
257</p>
258<div class="example">
259<pre class="example">(identifier) (&quot;,&quot; (identifier))*
260</pre></div>
261
262<span id="Alternation"></span><h4 class="subheading">Alternation</h4>
263
264<p>Again, similar to regular expressions, we can express &ldquo;match anyone
265from this group of patterns&rdquo; in a pattern. The syntax is a list of
266patterns enclosed in square brackets. For example, to capture some
267keywords in C, the pattern would be
268</p>
269<div class="example">
270<pre class="example">[
271 &quot;return&quot;
272 &quot;break&quot;
273 &quot;if&quot;
274 &quot;else&quot;
275] @keyword
276</pre></div>
277
278<span id="Anchor"></span><h4 class="subheading">Anchor</h4>
279
280<p>The anchor operator &lsquo;<samp>.</samp>&rsquo; can be used to enforce juxtaposition,
281i.e., to enforce two things to be directly next to each other. The
282two &ldquo;things&rdquo; can be two nodes, or a child and the end of its parent.
283For example, to capture the first child, the last child, or two
284adjacent children:
285</p>
286<div class="example">
287<pre class="example">;; Anchor the child with the end of its parent.
288(compound_expression (_) @last-child .)
289</pre><pre class="example">
290
291</pre><pre class="example">;; Anchor the child with the beginning of its parent.
292(compound_expression . (_) @first-child)
293</pre><pre class="example">
294
295</pre><pre class="example">;; Anchor two adjacent children.
296(compound_expression
297 (_) @prev-child
298 .
299 (_) @next-child)
300</pre></div>
301
302<p>Note that the enforcement of juxtaposition ignores any anonymous
303nodes.
304</p>
305<span id="Predicate"></span><h4 class="subheading">Predicate</h4>
306
307<p>It is possible to add predicate constraints to a pattern. For
308example, with the following pattern:
309</p>
310<div class="example">
311<pre class="example">(
312 (array . (_) @first (_) @last .)
313 (#equal @first @last)
314)
315</pre></div>
316
317<p>tree-sitter only matches arrays where the first element equals to
318the last element. To attach a predicate to a pattern, we need to
319group them together. A predicate always starts with a &lsquo;<samp>#</samp>&rsquo;.
320Currently there are two predicates, <code>#equal</code> and <code>#match</code>.
321</p>
322<dl class="def">
323<dt id="index-equal-1"><span class="category">Predicate: </span><span><strong>equal</strong> <em>arg1 arg2</em><a href='#index-equal-1' class='copiable-anchor'> &para;</a></span></dt>
324<dd><p>Matches if <var>arg1</var> equals to <var>arg2</var>. Arguments can be either
325strings or capture names. Capture names represent the text that the
326captured node spans in the buffer.
327</p></dd></dl>
328
329<dl class="def">
330<dt id="index-match-1"><span class="category">Predicate: </span><span><strong>match</strong> <em>regexp capture-name</em><a href='#index-match-1' class='copiable-anchor'> &para;</a></span></dt>
331<dd><p>Matches if the text that <var>capture-name</var>&rsquo;s node spans in the buffer
332matches regular expression <var>regexp</var>. Matching is case-sensitive.
333</p></dd></dl>
334
335<p>Note that a predicate can only refer to capture names that appear in
336the same pattern. Indeed, it makes little sense to refer to capture
337names in other patterns.
338</p>
339<span id="S_002dexpression-patterns"></span><h3 class="heading">S-expression patterns</h3>
340
341<span id="index-tree_002dsitter-patterns-as-sexps"></span>
342<span id="index-patterns_002c-tree_002dsitter_002c-in-sexp-form"></span>
343<p>Besides strings, Emacs provides a s-expression based syntax for
344tree-sitter patterns. It largely resembles the string-based syntax.
345For example, the following query
346</p>
347<div class="example">
348<pre class="example">(treesit-query-capture
349 node &quot;(addition_expression
350 left: (_) @left
351 \&quot;+\&quot; @plus-sign
352 right: (_) @right) @addition
353
354 [\&quot;return\&quot; \&quot;break\&quot;] @keyword&quot;)
355</pre></div>
356
357<p>is equivalent to
358</p>
359<div class="example">
360<pre class="example">(treesit-query-capture
361 node '((addition_expression
362 left: (_) @left
363 &quot;+&quot; @plus-sign
364 right: (_) @right) @addition
365
366 [&quot;return&quot; &quot;break&quot;] @keyword))
367</pre></div>
368
369<p>Most patterns can be written directly as strange but nevertheless
370valid s-expressions. Only a few of them needs modification:
371</p>
372<ul>
373<li> Anchor &lsquo;<samp>.</samp>&rsquo; is written as <code>:anchor</code>.
374</li><li> &lsquo;<samp>?</samp>&rsquo; is written as &lsquo;<samp>:?</samp>&rsquo;.
375</li><li> &lsquo;<samp>*</samp>&rsquo; is written as &lsquo;<samp>:*</samp>&rsquo;.
376</li><li> &lsquo;<samp>+</samp>&rsquo; is written as &lsquo;<samp>:+</samp>&rsquo;.
377</li><li> <code>#equal</code> is written as <code>:equal</code>. In general, predicates
378change their &lsquo;<samp>#</samp>&rsquo; to &lsquo;<samp>:</samp>&rsquo;.
379</li></ul>
380
381<p>For example,
382</p>
383<div class="example">
384<pre class="example">&quot;(
385 (compound_expression . (_) @first (_)* @rest)
386 (#match \&quot;love\&quot; @first)
387 )&quot;
388</pre></div>
389
390<p>is written in s-expression as
391</p>
392<div class="example">
393<pre class="example">'((
394 (compound_expression :anchor (_) @first (_) :* @rest)
395 (:match &quot;love&quot; @first)
396 ))
397</pre></div>
398
399<span id="Compiling-queries"></span><h3 class="heading">Compiling queries</h3>
400
401<span id="index-compiling-tree_002dsitter-queries"></span>
402<span id="index-queries_002c-compiling"></span>
403<p>If a query is intended to be used repeatedly, especially in tight
404loops, it is important to compile that query, because a compiled query
405is much faster than an uncompiled one. A compiled query can be used
406anywhere a query is accepted.
407</p>
408<dl class="def">
409<dt id="index-treesit_002dquery_002dcompile"><span class="category">Function: </span><span><strong>treesit-query-compile</strong> <em>language query</em><a href='#index-treesit_002dquery_002dcompile' class='copiable-anchor'> &para;</a></span></dt>
410<dd><p>This function compiles <var>query</var> for <var>language</var> into a compiled
411query object and returns it.
412</p>
413<p>This function raises the <code>treesit-query-error</code> error if
414<var>query</var> is malformed. The signal data contains a description of
415the specific error. You can use <code>treesit-query-validate</code> to
416validate and debug the query.
417</p></dd></dl>
418
419<dl class="def">
420<dt id="index-treesit_002dquery_002dlanguage"><span class="category">Function: </span><span><strong>treesit-query-language</strong> <em>query</em><a href='#index-treesit_002dquery_002dlanguage' class='copiable-anchor'> &para;</a></span></dt>
421<dd><p>This function return the language of <var>query</var>.
422</p></dd></dl>
423
424<dl class="def">
425<dt id="index-treesit_002dquery_002dexpand"><span class="category">Function: </span><span><strong>treesit-query-expand</strong> <em>query</em><a href='#index-treesit_002dquery_002dexpand' class='copiable-anchor'> &para;</a></span></dt>
426<dd><p>This function converts the s-expression <var>query</var> into the string
427format.
428</p></dd></dl>
429
430<dl class="def">
431<dt id="index-treesit_002dpattern_002dexpand"><span class="category">Function: </span><span><strong>treesit-pattern-expand</strong> <em>pattern</em><a href='#index-treesit_002dpattern_002dexpand' class='copiable-anchor'> &para;</a></span></dt>
432<dd><p>This function converts the s-expression <var>pattern</var> into the string
433format.
434</p></dd></dl>
435
436<p>For more details, read the tree-sitter project&rsquo;s documentation about
437pattern-matching, which can be found at
438<a href="https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries">https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries</a>.
439</p>
440</div>
441<hr>
442<div class="header">
443<p>
444Next: <a href="Multiple-Languages.html">Parsing Text in Multiple Languages</a>, Previous: <a href="Accessing-Node-Information.html">Accessing Node Information</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
445</div>
446
447
448
449</body>
450</html>