aboutsummaryrefslogtreecommitdiffstats
path: root/admin/notes/tree-sitter/html-manual/Language-Definitions.html
diff options
context:
space:
mode:
authorYuan Fu2022-10-05 14:11:33 -0700
committerYuan Fu2022-10-05 14:11:33 -0700
commitcb183f6467401fb5ed2b7fc98ca75be9d943cbe3 (patch)
treeef42ea6ae71e0829d900ffb46d8306fbba962a8e /admin/notes/tree-sitter/html-manual/Language-Definitions.html
parent1ea503ed4b3a14b3dc0a597cfbfe57d73b871422 (diff)
downloademacs-cb183f6467401fb5ed2b7fc98ca75be9d943cbe3.tar.gz
emacs-cb183f6467401fb5ed2b7fc98ca75be9d943cbe3.zip
Add tree-sitter admin notes
starter-guide: Guide on writing major mode features. build-module: Script for building official language definitions. html-manual: HTML version of the manual for easy access. * admin/notes/tree-sitter/build-module/README: New file. * admin/notes/tree-sitter/build-module/batch.sh: New file. * admin/notes/tree-sitter/build-module/build.sh: New file. * admin/notes/tree-sitter/starter-guide: New file. * admin/notes/tree-sitter/html-manual/Accessing-Node.html: New file. * admin/notes/tree-sitter/html-manual/Language-Definitions.html: New file. * admin/notes/tree-sitter/html-manual/Multiple-Languages.html: New file. * admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html: New file. * admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html: New file. * admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html: New file. * admin/notes/tree-sitter/html-manual/Pattern-Matching.html: New file. * admin/notes/tree-sitter/html-manual/Retrieving-Node.html: New file. * admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html: New file. * admin/notes/tree-sitter/html-manual/Using-Parser.html: New file. * admin/notes/tree-sitter/html-manual/build-manual.sh: New file. * admin/notes/tree-sitter/html-manual/manual.css: New file.
Diffstat (limited to 'admin/notes/tree-sitter/html-manual/Language-Definitions.html')
-rw-r--r--admin/notes/tree-sitter/html-manual/Language-Definitions.html326
1 files changed, 326 insertions, 0 deletions
diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html
new file mode 100644
index 00000000000..ba3eeb9eeb9
--- /dev/null
+++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html
@@ -0,0 +1,326 @@
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<html>
3<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
4<head>
5<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
6<!-- This is the GNU Emacs Lisp Reference Manual
7corresponding to Emacs version 29.0.50.
8
9Copyright © 1990-1996, 1998-2022 Free Software Foundation,
10Inc.
11
12Permission is granted to copy, distribute and/or modify this document
13under the terms of the GNU Free Documentation License, Version 1.3 or
14any later version published by the Free Software Foundation; with the
15Invariant Sections being "GNU General Public License," with the
16Front-Cover Texts being "A GNU Manual," and with the Back-Cover
17Texts as in (a) below. A copy of the license is included in the
18section entitled "GNU Free Documentation License."
19
20(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
21modify this GNU manual. Buying copies from the FSF supports it in
22developing GNU and promoting software freedom." -->
23<title>Language Definitions (GNU Emacs Lisp Reference Manual)</title>
24
25<meta name="description" content="Language Definitions (GNU Emacs Lisp Reference Manual)">
26<meta name="keywords" content="Language Definitions (GNU Emacs Lisp Reference Manual)">
27<meta name="resource-type" content="document">
28<meta name="distribution" content="global">
29<meta name="Generator" content="makeinfo">
30<meta name="viewport" content="width=device-width,initial-scale=1">
31
32<link href="index.html" rel="start" title="Top">
33<link href="Index.html" rel="index" title="Index">
34<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
35<link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source">
36<link href="Using-Parser.html" rel="next" title="Using Parser">
37<style type="text/css">
38<!--
39a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em}
40a.summary-letter {text-decoration: none}
41blockquote.indentedblock {margin-right: 0em}
42div.display {margin-left: 3.2em}
43div.example {margin-left: 3.2em}
44kbd {font-style: oblique}
45pre.display {font-family: inherit}
46pre.format {font-family: inherit}
47pre.menu-comment {font-family: serif}
48pre.menu-preformatted {font-family: serif}
49span.nolinebreak {white-space: nowrap}
50span.roman {font-family: initial; font-weight: normal}
51span.sansserif {font-family: sans-serif; font-weight: normal}
52span:hover a.copiable-anchor {visibility: visible}
53ul.no-bullet {list-style: none}
54-->
55</style>
56<link rel="stylesheet" type="text/css" href="./manual.css">
57
58
59</head>
60
61<body lang="en">
62<div class="section" id="Language-Definitions">
63<div class="header">
64<p>
65Next: <a href="Using-Parser.html" accesskey="n" rel="next">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
66</div>
67<hr>
68<span id="Tree_002dsitter-Language-Definitions"></span><h3 class="section">37.1 Tree-sitter Language Definitions</h3>
69
70<span id="Loading-a-language-definition"></span><h3 class="heading">Loading a language definition</h3>
71
72<p>Tree-sitter relies on language definitions to parse text in that
73language. In Emacs, A language definition is represented by a symbol.
74For example, C language definition is represented as <code>c</code>, and
75<code>c</code> can be passed to tree-sitter functions as the <var>language</var>
76argument.
77</p>
78<span id="index-treesit_002dextra_002dload_002dpath"></span>
79<span id="index-treesit_002dload_002dlanguage_002derror"></span>
80<span id="index-treesit_002dload_002dsuffixes"></span>
81<p>Tree-sitter language definitions are distributed as dynamic libraries.
82In order to use a language definition in Emacs, you need to make sure
83that the dynamic library is installed on the system. Emacs looks for
84language definitions under load paths in
85<code>treesit-extra-load-path</code>, <code>user-emacs-directory</code>/tree-sitter,
86and system default locations for dynamic libraries, in that order.
87Emacs tries each extensions in <code>treesit-load-suffixes</code>. If Emacs
88cannot find the library or has problem loading it, Emacs signals
89<code>treesit-load-language-error</code>. The signal data is a list of
90specific error messages.
91</p>
92<dl class="def">
93<dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> &para;</a></span></dt>
94<dd><p>This function checks whether the dynamic library for <var>language</var> is
95present on the system, and return non-nil if it is.
96</p></dd></dl>
97
98<span id="index-treesit_002dload_002dname_002doverride_002dlist"></span>
99<p>By convention, the dynamic library for <var>language</var> is
100<code>libtree-sitter-<var>language</var>.<var>ext</var></code>, where <var>ext</var> is the
101system-specific extension for dynamic libraries. Also by convention,
102the function provided by that library is named
103<code>tree_sitter_<var>language</var></code>. If a language definition doesn&rsquo;t
104follow this convention, you should add an entry
105</p>
106<div class="example">
107<pre class="example">(<var>language</var> <var>library-base-name</var> <var>function-name</var>)
108</pre></div>
109
110<p>to <code>treesit-load-name-override-list</code>, where
111<var>library-base-name</var> is the base filename for the dynamic library
112(conventionally <code>libtree-sitter-<var>language</var></code>), and
113<var>function-name</var> is the function provided by the library
114(conventionally <code>tree_sitter_<var>language</var></code>). For example,
115</p>
116<div class="example">
117<pre class="example">(cool-lang &quot;libtree-sitter-coool&quot; &quot;tree_sitter_cooool&quot;)
118</pre></div>
119
120<p>for a language too cool to abide by conventions.
121</p>
122<dl class="def">
123<dt id="index-treesit_002dlanguage_002dversion"><span class="category">Function: </span><span><strong>treesit-language-version</strong> <em>&amp;optional min-compatible</em><a href='#index-treesit_002dlanguage_002dversion' class='copiable-anchor'> &para;</a></span></dt>
124<dd><p>Tree-sitter library has a <em>language version</em>, a language
125definition&rsquo;s version needs to match this version to be compatible.
126</p>
127<p>This function returns tree-sitter library’s language version. If
128<var>min-compatible</var> is non-nil, it returns the minimal compatible
129version.
130</p></dd></dl>
131
132<span id="Concrete-syntax-tree"></span><h3 class="heading">Concrete syntax tree</h3>
133
134<p>A syntax tree is what a parser generates. In a syntax tree, each node
135represents a piece of text, and is connected to each other by a
136parent-child relationship. For example, if the source text is
137</p>
138<div class="example">
139<pre class="example">1 + 2
140</pre></div>
141
142<p>its syntax tree could be
143</p>
144<div class="example">
145<pre class="example"> +--------------+
146 | root &quot;1 + 2&quot; |
147 +--------------+
148 |
149 +--------------------------------+
150 | expression &quot;1 + 2&quot; |
151 +--------------------------------+
152 | | |
153+------------+ +--------------+ +------------+
154| number &quot;1&quot; | | operator &quot;+&quot; | | number &quot;2&quot; |
155+------------+ +--------------+ +------------+
156</pre></div>
157
158<p>We can also represent it in s-expression:
159</p>
160<div class="example">
161<pre class="example">(root (expression (number) (operator) (number)))
162</pre></div>
163
164<span id="Node-types"></span><h4 class="subheading">Node types</h4>
165
166<span id="index-tree_002dsitter-node-type"></span>
167<span id="tree_002dsitter-node-type"></span><span id="index-tree_002dsitter-named-node"></span>
168<span id="tree_002dsitter-named-node"></span><span id="index-tree_002dsitter-anonymous-node"></span>
169<p>Names like <code>root</code>, <code>expression</code>, <code>number</code>,
170<code>operator</code> are nodes&rsquo; <em>type</em>. However, not all nodes in a
171syntax tree have a type. Nodes that don&rsquo;t are <em>anonymous nodes</em>,
172and nodes with a type are <em>named nodes</em>. Anonymous nodes are
173tokens with fixed spellings, including punctuation characters like
174bracket &lsquo;<samp>]</samp>&rsquo;, and keywords like <code>return</code>.
175</p>
176<span id="Field-names"></span><h4 class="subheading">Field names</h4>
177
178<span id="index-tree_002dsitter-node-field-name"></span>
179<span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to
180analyze, many language definitions assign <em>field names</em> to child
181nodes. For example, a <code>function_definition</code> node could have a
182<code>declarator</code> and a <code>body</code>:
183</p>
184<div class="example">
185<pre class="example">(function_definition
186 declarator: (declaration)
187 body: (compound_statement))
188</pre></div>
189
190<dl class="def">
191<dt id="index-treesit_002dinspect_002dmode"><span class="category">Command: </span><span><strong>treesit-inspect-mode</strong><a href='#index-treesit_002dinspect_002dmode' class='copiable-anchor'> &para;</a></span></dt>
192<dd><p>This minor mode displays the node that <em>starts</em> at point in
193mode-line. The mode-line will display
194</p>
195<div class="example">
196<pre class="example"><var>parent</var> <var>field-name</var>: (<var>child</var> (<var>grand-child</var> (...)))
197</pre></div>
198
199<p><var>child</var>, <var>grand-child</var>, and <var>grand-grand-child</var>, etc, are
200nodes that have their beginning at point. And <var>parent</var> is the
201parent of <var>child</var>.
202</p>
203<p>If there is no node that starts at point, i.e., point is in the middle
204of a node, then the mode-line only displays the smallest node that
205spans point, and its immediate parent.
206</p>
207<p>This minor mode doesn&rsquo;t create parsers on its own. It simply uses the
208first parser in <code>(treesit-parser-list)</code> (see <a href="Using-Parser.html">Using Tree-sitter Parser</a>).
209</p></dd></dl>
210
211<span id="Reading-the-grammar-definition"></span><h3 class="heading">Reading the grammar definition</h3>
212
213<p>Authors of language definitions define the <em>grammar</em> of a
214language, and this grammar determines how does a parser construct a
215concrete syntax tree out of the text. In order to use the syntax
216tree effectively, we need to read the <em>grammar file</em>.
217</p>
218<p>The grammar file is usually <code>grammar.js</code> in a language
219definition’s project repository. The link to a language definition’s
220home page can be found in tree-sitter’s homepage
221(<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>).
222</p>
223<p>The grammar is written in JavaScript syntax. For example, the rule
224matching a <code>function_definition</code> node looks like
225</p>
226<div class="example">
227<pre class="example">function_definition: $ =&gt; seq(
228 $.declaration_specifiers,
229 field('declarator', $.declaration),
230 field('body', $.compound_statement)
231)
232</pre></div>
233
234<p>The rule is represented by a function that takes a single argument
235<var>$</var>, representing the whole grammar. The function itself is
236constructed by other functions: the <code>seq</code> function puts together a
237sequence of children; the <code>field</code> function annotates a child with
238a field name. If we write the above definition in BNF syntax, it
239would look like
240</p>
241<div class="example">
242<pre class="example">function_definition :=
243 &lt;declaration_specifiers&gt; &lt;declaration&gt; &lt;compound_statement&gt;
244</pre></div>
245
246<p>and the node returned by the parser would look like
247</p>
248<div class="example">
249<pre class="example">(function_definition
250 (declaration_specifier)
251 declarator: (declaration)
252 body: (compound_statement))
253</pre></div>
254
255<p>Below is a list of functions that one will see in a grammar
256definition. Each function takes other rules as arguments and returns
257a new rule.
258</p>
259<ul>
260<li> <code>seq(rule1, rule2, ...)</code> matches each rule one after another.
261
262</li><li> <code>choice(rule1, rule2, ...)</code> matches one of the rules in its
263arguments.
264
265</li><li> <code>repeat(rule)</code> matches <var>rule</var> for <em>zero or more</em> times.
266This is like the &lsquo;<samp>*</samp>&rsquo; operator in regular expressions.
267
268</li><li> <code>repeat1(rule)</code> matches <var>rule</var> for <em>one or more</em> times.
269This is like the &lsquo;<samp>+</samp>&rsquo; operator in regular expressions.
270
271</li><li> <code>optional(rule)</code> matches <var>rule</var> for <em>zero or one</em> time.
272This is like the &lsquo;<samp>?</samp>&rsquo; operator in regular expressions.
273
274</li><li> <code>field(name, rule)</code> assigns field name <var>name</var> to the child
275node matched by <var>rule</var>.
276
277</li><li> <code>alias(rule, alias)</code> makes nodes matched by <var>rule</var> appear as
278<var>alias</var> in the syntax tree generated by the parser. For example,
279
280<div class="example">
281<pre class="example">alias(preprocessor_call_exp, call_expression)
282</pre></div>
283
284<p>makes any node matched by <code>preprocessor_call_exp</code> to appear as
285<code>call_expression</code>.
286</p></li></ul>
287
288<p>Below are grammar functions less interesting for a reader of a
289language definition.
290</p>
291<ul>
292<li> <code>token(rule)</code> marks <var>rule</var> to produce a single leaf node.
293That is, instead of generating a parent node with individual child
294nodes under it, everything is combined into a single leaf node.
295
296</li><li> Normally, grammar rules ignore preceding whitespaces,
297<code>token.immediate(rule)</code> changes <var>rule</var> to match only when
298there is no preceding whitespaces.
299
300</li><li> <code>prec(n, rule)</code> gives <var>rule</var> a level <var>n</var> precedence.
301
302</li><li> <code>prec.left([n,] rule)</code> marks <var>rule</var> as left-associative,
303optionally with level <var>n</var>.
304
305</li><li> <code>prec.right([n,] rule)</code> marks <var>rule</var> as right-associative,
306optionally with level <var>n</var>.
307
308</li><li> <code>prec.dynamic(n, rule)</code> is like <code>prec</code>, but the precedence
309is applied at runtime instead.
310</li></ul>
311
312<p>The tree-sitter project talks about writing a grammar in more detail:
313<a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">https://tree-sitter.github.io/tree-sitter/creating-parsers</a>.
314Read especially &ldquo;The Grammar DSL&rdquo; section.
315</p>
316</div>
317<hr>
318<div class="header">
319<p>
320Next: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p>
321</div>
322
323
324
325</body>
326</html>