diff options
| author | Yuan Fu | 2022-10-05 14:11:33 -0700 |
|---|---|---|
| committer | Yuan Fu | 2022-10-05 14:11:33 -0700 |
| commit | cb183f6467401fb5ed2b7fc98ca75be9d943cbe3 (patch) | |
| tree | ef42ea6ae71e0829d900ffb46d8306fbba962a8e /admin/notes/tree-sitter/html-manual/Language-Definitions.html | |
| parent | 1ea503ed4b3a14b3dc0a597cfbfe57d73b871422 (diff) | |
| download | emacs-cb183f6467401fb5ed2b7fc98ca75be9d943cbe3.tar.gz emacs-cb183f6467401fb5ed2b7fc98ca75be9d943cbe3.zip | |
Add tree-sitter admin notes
starter-guide: Guide on writing major mode features.
build-module: Script for building official language definitions.
html-manual: HTML version of the manual for easy access.
* admin/notes/tree-sitter/build-module/README: New file.
* admin/notes/tree-sitter/build-module/batch.sh: New file.
* admin/notes/tree-sitter/build-module/build.sh: New file.
* admin/notes/tree-sitter/starter-guide: New file.
* admin/notes/tree-sitter/html-manual/Accessing-Node.html: New file.
* admin/notes/tree-sitter/html-manual/Language-Definitions.html: New file.
* admin/notes/tree-sitter/html-manual/Multiple-Languages.html: New file.
* admin/notes/tree-sitter/html-manual/Parser_002dbased-Font-Lock.html:
New file.
* admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html:
New file.
* admin/notes/tree-sitter/html-manual/Parsing-Program-Source.html: New
file.
* admin/notes/tree-sitter/html-manual/Pattern-Matching.html: New file.
* admin/notes/tree-sitter/html-manual/Retrieving-Node.html: New file.
* admin/notes/tree-sitter/html-manual/Tree_002dsitter-C-API.html: New
file.
* admin/notes/tree-sitter/html-manual/Using-Parser.html: New file.
* admin/notes/tree-sitter/html-manual/build-manual.sh: New file.
* admin/notes/tree-sitter/html-manual/manual.css: New file.
Diffstat (limited to 'admin/notes/tree-sitter/html-manual/Language-Definitions.html')
| -rw-r--r-- | admin/notes/tree-sitter/html-manual/Language-Definitions.html | 326 |
1 files changed, 326 insertions, 0 deletions
diff --git a/admin/notes/tree-sitter/html-manual/Language-Definitions.html b/admin/notes/tree-sitter/html-manual/Language-Definitions.html new file mode 100644 index 00000000000..ba3eeb9eeb9 --- /dev/null +++ b/admin/notes/tree-sitter/html-manual/Language-Definitions.html | |||
| @@ -0,0 +1,326 @@ | |||
| 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> | ||
| 2 | <html> | ||
| 3 | <!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ --> | ||
| 4 | <head> | ||
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | ||
| 6 | <!-- This is the GNU Emacs Lisp Reference Manual | ||
| 7 | corresponding to Emacs version 29.0.50. | ||
| 8 | |||
| 9 | Copyright © 1990-1996, 1998-2022 Free Software Foundation, | ||
| 10 | Inc. | ||
| 11 | |||
| 12 | Permission is granted to copy, distribute and/or modify this document | ||
| 13 | under the terms of the GNU Free Documentation License, Version 1.3 or | ||
| 14 | any later version published by the Free Software Foundation; with the | ||
| 15 | Invariant Sections being "GNU General Public License," with the | ||
| 16 | Front-Cover Texts being "A GNU Manual," and with the Back-Cover | ||
| 17 | Texts as in (a) below. A copy of the license is included in the | ||
| 18 | section entitled "GNU Free Documentation License." | ||
| 19 | |||
| 20 | (a) The FSF's Back-Cover Text is: "You have the freedom to copy and | ||
| 21 | modify this GNU manual. Buying copies from the FSF supports it in | ||
| 22 | developing GNU and promoting software freedom." --> | ||
| 23 | <title>Language Definitions (GNU Emacs Lisp Reference Manual)</title> | ||
| 24 | |||
| 25 | <meta name="description" content="Language Definitions (GNU Emacs Lisp Reference Manual)"> | ||
| 26 | <meta name="keywords" content="Language Definitions (GNU Emacs Lisp Reference Manual)"> | ||
| 27 | <meta name="resource-type" content="document"> | ||
| 28 | <meta name="distribution" content="global"> | ||
| 29 | <meta name="Generator" content="makeinfo"> | ||
| 30 | <meta name="viewport" content="width=device-width,initial-scale=1"> | ||
| 31 | |||
| 32 | <link href="index.html" rel="start" title="Top"> | ||
| 33 | <link href="Index.html" rel="index" title="Index"> | ||
| 34 | <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> | ||
| 35 | <link href="Parsing-Program-Source.html" rel="up" title="Parsing Program Source"> | ||
| 36 | <link href="Using-Parser.html" rel="next" title="Using Parser"> | ||
| 37 | <style type="text/css"> | ||
| 38 | <!-- | ||
| 39 | a.copiable-anchor {visibility: hidden; text-decoration: none; line-height: 0em} | ||
| 40 | a.summary-letter {text-decoration: none} | ||
| 41 | blockquote.indentedblock {margin-right: 0em} | ||
| 42 | div.display {margin-left: 3.2em} | ||
| 43 | div.example {margin-left: 3.2em} | ||
| 44 | kbd {font-style: oblique} | ||
| 45 | pre.display {font-family: inherit} | ||
| 46 | pre.format {font-family: inherit} | ||
| 47 | pre.menu-comment {font-family: serif} | ||
| 48 | pre.menu-preformatted {font-family: serif} | ||
| 49 | span.nolinebreak {white-space: nowrap} | ||
| 50 | span.roman {font-family: initial; font-weight: normal} | ||
| 51 | span.sansserif {font-family: sans-serif; font-weight: normal} | ||
| 52 | span:hover a.copiable-anchor {visibility: visible} | ||
| 53 | ul.no-bullet {list-style: none} | ||
| 54 | --> | ||
| 55 | </style> | ||
| 56 | <link rel="stylesheet" type="text/css" href="./manual.css"> | ||
| 57 | |||
| 58 | |||
| 59 | </head> | ||
| 60 | |||
| 61 | <body lang="en"> | ||
| 62 | <div class="section" id="Language-Definitions"> | ||
| 63 | <div class="header"> | ||
| 64 | <p> | ||
| 65 | Next: <a href="Using-Parser.html" accesskey="n" rel="next">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html" accesskey="u" rel="up">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> | ||
| 66 | </div> | ||
| 67 | <hr> | ||
| 68 | <span id="Tree_002dsitter-Language-Definitions"></span><h3 class="section">37.1 Tree-sitter Language Definitions</h3> | ||
| 69 | |||
| 70 | <span id="Loading-a-language-definition"></span><h3 class="heading">Loading a language definition</h3> | ||
| 71 | |||
| 72 | <p>Tree-sitter relies on language definitions to parse text in that | ||
| 73 | language. In Emacs, A language definition is represented by a symbol. | ||
| 74 | For example, C language definition is represented as <code>c</code>, and | ||
| 75 | <code>c</code> can be passed to tree-sitter functions as the <var>language</var> | ||
| 76 | argument. | ||
| 77 | </p> | ||
| 78 | <span id="index-treesit_002dextra_002dload_002dpath"></span> | ||
| 79 | <span id="index-treesit_002dload_002dlanguage_002derror"></span> | ||
| 80 | <span id="index-treesit_002dload_002dsuffixes"></span> | ||
| 81 | <p>Tree-sitter language definitions are distributed as dynamic libraries. | ||
| 82 | In order to use a language definition in Emacs, you need to make sure | ||
| 83 | that the dynamic library is installed on the system. Emacs looks for | ||
| 84 | language definitions under load paths in | ||
| 85 | <code>treesit-extra-load-path</code>, <code>user-emacs-directory</code>/tree-sitter, | ||
| 86 | and system default locations for dynamic libraries, in that order. | ||
| 87 | Emacs tries each extensions in <code>treesit-load-suffixes</code>. If Emacs | ||
| 88 | cannot find the library or has problem loading it, Emacs signals | ||
| 89 | <code>treesit-load-language-error</code>. The signal data is a list of | ||
| 90 | specific error messages. | ||
| 91 | </p> | ||
| 92 | <dl class="def"> | ||
| 93 | <dt id="index-treesit_002dlanguage_002davailable_002dp"><span class="category">Function: </span><span><strong>treesit-language-available-p</strong> <em>language</em><a href='#index-treesit_002dlanguage_002davailable_002dp' class='copiable-anchor'> ¶</a></span></dt> | ||
| 94 | <dd><p>This function checks whether the dynamic library for <var>language</var> is | ||
| 95 | present on the system, and return non-nil if it is. | ||
| 96 | </p></dd></dl> | ||
| 97 | |||
| 98 | <span id="index-treesit_002dload_002dname_002doverride_002dlist"></span> | ||
| 99 | <p>By convention, the dynamic library for <var>language</var> is | ||
| 100 | <code>libtree-sitter-<var>language</var>.<var>ext</var></code>, where <var>ext</var> is the | ||
| 101 | system-specific extension for dynamic libraries. Also by convention, | ||
| 102 | the function provided by that library is named | ||
| 103 | <code>tree_sitter_<var>language</var></code>. If a language definition doesn’t | ||
| 104 | follow this convention, you should add an entry | ||
| 105 | </p> | ||
| 106 | <div class="example"> | ||
| 107 | <pre class="example">(<var>language</var> <var>library-base-name</var> <var>function-name</var>) | ||
| 108 | </pre></div> | ||
| 109 | |||
| 110 | <p>to <code>treesit-load-name-override-list</code>, where | ||
| 111 | <var>library-base-name</var> is the base filename for the dynamic library | ||
| 112 | (conventionally <code>libtree-sitter-<var>language</var></code>), and | ||
| 113 | <var>function-name</var> is the function provided by the library | ||
| 114 | (conventionally <code>tree_sitter_<var>language</var></code>). For example, | ||
| 115 | </p> | ||
| 116 | <div class="example"> | ||
| 117 | <pre class="example">(cool-lang "libtree-sitter-coool" "tree_sitter_cooool") | ||
| 118 | </pre></div> | ||
| 119 | |||
| 120 | <p>for a language too cool to abide by conventions. | ||
| 121 | </p> | ||
| 122 | <dl class="def"> | ||
| 123 | <dt id="index-treesit_002dlanguage_002dversion"><span class="category">Function: </span><span><strong>treesit-language-version</strong> <em>&optional min-compatible</em><a href='#index-treesit_002dlanguage_002dversion' class='copiable-anchor'> ¶</a></span></dt> | ||
| 124 | <dd><p>Tree-sitter library has a <em>language version</em>, a language | ||
| 125 | definition’s version needs to match this version to be compatible. | ||
| 126 | </p> | ||
| 127 | <p>This function returns tree-sitter library’s language version. If | ||
| 128 | <var>min-compatible</var> is non-nil, it returns the minimal compatible | ||
| 129 | version. | ||
| 130 | </p></dd></dl> | ||
| 131 | |||
| 132 | <span id="Concrete-syntax-tree"></span><h3 class="heading">Concrete syntax tree</h3> | ||
| 133 | |||
| 134 | <p>A syntax tree is what a parser generates. In a syntax tree, each node | ||
| 135 | represents a piece of text, and is connected to each other by a | ||
| 136 | parent-child relationship. For example, if the source text is | ||
| 137 | </p> | ||
| 138 | <div class="example"> | ||
| 139 | <pre class="example">1 + 2 | ||
| 140 | </pre></div> | ||
| 141 | |||
| 142 | <p>its syntax tree could be | ||
| 143 | </p> | ||
| 144 | <div class="example"> | ||
| 145 | <pre class="example"> +--------------+ | ||
| 146 | | root "1 + 2" | | ||
| 147 | +--------------+ | ||
| 148 | | | ||
| 149 | +--------------------------------+ | ||
| 150 | | expression "1 + 2" | | ||
| 151 | +--------------------------------+ | ||
| 152 | | | | | ||
| 153 | +------------+ +--------------+ +------------+ | ||
| 154 | | number "1" | | operator "+" | | number "2" | | ||
| 155 | +------------+ +--------------+ +------------+ | ||
| 156 | </pre></div> | ||
| 157 | |||
| 158 | <p>We can also represent it in s-expression: | ||
| 159 | </p> | ||
| 160 | <div class="example"> | ||
| 161 | <pre class="example">(root (expression (number) (operator) (number))) | ||
| 162 | </pre></div> | ||
| 163 | |||
| 164 | <span id="Node-types"></span><h4 class="subheading">Node types</h4> | ||
| 165 | |||
| 166 | <span id="index-tree_002dsitter-node-type"></span> | ||
| 167 | <span id="tree_002dsitter-node-type"></span><span id="index-tree_002dsitter-named-node"></span> | ||
| 168 | <span id="tree_002dsitter-named-node"></span><span id="index-tree_002dsitter-anonymous-node"></span> | ||
| 169 | <p>Names like <code>root</code>, <code>expression</code>, <code>number</code>, | ||
| 170 | <code>operator</code> are nodes’ <em>type</em>. However, not all nodes in a | ||
| 171 | syntax tree have a type. Nodes that don’t are <em>anonymous nodes</em>, | ||
| 172 | and nodes with a type are <em>named nodes</em>. Anonymous nodes are | ||
| 173 | tokens with fixed spellings, including punctuation characters like | ||
| 174 | bracket ‘<samp>]</samp>’, and keywords like <code>return</code>. | ||
| 175 | </p> | ||
| 176 | <span id="Field-names"></span><h4 class="subheading">Field names</h4> | ||
| 177 | |||
| 178 | <span id="index-tree_002dsitter-node-field-name"></span> | ||
| 179 | <span id="tree_002dsitter-node-field-name"></span><p>To make the syntax tree easier to | ||
| 180 | analyze, many language definitions assign <em>field names</em> to child | ||
| 181 | nodes. For example, a <code>function_definition</code> node could have a | ||
| 182 | <code>declarator</code> and a <code>body</code>: | ||
| 183 | </p> | ||
| 184 | <div class="example"> | ||
| 185 | <pre class="example">(function_definition | ||
| 186 | declarator: (declaration) | ||
| 187 | body: (compound_statement)) | ||
| 188 | </pre></div> | ||
| 189 | |||
| 190 | <dl class="def"> | ||
| 191 | <dt id="index-treesit_002dinspect_002dmode"><span class="category">Command: </span><span><strong>treesit-inspect-mode</strong><a href='#index-treesit_002dinspect_002dmode' class='copiable-anchor'> ¶</a></span></dt> | ||
| 192 | <dd><p>This minor mode displays the node that <em>starts</em> at point in | ||
| 193 | mode-line. The mode-line will display | ||
| 194 | </p> | ||
| 195 | <div class="example"> | ||
| 196 | <pre class="example"><var>parent</var> <var>field-name</var>: (<var>child</var> (<var>grand-child</var> (...))) | ||
| 197 | </pre></div> | ||
| 198 | |||
| 199 | <p><var>child</var>, <var>grand-child</var>, and <var>grand-grand-child</var>, etc, are | ||
| 200 | nodes that have their beginning at point. And <var>parent</var> is the | ||
| 201 | parent of <var>child</var>. | ||
| 202 | </p> | ||
| 203 | <p>If there is no node that starts at point, i.e., point is in the middle | ||
| 204 | of a node, then the mode-line only displays the smallest node that | ||
| 205 | spans point, and its immediate parent. | ||
| 206 | </p> | ||
| 207 | <p>This minor mode doesn’t create parsers on its own. It simply uses the | ||
| 208 | first parser in <code>(treesit-parser-list)</code> (see <a href="Using-Parser.html">Using Tree-sitter Parser</a>). | ||
| 209 | </p></dd></dl> | ||
| 210 | |||
| 211 | <span id="Reading-the-grammar-definition"></span><h3 class="heading">Reading the grammar definition</h3> | ||
| 212 | |||
| 213 | <p>Authors of language definitions define the <em>grammar</em> of a | ||
| 214 | language, and this grammar determines how does a parser construct a | ||
| 215 | concrete syntax tree out of the text. In order to use the syntax | ||
| 216 | tree effectively, we need to read the <em>grammar file</em>. | ||
| 217 | </p> | ||
| 218 | <p>The grammar file is usually <code>grammar.js</code> in a language | ||
| 219 | definition’s project repository. The link to a language definition’s | ||
| 220 | home page can be found in tree-sitter’s homepage | ||
| 221 | (<a href="https://tree-sitter.github.io/tree-sitter">https://tree-sitter.github.io/tree-sitter</a>). | ||
| 222 | </p> | ||
| 223 | <p>The grammar is written in JavaScript syntax. For example, the rule | ||
| 224 | matching a <code>function_definition</code> node looks like | ||
| 225 | </p> | ||
| 226 | <div class="example"> | ||
| 227 | <pre class="example">function_definition: $ => seq( | ||
| 228 | $.declaration_specifiers, | ||
| 229 | field('declarator', $.declaration), | ||
| 230 | field('body', $.compound_statement) | ||
| 231 | ) | ||
| 232 | </pre></div> | ||
| 233 | |||
| 234 | <p>The rule is represented by a function that takes a single argument | ||
| 235 | <var>$</var>, representing the whole grammar. The function itself is | ||
| 236 | constructed by other functions: the <code>seq</code> function puts together a | ||
| 237 | sequence of children; the <code>field</code> function annotates a child with | ||
| 238 | a field name. If we write the above definition in BNF syntax, it | ||
| 239 | would look like | ||
| 240 | </p> | ||
| 241 | <div class="example"> | ||
| 242 | <pre class="example">function_definition := | ||
| 243 | <declaration_specifiers> <declaration> <compound_statement> | ||
| 244 | </pre></div> | ||
| 245 | |||
| 246 | <p>and the node returned by the parser would look like | ||
| 247 | </p> | ||
| 248 | <div class="example"> | ||
| 249 | <pre class="example">(function_definition | ||
| 250 | (declaration_specifier) | ||
| 251 | declarator: (declaration) | ||
| 252 | body: (compound_statement)) | ||
| 253 | </pre></div> | ||
| 254 | |||
| 255 | <p>Below is a list of functions that one will see in a grammar | ||
| 256 | definition. Each function takes other rules as arguments and returns | ||
| 257 | a new rule. | ||
| 258 | </p> | ||
| 259 | <ul> | ||
| 260 | <li> <code>seq(rule1, rule2, ...)</code> matches each rule one after another. | ||
| 261 | |||
| 262 | </li><li> <code>choice(rule1, rule2, ...)</code> matches one of the rules in its | ||
| 263 | arguments. | ||
| 264 | |||
| 265 | </li><li> <code>repeat(rule)</code> matches <var>rule</var> for <em>zero or more</em> times. | ||
| 266 | This is like the ‘<samp>*</samp>’ operator in regular expressions. | ||
| 267 | |||
| 268 | </li><li> <code>repeat1(rule)</code> matches <var>rule</var> for <em>one or more</em> times. | ||
| 269 | This is like the ‘<samp>+</samp>’ operator in regular expressions. | ||
| 270 | |||
| 271 | </li><li> <code>optional(rule)</code> matches <var>rule</var> for <em>zero or one</em> time. | ||
| 272 | This is like the ‘<samp>?</samp>’ operator in regular expressions. | ||
| 273 | |||
| 274 | </li><li> <code>field(name, rule)</code> assigns field name <var>name</var> to the child | ||
| 275 | node matched by <var>rule</var>. | ||
| 276 | |||
| 277 | </li><li> <code>alias(rule, alias)</code> makes nodes matched by <var>rule</var> appear as | ||
| 278 | <var>alias</var> in the syntax tree generated by the parser. For example, | ||
| 279 | |||
| 280 | <div class="example"> | ||
| 281 | <pre class="example">alias(preprocessor_call_exp, call_expression) | ||
| 282 | </pre></div> | ||
| 283 | |||
| 284 | <p>makes any node matched by <code>preprocessor_call_exp</code> to appear as | ||
| 285 | <code>call_expression</code>. | ||
| 286 | </p></li></ul> | ||
| 287 | |||
| 288 | <p>Below are grammar functions less interesting for a reader of a | ||
| 289 | language definition. | ||
| 290 | </p> | ||
| 291 | <ul> | ||
| 292 | <li> <code>token(rule)</code> marks <var>rule</var> to produce a single leaf node. | ||
| 293 | That is, instead of generating a parent node with individual child | ||
| 294 | nodes under it, everything is combined into a single leaf node. | ||
| 295 | |||
| 296 | </li><li> Normally, grammar rules ignore preceding whitespaces, | ||
| 297 | <code>token.immediate(rule)</code> changes <var>rule</var> to match only when | ||
| 298 | there is no preceding whitespaces. | ||
| 299 | |||
| 300 | </li><li> <code>prec(n, rule)</code> gives <var>rule</var> a level <var>n</var> precedence. | ||
| 301 | |||
| 302 | </li><li> <code>prec.left([n,] rule)</code> marks <var>rule</var> as left-associative, | ||
| 303 | optionally with level <var>n</var>. | ||
| 304 | |||
| 305 | </li><li> <code>prec.right([n,] rule)</code> marks <var>rule</var> as right-associative, | ||
| 306 | optionally with level <var>n</var>. | ||
| 307 | |||
| 308 | </li><li> <code>prec.dynamic(n, rule)</code> is like <code>prec</code>, but the precedence | ||
| 309 | is applied at runtime instead. | ||
| 310 | </li></ul> | ||
| 311 | |||
| 312 | <p>The tree-sitter project talks about writing a grammar in more detail: | ||
| 313 | <a href="https://tree-sitter.github.io/tree-sitter/creating-parsers">https://tree-sitter.github.io/tree-sitter/creating-parsers</a>. | ||
| 314 | Read especially “The Grammar DSL” section. | ||
| 315 | </p> | ||
| 316 | </div> | ||
| 317 | <hr> | ||
| 318 | <div class="header"> | ||
| 319 | <p> | ||
| 320 | Next: <a href="Using-Parser.html">Using Tree-sitter Parser</a>, Up: <a href="Parsing-Program-Source.html">Parsing Program Source</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html" title="Index" rel="index">Index</a>]</p> | ||
| 321 | </div> | ||
| 322 | |||
| 323 | |||
| 324 | |||
| 325 | </body> | ||
| 326 | </html> | ||