aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorYuan Fu2023-03-18 14:13:31 -0700
committerYuan Fu2023-03-18 14:15:43 -0700
commite84f878e19a892f66a1659c45e9f9b96e375b016 (patch)
tree9b9a4a393ef85f098e7732e6623517a596da21f9
parent11592bcfda6cf85d797d333072453c98994790e1 (diff)
downloademacs-e84f878e19a892f66a1659c45e9f9b96e375b016.tar.gz
emacs-e84f878e19a892f66a1659c45e9f9b96e375b016.zip
; * admin/notes/tree-sitter/starter-guide: Update starter-guide.
-rw-r--r--admin/notes/tree-sitter/starter-guide157
1 files changed, 80 insertions, 77 deletions
diff --git a/admin/notes/tree-sitter/starter-guide b/admin/notes/tree-sitter/starter-guide
index b8910aab5ca..846614f1446 100644
--- a/admin/notes/tree-sitter/starter-guide
+++ b/admin/notes/tree-sitter/starter-guide
@@ -17,6 +17,7 @@ TOC:
17- More features? 17- More features?
18- Common tasks (code snippets) 18- Common tasks (code snippets)
19- Manual 19- Manual
20- Appendix 1
20 21
21* Building Emacs with tree-sitter 22* Building Emacs with tree-sitter
22 23
@@ -42,11 +43,9 @@ You can use this script that I put together here:
42 43
43 https://github.com/casouri/tree-sitter-module 44 https://github.com/casouri/tree-sitter-module
44 45
45You can also find them under this directory in /build-modules.
46
47This script automatically pulls and builds language definitions for C, 46This script automatically pulls and builds language definitions for C,
48C++, Rust, JSON, Go, HTML, JavaScript, CSS, Python, Typescript, 47C++, Rust, JSON, Go, HTML, JavaScript, CSS, Python, Typescript,
49and C#. Better yet, I pre-built these language definitions for 48C#, etc. Better yet, I pre-built these language definitions for
50GNU/Linux and macOS, they can be downloaded here: 49GNU/Linux and macOS, they can be downloaded here:
51 50
52 https://github.com/casouri/tree-sitter-module/releases/tag/v2.1 51 https://github.com/casouri/tree-sitter-module/releases/tag/v2.1
@@ -68,6 +67,10 @@ organization has all the "official" language definitions:
68 67
69 https://github.com/tree-sitter 68 https://github.com/tree-sitter
70 69
70Alternatively, you can use treesit-install-language-grammar command
71and follow its instructions. If everything goes right, it should
72automatically download and compile the language grammar for you.
73
71* Setting up for adding major mode features 74* Setting up for adding major mode features
72 75
73Start Emacs and load tree-sitter with 76Start Emacs and load tree-sitter with
@@ -78,6 +81,10 @@ Now check if Emacs is built with tree-sitter library
78 81
79 (treesit-available-p) 82 (treesit-available-p)
80 83
84Make sure Emacs can find the language grammar you want to use
85
86 (treesit-language-available-p 'lang)
87
81* Tree-sitter major modes 88* Tree-sitter major modes
82 89
83Tree-sitter modes should be separate major modes, so other modes 90Tree-sitter modes should be separate major modes, so other modes
@@ -89,12 +96,15 @@ modes.
89 96
90If the tree-sitter variant and the "native" variant could share some 97If the tree-sitter variant and the "native" variant could share some
91setup, you can create a "base mode", which only contains the common 98setup, you can create a "base mode", which only contains the common
92setup. For example, there is python-base-mode (shared), python-mode 99setup. For example, python.el defines python-base-mode (shared),
93(native), and python-ts-mode (tree-sitter). 100python-mode (native), and python-ts-mode (tree-sitter).
94 101
95In the tree-sitter mode, check if we can use tree-sitter with 102In the tree-sitter mode, check if we can use tree-sitter with
96treesit-ready-p, it will error out if tree-sitter is not ready. 103treesit-ready-p, it will error out if tree-sitter is not ready.
97 104
105In Emacs 30 we'll introduce some mechanism to more gracefully inherit
106modes and fallback to other modes.
107
98* Naming convention 108* Naming convention
99 109
100Use tree-sitter for text (documentation, comment), use treesit for 110Use tree-sitter for text (documentation, comment), use treesit for
@@ -180,18 +190,17 @@ mark the offending part in red.
180To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’ and 190To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’ and
181‘treesit-font-lock-feature-list’ buffer-locally and call 191‘treesit-font-lock-feature-list’ buffer-locally and call
182‘treesit-major-mode-setup’. For example, see 192‘treesit-major-mode-setup’. For example, see
183‘python--treesit-settings’ in python.el. Below I paste a snippet of 193‘python--treesit-settings’ in python.el. Below is a snippet of it.
184it.
185 194
186Note that like the current font-lock, if the to-be-fontified region 195Just like the current font-lock, if the to-be-fontified region already
187already has a face (ie, an earlier match fontified part/all of the 196has a face (ie, an earlier match fontified part/all of the region),
188region), the new face is discarded rather than applied. If you want 197the new face is discarded rather than applied. If you want later
189later matches always override earlier matches, use the :override 198matches always override earlier matches, use the :override keyword.
190keyword.
191 199
192Each rule should have a :feature, like function-name, 200Each rule should have a :feature, like function-name,
193string-interpolation, builtin, etc. Users can then enable/disable each 201string-interpolation, builtin, etc. Users can then enable/disable each
194feature individually. 202feature individually. See Appendix 1 at the bottom for a set of common
203features names.
195 204
196#+begin_src elisp 205#+begin_src elisp
197(defvar python--treesit-settings 206(defvar python--treesit-settings
@@ -247,8 +256,7 @@ Concretely, something like this:
247 (string-interpolation decorator))) 256 (string-interpolation decorator)))
248 (treesit-major-mode-setup)) 257 (treesit-major-mode-setup))
249 (t 258 (t
250 ;; No tree-sitter 259 ;; No tree-sitter, do nothing or fallback to another mode.
251 (setq-local font-lock-defaults ...)
252 ...))) 260 ...)))
253#+end_src 261#+end_src
254 262
@@ -289,6 +297,7 @@ For ANCHOR we have
289 first-sibling => start of the first sibling 297 first-sibling => start of the first sibling
290 parent => start of parent 298 parent => start of parent
291 parent-bol => BOL of the line parent is on. 299 parent-bol => BOL of the line parent is on.
300 standalone-parent => Like parent-bol but handles more edge cases
292 prev-sibling => start of previous sibling 301 prev-sibling => start of previous sibling
293 no-indent => current position (don’t indent) 302 no-indent => current position (don’t indent)
294 prev-line => start of previous line 303 prev-line => start of previous line
@@ -329,7 +338,8 @@ tells you which rule is applied in the echo area.
329 ...)))) 338 ...))))
330#+end_src 339#+end_src
331 340
332Then you set ‘treesit-simple-indent-rules’ to your rules, and call 341To setup indentation for your major mode, set
342‘treesit-simple-indent-rules’ to your rules, and call
333‘treesit-major-mode-setup’: 343‘treesit-major-mode-setup’:
334 344
335#+begin_src elisp 345#+begin_src elisp
@@ -339,36 +349,14 @@ Then you set ‘treesit-simple-indent-rules’ to your rules, and call
339 349
340* Imenu 350* Imenu
341 351
342Not much to say except for utilizing ‘treesit-induce-sparse-tree’ (and 352Set ‘treesit-simple-imenu-settings’ and call
343explicitly pass a LIMIT argument: most of the time you don't need more 353‘treesit-major-mode-setup’.
344than 10). See ‘js--treesit-imenu-1’ in js.el for an example.
345
346Once you have the index builder, set ‘imenu-create-index-function’ to
347it.
348 354
349* Navigation 355* Navigation
350 356
351Mainly ‘beginning-of-defun-function’ and ‘end-of-defun-function’. 357Set ‘treesit-defun-type-regexp’ and call
352You can find the end of a defun with something like 358‘treesit-major-mode-setup’. You can additionally set
353 359‘treesit-defun-name-function’.
354(treesit-search-forward-goto "function_definition" 'end)
355
356where "function_definition" matches the node type of a function
357definition node, and ’end means we want to go to the end of that node.
358
359Tree-sitter has default implementations for
360‘beginning-of-defun-function’ and ‘end-of-defun-function’. So for
361ordinary languages, it is enough to set ‘treesit-defun-type-regexp’
362to something that matches all the defun struct types in the language,
363and call ‘treesit-major-mode-setup’. For example,
364
365#+begin_src emacs-lisp
366(setq-local treesit-defun-type-regexp (rx bol
367 (or "function" "class")
368 "_definition"
369 eol))
370(treesit-major-mode-setup)
371#+end_src>
372 360
373* Which-func 361* Which-func
374 362
@@ -376,36 +364,7 @@ If you have an imenu implementation, set ‘which-func-functions’ to
376nil, and which-func will automatically use imenu’s data. 364nil, and which-func will automatically use imenu’s data.
377 365
378If you want an independent implementation for which-func, you can 366If you want an independent implementation for which-func, you can
379find the current function by going up the tree and looking for the 367find the current function by ‘treesit-defun-at-point’.
380function_definition node. See the function below for an example.
381Since Python allows nested function definitions, that function keeps
382going until it reaches the root node, and records all the function
383names along the way.
384
385#+begin_src elisp
386(defun python-info-treesit-current-defun (&optional include-type)
387 "Identical to `python-info-current-defun' but use tree-sitter.
388For INCLUDE-TYPE see `python-info-current-defun'."
389 (let ((node (treesit-node-at (point)))
390 (name-list ())
391 (type nil))
392 (cl-loop while node
393 if (pcase (treesit-node-type node)
394 ("function_definition"
395 (setq type 'def))
396 ("class_definition"
397 (setq type 'class))
398 (_ nil))
399 do (push (treesit-node-text
400 (treesit-node-child-by-field-name node "name")
401 t)
402 name-list)
403 do (setq node (treesit-node-parent node))
404 finally return (concat (if include-type
405 (format "%s " type)
406 "")
407 (string-join name-list ".")))))
408#+end_src
409 368
410* More features? 369* More features?
411 370
@@ -449,7 +408,51 @@ section is Parsing Program Source. Typing
449 408
450 C-h i d m elisp RET g Parsing Program Source RET 409 C-h i d m elisp RET g Parsing Program Source RET
451 410
452will bring you to that section. You can also read the HTML version 411will bring you to that section. You don’t need to read through every
453under /html-manual in this directory. I find the HTML version easier 412sentence, just read the text paragraphs and glance over function
454to read. You don’t need to read through every sentence, just read the 413names.
455text paragraphs and glance over function names. 414
415* Appendix 1
416
417Below is a set of common features used by built-in major mode.
418
419Basic tokens:
420
421delimiter ,.; (delimit things)
422operator == != || (produces a value)
423bracket []{}()
424misc-punctuation (other punctuation that you want to highlight)
425
426constant true, false, null
427number
428keyword
429comment (includes doc-comments)
430string (includes chars and docstrings)
431string-interpolation f"text {variable}"
432escape-sequence "\n\t\\"
433function every function identifier
434variable every variable identifier
435type every type identifier
436property a.b <--- highlight b
437key { a: b, c: d } <--- highlight a, c
438error highlight parse error
439
440Abstract features:
441
442assignment: the LHS of an assignment (thing being assigned to), eg:
443
444a = b <--- highlight a
445a.b = c <--- highlight b
446a[1] = d <--- highlight a
447
448definition: the thing being defined, eg:
449
450int a(int b) { <--- highlight a
451 return 0
452}
453
454int a; <-- highlight a
455
456struct a { <--- highlight a
457 int b; <--- highlight b
458}