diff options
| author | Yuan Fu | 2023-03-18 14:13:31 -0700 |
|---|---|---|
| committer | Yuan Fu | 2023-03-18 14:15:43 -0700 |
| commit | e84f878e19a892f66a1659c45e9f9b96e375b016 (patch) | |
| tree | 9b9a4a393ef85f098e7732e6623517a596da21f9 | |
| parent | 11592bcfda6cf85d797d333072453c98994790e1 (diff) | |
| download | emacs-e84f878e19a892f66a1659c45e9f9b96e375b016.tar.gz emacs-e84f878e19a892f66a1659c45e9f9b96e375b016.zip | |
; * admin/notes/tree-sitter/starter-guide: Update starter-guide.
| -rw-r--r-- | admin/notes/tree-sitter/starter-guide | 157 |
1 files changed, 80 insertions, 77 deletions
diff --git a/admin/notes/tree-sitter/starter-guide b/admin/notes/tree-sitter/starter-guide index b8910aab5ca..846614f1446 100644 --- a/admin/notes/tree-sitter/starter-guide +++ b/admin/notes/tree-sitter/starter-guide | |||
| @@ -17,6 +17,7 @@ TOC: | |||
| 17 | - More features? | 17 | - More features? |
| 18 | - Common tasks (code snippets) | 18 | - Common tasks (code snippets) |
| 19 | - Manual | 19 | - Manual |
| 20 | - Appendix 1 | ||
| 20 | 21 | ||
| 21 | * Building Emacs with tree-sitter | 22 | * Building Emacs with tree-sitter |
| 22 | 23 | ||
| @@ -42,11 +43,9 @@ You can use this script that I put together here: | |||
| 42 | 43 | ||
| 43 | https://github.com/casouri/tree-sitter-module | 44 | https://github.com/casouri/tree-sitter-module |
| 44 | 45 | ||
| 45 | You can also find them under this directory in /build-modules. | ||
| 46 | |||
| 47 | This script automatically pulls and builds language definitions for C, | 46 | This script automatically pulls and builds language definitions for C, |
| 48 | C++, Rust, JSON, Go, HTML, JavaScript, CSS, Python, Typescript, | 47 | C++, Rust, JSON, Go, HTML, JavaScript, CSS, Python, Typescript, |
| 49 | and C#. Better yet, I pre-built these language definitions for | 48 | C#, etc. Better yet, I pre-built these language definitions for |
| 50 | GNU/Linux and macOS, they can be downloaded here: | 49 | GNU/Linux and macOS, they can be downloaded here: |
| 51 | 50 | ||
| 52 | https://github.com/casouri/tree-sitter-module/releases/tag/v2.1 | 51 | https://github.com/casouri/tree-sitter-module/releases/tag/v2.1 |
| @@ -68,6 +67,10 @@ organization has all the "official" language definitions: | |||
| 68 | 67 | ||
| 69 | https://github.com/tree-sitter | 68 | https://github.com/tree-sitter |
| 70 | 69 | ||
| 70 | Alternatively, you can use treesit-install-language-grammar command | ||
| 71 | and follow its instructions. If everything goes right, it should | ||
| 72 | automatically download and compile the language grammar for you. | ||
| 73 | |||
| 71 | * Setting up for adding major mode features | 74 | * Setting up for adding major mode features |
| 72 | 75 | ||
| 73 | Start Emacs and load tree-sitter with | 76 | Start Emacs and load tree-sitter with |
| @@ -78,6 +81,10 @@ Now check if Emacs is built with tree-sitter library | |||
| 78 | 81 | ||
| 79 | (treesit-available-p) | 82 | (treesit-available-p) |
| 80 | 83 | ||
| 84 | Make sure Emacs can find the language grammar you want to use | ||
| 85 | |||
| 86 | (treesit-language-available-p 'lang) | ||
| 87 | |||
| 81 | * Tree-sitter major modes | 88 | * Tree-sitter major modes |
| 82 | 89 | ||
| 83 | Tree-sitter modes should be separate major modes, so other modes | 90 | Tree-sitter modes should be separate major modes, so other modes |
| @@ -89,12 +96,15 @@ modes. | |||
| 89 | 96 | ||
| 90 | If the tree-sitter variant and the "native" variant could share some | 97 | If the tree-sitter variant and the "native" variant could share some |
| 91 | setup, you can create a "base mode", which only contains the common | 98 | setup, you can create a "base mode", which only contains the common |
| 92 | setup. For example, there is python-base-mode (shared), python-mode | 99 | setup. For example, python.el defines python-base-mode (shared), |
| 93 | (native), and python-ts-mode (tree-sitter). | 100 | python-mode (native), and python-ts-mode (tree-sitter). |
| 94 | 101 | ||
| 95 | In the tree-sitter mode, check if we can use tree-sitter with | 102 | In the tree-sitter mode, check if we can use tree-sitter with |
| 96 | treesit-ready-p, it will error out if tree-sitter is not ready. | 103 | treesit-ready-p, it will error out if tree-sitter is not ready. |
| 97 | 104 | ||
| 105 | In Emacs 30 we'll introduce some mechanism to more gracefully inherit | ||
| 106 | modes and fallback to other modes. | ||
| 107 | |||
| 98 | * Naming convention | 108 | * Naming convention |
| 99 | 109 | ||
| 100 | Use tree-sitter for text (documentation, comment), use treesit for | 110 | Use tree-sitter for text (documentation, comment), use treesit for |
| @@ -180,18 +190,17 @@ mark the offending part in red. | |||
| 180 | To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’ and | 190 | To enable tree-sitter font-lock, set ‘treesit-font-lock-settings’ and |
| 181 | ‘treesit-font-lock-feature-list’ buffer-locally and call | 191 | ‘treesit-font-lock-feature-list’ buffer-locally and call |
| 182 | ‘treesit-major-mode-setup’. For example, see | 192 | ‘treesit-major-mode-setup’. For example, see |
| 183 | ‘python--treesit-settings’ in python.el. Below I paste a snippet of | 193 | ‘python--treesit-settings’ in python.el. Below is a snippet of it. |
| 184 | it. | ||
| 185 | 194 | ||
| 186 | Note that like the current font-lock, if the to-be-fontified region | 195 | Just like the current font-lock, if the to-be-fontified region already |
| 187 | already has a face (ie, an earlier match fontified part/all of the | 196 | has a face (ie, an earlier match fontified part/all of the region), |
| 188 | region), the new face is discarded rather than applied. If you want | 197 | the new face is discarded rather than applied. If you want later |
| 189 | later matches always override earlier matches, use the :override | 198 | matches always override earlier matches, use the :override keyword. |
| 190 | keyword. | ||
| 191 | 199 | ||
| 192 | Each rule should have a :feature, like function-name, | 200 | Each rule should have a :feature, like function-name, |
| 193 | string-interpolation, builtin, etc. Users can then enable/disable each | 201 | string-interpolation, builtin, etc. Users can then enable/disable each |
| 194 | feature individually. | 202 | feature individually. See Appendix 1 at the bottom for a set of common |
| 203 | features names. | ||
| 195 | 204 | ||
| 196 | #+begin_src elisp | 205 | #+begin_src elisp |
| 197 | (defvar python--treesit-settings | 206 | (defvar python--treesit-settings |
| @@ -247,8 +256,7 @@ Concretely, something like this: | |||
| 247 | (string-interpolation decorator))) | 256 | (string-interpolation decorator))) |
| 248 | (treesit-major-mode-setup)) | 257 | (treesit-major-mode-setup)) |
| 249 | (t | 258 | (t |
| 250 | ;; No tree-sitter | 259 | ;; No tree-sitter, do nothing or fallback to another mode. |
| 251 | (setq-local font-lock-defaults ...) | ||
| 252 | ...))) | 260 | ...))) |
| 253 | #+end_src | 261 | #+end_src |
| 254 | 262 | ||
| @@ -289,6 +297,7 @@ For ANCHOR we have | |||
| 289 | first-sibling => start of the first sibling | 297 | first-sibling => start of the first sibling |
| 290 | parent => start of parent | 298 | parent => start of parent |
| 291 | parent-bol => BOL of the line parent is on. | 299 | parent-bol => BOL of the line parent is on. |
| 300 | standalone-parent => Like parent-bol but handles more edge cases | ||
| 292 | prev-sibling => start of previous sibling | 301 | prev-sibling => start of previous sibling |
| 293 | no-indent => current position (don’t indent) | 302 | no-indent => current position (don’t indent) |
| 294 | prev-line => start of previous line | 303 | prev-line => start of previous line |
| @@ -329,7 +338,8 @@ tells you which rule is applied in the echo area. | |||
| 329 | ...)))) | 338 | ...)))) |
| 330 | #+end_src | 339 | #+end_src |
| 331 | 340 | ||
| 332 | Then you set ‘treesit-simple-indent-rules’ to your rules, and call | 341 | To setup indentation for your major mode, set |
| 342 | ‘treesit-simple-indent-rules’ to your rules, and call | ||
| 333 | ‘treesit-major-mode-setup’: | 343 | ‘treesit-major-mode-setup’: |
| 334 | 344 | ||
| 335 | #+begin_src elisp | 345 | #+begin_src elisp |
| @@ -339,36 +349,14 @@ Then you set ‘treesit-simple-indent-rules’ to your rules, and call | |||
| 339 | 349 | ||
| 340 | * Imenu | 350 | * Imenu |
| 341 | 351 | ||
| 342 | Not much to say except for utilizing ‘treesit-induce-sparse-tree’ (and | 352 | Set ‘treesit-simple-imenu-settings’ and call |
| 343 | explicitly pass a LIMIT argument: most of the time you don't need more | 353 | ‘treesit-major-mode-setup’. |
| 344 | than 10). See ‘js--treesit-imenu-1’ in js.el for an example. | ||
| 345 | |||
| 346 | Once you have the index builder, set ‘imenu-create-index-function’ to | ||
| 347 | it. | ||
| 348 | 354 | ||
| 349 | * Navigation | 355 | * Navigation |
| 350 | 356 | ||
| 351 | Mainly ‘beginning-of-defun-function’ and ‘end-of-defun-function’. | 357 | Set ‘treesit-defun-type-regexp’ and call |
| 352 | You can find the end of a defun with something like | 358 | ‘treesit-major-mode-setup’. You can additionally set |
| 353 | 359 | ‘treesit-defun-name-function’. | |
| 354 | (treesit-search-forward-goto "function_definition" 'end) | ||
| 355 | |||
| 356 | where "function_definition" matches the node type of a function | ||
| 357 | definition node, and ’end means we want to go to the end of that node. | ||
| 358 | |||
| 359 | Tree-sitter has default implementations for | ||
| 360 | ‘beginning-of-defun-function’ and ‘end-of-defun-function’. So for | ||
| 361 | ordinary languages, it is enough to set ‘treesit-defun-type-regexp’ | ||
| 362 | to something that matches all the defun struct types in the language, | ||
| 363 | and call ‘treesit-major-mode-setup’. For example, | ||
| 364 | |||
| 365 | #+begin_src emacs-lisp | ||
| 366 | (setq-local treesit-defun-type-regexp (rx bol | ||
| 367 | (or "function" "class") | ||
| 368 | "_definition" | ||
| 369 | eol)) | ||
| 370 | (treesit-major-mode-setup) | ||
| 371 | #+end_src> | ||
| 372 | 360 | ||
| 373 | * Which-func | 361 | * Which-func |
| 374 | 362 | ||
| @@ -376,36 +364,7 @@ If you have an imenu implementation, set ‘which-func-functions’ to | |||
| 376 | nil, and which-func will automatically use imenu’s data. | 364 | nil, and which-func will automatically use imenu’s data. |
| 377 | 365 | ||
| 378 | If you want an independent implementation for which-func, you can | 366 | If you want an independent implementation for which-func, you can |
| 379 | find the current function by going up the tree and looking for the | 367 | find the current function by ‘treesit-defun-at-point’. |
| 380 | function_definition node. See the function below for an example. | ||
| 381 | Since Python allows nested function definitions, that function keeps | ||
| 382 | going until it reaches the root node, and records all the function | ||
| 383 | names along the way. | ||
| 384 | |||
| 385 | #+begin_src elisp | ||
| 386 | (defun python-info-treesit-current-defun (&optional include-type) | ||
| 387 | "Identical to `python-info-current-defun' but use tree-sitter. | ||
| 388 | For INCLUDE-TYPE see `python-info-current-defun'." | ||
| 389 | (let ((node (treesit-node-at (point))) | ||
| 390 | (name-list ()) | ||
| 391 | (type nil)) | ||
| 392 | (cl-loop while node | ||
| 393 | if (pcase (treesit-node-type node) | ||
| 394 | ("function_definition" | ||
| 395 | (setq type 'def)) | ||
| 396 | ("class_definition" | ||
| 397 | (setq type 'class)) | ||
| 398 | (_ nil)) | ||
| 399 | do (push (treesit-node-text | ||
| 400 | (treesit-node-child-by-field-name node "name") | ||
| 401 | t) | ||
| 402 | name-list) | ||
| 403 | do (setq node (treesit-node-parent node)) | ||
| 404 | finally return (concat (if include-type | ||
| 405 | (format "%s " type) | ||
| 406 | "") | ||
| 407 | (string-join name-list "."))))) | ||
| 408 | #+end_src | ||
| 409 | 368 | ||
| 410 | * More features? | 369 | * More features? |
| 411 | 370 | ||
| @@ -449,7 +408,51 @@ section is Parsing Program Source. Typing | |||
| 449 | 408 | ||
| 450 | C-h i d m elisp RET g Parsing Program Source RET | 409 | C-h i d m elisp RET g Parsing Program Source RET |
| 451 | 410 | ||
| 452 | will bring you to that section. You can also read the HTML version | 411 | will bring you to that section. You don’t need to read through every |
| 453 | under /html-manual in this directory. I find the HTML version easier | 412 | sentence, just read the text paragraphs and glance over function |
| 454 | to read. You don’t need to read through every sentence, just read the | 413 | names. |
| 455 | text paragraphs and glance over function names. | 414 | |
| 415 | * Appendix 1 | ||
| 416 | |||
| 417 | Below is a set of common features used by built-in major mode. | ||
| 418 | |||
| 419 | Basic tokens: | ||
| 420 | |||
| 421 | delimiter ,.; (delimit things) | ||
| 422 | operator == != || (produces a value) | ||
| 423 | bracket []{}() | ||
| 424 | misc-punctuation (other punctuation that you want to highlight) | ||
| 425 | |||
| 426 | constant true, false, null | ||
| 427 | number | ||
| 428 | keyword | ||
| 429 | comment (includes doc-comments) | ||
| 430 | string (includes chars and docstrings) | ||
| 431 | string-interpolation f"text {variable}" | ||
| 432 | escape-sequence "\n\t\\" | ||
| 433 | function every function identifier | ||
| 434 | variable every variable identifier | ||
| 435 | type every type identifier | ||
| 436 | property a.b <--- highlight b | ||
| 437 | key { a: b, c: d } <--- highlight a, c | ||
| 438 | error highlight parse error | ||
| 439 | |||
| 440 | Abstract features: | ||
| 441 | |||
| 442 | assignment: the LHS of an assignment (thing being assigned to), eg: | ||
| 443 | |||
| 444 | a = b <--- highlight a | ||
| 445 | a.b = c <--- highlight b | ||
| 446 | a[1] = d <--- highlight a | ||
| 447 | |||
| 448 | definition: the thing being defined, eg: | ||
| 449 | |||
| 450 | int a(int b) { <--- highlight a | ||
| 451 | return 0 | ||
| 452 | } | ||
| 453 | |||
| 454 | int a; <-- highlight a | ||
| 455 | |||
| 456 | struct a { <--- highlight a | ||
| 457 | int b; <--- highlight b | ||
| 458 | } | ||