Using Parser (GNU Emacs Lisp Reference Manual)

37.2 Using Tree-sitter Parser

This section described how to create and configure a tree-sitter parser. In Emacs, each tree-sitter parser is associated with a buffer. As we edit the buffer, the associated parser and the syntax tree is automatically kept up-to-date.

Variable: treesit-max-buffer-size ¶: This variable contains the maximum size of buffers in which tree-sitter can be activated. Major modes should check this value when deciding whether to enable tree-sitter features.

Function: treesit-can-enable-p ¶: This function checks whether the current buffer is suitable for activating tree-sitter features. It basically checks treesit-available-p and treesit-max-buffer-size.

Function: treesit-parser-create language &optional buffer no-reuse ¶

To create a parser, we provide a buffer and the language to use (see Tree-sitter Language Definitions). If buffer is nil, the current buffer is used.

By default, this function reuses a parser if one already exists for language in buffer, if no-reuse is non-nil, this function always creates a new parser.

Given a parser, we can query information about it:

Function: treesit-parser-buffer parser ¶: Returns the buffer associated with parser.

Function: treesit-parser-language parser ¶: Returns the language that parser uses.

Function: treesit-parser-p object ¶: Checks if object is a tree-sitter parser. Return non-nil if it is, return nil otherwise.

There is no need to explicitly parse a buffer, because parsing is done automatically and lazily. A parser only parses when we query for a node in its syntax tree. Therefore, when a parser is first created, it doesn’t parse the buffer; it waits until we query for a node for the first time. Similarly, when some change is made in the buffer, a parser doesn’t re-parse immediately.

When a parser do parse, it checks for the size of the buffer. Tree-sitter can only handle buffer no larger than about 4GB. If the size exceeds that, Emacs signals treesit-buffer-too-large with signal data being the buffer size.

Once a parser is created, Emacs automatically adds it to the internal parser list. Every time a change is made to the buffer, Emacs updates parsers in this list so they can update their syntax tree incrementally.

Function: treesit-parser-list &optional buffer ¶: This function returns the parser list of buffer. And buffer defaults to the current buffer.

Function: treesit-parser-delete parser ¶: This function deletes parser.

Normally, a parser “sees” the whole buffer, but when the buffer is narrowed (see Narrowing), the parser will only see the visible region. As far as the parser can tell, the hidden region is deleted. And when the buffer is later widened, the parser thinks text is inserted in the beginning and in the end. Although parsers respect narrowing, narrowing shouldn’t be the mean to handle a multi-language buffer; instead, set the ranges in which a parser should operate in. See Parsing Text in Multiple Languages.

Because a parser parses lazily, when we narrow the buffer, the parser is not affected immediately; as long as we don’t query for a node while the buffer is narrowed, the parser is oblivious of the narrowing.

Function: treesit-parse-string string language ¶

Besides creating a parser for a buffer, we can also just parse a string. Unlike a buffer, parsing a string is a one-time deal, and there is no way to update the result.

This function parses string with language, and returns the root node of the generated syntax tree.