API Reference

This module enhances the Python AST tree with token and source code information, sufficent to detect the source text of each AST node. This is helpful for tools that make source code transformations.

ASTTokens

class asttokens.ASTTokens(source_text: Any, parse: bool = False, tree: Module | None = None, filename: str = '<unknown>', tokens: Iterable[TokenInfo] = None)[source]

ASTTokens maintains the text of Python code in several forms: as a string, as line numbers, and as tokens, and is used to mark and access token and position information.

source_text must be a unicode or UTF8-encoded string. If you pass in UTF8 bytes, remember that all offsets you’ll get are to the unicode text, which is available as the .text property.

If parse is set, the source_text will be parsed with ast.parse(), and the resulting tree marked with token info and made available as the .tree property.

If tree is given, it will be marked and made available as the .tree property. In addition to the trees produced by the ast module, ASTTokens will also mark trees produced using astroid library <https://www.astroid.org>.

If only source_text is given, you may use .mark_tokens(tree) to mark the nodes of an AST tree created separately.

property filename: str: The filename that was parsed

find_token(start_token: Token, tok_type: int, tok_str: str | None = None, reverse: bool = False) → Token[source]: Looks for the first token, starting at start_token, that matches tok_type and, if given, the token string. Searches backwards if reverse is True. Returns ENDMARKER token if not found (you can check it with token.ISEOF(t.type)).

get_text_positions(node: AstNode, padded: bool) → Tuple[Tuple[int, int], Tuple[int, int]][source]

Returns two (lineno, col_offset) tuples for the start and end of the given node. If the positions can’t be determined, or the nodes don’t correspond to any particular text, returns (1, 0) for both.

padded corresponds to the padded argument to ast.get_source_segment(). This means that if padded is True, the start position will be adjusted to include leading whitespace if node is a multiline statement.

get_token(lineno: int, col_offset: int) → Token[source]: Returns the token containing the given (lineno, col_offset) position, or the preceeding token if the position is between tokens.

get_token_from_offset(offset: int) → Token[source]: Returns the token containing the given character offset (0-based position in source text), or the preceeding token if the position is between tokens.

get_token_from_utf8(lineno: int, col_offset: int) → Token[source]: Same as get_token(), but interprets col_offset as a UTF8 offset, which is what ast uses.

get_tokens(node: AstNode, include_extra: bool = False) → Iterator[Token][source]: Yields all tokens making up the given node. If include_extra is True, includes non-coding tokens such as tokenize.NL and .COMMENT.

mark_tokens(root_node: Module) → None[source]: Given the root of the AST or Astroid tree produced from source_text, visits all nodes marking them with token and position information by adding .first_token and .last_token attributes. This is done automatically in the constructor when parse or tree arguments are set, but may be used manually with a separate AST or Astroid tree.

next_token(tok: Token, include_extra: bool = False) → Token[source]: Returns the next token after the given one. If include_extra is True, includes non-coding tokens from the tokenize module, such as NL and COMMENT.

prev_token(tok: Token, include_extra: bool = False) → Token[source]: Returns the previous token before the given one. If include_extra is True, includes non-coding tokens from the tokenize module, such as NL and COMMENT.

property text: str: The source code passed into the constructor.

token_range(first_token: Token, last_token: Token, include_extra: bool = False) → Iterator[Token][source]: Yields all tokens in order from first_token through and including last_token. If include_extra is True, includes non-coding tokens such as tokenize.NL and .COMMENT.

property tokens: List[Token]: The list of tokens corresponding to the source code from the constructor.

property tree: Module | None: The root of the AST tree passed into the constructor or parsed from the source code.

ASTText

class asttokens.ASTText(source_text: Any, tree: Module | None = None, filename: str = '<unknown>')[source]

Supports the same get_text* methods as ASTTokens, but uses the AST to determine the text positions instead of tokens. This is faster than ASTTokens as it requires less setup work.

It also (sometimes) supports nodes inside f-strings, which ASTTokens doesn’t.

Some node types and/or Python versions are not supported. In these cases the get_text* methods will fall back to using ASTTokens which incurs the usual setup cost the first time. If you want to avoid this, check supports_tokenless(node) before calling get_text* methods.

get_text_positions(node: AstNode, padded: bool) → Tuple[Tuple[int, int], Tuple[int, int]][source]

Returns two (lineno, col_offset) tuples for the start and end of the given node. If the positions can’t be determined, or the nodes don’t correspond to any particular text, returns (1, 0) for both.

padded corresponds to the padded argument to ast.get_source_segment(). This means that if padded is True, the start position will be adjusted to include leading whitespace if node is a multiline statement.

LineNumbers

class asttokens.LineNumbers(text: str)[source]

Class to convert between character offsets in a text string, and pairs (line, column) of 1-based line and 0-based column numbers, as used by tokens and AST nodes.

This class expects unicode for input and stores positions in unicode. But it supports translating to and from utf8 offsets, which are used by ast parsing.

from_utf8_col(line: int, utf8_column: int) → int[source]: Given a 1-based line number and 0-based utf8 column, returns a 0-based unicode column.

line_to_offset(line: int, column: int) → int[source]: Converts 1-based line number and 0-based column to 0-based character offset into text.

offset_to_line(offset: int) → Tuple[int, int][source]: Converts 0-based character offset to pair (line, col) of 1-based line and 0-based column numbers.

util

class asttokens.util.Token(type, string, start, end, line, index, startpos, endpos)[source]

TokenInfo is an 8-tuple containing the same 5 fields as the tokens produced by the tokenize module, and 3 additional ones useful for this module:

[0] .type Token type (see token.py)
[1] .string Token (a string)
[2] .start Starting (row, column) indices of the token (a 2-tuple of ints)
[3] .end Ending (row, column) indices of the token (a 2-tuple of ints)
[4] .line Original line (string)
[5] .index Index of the token in the list of tokens that it belongs to.
[6] .startpos Starting character offset into the input text.
[7] .endpos Ending character offset into the input text.

asttokens.util.replace(text: str, replacements: List[Tuple[int, int, str]]) → str[source]

Replaces multiple slices of text with new values. This is a convenience method for making code modifications of ranges e.g. as identified by ASTTokens.get_text_range(node). Replacements is an iterable of (start, end, new_text) tuples.

For example, replace("this is a test", [(0, 4, "X"), (8, 9, "THE")]) produces "X is THE test".

asttokens.util.token_repr(tok_type: int, string: str | None) → str[source]: Returns a human-friendly representation of a token with the given type and string.

Scans the tree under the node depth-first using an explicit stack. It avoids implicit recursion via the function call stack to avoid hitting ‘maximum recursion depth exceeded’ error.

It calls previsit() and postvisit() as follows:

previsit(node, par_value) - should return (par_value, value)
par_value is as returned from previsit() of the parent.
postvisit(node, par_value, value) - should return value
par_value is as returned from previsit() of the parent, and value is as returned from previsit() of this node itself. The return value is ignored except the one for the root node, which is returned from the overall visit_tree() call.

For the initial node, par_value is None. postvisit may be None.

asttokens.util.walk(node: AST, include_joined_str: bool = False) → Iterator[Module | AstNode][source]

Recursively yield all descendant nodes in the tree starting at node (including node itself), using depth-first pre-order traversal (yieling parents before their children).

This is similar to ast.walk(), but with a different order, and it works for both ast and astroid trees. Also, as iter_children(), it skips singleton nodes generated by ast.

By default, JoinedStr (f-string) nodes and their contents are skipped because they previously couldn’t be handled. Set include_joined_str to True to include them.