API Reference
This module enhances the Python AST tree with token and source code information, sufficent to detect the source text of each AST node. This is helpful for tools that make source code transformations.
ASTTokens
- class asttokens.ASTTokens(source_text: Any, parse: bool = False, tree: Module | None = None, filename: str = '<unknown>', tokens: Iterable[TokenInfo] = None)[source]
ASTTokens maintains the text of Python code in several forms: as a string, as line numbers, and as tokens, and is used to mark and access token and position information.
source_text
must be a unicode or UTF8-encoded string. If you pass in UTF8 bytes, remember that all offsets you’ll get are to the unicode text, which is available as the.text
property.If
parse
is set, thesource_text
will be parsed withast.parse()
, and the resulting tree marked with token info and made available as the.tree
property.If
tree
is given, it will be marked and made available as the.tree
property. In addition to the trees produced by theast
module, ASTTokens will also mark trees produced usingastroid
library <https://www.astroid.org>.If only
source_text
is given, you may use.mark_tokens(tree)
to mark the nodes of an AST tree created separately.- property filename: str
The filename that was parsed
- find_token(start_token: Token, tok_type: int, tok_str: str | None = None, reverse: bool = False) Token [source]
Looks for the first token, starting at start_token, that matches tok_type and, if given, the token string. Searches backwards if reverse is True. Returns ENDMARKER token if not found (you can check it with token.ISEOF(t.type)).
- get_text_positions(node: AstNode, padded: bool) Tuple[Tuple[int, int], Tuple[int, int]] [source]
Returns two
(lineno, col_offset)
tuples for the start and end of the given node. If the positions can’t be determined, or the nodes don’t correspond to any particular text, returns(1, 0)
for both.padded
corresponds to thepadded
argument toast.get_source_segment()
. This means that ifpadded
is True, the start position will be adjusted to include leading whitespace ifnode
is a multiline statement.
- get_token(lineno: int, col_offset: int) Token [source]
Returns the token containing the given (lineno, col_offset) position, or the preceeding token if the position is between tokens.
- get_token_from_offset(offset: int) Token [source]
Returns the token containing the given character offset (0-based position in source text), or the preceeding token if the position is between tokens.
- get_token_from_utf8(lineno: int, col_offset: int) Token [source]
Same as get_token(), but interprets col_offset as a UTF8 offset, which is what ast uses.
- get_tokens(node: AstNode, include_extra: bool = False) Iterator[Token] [source]
Yields all tokens making up the given node. If include_extra is True, includes non-coding tokens such as tokenize.NL and .COMMENT.
- mark_tokens(root_node: Module) None [source]
Given the root of the AST or Astroid tree produced from source_text, visits all nodes marking them with token and position information by adding
.first_token
and.last_token``attributes. This is done automatically in the constructor when ``parse
ortree
arguments are set, but may be used manually with a separate AST or Astroid tree.
- next_token(tok: Token, include_extra: bool = False) Token [source]
Returns the next token after the given one. If include_extra is True, includes non-coding tokens from the tokenize module, such as NL and COMMENT.
- prev_token(tok: Token, include_extra: bool = False) Token [source]
Returns the previous token before the given one. If include_extra is True, includes non-coding tokens from the tokenize module, such as NL and COMMENT.
- property text: str
The source code passed into the constructor.
- token_range(first_token: Token, last_token: Token, include_extra: bool = False) Iterator[Token] [source]
Yields all tokens in order from first_token through and including last_token. If include_extra is True, includes non-coding tokens such as tokenize.NL and .COMMENT.
- property tokens: List[Token]
The list of tokens corresponding to the source code from the constructor.
- property tree: Module | None
The root of the AST tree passed into the constructor or parsed from the source code.
ASTText
- class asttokens.ASTText(source_text: Any, tree: Module | None = None, filename: str = '<unknown>')[source]
Supports the same
get_text*
methods asASTTokens
, but uses the AST to determine the text positions instead of tokens. This is faster thanASTTokens
as it requires less setup work.It also (sometimes) supports nodes inside f-strings, which
ASTTokens
doesn’t.Some node types and/or Python versions are not supported. In these cases the
get_text*
methods will fall back to usingASTTokens
which incurs the usual setup cost the first time. If you want to avoid this, checksupports_tokenless(node)
before callingget_text*
methods.- get_text_positions(node: AstNode, padded: bool) Tuple[Tuple[int, int], Tuple[int, int]] [source]
Returns two
(lineno, col_offset)
tuples for the start and end of the given node. If the positions can’t be determined, or the nodes don’t correspond to any particular text, returns(1, 0)
for both.padded
corresponds to thepadded
argument toast.get_source_segment()
. This means that ifpadded
is True, the start position will be adjusted to include leading whitespace ifnode
is a multiline statement.
LineNumbers
- class asttokens.LineNumbers(text: str)[source]
Class to convert between character offsets in a text string, and pairs (line, column) of 1-based line and 0-based column numbers, as used by tokens and AST nodes.
This class expects unicode for input and stores positions in unicode. But it supports translating to and from utf8 offsets, which are used by ast parsing.
- from_utf8_col(line: int, utf8_column: int) int [source]
Given a 1-based line number and 0-based utf8 column, returns a 0-based unicode column.
util
- class asttokens.util.Token(type, string, start, end, line, index, startpos, endpos)[source]
TokenInfo is an 8-tuple containing the same 5 fields as the tokens produced by the tokenize module, and 3 additional ones useful for this module:
[0] .type Token type (see token.py)
[1] .string Token (a string)
[2] .start Starting (row, column) indices of the token (a 2-tuple of ints)
[3] .end Ending (row, column) indices of the token (a 2-tuple of ints)
[4] .line Original line (string)
[5] .index Index of the token in the list of tokens that it belongs to.
[6] .startpos Starting character offset into the input text.
[7] .endpos Ending character offset into the input text.
- asttokens.util.replace(text: str, replacements: List[Tuple[int, int, str]]) str [source]
Replaces multiple slices of text with new values. This is a convenience method for making code modifications of ranges e.g. as identified by
ASTTokens.get_text_range(node)
. Replacements is an iterable of(start, end, new_text)
tuples.For example,
replace("this is a test", [(0, 4, "X"), (8, 9, "THE")])
produces"X is THE test"
.
- asttokens.util.token_repr(tok_type: int, string: str | None) str [source]
Returns a human-friendly representation of a token with the given type and string.
- asttokens.util.visit_tree(node: Module, previsit: Callable[[AstNode, Token | None], Tuple[Token | None, Token | None]], postvisit: Callable[[AstNode, Token | None, Token | None], None] | None) None [source]
Scans the tree under the node depth-first using an explicit stack. It avoids implicit recursion via the function call stack to avoid hitting ‘maximum recursion depth exceeded’ error.
It calls
previsit()
andpostvisit()
as follows:previsit(node, par_value)
- should return(par_value, value)
par_value
is as returned fromprevisit()
of the parent.
postvisit(node, par_value, value)
- should returnvalue
par_value
is as returned fromprevisit()
of the parent, andvalue
is as returned fromprevisit()
of this node itself. The returnvalue
is ignored except the one for the root node, which is returned from the overallvisit_tree()
call.
For the initial node,
par_value
is None.postvisit
may be None.
- asttokens.util.walk(node: AST) Iterator[Module | AstNode] [source]
Recursively yield all descendant nodes in the tree starting at
node
(includingnode
itself), using depth-first pre-order traversal (yieling parents before their children).This is similar to
ast.walk()
, but with a different order, and it works for bothast
andastroid
trees. Also, asiter_children()
, it skips singleton nodes generated byast
.