This module enhances the Python AST tree with token and source code information, sufficent to detect the source text of each AST node. This is helpful for tools that make source code transformations.
ASTTokens(source_text, parse=False, tree=None, filename='<unknown>')¶
ASTTokens maintains the text of Python code in several forms: as a string, as line numbers, and as tokens, and is used to mark and access token and position information.
source_textmust be a unicode or UTF8-encoded string. If you pass in UTF8 bytes, remember that all offsets you’ll get are to the unicode text, which is available as the
parseis set, the
source_textwill be parsed with
ast.parse(), and the resulting tree marked with token info and made available as the
treeis given, it will be marked and made available as the
.treeproperty. In addition to the trees produced by the
astmodule, ASTTokens will also mark trees produced using
source_textis given, you may use
.mark_tokens(tree)to mark the nodes of an AST tree created separately.
The filename that was parsed
find_token(start_token, tok_type, tok_str=None, reverse=False)¶
Looks for the first token, starting at start_token, that matches tok_type and, if given, the token string. Searches backwards if reverse is True. Returns ENDMARKER token if not found (you can check it with token.ISEOF(t.type).
After mark_tokens() has been called, returns the text corresponding to the given node. Returns ‘’ for nodes (like Load) that don’t correspond to any particular text.
After mark_tokens() has been called, returns the (startpos, endpos) positions in source text corresponding to the given node. Returns (0, 0) for nodes (like Load) that don’t correspond to any particular text.
Returns the token containing the given (lineno, col_offset) position, or the preceeding token if the position is between tokens.
Returns the token containing the given character offset (0-based position in source text), or the preceeding token if the position is between tokens.
Same as get_token(), but interprets col_offset as a UTF8 offset, which is what ast uses.
Yields all tokens making up the given node. If include_extra is True, includes non-coding tokens such as tokenize.NL and .COMMENT.
Given the root of the AST or Astroid tree produced from source_text, visits all nodes marking them with token and position information by adding
.last_token``attributes. This is done automatically in the constructor when ``parseor
treearguments are set, but may be used manually with a separate AST or Astroid tree.
Returns the next token after the given one. If include_extra is True, includes non-coding tokens from the tokenize module, such as NL and COMMENT.
Returns the previous token before the given one. If include_extra is True, includes non-coding tokens from the tokenize module, such as NL and COMMENT.
The source code passed into the constructor.
token_range(first_token, last_token, include_extra=False)¶
Yields all tokens in order from first_token through and including last_token. If include_extra is True, includes non-coding tokens such as tokenize.NL and .COMMENT.
The list of tokens corresponding to the source code from the constructor.
The root of the AST tree passed into the constructor or parsed from the source code.
Class to convert between character offsets in a text string, and pairs (line, column) of 1-based line and 0-based column numbers, as used by tokens and AST nodes.
This class expects unicode for input and stores positions in unicode. But it supports translating to and from utf8 offsets, which are used by ast parsing.
Given a 1-based line number and 0-based utf8 column, returns a 0-based unicode column.
Converts 1-based line number and 0-based column to 0-based character offset into text.
Converts 0-based character offset to pair (line, col) of 1-based line and 0-based column numbers.
TokenInfo is an 8-tuple containing the same 5 fields as the tokens produced by the tokenize module, and 3 additional ones useful for this module:
-  .type Token type (see token.py)
-  .string Token (a string)
-  .start Starting (row, column) indices of the token (a 2-tuple of ints)
-  .end Ending (row, column) indices of the token (a 2-tuple of ints)
-  .line Original line (string)
-  .index Index of the token in the list of tokens that it belongs to.
-  .startpos Starting character offset into the input text.
-  .endpos Ending character offset into the input text.
Returns a human-friendly representation of a token with the given type and string.
visit_tree(node, previsit, postvisit)¶
Scans the tree under the node depth-first using an explicit stack. It avoids implicit recursion via the function call stack to avoid hitting ‘maximum recursion depth exceeded’ error.
previsit(node, par_value)- should return
par_valueis as returned from
previsit()of the parent.
postvisit(node, par_value, value)- should return
par_valueis as returned from
previsit()of the parent, and
valueis as returned from
previsit()of this node itself. The return
valueis ignored except the one for the root node, which is returned from the overall
For the initial node,
par_valueis None. Either
postvisitmay be None.
Recursively yield all descendant nodes in the tree starting at
nodeitself), using depth-first pre-order traversal (yieling parents before their children).
This is similar to
ast.walk(), but with a different order, and it works for both
astroidtrees. Also, as
iter_children(), it skips singleton nodes generated by
Replaces multiple slices of text with new values. This is a convenience method for making code modifications of ranges e.g. as identified by
ASTTokens.get_text_range(node). Replacements is an iterable of
(start, end, new_text)tuples.
replace("this is a test", [(0, 4, "X"), (8, 1, "THE")])produces
"X is THE test".