Source code modification can be useful in a number of testing and analysis scenarios. Here, we’ll look at how you can modify Python source code using the
ast module, and some tools where this technique is used.
The CPython compilation process
To begin, let’s take a look at the CPython compilation process, as described in PEP 339.
Detailed knowledge of these steps isn’t required for reading this article, but it helps to have a rough idea of the whole process.
First, a parse tree is generated from the source code. Next, an Abstract Syntax Tree (AST) is built from the parse tree. From the AST, a control flow graph is generated, and finally, the code object is compiled from the control flow graph.
Marked in blue is the AST stage, since that’s what we’ll be focusing on today. The
ast module appeared in its current form in Python 2.6, and exposes a simple method of visiting and modifying the AST.
From there, we can generate a code object from our modified AST. We can also re-generate source code from our modified AST for explanatory purposes.
Creating an AST
Let’s define a simple expression, an
add function, and inspect the generated AST.
1 2 3 4 5
Now that we have generated an
ast.Module object, let’s dump the contents:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
As we can see,
Module is the parent node. The
Module body contains a list with a single element: our function definition. The function definition has a name, list of arguments, and a body. The body contains a single
Return node, which contains the
Modifying the AST
How can we modify this tree and change how the code works? To illustrate, let’s do something crazy that you would never want to do in your code. We’ll traverse the tree, and replace the
Add operation with a
Mult operation. See, I told you it was crazy!
We’ll start by subclassing the
NodeTransformer class, and defining the
visit_BinOp method which is called when the transformer visits a binary operator node:
1 2 3 4 5 6 7
Now that we’ve defined our transformer which performs this unhealthy action, let’s see it run on the expression defined above:
1 2 3
You can see the
Add node is replaced with a
Mult, as shown by the modified
__dict__. There are a couple of things we haven’t dealt with here, like visiting child nodes, but this is enough to illustrate the principle.
Compiling and executing the modified AST
After adding a call to our operation, i.e.:
…to the end of our script, let’s see how the code evaluates:
1 2 3 4 5 6 7
As we can see, our unmodified and modified ASTs compile to code that prints 9 and 20 respectively.
Translating back to source code
Finally, we can use
unparse found here to view the source code corresponding to our modified AST:
1 2 3 4 5
As we can see, the
* operator has replaced
+. Unparse is useful for understanding how your AST transformer modifies code.
Clearly, our above example serves little practical purpose. However, static analysis and modification of source code can be extremely useful.
You could, for instance, inject code for testing purposes. See this Pycon talk for an understanding of how a node transformer can be used to inject instrumentation code for testing purposes.
In addition, the pythoscope project uses an AST visitor to process source files and generate tests from method signatures.
Projects such as pylint also use an AST walking method to analyze source code. In the case of pylint, Logilab have created a module which aims:
to provide a common base representation of python source code for projects such as pychecker, pyreverse, pylint…
See here for more information.