Source code modification can be useful in a number of testing and analysis scenarios. Here, we’ll look at how you can modify Python source code using the ast
module, and some tools where this technique is used.
The CPython compilation process
To begin, let’s take a look at the CPython compilation process, as described in PEP 339.
Detailed knowledge of these steps isn’t required for reading this article, but it helps to have a rough idea of the whole process.
First, a parse tree is generated from the source code. Next, an Abstract Syntax Tree (AST) is built from the parse tree. From the AST, a control flow graph is generated, and finally, the code object is compiled from the control flow graph.
Marked in blue is the AST stage, since that’s what we’ll be focusing on today. The ast
module appeared in its current form in Python 2.6, and exposes a simple method of visiting and modifying the AST.
From there, we can generate a code object from our modified AST. We can also re-generate source code from our modified AST for explanatory purposes.
Creating an AST
Let’s define a simple expression, an add
function, and inspect the generated AST.
1 2 3 4 5 |
|
Now that we have generated an ast.Module
object, let’s dump the contents:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
As we can see, Module
is the parent node. The Module
body contains a list with a single element: our function definition. The function definition has a name, list of arguments, and a body. The body contains a single Return
node, which contains the Add
operation.
Modifying the AST
How can we modify this tree and change how the code works? To illustrate, let’s do something crazy that you would never want to do in your code. We’ll traverse the tree, and replace the Add
operation with a Mult
operation. See, I told you it was crazy!
We’ll start by subclassing the NodeTransformer
class, and defining the visit_BinOp
method which is called when the transformer visits a binary operator node:
1 2 3 4 5 6 7 |
|
Now that we’ve defined our transformer which performs this unhealthy action, let’s see it run on the expression defined above:
1 2 3 |
|
You can see the Add
node is replaced with a Mult
, as shown by the modified __dict__
. There are a couple of things we haven’t dealt with here, like visiting child nodes, but this is enough to illustrate the principle.
Compiling and executing the modified AST
After adding a call to our operation, i.e.:
1
|
|
…to the end of our script, let’s see how the code evaluates:
1 2 3 4 5 6 7 |
|
As we can see, our unmodified and modified ASTs compile to code that prints 9 and 20 respectively.
Translating back to source code
Finally, we can use unparse
found here to view the source code corresponding to our modified AST:
1 2 3 4 5 |
|
As we can see, the *
operator has replaced +
. Unparse is useful for understanding how your AST transformer modifies code.
Practical applications
Clearly, our above example serves little practical purpose. However, static analysis and modification of source code can be extremely useful.
You could, for instance, inject code for testing purposes. See this Pycon talk for an understanding of how a node transformer can be used to inject instrumentation code for testing purposes.
In addition, the pythoscope project uses an AST visitor to process source files and generate tests from method signatures.
Projects such as pylint also use an AST walking method to analyze source code. In the case of pylint, Logilab have created a module which aims:
to provide a common base representation of python source code for projects such as pychecker, pyreverse, pylint…
See here for more information.
References
This Pycon talk by Matthew J Desmarais and this blog post by Eli Bendersky were invaluable in writing this post.