Blueprint Forge.

Building things with computers.

Static Modification of Python With Python: The AST Module

Source code modification can be useful in a number of testing and analysis scenarios. Here, we’ll look at how you can modify Python source code using the ast module, and some tools where this technique is used.

The CPython compilation process

To begin, let’s take a look at the CPython compilation process, as described in PEP 339.

Detailed knowledge of these steps isn’t required for reading this article, but it helps to have a rough idea of the whole process.

First, a parse tree is generated from the source code. Next, an Abstract Syntax Tree (AST) is built from the parse tree. From the AST, a control flow graph is generated, and finally, the code object is compiled from the control flow graph.

Marked in blue is the AST stage, since that’s what we’ll be focusing on today. The ast module appeared in its current form in Python 2.6, and exposes a simple method of visiting and modifying the AST.

From there, we can generate a code object from our modified AST. We can also re-generate source code from our modified AST for explanatory purposes.


Creating an AST

Let’s define a simple expression, an add function, and inspect the generated AST.

1
2
3
4
5
>>> import ast
>>> expr = """def add(arg1, arg2): return arg1 + arg2"""
>>> expr_ast = ast.parse(expr)
>>> expr_ast
<_ast.Module object at 0x7f8718b171d0>

Now that we have generated an ast.Module object, let’s dump the contents:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
>>> ast.dump(expr_ast)
"Module(
    body=[
        FunctionDef(
            name='add', args=arguments(
                args=[
                    Name(id='arg1', ctx=Param()),
                    Name(id='arg2', ctx=Param())
                    ],
                    vararg=None,
                    kwarg=None,
                    defaults=[]),
            body=[
                Return(
                    value=BinOp(
                        left=Name(id='arg1', ctx=Load()),
                        op=Add(),
                        right=Name(id='arg2', ctx=Load())))
                ], 
             decorator_list=[])
    ])"

As we can see, Module is the parent node. The Module body contains a list with a single element: our function definition. The function definition has a name, list of arguments, and a body. The body contains a single Return node, which contains the Add operation.

Modifying the AST

How can we modify this tree and change how the code works? To illustrate, let’s do something crazy that you would never want to do in your code. We’ll traverse the tree, and replace the Add operation with a Mult operation. See, I told you it was crazy!

We’ll start by subclassing the NodeTransformer class, and defining the visit_BinOp method which is called when the transformer visits a binary operator node:

1
2
3
4
5
6
7
class CrazyTransformer(ast.NodeTransformer):

    def visit_BinOp(self, node):
        print node.__dict__
        node.op = ast.Mult()
        print node.__dict__
        return node

Now that we’ve defined our transformer which performs this unhealthy action, let’s see it run on the expression defined above:

1
2
3
>>> parser.visit(expr_ast)
{'op': <_ast.Add object at 0x226b810>, 'right': <_ast.Name object at 0x285fe10>, 'lineno': 1, 'col_offset': 28, 'left': <_ast.Name object at 0x285fdd0>}
{'op': <_ast.Mult object at 0x28eb090>, 'right': <_ast.Name object at 0x285fe10>, 'lineno': 1, 'col_offset': 28, 'left': <_ast.Name object at 0x285fdd0>}

You can see the Add node is replaced with a Mult, as shown by the modified __dict__. There are a couple of things we haven’t dealt with here, like visiting child nodes, but this is enough to illustrate the principle.

Compiling and executing the modified AST

After adding a call to our operation, i.e.:

1
print add(4, 5)

…to the end of our script, let’s see how the code evaluates:

1
2
3
4
5
6
7
>>> unmodified = ast.parse(expr)
>>> exec compile(unmodified, '<string>', 'exec')
9
>>> transformer = CrazyTransformer()
>>> modified = transformer.visit(unmodified)
>>> exec compile(modified, '<string>', 'exec')
20

As we can see, our unmodified and modified ASTs compile to code that prints 9 and 20 respectively.

Translating back to source code

Finally, we can use unparse found here to view the source code corresponding to our modified AST:

1
2
3
4
5
>>> unparse.Unparser(modified, sys.stdout)

def add(arg1, arg2):
    return (arg1 * arg2)
print add(4, 5)

As we can see, the * operator has replaced +. Unparse is useful for understanding how your AST transformer modifies code.

Practical applications

Clearly, our above example serves little practical purpose. However, static analysis and modification of source code can be extremely useful.

You could, for instance, inject code for testing purposes. See this Pycon talk for an understanding of how a node transformer can be used to inject instrumentation code for testing purposes.

In addition, the pythoscope project uses an AST visitor to process source files and generate tests from method signatures.

Projects such as pylint also use an AST walking method to analyze source code. In the case of pylint, Logilab have created a module which aims:

to provide a common base representation of python source code for projects such as pychecker, pyreverse, pylint…

See here for more information.

References

This Pycon talk by Matthew J Desmarais and this blog post by Eli Bendersky were invaluable in writing this post.