Lets get started

In this post I am going to write a byte-code interpreter capable of executing the bytecode of this simple function.
I am using Python 2.7.1, the bytecode can be different for different versions of Python. The aim is not to exactly reproduce the way CPython works in every detail, but to understand the concepts.

def test():
    a = 2
    b = a + 4
    return (a + 1) * (b - 2)

CPython uses a stack-based bytecode. This means that inputs and intermediary results for evaluating an expression are pushed on a stack. Most operations in the bytecode will operate on the top one or two items on the stack. In my last post i demonstrated how you can look at the bytecode of test().

The bytecode for the first line looks like this: 0x64 0x0100 0x7D 0x0000
Which is interpreted as

LOAD_CONST(1) # push(code.co_consts[1])
STORE_FAST(0) # locals[0] = pop()

The interpreter will need to process each bytecode (line in the disassembly) sequentially. It will need to keep track of the current instruction to be executed. This is done using a variable which keeps the index of the current instruction in the code. I call this index PC for program counter.

Then it will also need to keep track of the values of all local variables. The code object has the field co_nlocals which tells us how many local variables we need to reserve space for. So far the names of the local variables do not matter, the locals are accessed by a number in the bytecode.

And of course the interpreter needs the stack where all the calculations and shoveling around of values happen. For implementing the stack I will use an ordinary list, .append() for pushing an item, [-1] for accessing the top of stack and .pop() for popping the top of stack.

Even though this is not needed now, I define a frame object to hold all these objects defining the state of the interpreter, as well as a reference to the code object being executed. This will prove useful when we will be implementing function calls in the future. Right now all it does is initialize the PC, reserve space for the locals, and create an empty stack.

class Frame(object):
    def __init__(self, code):
        self.locals = [None] * code.co_nlocals
        self.PC = 0
        self.stack = []
        self.code = code

The core part of the interpreter is a loop executing one bytecode after another. At the beginning of each loop the variable bc is assigned the next opcode to be performed, i.e. the code at the position PC. Note that I use the frame object to hold all the state of the interpreter. In the complete source you can see that I obtain the numeric values for the various opcodes from the dis module.

def execute(code):
    f = Frame(code)

    while True:
        bc = ord(f.code.co_code[f.PC])
        if bc==LOAD_FAST:
            # [...]
        elif bc==LOAD_CONST:
            # [...]
            raise Exception('Unknown Opcode %d (%s)' % (bc, dis.opname[bc]))

An exception is raised in case an opcode is encountered, that I did not implement. That way it is easy to see what is still missing when applying the interpreter to more and more complex test functions.

Let’s now look into the implementation of each opcode:

LOAD_FAST and LOAD_CONST are pretty similar. Both take a two-byte argument stored after the opcode at positions PC+1 and PC+2. The arguments is an index into either the locals (stored in the frame) or the constants (stored in the code object). The value is pushed onto the stack and then the program counter is advanced to the next opcode. Both instructions are three bytes wide.

if bc==LOAD_FAST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.PC += 3

elif bc==LOAD_CONST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.PC += 3

Writing to a local variable is also pretty similar. The value written is popped off the stack.

elif bc==STORE_FAST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.locals[arg] = f.stack.pop()
    f.PC += 3

The binary operations pop two items off the stack and push the result. Be careful about the order in which the operands are expected on the stack. Because these instructions take no argument, the PC needs only to be incremented by one. (I only show subtraction, addition and multiplication work analogously.)

    b = f.stack.pop()
    a = f.stack.pop()
    f.PC += 1

Finally the result of the function needs to be returned to the caller. The RETURN_VALUE instruction expects the value to be returned on the top of the stack. So far there is only one function, the top-level test(). Therefore we can halt the interpreter when reaching the return instruction. The value returned from the test function is returned to the caller of the interpreter in the “normal” python world.

elif bc==RETURN_VALUE:
    return f.stack.pop()

In the future the interpreter is going to implement function calls and we will have to revise the implementation of RETURN_VALUE as it will no longer be the end of execution.

That’s it! Lets test our implementation, both the CPython interpreter on the first line and the newly developed execute() need to give the same result: 12

print 'normal Python call:', test()
print 'own execute function:', execute(test.func_code)

Download the source code developped in this post.

Edit: GitHub
I have created a GitHub repository for this project. See the revision for this post on GitHub.

  1. Roger said:

    You may be interested in a project I did named Java Mini Python – http://code.google.com/p/java-mini-python/

    The actual interpreter is implemented in Java, A Python program uses the ast module to turn Python source into custom bytecode which makes things quite interesting. You can see a sample dump at http://doc.java-mini-python.googlecode.com/hg/jmp-compile.html#dumping

    Fun things are issues like how to store line number information as compactly as possible. Another challenge was that Python’s evaluation ordering isn’t as consistent as you think. For example consider code like “a()[b()]”, “{a():b()}” and other permutations – does a() or b() get evaluated first? I had to write lots of little programs to find out.

    The ast module is a lot of fun though.

  2. leovt said:

    Thank you for the pointers. The generation of custom bytecode seems very interesting. I am going to have a closer look on your bytecode interpreter in MiniPython.java. What I am going to take from your inspiration is to move the reading of bytecode arguments and advancing of PC to the start of the loop and not repeat it in every bytecode case.

  3. Roger said:

    Making your own bytecode is a lot of fun, and very easy when using the ast module. My compiler is effectively just using the ast.NodeVisitor and then generating code in each construct handled.

    The only drawback of advancing the PC at the top of the loop is that it is then off by one when later examined, except it may be off by more if there were arguments. You can’t reliably work out what the prior PC was by scanning backwards – you have to start from the beginning of the bytecode. That value is needed for correct line numbers. What Python used to do instead was have an opcode that set the current line number, but this is very inefficient as you don’t need to know the line number most of the time.

    What I am currently grappling with is how to do profiling for test coverage of user code. There is no need to track every instruction executed, only basic blocks, so it is only the beginning of each basic block that has to be marked. The easiest solution is to recompile the code with a suitable flag, and that then inserts a profile instruction at the beginning of each basic block (this can easily be done after normal code generation). But that then means you are running different code than you do in production, and have to arrange for it to be executed instead of production code. There are also some tricky issues when using the coverage run data, especially if it doesn’t match the source due to different compilation flags or code changes. Another approach is to add code to the interpreter such that turning on profiling means it goes and swaps out the instruction at the beginning of each basic block into a lookaside putting in an TRACE opcode. TRACE would then update the profiling data, find the original instruction from the lookaside and then execute that. Unfortunately it is very difficult to do that as Java doesn’t have goto. This will also make “mini” Python larger – something that is not desirable.

    I’m also going to have revisit how to map PC locations to line numbers. The naive approach results in 4 bytes per line of code in a table which is too large.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: