This post is again about the Python bytecode interpreter I have started in the three previous posts. This time I won’t develop the interpreter any further but rather start to set up an environment for this little project.

Until now I have just had a single test case in my main script, and as you could see if you had followed the download links I the three versions of the script were just saved as three different files,,, and
This manual version control is maybe feasible as long as I have only one file and as long as I am the only one developing, but already under these conditions a finer versioning would be desirable.

I could use a local version control system, but as I want to publish the code anyway, a distributed, public version control would be the better suited tool. I am new to distributed version control systems, and have quickly looked at git and mercurial. For both there are public repositories available: GitHub and BitBucket.

I found this question on StackOverflow about the differences of git and mercurial and I think for this project it does not matter much which system I will use. I went for git / GitHub because I think it can be lower level and if want to try out more complicated things later, I would already have this as a real example. But for the moment I will probably use only basic features.

I have created leovt/interp on GitHub and have already checked in the files from the previous posts.
Each version posted previously is now a tag in the GitHub repository.

All I needed to do for setting up this repository is clearly explained on the GitHub website.


Lets extend the interpreter with the capability to call into a function. Unlike the builtin function range in the last post, I would like also the called function to be interpreted in my interpreter and not just being executed in the host Python.

def test():
    return square(4) + square(3)

def square(n):
    return n * n

In order to simplify the testing I do not want to adapt the globals directory every time I change the test function. Fortunately every function keeps a reference to the global environment it is defined in, so I will just tell the execute function to use the appropriate globals of the test function.

print 'own execute function:', execute(test.func_code, test.func_globals)

The called function will need its own space to run in, e.g. to store the local variables. Also the program counter for the called function must not overwrite the program counter in the calling function. Now it comes handy that all the execution state is stored in a frame object: we can simply create a clean new frame for the called function.
Then we simply point our variable f, holding the executed frame to the newly created frame for transferring control to the called function.

    elif type(callee) is types.FunctionType:
        subframe = Frame(callee.func_code, callee.func_globals)
        subframe.locals[:arg] = call_args
        f = subframe

Note that the functions arguments are just copied into the locals of the called function. In fact (ignoring any * or ** magic) the functions arguments are always the first local variables.

Lets try to execute the test:

normal Python call: 25
own execute function: 16

What happened? Apparently we see only the result of the first function call square(4). The interpreter has just quit too early – upon seeing the first return statement.

Of course when we are calling into functions we also need to transfer control back to the calling function when the called function returns. At the moment we just lose the calling frame when we transfer control to the subframe. For being able to return control to the caller we first need to keep the calling frame. We will give the frame object a reference to the caller.

class Frame(object):
    def __init__(self, code, globals, caller):
        self.locals = [None] * code.co_nlocals
        self.PC = 0
        self.stack = []
        self.globals = globals
        self.code = code
        self.caller = caller

The first function executed by our interpreter is not called from inside the interpreter, so for this first frame we set the caller to None.

f = Frame(code, globals, None)

In the CALL_FUNCTION implementation we keep the reference to the calling frame and we also update the calling frames program counter to point to the next instruction before transferring control, so everything will be ready when we decide to return.

    elif type(callee) is types.FunctionType:
        subframe = Frame(callee.func_code, callee.func_globals, f)
        subframe.locals[:arg] = call_args
        f.PC += 3
        f = subframe

Now everything is ready for the return statement to be implemented in the RETURN_VALUE bytecode. For this we check if we have a calling frame and if so we push the returned value on the calling frame and transfer control back.

elif bc==RETURN_VALUE:
    if f.caller is None:
        return f.stack.pop()
        ret = f.stack.pop()
        f = f.caller

And finally we get the desired result.

normal Python call: 25
own execute function: 25

You can download the code of the updated interpreter.

Edit: GitHub
I have created a GitHub repository for this project. See the revision for this post on GitHub.

A note on the value stack:
In my implementation each frame has got its own stack, so a called function has no chance of corrupting the callers stack. If the called function guarantees not to alter the bottom of the stack and also to pop off all values it pushed, then the same stack could be shared between the caller and the callee.
In this hobby / educational implementation I prefer the seperate approach as it is probably more robust and easier to debug.

I continue building my interpreter by making it capable of running a test function which uses some new features.

def test():
    for i in range(10):
        print i

This little test function makes it necessary to support four new concepts:

  • global variables
  • calling a built-in function
  • the for loop
  • printing values

Global Variables

So far the code only used local variables. They are only needed during execution of the code object and are stored in the execution frame. In Python global (or more exactly module-level) variables are stored in a dictionary (you can see this dictionary using the globals() built-in). So before we can load global variables into our value stack, we need to keep a reference to the relevant globals dictionary. Therefore I extend the frame object to hold this reference.

Also we need to somehow initialize the globals of our execution environment, therefore the execute function gets a new parameter.

class Frame(object):
    def __init__(self, code, globals):
        self.locals = [None] * code.co_nlocals
        self.PC = 0
        self.stack = []
        self.globals = globals
        self.code = code

def execute(code, globals):
    f = Frame(code, globals)

my_globals = {'range': range}
print 'own execute function:', execute(test.func_code, my_globals)

You can see that I initialized the globals with the minimum needed to execute our test function, the reference to the built-in range.

The access to the globals is made by the LOAD_GLOBAL bytecode, whose argument is an index into the co_names field which contains the names of the globals used in the function.

        elif bc==LOAD_GLOBAL:
            arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
            f.PC += 3

Calling a Built-in Function

The relevant bytecode CALL_FUNCTION is going to be quite complex to implement.
For the moment I implement only the simplest case: calling a built-in function without any
* or ** magic arguments.

For a function call the stack need to look like this: The function arguments lie on top of the function object. The first (leftmost) argument is lowest, the rightmost function argument is on the top of the stack. The byte at PC+1 contains the number of arguments used in the function call.

So first we pop the function arguments off the stack, pop off the function itself and
push the result of the function call (which happens outside of our execution environment) back on the stack.

    arg = ord(f.code.co_code[f.PC+1])

    call_args = [None] * arg
    for i in range(arg-1, -1, -1):
        call_args[i] = f.stack.pop()

    callee = f.stack.pop()
    if type(callee) is types.BuiltinFunctionType:
        f.PC += 3

The For Loop

The for loop in python uses the iterator protocol. Essentially it works like this: for each loop try to get the next value of the iterator. Once the iterator is exhausted it raises the StopIteration exception. The for loop swallows this exception and continues with the next statement after the loop.
At the end of the loop body a jump is made back to the head which will then continue with the next iteration or end the loop.

When you look at the bytecode yo will see the SETUP_LOOP and the POP_BLOCK instructions. I have not yet figured what they do exactly, and in this simple example they are not needed, so I implement them as no-operations.

elif bc==SETUP_LOOP:
    f.PC += 3
elif bc==POP_BLOCK:
    f.PC += 1

The jump at the end of the loop is the simplest jump possible, an unconditional absolute jump.
It will just set the next instruction to the address provided as an argument to the bytecode.

    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.PC = arg

The for loop in python needs an iterator. Python gets an iterator for the object provided to the for loop by the GET_ITER bytecode. Now i implement it with the built_in iter() function of the host python, later I may want to make my implementation more independent and use __magic__ methods instead.

elif bc==GET_ITER:
    f.PC += 1

The most interesting bytecode for this post is finally the FOR_ITER. It can either push the next value of the iterator or jump to the end of the loop. Note that the jump is relative to the bytecode after FOR_ITER.

elif bc==FOR_ITER:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
        nx = next(f.stack[-1])
    except StopIteration:
        f.PC += arg + 3
        f.PC += 3


Python prints the top of stack using the PRINT_VALUE bytecode, and the final newline using the PRINT_NEWLINE code. Again I simply use the functionality of the host Python to implement these two bytecodes.

elif bc==PRINT_ITEM:
    print f.stack.pop(),
    f.PC += 1
    f.PC += 1


As I wrote in the introduction post, I took a lot of advantage of the host Python in which I implement my own Python interpreter.

You can download the source code of the extended interpreter.

Edit: GitHub
I have created a GitHub repository for this project. See the revision for this post on GitHub.

In this post I am going to write a byte-code interpreter capable of executing the bytecode of this simple function.
I am using Python 2.7.1, the bytecode can be different for different versions of Python. The aim is not to exactly reproduce the way CPython works in every detail, but to understand the concepts.

def test():
    a = 2
    b = a + 4
    return (a + 1) * (b - 2)

CPython uses a stack-based bytecode. This means that inputs and intermediary results for evaluating an expression are pushed on a stack. Most operations in the bytecode will operate on the top one or two items on the stack. In my last post i demonstrated how you can look at the bytecode of test().

The bytecode for the first line looks like this: 0x64 0x0100 0x7D 0x0000
Which is interpreted as

LOAD_CONST(1) # push(code.co_consts[1])
STORE_FAST(0) # locals[0] = pop()

The interpreter will need to process each bytecode (line in the disassembly) sequentially. It will need to keep track of the current instruction to be executed. This is done using a variable which keeps the index of the current instruction in the code. I call this index PC for program counter.

Then it will also need to keep track of the values of all local variables. The code object has the field co_nlocals which tells us how many local variables we need to reserve space for. So far the names of the local variables do not matter, the locals are accessed by a number in the bytecode.

And of course the interpreter needs the stack where all the calculations and shoveling around of values happen. For implementing the stack I will use an ordinary list, .append() for pushing an item, [-1] for accessing the top of stack and .pop() for popping the top of stack.

Even though this is not needed now, I define a frame object to hold all these objects defining the state of the interpreter, as well as a reference to the code object being executed. This will prove useful when we will be implementing function calls in the future. Right now all it does is initialize the PC, reserve space for the locals, and create an empty stack.

class Frame(object):
    def __init__(self, code):
        self.locals = [None] * code.co_nlocals
        self.PC = 0
        self.stack = []
        self.code = code

The core part of the interpreter is a loop executing one bytecode after another. At the beginning of each loop the variable bc is assigned the next opcode to be performed, i.e. the code at the position PC. Note that I use the frame object to hold all the state of the interpreter. In the complete source you can see that I obtain the numeric values for the various opcodes from the dis module.

def execute(code):
    f = Frame(code)

    while True:
        bc = ord(f.code.co_code[f.PC])
        if bc==LOAD_FAST:
            # [...]
        elif bc==LOAD_CONST:
            # [...]
            raise Exception('Unknown Opcode %d (%s)' % (bc, dis.opname[bc]))

An exception is raised in case an opcode is encountered, that I did not implement. That way it is easy to see what is still missing when applying the interpreter to more and more complex test functions.

Let’s now look into the implementation of each opcode:

LOAD_FAST and LOAD_CONST are pretty similar. Both take a two-byte argument stored after the opcode at positions PC+1 and PC+2. The arguments is an index into either the locals (stored in the frame) or the constants (stored in the code object). The value is pushed onto the stack and then the program counter is advanced to the next opcode. Both instructions are three bytes wide.

if bc==LOAD_FAST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.PC += 3

elif bc==LOAD_CONST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.PC += 3

Writing to a local variable is also pretty similar. The value written is popped off the stack.

elif bc==STORE_FAST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.locals[arg] = f.stack.pop()
    f.PC += 3

The binary operations pop two items off the stack and push the result. Be careful about the order in which the operands are expected on the stack. Because these instructions take no argument, the PC needs only to be incremented by one. (I only show subtraction, addition and multiplication work analogously.)

    b = f.stack.pop()
    a = f.stack.pop()
    f.PC += 1

Finally the result of the function needs to be returned to the caller. The RETURN_VALUE instruction expects the value to be returned on the top of the stack. So far there is only one function, the top-level test(). Therefore we can halt the interpreter when reaching the return instruction. The value returned from the test function is returned to the caller of the interpreter in the “normal” python world.

elif bc==RETURN_VALUE:
    return f.stack.pop()

In the future the interpreter is going to implement function calls and we will have to revise the implementation of RETURN_VALUE as it will no longer be the end of execution.

That’s it! Lets test our implementation, both the CPython interpreter on the first line and the newly developed execute() need to give the same result: 12

print 'normal Python call:', test()
print 'own execute function:', execute(test.func_code)

Download the source code developped in this post.

Edit: GitHub
I have created a GitHub repository for this project. See the revision for this post on GitHub.

This is part of my series about writing a Python virtual machine in python.

The python source code is compiled into bytecode which is contained in code objects. In addition to the bytecode they contain information about the variables and constants contained in the code.

Code objects can be accessed most easily from functions. Accessing the code of modules is harder, because they are executed when the module is imported and the code is not needed later any more. Python functions have the attribute func_code which exposes the code object implementing the function code.

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def test(a, b=5):
...     'docstring of test'
...     z = abs(a - 15)
...     return z - b
>>> test.func_code
<code object test at 0xb77b8e78, file "<stdin>", line 1>
>>> test.func_code.co_consts
('docstring of test', 15)
>>> test.func_code.co_varnames
('a', 'b', 'z')
>>> test.func_code.co_names
>>> test.func_code.co_code
>>> ' '.join('%02X' % ord(c) for c in test.func_code.co_code)
'74 00 00 7C 00 00 64 01 00 18 83 01 00 7D 02 00 7C 02 00 7C 01 00 18 53'

Note that most (or all?) of this is implementation detail, so you should not rely on it to be the same in other implementations of python or even future versions of CPython. I do not intend to document the code objects completely here, I encourage you to explore. I found some quick descriptions of the fields in the documentation of the inspect module.

We can see that the constants used in the code – including the docstring – are stored in the co_consts field, the names external to the code (e.g. the global abs) in the co_names and the local names in the co_varnames attribute.

The bytecode in the co_code attribute is a byte-string which is conveniently analyzed using the dis module. The documentation of the dis module is also valuable for getting a quick glance at the various bytecodes.

>>> import dis
>>> dis.dis(test.func_code)
.  3           0 LOAD_GLOBAL              0 (abs)
.              3 LOAD_FAST                0 (a)
.              6 LOAD_CONST               1 (15)
.              9 BINARY_SUBTRACT
.             10 CALL_FUNCTION            1
.             13 STORE_FAST               2 (z)
.  4          16 LOAD_FAST                2 (z)
.             19 LOAD_FAST                1 (b)
.             22 BINARY_SUBTRACT
.             23 RETURN_VALUE

For this simple test function it is pretty easy to make the connection between the bytecode and the source code. The numbers to the left (3 and 4) are the line numbers in the source code. The numbers next to the mnemonics are the offsets into the byte code. We can see that some bytecodes use one byte (e.g. BINARY_SUBSTRACT), while others use three bytes, taking a two-byte argument. The disassembler conveniently annotates the argument with the item it is referencing. For example the code at offset 6 (LOAD_CONST) takes an argument (1) which is an index into the co_consts field of the code object.

Note that the default argument (5) is not stored in the code object. Also for the global abs only the name is stored, but no reference to the abs function. These links to external objects are made in the function object and not in the code object.

In a first series of posts I would like to explore how python works behind the scenes. I am going to re build some parts of Python in Python itself. My aim is not to reproduce PyPy, a project that is implementing a production grade Python interpreter in Python. Rather I think that by rewriting – probably simplified – parts of the interpreter, I can understand them better.

There are two main advantages of using Python for implementing parts of Python. One is that I am using a language that I am familiar with and like to program in. The second but very important advantage is that i can use Python to do all the stuff I have not (yet) rewritten. As we will see Python exposes a lot of its internals readily accessible from within Python programs.

These are the major parts of the Python interpreter.

Components of the Python Interpreter

  1. A  compiler which produces bytecode, constant objects, lists of variable names and more from Python source code.
  2. An object model and built-in types, modules and functions.
  3. The interpreter itself, also known as a virtual machine, executing the bytecode and thereby modifying objects.

In my next posts I am going to write the third part, the virtual machine. The aim is to get it to run simple Python programs, and to understand how a virtual machine can be built.