Looping and more

I continue building my interpreter by making it capable of running a test function which uses some new features.

def test():
    for i in range(10):
        print i

This little test function makes it necessary to support four new concepts:

  • global variables
  • calling a built-in function
  • the for loop
  • printing values

Global Variables

So far the code only used local variables. They are only needed during execution of the code object and are stored in the execution frame. In Python global (or more exactly module-level) variables are stored in a dictionary (you can see this dictionary using the globals() built-in). So before we can load global variables into our value stack, we need to keep a reference to the relevant globals dictionary. Therefore I extend the frame object to hold this reference.

Also we need to somehow initialize the globals of our execution environment, therefore the execute function gets a new parameter.

class Frame(object):
    def __init__(self, code, globals):
        self.locals = [None] * code.co_nlocals
        self.PC = 0
        self.stack = []
        self.globals = globals
        self.code = code

def execute(code, globals):
    f = Frame(code, globals)
    ...

my_globals = {'range': range}
print 'own execute function:', execute(test.func_code, my_globals)

You can see that I initialized the globals with the minimum needed to execute our test function, the reference to the built-in range.

The access to the globals is made by the LOAD_GLOBAL bytecode, whose argument is an index into the co_names field which contains the names of the globals used in the function.

        elif bc==LOAD_GLOBAL:
            arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
            f.stack.append(f.globals[f.code.co_names[arg]])
            f.PC += 3

Calling a Built-in Function

The relevant bytecode CALL_FUNCTION is going to be quite complex to implement.
For the moment I implement only the simplest case: calling a built-in function without any
* or ** magic arguments.

For a function call the stack need to look like this: The function arguments lie on top of the function object. The first (leftmost) argument is lowest, the rightmost function argument is on the top of the stack. The byte at PC+1 contains the number of arguments used in the function call.

So first we pop the function arguments off the stack, pop off the function itself and
push the result of the function call (which happens outside of our execution environment) back on the stack.

elif bc==CALL_FUNCTION:
    arg = ord(f.code.co_code[f.PC+1])

    call_args = [None] * arg
    for i in range(arg-1, -1, -1):
        call_args[i] = f.stack.pop()

    callee = f.stack.pop()
    if type(callee) is types.BuiltinFunctionType:
        f.stack.append(callee(*call_args))
        f.PC += 3

The For Loop

The for loop in python uses the iterator protocol. Essentially it works like this: for each loop try to get the next value of the iterator. Once the iterator is exhausted it raises the StopIteration exception. The for loop swallows this exception and continues with the next statement after the loop.
At the end of the loop body a jump is made back to the head which will then continue with the next iteration or end the loop.

When you look at the bytecode yo will see the SETUP_LOOP and the POP_BLOCK instructions. I have not yet figured what they do exactly, and in this simple example they are not needed, so I implement them as no-operations.

elif bc==SETUP_LOOP:
    f.PC += 3
elif bc==POP_BLOCK:
    f.PC += 1

The jump at the end of the loop is the simplest jump possible, an unconditional absolute jump.
It will just set the next instruction to the address provided as an argument to the bytecode.

elif bc==JUMP_ABSOLUTE:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.PC = arg

The for loop in python needs an iterator. Python gets an iterator for the object provided to the for loop by the GET_ITER bytecode. Now i implement it with the built_in iter() function of the host python, later I may want to make my implementation more independent and use __magic__ methods instead.

elif bc==GET_ITER:
    f.stack.append(iter(f.stack.pop()))
    f.PC += 1

The most interesting bytecode for this post is finally the FOR_ITER. It can either push the next value of the iterator or jump to the end of the loop. Note that the jump is relative to the bytecode after FOR_ITER.

elif bc==FOR_ITER:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    try:
        nx = next(f.stack[-1])
    except StopIteration:
        f.stack.pop()
        f.PC += arg + 3
    else:
        f.stack.append(nx)
        f.PC += 3

Printing

Python prints the top of stack using the PRINT_VALUE bytecode, and the final newline using the PRINT_NEWLINE code. Again I simply use the functionality of the host Python to implement these two bytecodes.

elif bc==PRINT_ITEM:
    print f.stack.pop(),
    f.PC += 1
elif bc==PRINT_NEWLINE:
    print
    f.PC += 1

Conclusion

As I wrote in the introduction post, I took a lot of advantage of the host Python in which I implement my own Python interpreter.

You can download the source code of the extended interpreter.

Edit: GitHub
I have created a GitHub repository for this project. See the revision for this post on GitHub.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: