This is part of my series about writing a Python virtual machine in python.
The python source code is compiled into bytecode which is contained in code objects. In addition to the bytecode they contain information about the variables and constants contained in the code.
Code objects can be accessed most easily from functions. Accessing the code of modules is harder, because they are executed when the module is imported and the code is not needed later any more. Python functions have the attribute func_code which exposes the code object implementing the function code.
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) [GCC 4.5.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> def test(a, b=5): ... 'docstring of test' ... z = abs(a - 15) ... return z - b ... >>> test.func_code <code object test at 0xb77b8e78, file "<stdin>", line 1> >>> test.func_code.co_consts ('docstring of test', 15) >>> test.func_code.co_varnames ('a', 'b', 'z') >>> test.func_code.co_names ('abs',) >>> test.func_code.co_code 't\x00\x00|\x00\x00d\x01\x00\x18\x83\x01\x00}\x02\x00|\x02\x00|\x01\x00\x18S' >>> ' '.join('%02X' % ord(c) for c in test.func_code.co_code) '74 00 00 7C 00 00 64 01 00 18 83 01 00 7D 02 00 7C 02 00 7C 01 00 18 53'
Note that most (or all?) of this is implementation detail, so you should not rely on it to be the same in other implementations of python or even future versions of CPython. I do not intend to document the code objects completely here, I encourage you to explore. I found some quick descriptions of the fields in the documentation of the inspect module.
We can see that the constants used in the code – including the docstring – are stored in the co_consts field, the names external to the code (e.g. the global abs) in the co_names and the local names in the co_varnames attribute.
The bytecode in the co_code attribute is a byte-string which is conveniently analyzed using the dis module. The documentation of the dis module is also valuable for getting a quick glance at the various bytecodes.
>>> import dis >>> dis.dis(test.func_code) . 3 0 LOAD_GLOBAL 0 (abs) . 3 LOAD_FAST 0 (a) . 6 LOAD_CONST 1 (15) . 9 BINARY_SUBTRACT . 10 CALL_FUNCTION 1 . 13 STORE_FAST 2 (z) . . 4 16 LOAD_FAST 2 (z) . 19 LOAD_FAST 1 (b) . 22 BINARY_SUBTRACT . 23 RETURN_VALUE
For this simple test function it is pretty easy to make the connection between the bytecode and the source code. The numbers to the left (3 and 4) are the line numbers in the source code. The numbers next to the mnemonics are the offsets into the byte code. We can see that some bytecodes use one byte (e.g. BINARY_SUBSTRACT), while others use three bytes, taking a two-byte argument. The disassembler conveniently annotates the argument with the item it is referencing. For example the code at offset 6 (LOAD_CONST) takes an argument (1) which is an index into the co_consts field of the code object.
Note that the default argument (5) is not stored in the code object. Also for the global abs only the name is stored, but no reference to the abs function. These links to external objects are made in the function object and not in the code object.