Table of contents
Open Table of contents
Scope Lookups
To understand what’s going on here, we need to take a look at scopes.
Let’s start with a simple question, if I’m in a python function, and I
encounter something named open
, how does python go about figuring out
the value of open
?
# <GLOBAL: bunch of code here>
def myfunc():
# <LOCAL: bunch of code here>
with open('foo.txt', 'w') as f:
pass
The short answer is that without knowing the contents of the GLOBAL and
the LOCAL section, you can’t know for certain the value of open
.
Conceptually, python checks three namespaces for a name (ignoring nested
scopes to keep things simple):
- locals
- globals
- builtin
So in the myfunc
function, if we’re trying to find a value for
open
, we’ll first check the local namespace, then the globals
namespace, then the builtins namespace. And if open
is not defined in
any namespace, a NameError
is raised.
Scope Lookups, the Implementation
The lookup process above is just conceptual. The implementation of this lookup process gives us room to exploit the implementation.
def foo():
a = 1
return a
def bar():
return a
def baz(a=1):
return a
Let’s look at the bytecode of each function:
>>> import dis
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (a)
3 6 LOAD_FAST 0 (a)
9 RETURN_VALUE
>>> dis.dis(bar)
2 0 LOAD_GLOBAL 0 (a)
3 RETURN_VALUE
>>> dis.dis(baz)
2 0 LOAD_FAST 0 (a)
3 RETURN_VALUE
Look at the differences between foo and bar. Right away we can see that
at the bytecode level python has already determined what’s a local
variable and what is not because foo
is using LOAD_FAST
and bar
is
using LOAD_GLOBAL
.
We won’t get into the details of how python’s compiler knows when to emit which bytecode (perhaps that’s another post), but suffice to say python knows which type of lookup it needs to perform when it executes a function.
One other thing that can be confusing is that LOAD_GLOBAL
is used for
lookups in the global as well as the builtin namespace. You can think of
this as “not local”, again ignoring the issue of nested scopes. The C
code for this is roughly1:
case LOAD_GLOBAL:
v = PyObject_GetItem(f->f_globals, name);
if (v == NULL) {
v = PyObject_GetItem(f->f_builtins, name);
if (v == NULL) {
if (PyErr_ExceptionMatches(PyExc_KeyError))
format_exc_check_arg(
PyExc_NameError,
NAME_ERROR_MSG, name);
goto error;
}
}
PUSH(v);
Even if you’ve never seen any of the C code for CPython, the above code
is pretty straightforward. First, check if the key name we’re looking
for is in f->f_globals
(the globals dict), then check if the name is
in f->f_builtins
(the builtins dict), and finally, raise a NameError
if both checks failed.
Binding Constants to the Local Scope
Now when we look at the initial code sample, we can see that the last
argument is binding a function into the local scope of a function. It
does this by assigning a value, dict.__setitem__
, as the default value
of an argument. Here’s another example:
def not_list_or_dict(value):
return not (isinstance(value, dict) or isinstance(value, list))
def not_list_or_dict(value, _isinstance=isinstance, _dict=dict, _list=list):
return not (_isinstance(value, _dict) or _isinstance(value, _list))
We’re doing the same thing here, binding what would normally be objects
that are in the builtin namespace into the local namespace instead. So
instead of requiring the use of LOAD_GLOBAL
(a global lookup), python
instead will use LOCAL_FAST
. So how much faster is this? Let’s do
some crude testing:
$ python -m timeit -s 'def not_list_or_dict(value): return not (isinstance(value, dict) or isinstance(value, list))' 'not_list_or_dict(50)'
1000000 loops, best of 3: 0.48 usec per loop
$ python -m timeit -s 'def not_list_or_dict(value, _isinstance=isinstance, _dict=dict, _list=list): return not (_isinstance(value, _dict) or _isinstance(value, _list))' 'not_list_or_dict(50)'
1000000 loops, best of 3: 0.423 usec per loop
Or in other words, that’s an 11.9% improvement2. That’s way more than the 5% I promised at the beginning of this post!
There’s More to the Story
It’s reasonable to think that the speed improvment is because
LOAD_FAST
reads from the local namespace whereas LOAD_GLOBAL
will
first check the global namespace before falling back to checking the
builtin namespace. And in the example function above, isinstance
,
dict
, and list
all come from the built in namespace.
However, there’s more going on. Not only are we able to skip additional
lookup with LOAD_FAST
, it’s also a different type of lookup.
The C code snippet above showed the code for LOAD_GLOBAL
, but here’s
the code for LOAD_FAST
:
case LOAD_FAST:
PyObject *value = fastlocal[oparg];
if (value == NULL) {
format_exc_check_arg(PyExc_UnboundLocalError,
UNBOUNDLOCAL_ERROR_MSG,
PyTuple_GetItem(co->co_varnames, oparg));
goto error;
}
Py_INCREF(value);
PUSH(value);
FAST_DISPATCH();
We’re retrieving the local value by indexing into an array. It’s not
shown here, but oparg
is just an index into that array.
Now it’s starting to make sense. In our first version
not_list_or_dict
had to perform 4 lookups, and each name was in the
builtins namespace which we only look at after looking in the globals
namespace. That’s 8 dictionary key lookups. Compare that to directly
indexing into a C array 4 times, which is what happens in the second
version of not_list_or_dict
, which all use LOAD_FAST
under the hood.
This is why lookups in the local namespace are faster.
Wrapping Up
Now the next time you see this in someone else’s code you’ll know what’s going on.
And one final thing. Please don’t actually do these kinds of optimizations unless you really need to. And most of the time you don’t need to. But when the time really comes, and you really need to squeeze out every last bit of performance, you’ll have this in your back pocket.
Footnotes
Footnotes
-
Though keep in mind that I removed some performance optimizations in the above code to make it simpler to read. The real code is slightly more complicated. ↩
-
On a toy example for a function that doesn’t really do anything interesting nor does it perform any IO and is mostly bound by the python VM loop. ↩