I'm going to show you how a micro optimization can speed up your python code by a whopping 5%. 5%! It can also annoy anyone that has to maintain your code.

But really, this is about explaining code might you see occasionally see in the standard library or in other people's code. Let's take an example from the standard library, specifically the collections.OrderedDict class:

def __setitem__(self, key, value, dict_setitem=dict.__setitem__):
    if key not in self:
        root = self.__root
        last = root[0]
        last[1] = root[0] = self.__map[key] = [last, root, key]
    return dict_setitem(self, key, value)

Notice the last arg: dict_setitem=dict.__setitem__. It makes sense if you think about it. To associate a key with a value, you'll need to provide a __setitem__ method which takes three arguments: the key you're setting, the value associated with the key, and the __setitem__ class method to the built in dict class. Wait. Ok maybe the last argument makes no sense.

Scope Lookups

To understand what's going on here, we need to take a look at scopes. Let's start with a simple question, if I'm in a python function, and I encounter something named open, how does python go about figuring out the value of open?

# <GLOBAL: bunch of code here>

def myfunc():
    # <LOCAL: bunch of code here>
    with open('foo.txt', 'w') as f:

The short answer is that without knowing the contents of the GLOBAL and the LOCAL section, you can't know for certain the value of open. Conceptually, python checks three namespaces for a name (ignoring nested scopes to keep things simple):

  • locals
  • globals
  • builtin

So in the myfunc function, if we're trying to find a value for open, we'll first check the local namespace, then the globals namespace, then the builtins namespace. And if open is not defined in any namespace, a NameError is raised.

Scope Lookups, the Implementation

The lookup process above is just conceptual. The implementation of this lookup process gives us room to exploit the implementation.

def foo():
    a = 1
    return a

def bar():
    return a

def baz(a=1):
    return a

Let's look at the bytecode of each function:

>>> import dis
>>> dis.dis(foo)
  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (a)

  3           6 LOAD_FAST                0 (a)
              9 RETURN_VALUE

>>> dis.dis(bar)
  2           0 LOAD_GLOBAL              0 (a)
              3 RETURN_VALUE

>>> dis.dis(baz)
  2           0 LOAD_FAST                0 (a)
              3 RETURN_VALUE

Look at the differences between foo and bar. Right away we can see that at the bytecode level python has already determined what's a local variable and what is not because foo is using LOAD_FAST and bar is using LOAD_GLOBAL.

We won't get into the details of how python's compiler knows when to emit which bytecode (perhaps that's another post), but suffice to say python knows which type of lookup it needs to perform when it executes a function.

One other thing that can be confusing is that LOAD_GLOBAL is used for lookups in the global as well as the builtin namespace. You can think of this as "not local", again ignoring the issue of nested scopes. The C code for this is roughly [1]:

    v = PyObject_GetItem(f->f_globals, name);
    if (v == NULL) {
        v = PyObject_GetItem(f->f_builtins, name);
        if (v == NULL) {
            if (PyErr_ExceptionMatches(PyExc_KeyError))
                            NAME_ERROR_MSG, name);
            goto error;

Even if you've never seen any of the C code for CPython, the above code is pretty straightforward. First, check if the key name we're looking for is in f->f_globals (the globals dict), then check if the name is in f->f_builtins (the builtins dict), and finally, raise a NameError if both checks failed.

Binding Constants to the Local Scope

Now when we look at the initial code sample, we can see that the last argument is binding a function into the local scope of a function. It does this by assigning a value, dict.__setitem__, as the default value of an argument. Here's another example:

def not_list_or_dict(value):
  return not (isinstance(value, dict) or isinstance(value, list))

def not_list_or_dict(value, _isinstance=isinstance, _dict=dict, _list=list):
  return not (_isinstance(value, _dict) or _isinstance(value, _list))

We're doing the same thing here, binding what would normally be objects that are in the builtin namespace into the local namespace instead. So instead of requiring the use of LOAD_GLOBAL (a global lookup), python instead will use LOCAL_FAST. So how much faster is this? Let's do some crude testing:

$ python -m timeit -s 'def not_list_or_dict(value): return not (isinstance(value, dict) or isinstance(value, list))' 'not_list_or_dict(50)'
1000000 loops, best of 3: 0.48 usec per loop
$ python -m timeit -s 'def not_list_or_dict(value, _isinstance=isinstance, _dict=dict, _list=list): return not (_isinstance(value, _dict) or _isinstance(value, _list))' 'not_list_or_dict(50)'
1000000 loops, best of 3: 0.423 usec per loop

Or in other words, that's an 11.9% improvement [2]. That's way more than the 5% I promised at the beginning of this post!

There's More to the Story

It's reasonable to think that the speed improvment is because LOAD_FAST reads from the local namespace whereas LOAD_GLOBAL will first check the global namespace before falling back to checking the builtin namespace. And in the example function above, isinstance, dict, and list all come from the built in namespace.

However, there's more going on. Not only are we able to skip additional lookup with LOAD_FAST, it's also a different type of lookup.

The C code snippet above showed the code for LOAD_GLOBAL, but here's the code for LOAD_FAST:

    PyObject *value = fastlocal[oparg];
    if (value == NULL) {
                             PyTuple_GetItem(co->co_varnames, oparg));
        goto error;

We're retrieving the local value by indexing into an array. It's not shown here, but oparg is just an index into that array.

Now it's starting to make sense. In our first version not_list_or_dict had to perform 4 lookups, and each name was in the builtins namespace which we only look at after looking in the globals namespace. That's 8 dictionary key lookups. Compare that to directly indexing into a C array 4 times, which is what happens in the second version of not_list_or_dict, which all use LOAD_FAST under the hood. This is why lookups in the local namespace are faster.

Wrapping Up

Now the next time you see this in someone else's code you'll know what's going on.

And one final thing. Please don't actually do these kinds of optimizations unless you really need to. And most of the time you don't need to. But when the time really comes, and you really need to squeeze out every last bit of performance, you'll have this in your back pocket.


[1]Though keep in mind that I removed some performance optimizations in the above code to make it simpler to read. The real code is slightly more complicated.
[2]On a toy example for a function that doesn't really do anything interesting nor does it perform any IO and is mostly bound by the python VM loop.

JMESPath is an expression language that allows you to manipulate JSON. From selecting specific keys from a hash or only selecting keys based on certain filter criteria, JMESPath gives you a lot of power when working with JSON.

In my experience, the quickest way to get up to speed with a language is to try the language out. The JMESPath tutorial gives you a brief introduction to the language, but to really solidify the concepts you really just need to spend some time experimenting with the language.

You could accomplish this by using one of the existing JMESPath libraries, but there's an easier to way to accomplish this. You can use the JMESPath terminal. Either specify what JSON file to use or pipe the JSON document into the jmespath-terminal command.

The JMESPath terminal README has instructions on getting setup and how to use the JMESPath terminal.

Check it out, and feel free to leave any feedback and suggestions on the issue tracker.

I've just released 0.4.0 of semidbm. This represents a number of really cool features. See the full changelog for more details.

One of the biggest features is python 3 support. I was worried about not introducing a performance regression by supporting python 3. Fortunately, this was not the case.

In fact, performance increased. This was possible for a number of reasons. First, the index file and data file were combined into a single file. This means that a __setitem__ call results in only a single write() call. Also, semidbm now uses a binary format. This results in a more compact form and it's easier to create the sequence of bytes we need to write out to disk. This is also including the fact that semidbm now includes checksum data for each write that occurs.

Try it out for yourself.

What's Next?

I think at this time, semidbm has more than exceeded it's original goal, which was to be a pure python cross platform key value storage that had reasonable performance. So what's next for semidbm? In a nutshell, higher level abstractions (aka the "fun stuff"). Code that builds on the simple key value storage of semidbm.db and provides additional features. And as we get higher level, I think it makes sense to reevaluate the original goals of semidbm and whether or not it makes sense to carry those goals forward:

  • Cross platform. I'm inclined to not support windows for these higher level abstractions.
  • Pure python. I think the big reason for remaining pure python was for ease of installation. Especially on windows, pip installing a package should just work. With C extensions, this becomes much harder on windows. If semidbm isn't going to support windows for these higher level abstractions, then C extensions are fair game.

Some ideas I've been considering:

  • A C version of _Semidbm.
  • A dict like interface that is concurrent (possibly single writer multiple reader).
  • A sorted version of semidbm (supporting things like range queries).
  • Caching reads (need an efficient LRU cache).
  • Automatic background compaction of data file.
  • Batched writes
  • Transactions
  • Compression (I played around with this earlier. Zlib turned out to be too slow for the smaller sized values (~100 bytes) but it might be worth being able to configure this on a per db basis.

Now that fakeredis 0.3.0 is out I think it's a good time to discuss the finer points of fakeredis, and why you should consider using it for your redis unit testing needs.

What exactly is fakeredis? Other than the pedantic naming of "fake" instead of "mock", it is an in memory implementation of the redis client used for python. This allows you to write tests that use the redis-py client interface without having to have redis running.

Setting up redis is not hard, even compiling from source is easy; there's not even a ./configure step! But unit tests should require no configuration to run. Someone should be able to checkout/clone the repo, and be able to run your unit tests.

There's one big problem with writing fakes:

How do you know your fake implementation matches the real implementation?

Fakeredis verifies this in a simple way. First, there's unit tests for fakeredis. And for every unit test for fakeredis, there's the equivalent integration test that actually talks to a real redis server. This ensures that every single test for fakeredis has the exact same behavior as real redis. There's nothing worse than writing unit tests against a fake implementation only to find out that the real implementation is actually different!

In fakeredis, this is implemented with a factory method pattern. The fakeredis tests instantiate a fakeredis.FakeRedis class while the real redis integration tests instantiate a redis.Redis instance:

class TestFakeRedis(unittest.TestCase):
    def setUp(self):
        self.redis = self.create_redis()

    def create_redis(self, db=0):
        return fakeredis.FakeStrictRedis(db=db)

    def test_set_then_get(self):
        self.assertEqual(self.redis.set('foo', 'bar'), True)
        self.assertEqual(self.redis.get('foo'), 'bar')

class TestRealRedis(TestFakeRedis):
    def create_redis(self, db=0):
        return redis.Redis('localhost', port=6379, db=db)

Now every test written in the TestFakeRedis class will be automatically run against both a FakeRedis instance and a Redis instance, ensuring parity between the two.

This also makes it easier for contributors. If they notice an inconsistency between fakeredis and redis, they only need to write a single test and they'll have a simple repro that shows that the test passes for redis but fails against FakeRedis.

And finally test coverage. Every single implemented command in fakeredis has test cases. I only accept contributions for bug fixes/new features if they have tests. I normally don't worry about actual coverage numbers, but out of curiosity I checked what those numbers actually were:

$ coverage report fakeredis.py
Name        Stmts   Miss  Cover
fakeredis     640     19    97%

Not bad. Most of the missing lines are either unimplemented commands (pass statements counted as missing coverage) or precondition checks such as:

def zadd(self, name, *args, **kwargs):
   # ...
   if len(args) % 2 != 0:
       raise redis.RedisError("ZADD requires an equal number of "
                              "values and scores")

I do plan on going through and ensuring that all of these precondition checks have tests.

So the next time you're looking for a fake implementation of redis, consider fakeredis.

A new release of fakeredis is out. This 0.3.0 release adds:

  • Support for redis 2.6.
  • Improved support for pipelines/watch/multi/exec.
  • Full support for variadic commands.
  • Better consistency with the actual behavior of redis.

And of course, a handful of bug fixes. This release was tested against:

  • redis 2.6.4
  • redis-py 2.6.2
  • python 2.7.3, 2.6

You can install fakeredis via pip install fakeredis. Also check out: