My first (and still favorite) mock library for python has been Michael Foord's voidspace mock. It's an excellent library, and for me feels like the most pythonic take on mocking. I'd like to go over the code and examine one of the functions provided in the API, namely the patch function. It's a pretty interesting decorator whose implementation is worth looking at. But of course, first a detour on how I got to examining the implementation of patch (skip this part if you don't care and are only interested in the examination of patch):

So anyways, we've been using version 0.4.0 of the library for years now (I know, we're horribly out of date). Looking at the awesome changelog, there were so many features that looked like they would help us write tests that I made the attempt at upgrading from 0.4.0 to 0.6.0.

My hopes of 0.6.0 being a drop in replacement for 0.4.0 were quickly dispelled. There was no chance of this happening. This was frustrating, but understandable. It clearly states on the main docs page that:

The current version is 0.6.0, dated 23rd August 2009. Mock is still experimental; the API may change. If you find bugs or have suggestions for improvements / extensions then please email me.

Not to mention the pre 1.0 version numbers are a clear indicator as well that the project is not to be considered stable. So I start looking at what sort of backwards incompatible changes were made, and what it would take to update to 0.6.0. At this point, I figured I might as well sync up with svn and just use the latest version in trunk (magic method support is highly desirable for us). So most of the changes were pretty straightforward, reset() has been renamed to reset_mock(), side_effect works a little differently, but ultimately, nothing too surprising.

However...

There was one change I just couldn't wrap my head around. One change that didn't make a whole lot of sense.  The ordering of patch decorators was reversed! In other words:

@patch('a')
@patch('b')
@patch('c')
def test(self, a_mock, b_mock, c_mock): pass

now becomes:

@patch('a')
@patch('b')
@patch('c')
def test(self, c_mock, b_mock, a_mock): pass

Let's see. This didn't add/enable any new feature. There's nothing (as far as my limited knowledge goes) about the accepted and proper ordering of stacked decorators. So why the change?

Well curiosity got the best of me, so I thought maybe there's some issue associated with this. The best i could find was this issue which is someone asking for the ordering to be put back, and for the author stating to flip the order back (once again) would be even more confusing, and that this was the result of a bugfix, that "decorators were being applied incorrectly previously". Ok, I can go with that. If this was the result of a necessary bugfix, I can understand the need for a change.

So at this point, I wasn't interested in the "why the change" question, but merely "what was the bug?" Curiosity got the best of me.

My next step was to track down the commit that caused this bug. Easy right? Well, not exactly. Eventually I gave up trying to use SVN to track down the commit (or more correctly the SVN server gave up on me and kept timing out), so I imported the whole thing to git using git-svn. A few minutes with git bisect showed the commit I was after. It's commit message:

Yeah, that's right, an empty commit message. Now I'm really curious so I start looking at the diff to see if I can understand what the patch bug was and how it was fixed:

-def _patch(target, attribute, new):
+def _patch(target, attribute, new, methods, spec):

     def patcher(func):
         original = getattr(target, attribute)
         if hasattr(func, 'restore_list'):
             func.restore_list.append((target, attribute, original))
-            func.patch_list.append((target, attribute, new))
+            func.patch_list.append((target, attribute, new, methods, spec))
             return func

-        func.restore_list = [(target, attribute, original)]
-        func.patch_list = [(target, attribute, new)]
+        patch_list = [(target, attribute, new, methods, spec)]
+        restore_list = [(target, attribute, original)]

         def patched(*args, **keywargs):
-            for target, attribute, new in func.patch_list:
+            for target, attribute, new, methods, spec in patch_list:
                 if new is DEFAULT:
-                    new = Mock()
+                    new = Mock(methods=methods, spec=spec)
                     args += (new,)
                 setattr(target, attribute, new)
             try:
                 return func(*args, **keywargs)
             finally:
-                for target, attribute, original in func.restore_list:
+                for target, attribute, original in restore_list:
                     setattr(target, attribute, original)

+        patched.restore_list = restore_list
+        patched.patch_list = patch_list
         patched.__name__ = func.__name__
         patched.compat_co_firstlineno = getattr(func, "compat_co_firstlineno",
                                                 func.func_code.co_firstlineno)

Did you see it? Did you see the bug? Well neither did I at first. In fact, I stared at it for a while still not understanding what the bug was or even how this somehow reversed the order of patch decorators. So now in full detective mode, I sketched exactly what the code was doing both pre patch and post patch to fully understand how this all worked.

The Implementation of @patch

For anyone not familiar with how patch works, see the API docs. The basic idea is that whatever object I specify as an argument to patch will replaced with a mock and passed into the decorated function as an argument. After the function is run, it will replace the patched out object with it's original value. You can stack patch decorators on top of each other to patch multiple things out for the duration of the function.

This can get confusing when you think about what is going on behind the scenes in a call like this:

@patch('a')
@patch('b')
def f(a_mock, b_mock):
    pass

The first thing to do is to substitute the syntactic equivalent of the decorators without using the decorator synax:

def f(a_mock, b_mock):
    pass

f = patch('a')(patch('b')(f))

It's now a little more clear what exactly is going on. Still to be even more clear, here's the call sequence:

r1 = patch('a')
r2 = patch('b')
r3 = r2(f)
r4 = r1(r3)
f = r4

Of course, this all happens behind the scenes. This is what happens at import time (or whenever the function being decorated is encountered). The function hasn't even been invoked yet. What happens when the new f is finally invoked (in this case, r4)? Well, this is implementation specific. If we assume that each function call above returns a brand new function, then it would look something like this:

r4(r3(f))

Now this could be arbitrarily large. If we had we 10 patches, the invocation would be:

r20(r19(r18(....r11(f)...)))

Can we do any better given what we know about patch? Actually we can. What we ultimately want is for the execution of the original f to occur in an environment where all of the objects referenced in the patches above have been replaced with mocks. If every invocation of r_n first patches the object associated with itself before invoking the next r_(n-1) function, by the time the f is finally invoked, the necessary patches will have already occurred. Similarly when the function returns, if we restore the patched object before returning control to the calling function, by the time r_n returns, the original environment will have been put back.

But we don't need all that nesting. Think of an alternative implementation that accomplishes the same thing. One way to do it would be to wait until we actually invoke the original f and then patch all the objects out at once. When f returns, we can restore all the objects in one swoop as well.

This is what the implementation of _patch was trying to do. The idea was to tag the function with some metadata containing a list of objects to patch. Each application of the decoration would check for that attribute on the function, and if it existed, it would just append its own patched values to the end of the list and return the same exact function instead of creating a new function. Essentially:

r1 = patch('a')
r2 = patch('b')
r3 = r2(f)
r3 = r1(r3)
f = r3

This becomes more obvious with more patches:

r1 = patch('a')
r2 = patch('b')
r3 = patch('c')
r4 = patch('d')
r5 = r4(f)  # before returning r5: r5.metadata = ['d']
# Subsequent calls will always return r5
r3(r5) # r5.metadata.append('c') --> r5
r2(r5) # r5.metadata.append('b') --> r5
r1(r5) # r5.metadata.append('a') --> r5
f = r5

In reality there would still be an r6-r8, but I show them without return values to emphasize they are actually modifying the state of r5. So by the end of the above sequence metadata looks like:

metadata = ['d', 'c', 'b', 'a']

And you can now see how when we apply the patches in order, your argument list will map to:

def f(d_mock, c_mock, b_mock, a_mock): pass

Even though this is what we'd like it to do, this was not what the original implementation was doing. So what was the original implementation doing that caused the args to be passed in reverse order? Well the key part of the diff above are these two parts:

-        func.restore_list = [(target, attribute, original)]
-        func.patch_list = [(target, attribute, new)]
+        patch_list = [(target, attribute, new, methods, spec)]
+        restore_list = [(target, attribute, original)]

and:

+        patched.restore_list = restore_list
+        patched.patch_list = patch_list

In other words, it was tagging the inner function with metadata information instead of tagging the newly created function! What effect did this have? Well, remember that if the function tagged with a certain attribute, it will create a new function that wraps the passed in function and tag the function with metadata. In other words, it creates a brand new function. This maps to something like this:

r1 = patch('a')
r2 = patch('b')
r3 = patch('c')
r4 = patch('d')
# This is important.  Note how it's tagging f and NOT r5.  This is the bug.
# And it will propogate all the way through.
r5 = r4(f)  # before returning r5: f.metadata = ['d']
r6 = r3(r5) # since f is tagged and not r5, it will create a new function
            # and tag the inner function r5:  r5.metadata = ['c']
r7 = r2(r6) # create a new r7 and do: r6.metadata = ['b']
r8 = r1(r7) # create a new r8 and do: r7.metadata = ['a']
f = r8

So what happens when we execute r8? Well, it's the exact same function as before. It will iterate through its list of "metadata" objects, and patch them out before calling the function it wraps. So r8 will iterate through a single element list, which will patch out 'a'. It will then call r7. r7 will patch out 'b' and then call r6, which patches 'c' and calls r5, which patches 'd' and finally calls the original f. Now you can see how that processing results in the original f function having the signature:

def f(a_mock, b_mock, c_mock, d_mock): pass

Hopefully at this point you understand what the patch bug was. Sure it happened to work, but this was not the intended way it was suppose to work.

Now, the astute reader will notice that we could have made the bug fix (that is, make it so all the nesting is removed and all the patches are in a single list) and at the same time preserve backwards compatability. Simply change:

def patcher(func):
    original = getattr(target, attribute)
    if hasattr(func, 'restore_list'):
        func.restore_list.append((target, attribute, original))
        func.patch_list.append((target, attribute, new, methods, spec))

to:

def patcher(func):
    original = getattr(target, attribute)
    if hasattr(func, 'restore_list'):
        func.restore_list.insert(0, (target, attribute, original))
        func.patch_list.insert(0, (target, attribute, new, methods, spec))

Unfortunately, this is what we ended up doing at work, as we had way too many tests that this would break. Plus, we were already used to the in order semantics of the old implementation. This could be potentially troublesome if we intend to get updates in the future (which we do).

And that, is the voidspace mock bug.


To gain more confidence in production code, tests at varying levels accompany the development of such code. Unittests and integration tests are common, and sometimes suites of functional/acceptance tests are present as well. There's been quite a bit of discussion surrounding how to best write unittests (especially if you're using some xUnit framework variant), and sometimes these best practices even make their way into integration tests. Unfortunately, I still have not seen these best practices make their way into functional tests. In particular, I frequently see functional tests that are either very complicated, and/or obscure.

Before going on, it's worth voicing why I think complicated and obscure tests are bad. Typically production code is complicated. The domain is filled with special cases and unintuitive behavior. In order to help write production code, unittests are written to help make sure the production code works as we expect (in addition to driving the design if you use TDD). There's a secret so fundamental to writing unittests, yet it's rarely explicitly called out:

We don't test our tests!

Think about that for a moment. We have complicated production code. We write tests to ensure that the production code works as expected. What tests the test? Well if you have really complicated test logic, then are you really better off because you've written tests? Testcases are only effective if we can assume they are correct, in which case they are correctly asserting things about the system under test. If testcases are incorrect, that is, they are asserting incorrect things about the system, then the validity of the system under test is unknown, and you're arguably worse off than before (is a test fail really a fail?).

So what's the solution for this? Well, test the tests of course! And what if those tests are still complicated? Well, we'll test those as well! In fact it's tests all the way down. In order to avoid this recursion, we have to set a practical limitation to testcases: they must be simple enough to not require tests. In practice, this typically means two things:

  • No conditionals
  • Small in length

Now, how to avoid these two things (or how to replace these two things with better alternatives) is the topic on it's own. What I'm interested in for this post is why the trend of simple and small tests has still not made it's way into the realm of functional testing.

If you look at any (reasonable) code base's unittests, they're typically not so bad. They're short, they're small, you can read a test and more or less understand what it's testing. If this code base contains functional tests (and by functional tests, I basically mean any test that simulates interacting with the system in similar way to how a user would interact with the system), you'll typically see these characteristics instead:

  • Long
  • Complicated logic
  • Hard to understand
  • Fragile, failures aren't really failures sometimes
  • Terrible defect localization

Why is this? Is it because functional testing is still progressing towards what unittests currently are? Is it because of a lack of consensus that functional testing should have the same characteristics of unittests? Is it because writing small tests with no conditionals is hard to do at such a high level?

My take on this is that I believe writing good functional tests is much harder than writing unittests, and because of this, best practices are typically ignored. Fortunately, I think this situation can be remedied, and in the next post, I'll show several things you can do to achieve more succinct tests.