MY PYTHON CODE ISN'T WORKING!! We've all been there right? This is a series where I'll share miscellaneous tips I've learned for troubleshooting python code. This is aimed at people who are relatively new to python. In this first series, I'd like to cover one of those common things you'll run into: the traceback.
Many times an error in python code is accompanied by a traceback. If you want to get really good at troubleshooting python programs, you'll need to become really comfortable with reading a traceback. You should be able to look at a traceback and have a general idea of what's happening in the traceback. One of the things I always notice when working with people new to python is how puzzled they look when they first see tracebacks.
So let's work through an example. Consider this script:
import httplib2 def a(): b() def b(): c() def c(): d() def d(): h = httplib2.Http() h.request(uri=None) a()
When this script is run we get this traceback:
Traceback (most recent call last): File "issue.py", line 19, in <module> a() File "issue.py", line 5, in a b() File "issue.py", line 9, in b c() File "issue.py", line 12, in c d() File "issue.py", line 16, in d h.request(uri=None) File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request (scheme, authority, request_uri, defrag_uri) = urlnorm(uri) File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 206, in urlnorm (scheme, authority, path, query, fragment) = parse_uri(uri) File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 202, in parse_uri groups = URI.match(uri).groups() TypeError: expected string or buffer
While this can look intimidating at first, there's a few basic things to remember when reading a traceback:
So let's see how we can apply these steps to the traceback above. First, let's use the first item: the stack frames go from oldest frame at the beginning of the output to the newest frame at the bottom. To be absolutely clear, in the above code, the call chain is: a() -> b() -> c() -> d() -> httplib2.Http.request. The oldest stack frame is associated with the a() function call (it's the call the triggered all the remaining calls), and the newest stack frame is for httplib2.Http.request (it's the call that actually triggered the exception being raised). Conceptually, you think of a python traceback as growing downwards, any time something is pushed onto the stack, it is appended to the output. And when something is popped off the stack, its output is removed from the end of the stack.
Now let's apply the second item: pay attention to the filenames associated with each level of the traceback. Right off the bat we can see there are two main modules involved in this interaction. There's the issue module, which looks like this in the traceback:
File "issue.py", line 19, in <module> a() File "issue.py", line 5, in a b()
and there's httplib2, which looks like this in the traceback:
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request (scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
There's a few important observations:
We know that an error occurs in a 3rd party library because the location of this library is under the "site-packages" directory. As a rule of thumb, if something is under the "site-packages" directory it's a third party module (i.e. not something in the python standard library). This is typically where packages installed the pip are replaced (e.g. pip install httplib2).
The second item also says to pay attention to where the frames jump across modules or package "types." In this traceback we can see that we jump across modules and packages "types" here:
File "issue.py", line 16, in d h.request(uri=None) File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request (scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
In these four lines we can see that we jump from issue.py to httplib2. By jumping across package "types", I simply mean where we jump from our modules/packages to either standard library packages or 3rd party packages. From the four lines shown above we can see that by calling h.request() we jump into the httplib2 module.
Now let's apply the third item: Read the bottom most line to read the actual exception message. In our example, the actual exception that's raised is:
TypeError: expected string or buffer
Admittedly, not the most helpful error message. If we look at the line before this line, we can see the actual line that caused this TypeError:
groups = URI.match(uri).groups()
The two most likely things to cause a TypeError would be a call to match() or a call to groups(). Noticing that uri arg is seen at multiple frames in the traceback, our first guess would be that the value of uri is causing a TypeError. If we go bottom up until we don't see the uri param mentioned, we can see that it's first mentioned here:
File "issue.py", line 16, in d h.request(uri=None) File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request (scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
Given that the h.request(uri=None) comes from our code, this is probably the first place we should look.
It turns out that the uri parameter needs to be a string:
h = httplib2.Http() response = h.request(uri='http://www.google.com')
Now, it doesn't always work out as nicely as this, but having a basic example helps to serve as a basis for further debugging techniques.
The first question one might ask is:
Another persistent key value store, really?
It all started when I was working on a project where I needed a simple key value store, accessible from python. Technically, I was using the shelve module, and it decided to use the Berkeley DB (via anydbm). So far so good. But there were a few issues:
The first issue is regarding availability. Not all python installations are the same. Just because a user has python installed does not mean they necessarily have all the standard libraries installed. I just checked my python install on my Macbook, and I don't have the bsddb module available. On my debian system I don't have the gdbm module installed. Given that these packages are just python bindings to C based dbms, installing these packages involves:
None of these steps are that much work, but are there any alternatives?
On all of my systems I have the dbm module available. This is a C based DBM that seems to available on most python installations. How fast is it? There's a scripts/benchmark script available in the semidbm repo that can benchmark any dbm like module. Here's the results for the dbm module:
$ scripts/benchmark -d dbm Generating random data. Benchmarking: <module 'dbm' from '/Users/jsaryer/.virtualenvs/semidbm/lib/python2.7/lib-dynload/dbm.so'> num_keys : 1000000 key_size : 16 value_size: 100 HASH: Out of overflow pages. Increase page size ERROR: exception caught when benchmarking <module 'dbm' from '/Users/jsaryer/.virtualenvs/semidbm/lib/python2.7/lib-dynload/dbm.so'>: cannot add item to database
Or in other words, it made it to about 450000 keys before this error was generated. So storing a large number of keys doesn't seem possible with python's dbm module.
While some dbms that aren't available simply require compiling/installing the right packages and files, there are some dbms that just aren't available on certain platforms (notoriously windows).
Well fortunately, there's a fallback python module that's guaranteed to be available on every single python installation: dumbdbm.
Unfortunately, the performance is terrible. There's also a number of undesirable qualities:
To be fair, dumbdbm was most likely written as a last resort fallback to the more classic dbms. It's also really old (written by Guido himself if I remember correctly).
Hopefully the goals of semidbm are becoming clearer. I just wanted a dbm that was:
The first two points I felt I could achieve by simply using python, and not requiring any C libraries or C extensions.
The third point I felt I could improve by taking dumbdbm and making some minor improvements.
So that's the background of semidbm.
I think so. The benchmark page has more details regarding the performance, but as a quick comparison to semidbm:
$ scripts/benchmark -d semidbm -n 10000 Generating random data. Benchmarking: <module 'semidbm'> num_keys : 10000 key_size : 16 value_size: 100 fill_sequential : time: 0.126, micros/ops: 12.597, ops/s: 79382.850, MB/s: 8.782 read_hot : time: 0.041, micros/ops: 4.115, ops/s: 243036.754, MB/s: 26.886 read_sequential : time: 0.039, micros/ops: 3.861, ops/s: 258973.197, MB/s: 28.649 read_random : time: 0.042, micros/ops: 4.181, ops/s: 239171.571, MB/s: 26.459 delete_sequential : time: 0.058, micros/ops: 5.819, ops/s: 171856.854, MB/s: 19.012 $ scripts/benchmark -d dumbdbm -n 10000 Generating random data. Benchmarking: <module 'dumbdbm'> num_keys : 10000 key_size : 16 value_size: 100 fill_sequential : time: 1.824, micros/ops: 182.400, ops/s: 5482.447, MB/s: 0.607 read_hot : time: 0.165, micros/ops: 16.543, ops/s: 60450.332, MB/s: 6.687 read_sequential : time: 0.167, micros/ops: 16.733, ops/s: 59762.818, MB/s: 6.611 read_random : time: 0.175, micros/ops: 17.505, ops/s: 57126.529, MB/s: 6.320 delete_sequential : time: 99.025, micros/ops: 9902.522, ops/s: 100.984, MB/s: 0.011
From the output above, writes are an order of magnitude faster (and semidbm computes and writes out a checksum for every value) and reads are almost 4 times faster. Deletion performance is much better (0.058 seconds vs. 99.025 seconds for deleting 10000 keys).
Also, every single insertion/update/deletion is immediately written out to disk so if python crashes, at worst you'd lose one key, the key that was being writen out to disk when python crashed.
I think if you ever need to use a pure python dbm, semidbm is a great choice. Any time you'd otherwise have to use dumbdbm, use semidbm instead.
There's a number of things I'd like to investigate in the future:
My first (and still favorite) mock library for python has been Michael Foord's voidspace mock. It's an excellent library, and for me feels like the most pythonic take on mocking. I'd like to go over the code and examine one of the functions provided in the API, namely the patch function. It's a pretty interesting decorator whose implementation is worth looking at. But of course, first a detour on how I got to examining the implementation of patch (skip this part if you don't care and are only interested in the examination of patch):
So anyways, we've been using version 0.4.0 of the library for years now (I know, we're horribly out of date). Looking at the awesome changelog, there were so many features that looked like they would help us write tests that I made the attempt at upgrading from 0.4.0 to 0.6.0.
My hopes of 0.6.0 being a drop in replacement for 0.4.0 were quickly dispelled. There was no chance of this happening. This was frustrating, but understandable. It clearly states on the main docs page that:
The current version is 0.6.0, dated 23rd August 2009. Mock is still experimental; the API may change. If you find bugs or have suggestions for improvements / extensions then please email me.
Not to mention the pre 1.0 version numbers are a clear indicator as well that the project is not to be considered stable. So I start looking at what sort of backwards incompatible changes were made, and what it would take to update to 0.6.0. At this point, I figured I might as well sync up with svn and just use the latest version in trunk (magic method support is highly desirable for us). So most of the changes were pretty straightforward, reset() has been renamed to reset_mock(), side_effect works a little differently, but ultimately, nothing too surprising.
There was one change I just couldn't wrap my head around. One change that didn't make a whole lot of sense. The ordering of patch decorators was reversed! In other words:
@patch('a') @patch('b') @patch('c') def test(self, a_mock, b_mock, c_mock): pass
@patch('a') @patch('b') @patch('c') def test(self, c_mock, b_mock, a_mock): pass
Let's see. This didn't add/enable any new feature. There's nothing (as far as my limited knowledge goes) about the accepted and proper ordering of stacked decorators. So why the change?
Well curiosity got the best of me, so I thought maybe there's some issue associated with this. The best i could find was this issue which is someone asking for the ordering to be put back, and for the author stating to flip the order back (once again) would be even more confusing, and that this was the result of a bugfix, that "decorators were being applied incorrectly previously". Ok, I can go with that. If this was the result of a necessary bugfix, I can understand the need for a change.
So at this point, I wasn't interested in the "why the change" question, but merely "what was the bug?" Curiosity got the best of me.
My next step was to track down the commit that caused this bug. Easy right? Well, not exactly. Eventually I gave up trying to use SVN to track down the commit (or more correctly the SVN server gave up on me and kept timing out), so I imported the whole thing to git using git-svn. A few minutes with git bisect showed the commit I was after. It's commit message:
Yeah, that's right, an empty commit message. Now I'm really curious so I start looking at the diff to see if I can understand what the patch bug was and how it was fixed:
-def _patch(target, attribute, new): +def _patch(target, attribute, new, methods, spec): def patcher(func): original = getattr(target, attribute) if hasattr(func, 'restore_list'): func.restore_list.append((target, attribute, original)) - func.patch_list.append((target, attribute, new)) + func.patch_list.append((target, attribute, new, methods, spec)) return func - func.restore_list = [(target, attribute, original)] - func.patch_list = [(target, attribute, new)] + patch_list = [(target, attribute, new, methods, spec)] + restore_list = [(target, attribute, original)] def patched(*args, **keywargs): - for target, attribute, new in func.patch_list: + for target, attribute, new, methods, spec in patch_list: if new is DEFAULT: - new = Mock() + new = Mock(methods=methods, spec=spec) args += (new,) setattr(target, attribute, new) try: return func(*args, **keywargs) finally: - for target, attribute, original in func.restore_list: + for target, attribute, original in restore_list: setattr(target, attribute, original) + patched.restore_list = restore_list + patched.patch_list = patch_list patched.__name__ = func.__name__ patched.compat_co_firstlineno = getattr(func, "compat_co_firstlineno", func.func_code.co_firstlineno)
Did you see it? Did you see the bug? Well neither did I at first. In fact, I stared at it for a while still not understanding what the bug was or even how this somehow reversed the order of patch decorators. So now in full detective mode, I sketched exactly what the code was doing both pre patch and post patch to fully understand how this all worked.
For anyone not familiar with how patch works, see the API docs. The basic idea is that whatever object I specify as an argument to patch will replaced with a mock and passed into the decorated function as an argument. After the function is run, it will replace the patched out object with it's original value. You can stack patch decorators on top of each other to patch multiple things out for the duration of the function.
This can get confusing when you think about what is going on behind the scenes in a call like this:
@patch('a') @patch('b') def f(a_mock, b_mock): pass
The first thing to do is to substitute the syntactic equivalent of the decorators without using the decorator synax:
def f(a_mock, b_mock): pass f = patch('a')(patch('b')(f))
It's now a little more clear what exactly is going on. Still to be even more clear, here's the call sequence:
r1 = patch('a') r2 = patch('b') r3 = r2(f) r4 = r1(r3) f = r4
Of course, this all happens behind the scenes. This is what happens at import time (or whenever the function being decorated is encountered). The function hasn't even been invoked yet. What happens when the new f is finally invoked (in this case, r4)? Well, this is implementation specific. If we assume that each function call above returns a brand new function, then it would look something like this:
Now this could be arbitrarily large. If we had we 10 patches, the invocation would be:
Can we do any better given what we know about patch? Actually we can. What we ultimately want is for the execution of the original f to occur in an environment where all of the objects referenced in the patches above have been replaced with mocks. If every invocation of r_n first patches the object associated with itself before invoking the next r_(n-1) function, by the time the f is finally invoked, the necessary patches will have already occurred. Similarly when the function returns, if we restore the patched object before returning control to the calling function, by the time r_n returns, the original environment will have been put back.
But we don't need all that nesting. Think of an alternative implementation that accomplishes the same thing. One way to do it would be to wait until we actually invoke the original f and then patch all the objects out at once. When f returns, we can restore all the objects in one swoop as well.
This is what the implementation of _patch was trying to do. The idea was to tag the function with some metadata containing a list of objects to patch. Each application of the decoration would check for that attribute on the function, and if it existed, it would just append its own patched values to the end of the list and return the same exact function instead of creating a new function. Essentially:
r1 = patch('a') r2 = patch('b') r3 = r2(f) r3 = r1(r3) f = r3
This becomes more obvious with more patches:
r1 = patch('a') r2 = patch('b') r3 = patch('c') r4 = patch('d') r5 = r4(f) # before returning r5: r5.metadata = ['d'] # Subsequent calls will always return r5 r3(r5) # r5.metadata.append('c') --> r5 r2(r5) # r5.metadata.append('b') --> r5 r1(r5) # r5.metadata.append('a') --> r5 f = r5
In reality there would still be an r6-r8, but I show them without return values to emphasize they are actually modifying the state of r5. So by the end of the above sequence metadata looks like:
metadata = ['d', 'c', 'b', 'a']
And you can now see how when we apply the patches in order, your argument list will map to:
def f(d_mock, c_mock, b_mock, a_mock): pass
Even though this is what we'd like it to do, this was not what the original implementation was doing. So what was the original implementation doing that caused the args to be passed in reverse order? Well the key part of the diff above are these two parts:
- func.restore_list = [(target, attribute, original)] - func.patch_list = [(target, attribute, new)] + patch_list = [(target, attribute, new, methods, spec)] + restore_list = [(target, attribute, original)]
+ patched.restore_list = restore_list + patched.patch_list = patch_list
In other words, it was tagging the inner function with metadata information instead of tagging the newly created function! What effect did this have? Well, remember that if the function tagged with a certain attribute, it will create a new function that wraps the passed in function and tag the function with metadata. In other words, it creates a brand new function. This maps to something like this:
r1 = patch('a') r2 = patch('b') r3 = patch('c') r4 = patch('d') # This is important. Note how it's tagging f and NOT r5. This is the bug. # And it will propogate all the way through. r5 = r4(f) # before returning r5: f.metadata = ['d'] r6 = r3(r5) # since f is tagged and not r5, it will create a new function # and tag the inner function r5: r5.metadata = ['c'] r7 = r2(r6) # create a new r7 and do: r6.metadata = ['b'] r8 = r1(r7) # create a new r8 and do: r7.metadata = ['a'] f = r8
So what happens when we execute r8? Well, it's the exact same function as before. It will iterate through its list of "metadata" objects, and patch them out before calling the function it wraps. So r8 will iterate through a single element list, which will patch out 'a'. It will then call r7. r7 will patch out 'b' and then call r6, which patches 'c' and calls r5, which patches 'd' and finally calls the original f. Now you can see how that processing results in the original f function having the signature:
def f(a_mock, b_mock, c_mock, d_mock): pass
Hopefully at this point you understand what the patch bug was. Sure it happened to work, but this was not the intended way it was suppose to work.
Now, the astute reader will notice that we could have made the bug fix (that is, make it so all the nesting is removed and all the patches are in a single list) and at the same time preserve backwards compatability. Simply change:
def patcher(func): original = getattr(target, attribute) if hasattr(func, 'restore_list'): func.restore_list.append((target, attribute, original)) func.patch_list.append((target, attribute, new, methods, spec))
def patcher(func): original = getattr(target, attribute) if hasattr(func, 'restore_list'): func.restore_list.insert(0, (target, attribute, original)) func.patch_list.insert(0, (target, attribute, new, methods, spec))
Unfortunately, this is what we ended up doing at work, as we had way too many tests that this would break. Plus, we were already used to the in order semantics of the old implementation. This could be potentially troublesome if we intend to get updates in the future (which we do).
And that, is the voidspace mock bug.
To gain more confidence in production code, tests at varying levels accompany the development of such code. Unittests and integration tests are common, and sometimes suites of functional/acceptance tests are present as well. There's been quite a bit of discussion surrounding how to best write unittests (especially if you're using some xUnit framework variant), and sometimes these best practices even make their way into integration tests. Unfortunately, I still have not seen these best practices make their way into functional tests. In particular, I frequently see functional tests that are either very complicated, and/or obscure.
Before going on, it's worth voicing why I think complicated and obscure tests are bad. Typically production code is complicated. The domain is filled with special cases and unintuitive behavior. In order to help write production code, unittests are written to help make sure the production code works as we expect (in addition to driving the design if you use TDD). There's a secret so fundamental to writing unittests, yet it's rarely explicitly called out:
Think about that for a moment. We have complicated production code. We write tests to ensure that the production code works as expected. What tests the test? Well if you have really complicated test logic, then are you really better off because you've written tests? Testcases are only effective if we can assume they are correct, in which case they are correctly asserting things about the system under test. If testcases are incorrect, that is, they are asserting incorrect things about the system, then the validity of the system under test is unknown, and you're arguably worse off than before (is a test fail really a fail?).
So what's the solution for this? Well, test the tests of course! And what if those tests are still complicated? Well, we'll test those as well! In fact it's tests all the way down. In order to avoid this recursion, we have to set a practical limitation to testcases: they must be simple enough to not require tests. In practice, this typically means two things:
Now, how to avoid these two things (or how to replace these two things with better alternatives) is the topic on it's own. What I'm interested in for this post is why the trend of simple and small tests has still not made it's way into the realm of functional testing.
If you look at any (reasonable) code base's unittests, they're typically not so bad. They're short, they're small, you can read a test and more or less understand what it's testing. If this code base contains functional tests (and by functional tests, I basically mean any test that simulates interacting with the system in similar way to how a user would interact with the system), you'll typically see these characteristics instead:
Why is this? Is it because functional testing is still progressing towards what unittests currently are? Is it because of a lack of consensus that functional testing should have the same characteristics of unittests? Is it because writing small tests with no conditionals is hard to do at such a high level?
My take on this is that I believe writing good functional tests is much harder than writing unittests, and because of this, best practices are typically ignored. Fortunately, I think this situation can be remedied, and in the next post, I'll show several things you can do to achieve more succinct tests.