Troubleshooting Python Code

Wed 21 March 2012 by James Saryerwinnie

MY PYTHON CODE ISN'T WORKING!! We've all been there right? This is a series where I'll share miscellaneous tips I've learned for troubleshooting python code. This is aimed at people who are relatively new to python. In this first series, I'd like to cover one of those common things you'll run into: the traceback.

Reading Python Tracebacks

Many times an error in python code is accompanied by a traceback. If you want to get really good at troubleshooting python programs, you'll need to become really comfortable with reading a traceback. You should be able to look at a traceback and have a general idea of what's happening in the traceback. One of the things I always notice when working with people new to python is how puzzled they look when they first see tracebacks.

So let's work through an example. Consider this script:

import httplib2


def a():
    b()


def b():
    c()

def c():
    d()

def d():
    h = httplib2.Http()
    h.request(uri=None)


a()

When this script is run we get this traceback:

Traceback (most recent call last):
  File "issue.py", line 19, in <module>
    a()
  File "issue.py", line 5, in a
    b()
  File "issue.py", line 9, in b
    c()
  File "issue.py", line 12, in c
    d()
  File "issue.py", line 16, in d
    h.request(uri=None)
  File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request
    (scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
  File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 206, in urlnorm
    (scheme, authority, path, query, fragment) = parse_uri(uri)
  File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 202, in parse_uri
    groups = URI.match(uri).groups()
TypeError: expected string or buffer

While this can look intimidating at first, there's a few basic things to remember when reading a traceback:

  • The oldest frame in the stack is at the top, and the newest frame is at the bottom. This means that the bottom of the traceback output is where the uncaught exception was originally raised. This is the opposite of other languages such as java and c/c++ where the first line shows the newest frame (the frame where the uncaught exception originated).
  • Pay attention to the filenames associated with each level of the traceback, and pay attention where the frames jump across modules and package "types" (more on this later).
  • Read the bottom most line to read the actual exception message.
  • Above all, remember that the traceback alone may not be sufficient to understand what went wrong.

So let's see how we can apply these steps to the traceback above. First, let's use the first item: the stack frames go from oldest frame at the beginning of the output to the newest frame at the bottom. To be absolutely clear, in the above code, the call chain is: a() -> b() -> c() -> d() -> httplib2.Http.request. The oldest stack frame is associated with the a() function call (it's the call the triggered all the remaining calls), and the newest stack frame is for httplib2.Http.request (it's the call that actually triggered the exception being raised). Conceptually, you think of a python traceback as growing downwards, any time something is pushed onto the stack, it is appended to the output. And when something is popped off the stack, its output is removed from the end of the stack.

Now let's apply the second item: pay attention to the filenames associated with each level of the traceback. Right off the bat we can see there are two main modules involved in this interaction. There's the issue module, which looks like this in the traceback:

File "issue.py", line 19, in <module>
  a()
File "issue.py", line 5, in a
  b()

and there's httplib2, which looks like this in the traceback:

File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request
  (scheme, authority, request_uri, defrag_uri) = urlnorm(uri)

There's a few important observations:

  • The length of the filenames. In this case the issue.py filename suggests that this originated from our current working directory, hence the relative path.
  • The error actually occurs in a 3rd party library (the last three lines of the output from the traceback).

We know that an error occurs in a 3rd party library because the location of this library is under the "site-packages" directory. As a rule of thumb, if something is under the "site-packages" directory it's a third party module (i.e. not something in the python standard library). This is typically where packages installed the pip are replaced (e.g. pip install httplib2).

The second item also says to pay attention to where the frames jump across modules or package "types." In this traceback we can see that we jump across modules and packages "types" here:

File "issue.py", line 16, in d
  h.request(uri=None)
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request
  (scheme, authority, request_uri, defrag_uri) = urlnorm(uri)

In these four lines we can see that we jump from issue.py to httplib2. By jumping across package "types", I simply mean where we jump from our modules/packages to either standard library packages or 3rd party packages. From the four lines shown above we can see that by calling h.request() we jump into the httplib2 module.

Now let's apply the third item: Read the bottom most line to read the actual exception message. In our example, the actual exception that's raised is:

TypeError: expected string or buffer

Admittedly, not the most helpful error message. If we look at the line before this line, we can see the actual line that caused this TypeError:

groups = URI.match(uri).groups()

The two most likely things to cause a TypeError would be a call to match() or a call to groups(). Noticing that uri arg is seen at multiple frames in the traceback, our first guess would be that the value of uri is causing a TypeError. If we go bottom up until we don't see the uri param mentioned, we can see that it's first mentioned here:

File "issue.py", line 16, in d
  h.request(uri=None)
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request
  (scheme, authority, request_uri, defrag_uri) = urlnorm(uri)

Given that the h.request(uri=None) comes from our code, this is probably the first place we should look.

It turns out that the uri parameter needs to be a string:

h = httplib2.Http()
response = h.request(uri='http://www.google.com')

Now, it doesn't always work out as nicely as this, but having a basic example helps to serve as a basis for further debugging techniques.


Comments