jamesls: a webloghttp://jamesls.com/2016-09-20T07:00:00-07:00Writing Redis in Python with asyncio 2: Shared State2016-09-20T07:00:00-07:00James Saryerwinnietag:jamesls.com,2016-09-20:writing-redis-in-python-with-asyncio-2-shared-state.html<p>I've been writing redis in python using asyncio. I started this project
because I wanted to learn more about asyncio and thought that porting
an existing project to asyncio would give me an excellent opportunity to learn
about this new library.</p>
<p><a class="reference external" href="http://jamesls.com/writing-redis-in-python-with-asyncio-part-1.html">Part 1</a>
of this series covered how to implement a basic request/response for redis
using asyncio. It covered how to use protocols, how they hooked into asyncio,
and how you could parse and serialize requests and responses.
Since part 1 was first published over a year ago (I know...), a few things have
happened:</p>
<ol class="arabic simple">
<li>Python 3.5 added <tt class="docutils literal">async</tt> and <tt class="docutils literal">await</tt> keywords which changed the
recommended way for working with coroutines.</li>
<li>I had the opportunity to speak about this topic at EuroPython 2016.</li>
</ol>
<p>The slides for my talk are available <a class="reference external" href="https://speakerdeck.com/jamesls/writing-redis-in-python-with-asyncio">on speakerdeak</a>.</p>
<p>You can also check out the talk <a class="reference external" href="https://www.youtube.com/watch?v=CF8zt8l6SeI">here</a>.</p>
<p>If you've read <a class="reference external" href="http://jamesls.com/writing-redis-in-python-with-asyncio-part-1.html">Part 1</a>
of this series, the EuroPython talk covered several additional topics:</p>
<ul class="simple">
<li>PUBLISH/SUBSCRIBE</li>
<li>BLPOP/BRPOP (blocking queues)</li>
</ul>
<p>Covering BLPOP/BRPOP also required a quick detour into <tt class="docutils literal">async</tt> and <tt class="docutils literal">await</tt>.</p>
<p>In the next series of posts, I wanted to discuss these topics in more detail,
and cover some of the additional topics that I omitted from my talk due to
time.</p>
<p>For the remainder of this post, we're going to look at how to implement
the <tt class="docutils literal">PUBLISH</tt> and <tt class="docutils literal">SUBSCRIBE</tt> commands in redis using asyncio.</p>
<div class="section" id="publish-subscribe">
<h2>Publish/Subscribe</h2>
<p>I'm assuming you're familiar with redis, but here's a quick reminder
of the PUBSUB feature in redis, and what we're shooting for in this post:</p>
<div class="figure">
<object data="http://jamesls.com/images/redis-async/pubsub.svg" style="width: 100%;" type="image/svg+xml">
</object>
</div>
<p>And here's a video of this in action:</p>
<script type="text/javascript" src="https://asciinema.org/a/3fv15a3oalrfbtlyckb1ngllj.js" id="asciicast-3fv15a3oalrfbtlyckb1ngllj" async></script><p>In the video, you see two clients <tt class="docutils literal">SUBSCRIBE</tt> to a channel. Those clients
will then block until another client comes along and issues a <tt class="docutils literal">PUBLISH</tt>
command to that channel. You can see that when the bottom client issues
a <tt class="docutils literal">PUBLISH</tt> command, the two top clients subscribed to the channel receive
the published message.
The <a class="reference external" href="http://redis.io/topics/pubsub">redis docs on pubsub</a> have a more
detailed overview of this feature.</p>
<p>Let's look at how to do this in python using asyncio.</p>
</div>
<div class="section" id="sharing-state">
<h2>Sharing State</h2>
<p>Whenever we receive a <tt class="docutils literal">PUBLISH</tt> command from a client, we need to send
the message being published to every previously subscribed client. Transports
need to be able to talk to other transports. More generally, we need a way
to share state across transports.</p>
<p>As a newcomer to the async world, this was one of the hardest things for me to figure out.
How are you suppose to share state across connections?</p>
<p>What I found helpful was to first write code that assumed shared state and then figure
out how to plumb it all together later.
For this PUSUB feature, let's create a <tt class="docutils literal">PubSub</tt> class that allows transports
to subscribe and publish to channels:</p>
<div class="highlight"><pre><span class="k">class</span> <span class="nc">PubSub</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_channels</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">subscribe</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">channel</span><span class="p">,</span> <span class="n">transport</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_channels</span><span class="o">.</span><span class="n">setdefault</span><span class="p">(</span><span class="n">channel</span><span class="p">,</span> <span class="p">[])</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">transport</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="s">'subscribe'</span><span class="p">,</span> <span class="n">channel</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">publish</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">channel</span><span class="p">,</span> <span class="n">message</span><span class="p">):</span>
<span class="n">transports</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_channels</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">channel</span><span class="p">,</span> <span class="p">[])</span>
<span class="n">message</span> <span class="o">=</span> <span class="n">serializer</span><span class="o">.</span><span class="n">serialize_to_wire</span><span class="p">(</span>
<span class="p">[</span><span class="s">'message'</span><span class="p">,</span> <span class="n">channel</span><span class="p">,</span> <span class="n">message</span><span class="p">])</span>
<span class="k">for</span> <span class="n">transport</span> <span class="ow">in</span> <span class="n">transports</span><span class="p">:</span>
<span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">message</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">transports</span><span class="p">)</span>
</pre></div>
<p>In the class above, We maintain a mapping of channel names (which are strings)
to transports. Every time a client wants to subscribe to a channel we add them to
the list of transports associated with that channel. Whenever a client wants to
publish a message we iterate through every transport and write the message being
published.</p>
<p>The way we'd use this class is in our <tt class="docutils literal">RedisServerProtocol</tt> class where we'll
assume we have an instance of this <tt class="docutils literal">PubSub</tt> class available as the
<tt class="docutils literal">self._pubsub</tt> instance variable:</p>
<div class="highlight"><pre><span class="c"># In the RedisServerProtocol class:</span>
<span class="k">class</span> <span class="nc">RedisServerProtocol</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pubsub</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_pubsub</span> <span class="o">=</span> <span class="n">pubsub</span>
<span class="k">def</span> <span class="nf">data_received</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">parsed</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_wire_protocol</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="c"># [COMMAND, arg1, arg2]</span>
<span class="n">command</span> <span class="o">=</span> <span class="n">parsed</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span>
<span class="k">if</span> <span class="n">command</span> <span class="o">==</span> <span class="n">b</span><span class="s">'subscribe'</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_pubsub</span><span class="o">.</span><span class="n">subscribe</span><span class="p">(</span><span class="n">parsed</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">command</span> <span class="o">==</span> <span class="n">b</span><span class="s">'publish'</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_pubsub</span><span class="o">.</span><span class="n">publish</span><span class="p">(</span><span class="n">parsed</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">parsed</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
</pre></div>
<p>For this code to work, there can only be a single instance of the
<tt class="docutils literal">PubSub</tt> class that's shared across all the incoming connections. We need a
way to make sure that whenever we create a protocol instance, we can also inject
a shared reference to a <tt class="docutils literal">PubSub</tt> instance.</p>
<p>Let's refresh our memories first. In <a class="reference external" href="http://jamesls.com/writing-redis-in-python-with-asyncio-part-1.html">part 1</a> of
this series, we talked protocols and transports.
One of the main takeaways from that post is that
every time a client connects to our server, there is a protocol instance and a
transport instance associated with that connection. It looks like this:</p>
<div class="figure">
<object data="http://jamesls.com/images/redis-async/transportpair.svg" style="width: 100%;" type="image/svg+xml">
</object>
</div>
<p>A protocol factory is used to create a protocol instance which is
associated with a single connection.
This factory is just a callable that returns an instance of a protocol.
Here's how a protocol factory is used in the asyncio code base,
<tt class="docutils literal">asyncio/selector_events.py</tt>:</p>
<div class="highlight"><pre> <span class="k">def</span> <span class="nf">_accept_connection2</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span> <span class="n">protocol_factory</span><span class="p">,</span> <span class="n">conn</span><span class="p">,</span> <span class="n">extra</span><span class="p">,</span> <span class="n">server</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="n">protocol</span> <span class="o">=</span> <span class="bp">None</span>
<span class="n">transport</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">try</span><span class="p">:</span>
<span class="hll"> <span class="n">protocol</span> <span class="o">=</span> <span class="n">protocol_factory</span><span class="p">()</span> <span class="c"># RedisServerProtocol</span>
</span> <span class="n">waiter</span> <span class="o">=</span> <span class="n">futures</span><span class="o">.</span><span class="n">Future</span><span class="p">(</span><span class="n">loop</span><span class="o">=</span><span class="bp">self</span><span class="p">)</span>
<span class="n">transport</span> <span class="o">=</span> <span class="n">_SelectorSocketTransport</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sock</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span>
<span class="n">waiter</span><span class="p">,</span> <span class="n">extra</span><span class="p">,</span> <span class="n">server</span><span class="p">)</span>
<span class="c"># ...</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">exc</span><span class="p">:</span>
<span class="c"># ...</span>
<span class="k">pass</span>
</pre></div>
<p>Because a protocol factory is instantiated with no args, we need some
other way to bind our <tt class="docutils literal">PubSub</tt> instance to this factory. We could use
<tt class="docutils literal">functools.partial</tt> (which is actually what's used in part 1), but I've
found that having a distinct class for this has made things easier:</p>
<div class="highlight"><pre><span class="k">class</span> <span class="nc">ProtocolFactory</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">protocol_cls</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_protocol_cls</span> <span class="o">=</span> <span class="n">protocol_cls</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_args</span> <span class="o">=</span> <span class="n">args</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_kwargs</span> <span class="o">=</span> <span class="n">kwargs</span>
<span class="k">def</span> <span class="nf">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c"># No arg callable is used to instantiate</span>
<span class="c"># protocols in asyncio.</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_protocol_cls</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">_args</span><span class="p">,</span> <span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">_kwargs</span><span class="p">)</span>
</pre></div>
<p>Now instead of passing the <tt class="docutils literal">RedisServerProtocol</tt> to the
<tt class="docutils literal">loop.create_server</tt> call, we can pass an instance of the protocol
factory class we just created. Here's how everything looks once
it's wired together:</p>
<div class="highlight"><pre><span class="n">factory</span> <span class="o">=</span> <span class="n">ProtocolFactory</span><span class="p">(</span>
<span class="n">RedisServerProtocol</span><span class="p">,</span> <span class="n">PubSub</span><span class="p">()</span>
<span class="p">)</span>
<span class="n">coro</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_server</span><span class="p">(</span><span class="n">factory</span><span class="p">,</span> <span class="n">hostname</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span>
<span class="n">server</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">coro</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Listening on port {}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_forever</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">KeyboardInterrupt</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Ctrl-C received, shutting down."</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">server</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">server</span><span class="o">.</span><span class="n">wait_closed</span><span class="p">())</span>
<span class="n">loop</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Server shutdown."</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span>
</pre></div>
<p>And that's all you need to get a basic PUBSUB implementation
up and running using asyncio.</p>
</div>
<div class="section" id="wrapping-up">
<h2>Wrapping Up</h2>
<p>To summarize what we've done:</p>
<ul class="simple">
<li>Create a new Pubsub class that gives you the ability to subscribe
a transport to a channel name as well as the ability to publish
a message to a channel.</li>
<li>Update the <tt class="docutils literal">RedisServerProtocol</tt> class to accept a reference to this
object in its <tt class="docutils literal">__init__</tt>.</li>
<li>Update <tt class="docutils literal">RedisServerProtocol.data_received</tt> to use this <tt class="docutils literal">_pubsub</tt>
instance whenever we received a <tt class="docutils literal">PUBLISH</tt> or <tt class="docutils literal">SUBSCRIBE</tt> command.</li>
<li>Create a protocol factory that passes the same shared <tt class="docutils literal">PubSub</tt>
object to every protocol instance that gets created.</li>
</ul>
<p>In the next post, we'll look at how you can implement <tt class="docutils literal">BLPOP/BRPOP</tt>
with asyncio.</p>
<p>One last thing. I'm in the process of getting this code on github.
I'll update this post with a link once the repo is available, or you can
<a class="reference external" href="https://twitter.com/jsaryer">follow me on twitter</a> where I'll also
post a link.</p>
</div>
Writing Redis in Python with asyncio: Part 12015-03-23T08:00:00-07:00James Saryerwinnietag:jamesls.com,2015-03-23:writing-redis-in-python-with-asyncio-part-1.html<p>Python 3.4 featured a brand new library that's been getting a lot of attention:
<tt class="docutils literal">asyncio</tt>. For numerous reasons, including the fact that the originator of
<a class="reference external" href="https://www.python.org/dev/peps/pep-3156/">the pep</a> is Guido himself, the
<tt class="docutils literal">asyncio</tt> library is growing in popularity within the python community.</p>
<p>So, I'm thinking that it might be fun to try to use this new <tt class="docutils literal">asyncio</tt>
library to write redis in pure python.</p>
<div class="section" id="the-pitch">
<h2>The Pitch</h2>
<p>I maintain a library, <a class="reference external" href="https://github.com/jamesls/fakeredis">fakeredis</a>,
which is a testing library that emulates redis via the <tt class="docutils literal"><span class="pre">redis-py</span></tt> client API.
It doesn't have any of the runtime guarantees that redis has (yet), but for the
most part, it has the same <em>functional</em> behavior as redis. All of its
state is kept in memory, and it does nothing for persisting state to disk.
After all, it's meant as a testing library, to avoid having to spin up a real
redis server during your python unit tests. Here's fakeredis in a nutshell:</p>
<div class="highlight"><pre><span class="o">>>></span> <span class="kn">import</span> <span class="nn">fakeredis</span>
<span class="o">>>></span> <span class="n">r</span> <span class="o">=</span> <span class="n">fakeredis</span><span class="o">.</span><span class="n">FakeStrictRedis</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s">'foo'</span><span class="p">,</span> <span class="s">'bar'</span><span class="p">)</span>
<span class="bp">True</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">'foo'</span><span class="p">)</span>
<span class="s">'bar'</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">lpush</span><span class="p">(</span><span class="s">'bar'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="mi">1</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">lpush</span><span class="p">(</span><span class="s">'bar'</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="mi">2</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">lrange</span><span class="p">(</span><span class="s">'bar'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>
</pre></div>
<p>Same semantics as the redis-py client, with all state stored in the memory of
the process.</p>
<p>So the idea is to use <tt class="docutils literal">asyncio</tt> to provide a server that accepts client
connections that can parse the wire protocol for redis. It then figures out the
corresponding calls to make into <tt class="docutils literal">fakeredis</tt>, which provide all the
functional redis semantics, and takes the return values from <tt class="docutils literal">fakeredis</tt> and
constructs the appropriate wire response.</p>
<!-- add a cool diagram here -->
<p>If all goes well I should have something that any redis client can talk to,
without knowing they're not actually talking to the real redis server. We will
have created a slower, less memory efficient implementation of the redis server
without the "required" features like "persistence" or "replication". It's
redis, writen in python, using <tt class="docutils literal">asyncio</tt>. Sounds like fun.</p>
<p>If nothing else, we'll learn a little more about <tt class="docutils literal">asyncio</tt> in the process.</p>
<div class="section" id="setting-scope">
<h3>Setting Scope</h3>
<p>Now first off, I plan for this to be a multipart series.</p>
<p>The scope for this post, part 1, is to get to the point where we can make redis
calls for all its basic functionality, which includes the API calls for
manipulating data for all of redis's supported types. Perhaps what's more
interesting is what I'm leaving out in this post.</p>
<p>What I won't look at in this post is:</p>
<ul class="simple">
<li>saving to disk</li>
<li>blocking operations, such as <a class="reference external" href="http://redis.io/commands/BLPOP">BLPOP</a></li>
<li>performance</li>
<li>handling slow clients</li>
<li>expirations</li>
<li>any kind of replication</li>
<li>testing</li>
</ul>
<p>These items will be the subject of future posts. This is a long winded way
of me saying that we're going to be taking shortcuts. It'll be ok.</p>
</div>
<div class="section" id="assumptions">
<h3>Assumptions</h3>
<p>To get the most out of this post, I'm assuming that:</p>
<ul class="simple">
<li>You're familiar with redis from an end-user perspective. You know what redis
is and you're familiar with the basic commands.</li>
<li>You're new to <tt class="docutils literal">asyncio</tt>, but you're not necessarily new to event driven
programming.</li>
<li><strong>You're using python 3.4 or greater.</strong></li>
</ul>
</div>
</div>
<div class="section" id="get-the-skeleton-up-and-running">
<h2>Get the Skeleton Up and Running</h2>
<p>The very first thing I want to do is get something up and running. It doesn't
have to do much, but I want to be able to at least have the server handle a
request and return a response, even if it's hardcoded. I'm going to pick the
<tt class="docutils literal">GET</tt> command because it's the simplest operation that provides useful
functionality. Once we get this running, we'll pick it apart and figure
out how it actually works.</p>
<p>So first things first, let's hop over to the
<a class="reference external" href="https://docs.python.org/3/library/asyncio.html">asyncio reference docs</a>.</p>
<div class="section" id="end-to-end-skeleton">
<h3>End to End Skeleton</h3>
<p>Asyncio appears to have a <strong>huge</strong> amount of documentation,
but most of it is stuff I don't care about right now.
The closest thing that looks interesting
is this <a class="reference external" href="https://docs.python.org/3/library/asyncio-protocol.html#tcp-echo-server-protocol">TCP echo server protocol</a>,
which shows a basic echo server with asyncio. We should be able to start with
the echo server and adapt that to what we want, at least initially.
Here's what I came up with after trying to adapt the echo server example above
to a hard coded redis <tt class="docutils literal">GET</tt> command.</p>
<div class="highlight"><pre><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="k">class</span> <span class="nc">RedisServerProtocol</span><span class="p">(</span><span class="n">asyncio</span><span class="o">.</span><span class="n">Protocol</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">connection_made</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">transport</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span> <span class="o">=</span> <span class="n">transport</span>
<span class="k">def</span> <span class="nf">data_received</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">message</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span>
<span class="k">if</span> <span class="s">'GET'</span> <span class="ow">in</span> <span class="n">message</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">b</span><span class="s">"$3</span><span class="se">\r\n</span><span class="s">"</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">b</span><span class="s">"BAZ</span><span class="se">\r\n</span><span class="s">"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">b</span><span class="s">"-ERR unknown command</span><span class="se">\r\n</span><span class="s">"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">hostname</span><span class="o">=</span><span class="s">'localhost'</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">6379</span><span class="p">):</span>
<span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="n">coro</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_server</span><span class="p">(</span><span class="n">RedisServerProtocol</span><span class="p">,</span>
<span class="n">hostname</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span>
<span class="n">server</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">coro</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Listening on port {}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_forever</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">KeyboardInterrupt</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"User requested shutdown."</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">server</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">server</span><span class="o">.</span><span class="n">wait_closed</span><span class="p">())</span>
<span class="n">loop</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Redis is now ready to exit."</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</pre></div>
<p>Save the above code to a file <tt class="docutils literal"><span class="pre">redis-asyncio</span></tt> and run it. We'll use the
<tt class="docutils literal"><span class="pre">redis-cli</span></tt> to verify this has the behavior that we want:</p>
<pre class="literal-block">
$ ./redis-asyncio &
[1] 96221
Listening on port 6379
$ redis-cli
127.0.0.1:5678> GET foo
"BAZ"
127.0.0.1:5678> GET bar
"BAZ"
127.0.0.1:5678> GET anything
"BAZ"
127.0.0.1:5678> FOOBAR asdf
(error) ERR unknown command
</pre>
<p>It works!</p>
</div>
<div class="section" id="but-how-does-it-work">
<h3>But How Does it Work?</h3>
<p>There's a lot we haven't explained yet.</p>
<p>While I'm going to skip over the <tt class="docutils literal">get_event_loop</tt> and <tt class="docutils literal">run_until_complete</tt>
for now, the <tt class="docutils literal">create_server</tt> is interesting. How exactly
does this server we create integrate with the <tt class="docutils literal">RedisServerProtocol</tt> we made?
For example, how do we go from <tt class="docutils literal">create_server</tt> to calling
<tt class="docutils literal">RedisServerProtocol.connection_made</tt>?</p>
<p>What helped me the most was just digging into the source code for
<tt class="docutils literal">asyncio</tt>, so let's do that.
I've annotated and simplified the code to give you a high level
view of what's going on. We'll start with <tt class="docutils literal">create_server</tt>,
and keeping going through the various methods until we see
our protocol's <tt class="docutils literal">connection_made</tt> method being called.</p>
<div class="highlight"><pre><span class="c"># These are all methods within an EventLoop class.</span>
<span class="nd">@coroutine</span>
<span class="k">def</span> <span class="nf">create_server</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">protocol_factory</span><span class="p">,</span> <span class="n">host</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="o">*</span><span class="p">,</span>
<span class="n">family</span><span class="o">=</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_UNSPEC</span><span class="p">,</span>
<span class="n">flags</span><span class="o">=</span><span class="n">socket</span><span class="o">.</span><span class="n">AI_PASSIVE</span><span class="p">,</span>
<span class="n">sock</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">backlog</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
<span class="n">ssl</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">reuse_address</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="c"># In this scenario the ``protocol_factory`` maps</span>
<span class="c"># to the ``RedisServerProtocol`` class object.</span>
<span class="c"># Create listening socket(s).</span>
<span class="n">socket</span> <span class="o">=</span> <span class="n">lots_of_code</span><span class="p">()</span>
<span class="n">server</span> <span class="o">=</span> <span class="n">Server</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">socket</span><span class="p">)</span>
<span class="c"># Once we create a server, we call _start_serving.</span>
<span class="c"># Note how we're passing along the protocol_factory</span>
<span class="c"># argument (our ``RedisServerProtocol`` class).</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_start_serving</span><span class="p">(</span><span class="n">protocol_factory</span><span class="p">,</span> <span class="n">socket</span><span class="p">,</span> <span class="n">ssl</span><span class="p">,</span> <span class="n">server</span><span class="p">)</span>
<span class="k">return</span> <span class="n">server</span>
<span class="k">def</span> <span class="nf">_start_serving</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">protocol_factory</span><span class="p">,</span> <span class="n">sock</span><span class="p">,</span>
<span class="n">sslcontext</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">server</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="c"># We're registering the _accept_connection method to be called</span>
<span class="c"># when a new connection is made. Again notice how we're</span>
<span class="c"># still passing along our protocol_factory (``RedisServerProtocol``</span>
<span class="c"># class) object.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">add_reader</span><span class="p">(</span><span class="n">sock</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span> <span class="bp">self</span><span class="o">.</span><span class="n">_accept_connection</span><span class="p">,</span>
<span class="n">protocol_factory</span><span class="p">,</span> <span class="n">sock</span><span class="p">,</span> <span class="n">sslcontext</span><span class="p">,</span> <span class="n">server</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_accept_connection</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">protocol_factory</span><span class="p">,</span> <span class="n">sock</span><span class="p">,</span>
<span class="n">sslcontext</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">server</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="c"># Finally! We're we can see that we instantiate</span>
<span class="c"># the protocol_factory class to actually get</span>
<span class="c"># an instance of ``RedisServerProtocol``.</span>
<span class="c"># We've gone from a class to an instance. So</span>
<span class="c"># what about connection_made? When does this get</span>
<span class="c"># called? Down the stack to _make_socket_transport.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_make_socket_transport</span><span class="p">(</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">protocol_factory</span><span class="p">(),</span> <span class="n">extra</span><span class="o">=</span><span class="p">{</span><span class="s">'peername'</span><span class="p">:</span> <span class="n">addr</span><span class="p">},</span>
<span class="n">server</span><span class="o">=</span><span class="n">server</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_make_socket_transport</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sock</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">waiter</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
<span class="n">extra</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">server</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="c"># At least we can see we've gone from protocol_factory to</span>
<span class="c"># just protocol, so now "protocol" in this scenario is an</span>
<span class="c"># instance of ``RedisServerProtocol``.</span>
<span class="k">return</span> <span class="n">_SelectorSocketTransport</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sock</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">waiter</span><span class="p">,</span>
<span class="n">extra</span><span class="p">,</span> <span class="n">server</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">_SelectorSocketTransport</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">loop</span><span class="p">,</span> <span class="n">sock</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">waiter</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
<span class="n">extra</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">server</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">sock</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">extra</span><span class="p">,</span> <span class="n">server</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_eof</span> <span class="o">=</span> <span class="bp">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_paused</span> <span class="o">=</span> <span class="bp">False</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_loop</span><span class="o">.</span><span class="n">add_reader</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_sock_fd</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_read_ready</span><span class="p">)</span>
<span class="c"># And finally, we see that we ask the event loop to call</span>
<span class="c"># the connection_made method of our protocol class, and we're</span>
<span class="c"># passing "self" (The transport object) as an argument to</span>
<span class="c"># connection_made.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_loop</span><span class="o">.</span><span class="n">call_soon</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_protocol</span><span class="o">.</span><span class="n">connection_made</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span>
</pre></div>
</div>
<div class="section" id="recap">
<h3>Recap</h3>
<p>So far, we've learned:</p>
<ul class="simple">
<li>It looks like the interesting stuff we'll be writing is in the Protocol. To
write our own redis server, we're going to flesh out a proper
<tt class="docutils literal">RedisServerProtocol</tt> class that understands the redis wire protocol.</li>
<li>We get 1 protocol per client connection. Storing state on the protocol
will be scoped to the lifetime of that connection.</li>
<li>To wire things up, hand the protocol class to the <tt class="docutils literal">create_server</tt>, which
is called on an event loop instance. As we saw in the code snippet above
in <tt class="docutils literal">_accept_connection()</tt>, the <tt class="docutils literal">protocol_factory</tt> argument is called
with no args to create a protocol instance. While a class object works fine
for now, we're going to have to use a closure or a factory class to pass
arguments to the protocol when it's created.</li>
<li>The protocols themselves let you define methods that are invoked by the event
loop. That is <tt class="docutils literal">asyncio</tt> will call methods when there's a
<tt class="docutils literal">connection_made()</tt>, or there's <tt class="docutils literal">data_received</tt>. Looking at the
<a class="reference external" href="https://docs.python.org/3/library/asyncio-protocol.html#protocol-classes">Protocol classes</a>,
there appears to be a few more methods you can implement.</li>
</ul>
<p>Now that we understand the basics, we can start looking at the redis
wire protocol.</p>
</div>
</div>
<div class="section" id="parsing-the-wire-protocol">
<h2>Parsing the Wire Protocol</h2>
<p>First thing we're going to need to do properly handle requests is protocol
parser, this is the code that takes the redis request off the TCP socket and
parses it into something meaningful. This code for this isn't that
interesting. Reading the docs for the <a class="reference external" href="http://redis.io/topics/protocol">redis wire protocol</a>, it's straightforward to implement.</p>
<p>Now, none of this is optimized yet, but here's a basic implementation
of parsing the redis wire protocol. It accepts a byte string, and returns
python objects.</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">parse_wire_protocol</span><span class="p">(</span><span class="n">message</span><span class="p">):</span>
<span class="k">return</span> <span class="n">_parse_wire_protocol</span><span class="p">(</span><span class="n">io</span><span class="o">.</span><span class="n">BytesIO</span><span class="p">(</span><span class="n">message</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">_parse_wire_protocol</span><span class="p">(</span><span class="n">msg_buffer</span><span class="p">):</span>
<span class="n">current_line</span> <span class="o">=</span> <span class="n">msg_buffer</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span>
<span class="n">msg_type</span><span class="p">,</span> <span class="n">remaining</span> <span class="o">=</span> <span class="nb">chr</span><span class="p">(</span><span class="n">current_line</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">current_line</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">if</span> <span class="n">msg_type</span> <span class="o">==</span> <span class="s">'+'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">remaining</span><span class="o">.</span><span class="n">rstrip</span><span class="p">(</span><span class="n">b</span><span class="s">'</span><span class="se">\r\n</span><span class="s">'</span><span class="p">)</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span>
<span class="k">elif</span> <span class="n">msg_type</span> <span class="o">==</span> <span class="s">':'</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">remaining</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">msg_type</span> <span class="o">==</span> <span class="s">'$'</span><span class="p">:</span>
<span class="n">msg_length</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">remaining</span><span class="p">)</span>
<span class="k">if</span> <span class="n">msg_length</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">msg_buffer</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">msg_length</span><span class="p">)</span>
<span class="c"># There's a '\r\n' that comes after a bulk string</span>
<span class="c"># so we .readline() to move passed that crlf.</span>
<span class="n">msg_buffer</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span>
<span class="k">return</span> <span class="n">result</span>
<span class="k">elif</span> <span class="n">msg_type</span> <span class="o">==</span> <span class="s">'*'</span><span class="p">:</span>
<span class="n">array_length</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">remaining</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="n">_parse_wire_protocol</span><span class="p">(</span><span class="n">msg_buffer</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">array_length</span><span class="p">)]</span>
</pre></div>
<p>We're also going to need the inverse of this, something that takes a response
from fakeredis and converts it back into bytes that can be sent across the
wire. Again, nothing too interesting about this code, but here's what I came
up with:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">serialize_to_wire</span><span class="p">(</span><span class="n">value</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">str</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="s">'+</span><span class="si">%s</span><span class="s">'</span> <span class="o">%</span> <span class="n">value</span><span class="p">)</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span> <span class="o">+</span> <span class="n">b</span><span class="s">'</span><span class="se">\r\n</span><span class="s">'</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">bool</span><span class="p">)</span> <span class="ow">and</span> <span class="n">value</span><span class="p">:</span>
<span class="k">return</span> <span class="n">b</span><span class="s">"+OK</span><span class="se">\r\n</span><span class="s">"</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">int</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="s">':</span><span class="si">%s</span><span class="s">'</span> <span class="o">%</span> <span class="n">value</span><span class="p">)</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span> <span class="o">+</span> <span class="n">b</span><span class="s">'</span><span class="se">\r\n</span><span class="s">'</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">bytes</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="n">b</span><span class="s">'$'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">value</span><span class="p">))</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span> <span class="o">+</span>
<span class="n">b</span><span class="s">'</span><span class="se">\r\n</span><span class="s">'</span> <span class="o">+</span> <span class="n">value</span> <span class="o">+</span> <span class="n">b</span><span class="s">'</span><span class="se">\r\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">value</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="k">return</span> <span class="n">b</span><span class="s">'$-1</span><span class="se">\r\n</span><span class="s">'</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">list</span><span class="p">):</span>
<span class="n">base</span> <span class="o">=</span> <span class="n">b</span><span class="s">'*'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">value</span><span class="p">))</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span> <span class="o">+</span> <span class="n">b</span><span class="s">'</span><span class="se">\r\n</span><span class="s">'</span>
<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">value</span><span class="p">:</span>
<span class="n">base</span> <span class="o">+=</span> <span class="n">serialize_to_wire</span><span class="p">(</span><span class="n">item</span><span class="p">)</span>
<span class="k">return</span> <span class="n">base</span>
</pre></div>
<p>Let's try this out:</p>
<div class="highlight"><pre><span class="o">>>></span> <span class="n">set_request</span> <span class="o">=</span> <span class="n">b</span><span class="s">'*3</span><span class="se">\r\n</span><span class="s">$3</span><span class="se">\r\n</span><span class="s">set</span><span class="se">\r\n</span><span class="s">$3</span><span class="se">\r\n</span><span class="s">foo</span><span class="se">\r\n</span><span class="s">$3</span><span class="se">\r\n</span><span class="s">bar</span><span class="se">\r\n</span><span class="s">'</span>
<span class="o">>>></span> <span class="n">parse_wire_protocol</span><span class="p">(</span><span class="n">set_request</span><span class="p">)</span>
<span class="p">[</span><span class="n">b</span><span class="s">'set'</span><span class="p">,</span> <span class="n">b</span><span class="s">'foo'</span><span class="p">,</span> <span class="n">b</span><span class="s">'bar'</span><span class="p">]</span>
<span class="o">>>></span> <span class="n">serialize_to_wire</span><span class="p">([</span><span class="n">b</span><span class="s">'5'</span><span class="p">,</span> <span class="n">b</span><span class="s">'4'</span><span class="p">,</span> <span class="n">b</span><span class="s">'3'</span><span class="p">,</span> <span class="n">b</span><span class="s">'2'</span><span class="p">,</span> <span class="n">b</span><span class="s">'1'</span><span class="p">])</span>
<span class="n">b</span><span class="s">'*5</span><span class="se">\r\n</span><span class="s">$1</span><span class="se">\r\n</span><span class="s">5</span><span class="se">\r\n</span><span class="s">$1</span><span class="se">\r\n</span><span class="s">4</span><span class="se">\r\n</span><span class="s">$1</span><span class="se">\r\n</span><span class="s">3</span><span class="se">\r\n</span><span class="s">$1</span><span class="se">\r\n</span><span class="s">2</span><span class="se">\r\n</span><span class="s">$1</span><span class="se">\r\n</span><span class="s">1</span><span class="se">\r\n</span><span class="s">'</span>
</pre></div>
<p>After calling the <tt class="docutils literal">parse_wire_protocol</tt> we can see that get a list of
<tt class="docutils literal">[command_name, arg1, arg2, <span class="pre">...]</span></tt>.</p>
</div>
<div class="section" id="implementing-the-protocol-class">
<h2>Implementing the Protocol Class</h2>
<p>We should have everything we need to make a more realistic
<tt class="docutils literal">RedisServerProtocol</tt> class now. We're making the assumption
for now that the entire command is provided when
<tt class="docutils literal">data_received</tt> is called.</p>
<div class="highlight"><pre><span class="k">class</span> <span class="nc">RedisServerProtocol</span><span class="p">(</span><span class="n">asyncio</span><span class="o">.</span><span class="n">Protocol</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">redis</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_redis</span> <span class="o">=</span> <span class="n">redis</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">def</span> <span class="nf">connection_made</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">transport</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span> <span class="o">=</span> <span class="n">transport</span>
<span class="k">def</span> <span class="nf">data_received</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">parsed</span> <span class="o">=</span> <span class="n">parse_wire_protocol</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="c"># parsed is an array of [command, *args]</span>
<span class="n">command</span> <span class="o">=</span> <span class="n">parsed</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">method</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_redis</span><span class="p">,</span> <span class="n">command</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span>
<span class="n">b</span><span class="s">"-ERR unknown command "</span> <span class="o">+</span> <span class="n">parsed</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">b</span><span class="s">"</span><span class="se">\r\n</span><span class="s">"</span><span class="p">)</span>
<span class="k">return</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">method</span><span class="p">(</span><span class="o">*</span><span class="n">parsed</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="n">serialized</span> <span class="o">=</span> <span class="n">serialize_to_wire</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">serialized</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">WireRedisConverter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">redis</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_redis</span> <span class="o">=</span> <span class="n">redis</span>
<span class="k">def</span> <span class="nf">lrange</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">start</span><span class="p">,</span> <span class="n">end</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_redis</span><span class="o">.</span><span class="n">lrange</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">start</span><span class="p">),</span> <span class="nb">int</span><span class="p">(</span><span class="n">end</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">hmset</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="n">converted</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">iter_args</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">args</span><span class="p">))</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">iter_args</span><span class="p">,</span> <span class="n">iter_args</span><span class="p">):</span>
<span class="n">converted</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">val</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_redis</span><span class="o">.</span><span class="n">hmset</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">converted</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">__getattr__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_redis</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
</pre></div>
<p>The most important part here is the <tt class="docutils literal">data_received</tt> method. Note that the
first thing we do is take the bytes data we're given and immediately parse that
into a python list using our <tt class="docutils literal">parse_wire_protocol</tt>. The next thing we do is
try to look for a corresponding method in the <tt class="docutils literal">WireRedisConverter</tt> class
based on the command we've been given. The <tt class="docutils literal">WireRedisConverter</tt> class takes
the parsed python list we receive from clients and maps that into the
appropriate calls into fakeredis. For example:</p>
<pre class="literal-block">
HMSET myhash field1 "Hello" <- redis-cli
['hmset', 'myhash', 'field1', 'Hello'] <- parsed
WireRedisConverter.hmset('myhash', 'field1', 'Hello')
FakeRedis.hmset('myhash', {'field1': 'Hello'})
</pre>
<p>I've only shown a portion of <tt class="docutils literal">WireRedisConverter</tt>, but there's enough to
give you the basic idea of how a python list maps is then mapped to
<tt class="docutils literal">fakeredis</tt> calls.</p>
<p>And finally, we serialize the python response back to bytes using
<tt class="docutils literal">serialize_to_wire</tt> and write this value out to the <tt class="docutils literal">transport</tt> we received
from <tt class="docutils literal">connection_made</tt>.</p>
</div>
<div class="section" id="wiring-up-the-protocol-class">
<h2>Wiring Up the Protocol Class</h2>
<p>We'll also need to make a change to our main function, mostly in how we wire
up the <tt class="docutils literal">RedisServerProtocol</tt>:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">hostname</span><span class="o">=</span><span class="s">'localhost'</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">6379</span><span class="p">):</span>
<span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="n">wrapped_redis</span> <span class="o">=</span> <span class="n">WireRedisConverter</span><span class="p">(</span><span class="n">fakeredis</span><span class="o">.</span><span class="n">FakeStrictRedis</span><span class="p">())</span>
<span class="n">bound_protocol</span> <span class="o">=</span> <span class="n">functools</span><span class="o">.</span><span class="n">partial</span><span class="p">(</span><span class="n">RedisServerProtocol</span><span class="p">,</span>
<span class="n">wrapped_redis</span><span class="p">)</span>
<span class="n">coro</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_server</span><span class="p">(</span><span class="n">bound_protocol</span><span class="p">,</span>
<span class="n">hostname</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span>
<span class="n">server</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">coro</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Listening on port {}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_forever</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">KeyboardInterrupt</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"User requested shutdown."</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">server</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">server</span><span class="o">.</span><span class="n">wait_closed</span><span class="p">())</span>
<span class="n">loop</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Redis is now ready to exit."</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span>
</pre></div>
<p>The biggest difference here is that we're using <tt class="docutils literal">functools.partial</tt>
so that we can pass in our wrapped fakeredis instance to the
<tt class="docutils literal">RedisServerProtocol</tt> class whenever it's created. As we saw earlier, the
<tt class="docutils literal">protocol_factory</tt> is called with no args and is expected to return a
protocol instance. While we could write a protocol factory class, we're using
<tt class="docutils literal">functools.partial</tt> because that's all we need for now.</p>
</div>
<div class="section" id="testing-it-out">
<h2>Testing it Out</h2>
<p>And finally, we should have something that vaguely resembles redis. Let's try
it out:</p>
<pre class="literal-block">
$ ./redis-asyncio &
[1] 55470
$ redis-cli
127.0.0.1:6379> set foo bar
OK
127.0.0.1:6379> get foo
"bar"
127.0.0.1:6379> set foo baz
OK
127.0.0.1:6379> get foo
"baz"
127.0.0.1:6379> lpush abc 1
(integer) 1
127.0.0.1:6379> lpush abc 2
(integer) 2
127.0.0.1:6379> lpush abc 3
(integer) 3
127.0.0.1:6379> lrange abc 0 -1
1) "3"
2) "2"
3) "1"
127.0.0.1:6379> hmset myhash field1 "hello" field2 "world"
OK
127.0.0.1:6379> hget myhash field1
"hello"
127.0.0.1:6379> hget myhash field2
"world"
127.0.0.1:6379> sadd myset "Hello"
(integer) 1
127.0.0.1:6379> sadd myset "World"
(integer) 1
127.0.0.1:6379> sadd myset "World"
(integer) 0
127.0.0.1:6379> smembers myset
1) "Hello"
2) "World"
</pre>
<p>Let's even try talking to <tt class="docutils literal"><span class="pre">./redis-asyncio</span></tt> using the <tt class="docutils literal"><span class="pre">redis-py</span></tt> module:</p>
<div class="highlight"><pre><span class="o">>>></span> <span class="kn">import</span> <span class="nn">redis</span>
<span class="o">>>></span> <span class="n">r</span> <span class="o">=</span> <span class="n">redis</span><span class="o">.</span><span class="n">Redis</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s">'foo'</span><span class="p">,</span> <span class="s">'bar'</span><span class="p">)</span>
<span class="bp">True</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">'foo'</span><span class="p">)</span>
<span class="n">b</span><span class="s">'bar'</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">lpush</span><span class="p">(</span><span class="s">'mylist'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="mi">1</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">lpush</span><span class="p">(</span><span class="s">'mylist'</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="mi">2</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">lrange</span><span class="p">(</span><span class="s">'mylist'</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="p">[</span><span class="n">b</span><span class="s">'2'</span><span class="p">,</span> <span class="n">b</span><span class="s">'1'</span><span class="p">]</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">sadd</span><span class="p">(</span><span class="s">'myset'</span><span class="p">,</span> <span class="s">'hello'</span><span class="p">)</span>
<span class="mi">1</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">sadd</span><span class="p">(</span><span class="s">'myset'</span><span class="p">,</span> <span class="s">'world'</span><span class="p">)</span>
<span class="mi">1</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">sadd</span><span class="p">(</span><span class="s">'myset'</span><span class="p">,</span> <span class="s">'world'</span><span class="p">)</span>
<span class="mi">0</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">smembers</span><span class="p">(</span><span class="s">'myset'</span><span class="p">)</span>
<span class="p">{</span><span class="n">b</span><span class="s">'world'</span><span class="p">,</span> <span class="n">b</span><span class="s">'hello'</span><span class="p">}</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">hmset</span><span class="p">(</span><span class="s">'myhash'</span><span class="p">,</span> <span class="p">{</span><span class="s">'a'</span><span class="p">:</span> <span class="s">'b'</span><span class="p">,</span> <span class="s">'c'</span><span class="p">:</span> <span class="s">'d'</span><span class="p">})</span>
<span class="bp">True</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">hget</span><span class="p">(</span><span class="s">'myhash'</span><span class="p">,</span> <span class="s">'a'</span><span class="p">)</span>
<span class="n">b</span><span class="s">'b'</span>
<span class="o">>>></span> <span class="n">r</span><span class="o">.</span><span class="n">hget</span><span class="p">(</span><span class="s">'myhash'</span><span class="p">,</span> <span class="s">'c'</span><span class="p">)</span>
<span class="n">b</span><span class="s">'d'</span>
</pre></div>
</div>
<div class="section" id="wrapping-up">
<h2>Wrapping Up</h2>
<p>In this post, we looked at getting a basic redis implementation up and running
using <tt class="docutils literal">asyncio</tt> and <tt class="docutils literal">fakeredis</tt>. We were able to run basic commands such
as <tt class="docutils literal">get, set, lpush, lrange, sadd, smembers, hmset, hget</tt>, etc.</p>
<p>In the next post, we'll look at implementing blocking operations such as
<tt class="docutils literal">BLPOP</tt>.</p>
<p>And, because you're probably just as curious as I was, here's a benchmark
comparison between what we've written and the real redis server. These benchmarks
were run on the same machine so it's the relative difference that's interesting
to me. I wouldn't read too much into it though.</p>
<div class="section" id="redis-benchmark-t-set-n-200000">
<h3>redis-benchmark -t set -n 200000</h3>
<table class="u-full-width">
<thead>
<tr>
<th>redis-server</th>
<th>redis-asyncio</th>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top"><pre>
====== SET ======
200000 requests completed
in 1.52 seconds
50 parallel clients
3 bytes payload
keep alive: 1
99.97% <= 1 milliseconds
100.00% <= 1 milliseconds
131926.12 requests per second
</pre></td>
<td style="vertical-align: top"><pre>
====== SET ======
200000 requests completed
in 5.17 seconds
50 parallel clients
3 bytes payload
keep alive: 1
0.25% <= 1 milliseconds
99.48% <= 2 milliseconds
99.96% <= 3 milliseconds
99.98% <= 4 milliseconds
99.99% <= 5 milliseconds
99.99% <= 6 milliseconds
99.99% <= 7 milliseconds
99.99% <= 8 milliseconds
99.99% <= 9 milliseconds
99.99% <= 10 milliseconds
99.99% <= 11 milliseconds
99.99% <= 12 milliseconds
99.99% <= 13 milliseconds
99.99% <= 14 milliseconds
99.99% <= 15 milliseconds
99.99% <= 16 milliseconds
99.99% <= 18 milliseconds
100.00% <= 19 milliseconds
100.00% <= 20 milliseconds
100.00% <= 21 milliseconds
100.00% <= 22 milliseconds
100.00% <= 24 milliseconds
100.00% <= 25 milliseconds
100.00% <= 26 milliseconds
100.00% <= 27 milliseconds
100.00% <= 29 milliseconds
100.00% <= 30 milliseconds
100.00% <= 32 milliseconds
38654.81 requests per second
</pre></td>
</tr>
</tbody>
</table></div>
<div class="section" id="redis-benchmark-t-get-n-200000">
<h3>redis-benchmark -t get -n 200000</h3>
<table class="u-full-width">
<thead>
<tr>
<th>redis-server</th>
<th>redis-asyncio</th>
</tr>
</thead>
<tbody>
<tr>
<td style="vertical-align: top"><pre>
====== GET ======
200000 requests completed
in 1.53 seconds
50 parallel clients
3 bytes payload
keep alive: 1
100.00% <= 0 milliseconds
130975.77 requests per second
</pre></td>
<td style="vertical-align: top"><pre>
====== GET ======
200000 requests completed
in 6.42 seconds
50 parallel clients
3 bytes payload
keep alive: 1
0.18% <= 1 milliseconds
96.55% <= 2 milliseconds
99.83% <= 3 milliseconds
99.98% <= 4 milliseconds
99.99% <= 5 milliseconds
99.99% <= 6 milliseconds
99.99% <= 7 milliseconds
99.99% <= 8 milliseconds
99.99% <= 9 milliseconds
99.99% <= 10 milliseconds
99.99% <= 11 milliseconds
99.99% <= 12 milliseconds
99.99% <= 13 milliseconds
99.99% <= 14 milliseconds
99.99% <= 15 milliseconds
99.99% <= 17 milliseconds
99.99% <= 18 milliseconds
99.99% <= 19 milliseconds
99.99% <= 21 milliseconds
99.99% <= 22 milliseconds
99.99% <= 23 milliseconds
100.00% <= 24 milliseconds
100.00% <= 26 milliseconds
100.00% <= 27 milliseconds
100.00% <= 29 milliseconds
100.00% <= 31 milliseconds
100.00% <= 32 milliseconds
100.00% <= 34 milliseconds
100.00% <= 35 milliseconds
100.00% <= 37 milliseconds
100.00% <= 39 milliseconds
100.00% <= 41 milliseconds
31157.50 requests per second
</pre></td>
</tr>
</tbody>
</table></div>
</div>
Micro-Optimizations in Python Code: Speeding Up Lookups2015-02-04T19:30:00-08:00James Saryerwinnietag:jamesls.com,2015-02-04:micro-optimizations-in-python-code-speeding-up-lookups.html<p>I'm going to show you how a micro optimization can speed up your python code by
a whopping 5%. 5%! It can also annoy anyone that has to maintain your code.</p>
<p>But really, this is about explaining code might you see occasionally
see in the standard library or in other people's code. Let's take an example
from the standard library, specifically the <tt class="docutils literal">collections.OrderedDict</tt> class:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">__setitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">dict_setitem</span><span class="o">=</span><span class="nb">dict</span><span class="o">.</span><span class="n">__setitem__</span><span class="p">):</span>
<span class="k">if</span> <span class="n">key</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">:</span>
<span class="n">root</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">__root</span>
<span class="n">last</span> <span class="o">=</span> <span class="n">root</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">last</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">root</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">__map</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="n">last</span><span class="p">,</span> <span class="n">root</span><span class="p">,</span> <span class="n">key</span><span class="p">]</span>
<span class="k">return</span> <span class="n">dict_setitem</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
</pre></div>
<p>Notice the last arg: <tt class="docutils literal">dict_setitem=dict.__setitem__</tt>. It makes sense if you
think about it. To associate a key with a value, you'll need to provide a
<tt class="docutils literal">__setitem__</tt> method which takes three arguments: the key you're setting, the
value associated with the key, and the <tt class="docutils literal">__setitem__</tt> class method to the
built in dict class. Wait. Ok maybe the last argument makes no sense.</p>
<div class="section" id="scope-lookups">
<h2>Scope Lookups</h2>
<p>To understand what's going on here, we need to take a look at scopes. Let's
start with a simple question, if I'm in a python function, and I encounter
something named <tt class="docutils literal">open</tt>, how does python go about figuring out the value of
<tt class="docutils literal">open</tt>?</p>
<div class="highlight"><pre><span class="c"># <GLOBAL: bunch of code here></span>
<span class="k">def</span> <span class="nf">myfunc</span><span class="p">():</span>
<span class="c"># <LOCAL: bunch of code here></span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'foo.txt'</span><span class="p">,</span> <span class="s">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">pass</span>
</pre></div>
<p>The short answer is that without knowing the contents of the GLOBAL and the
LOCAL section, you can't know for certain the value of <tt class="docutils literal">open</tt>. Conceptually,
python checks three namespaces for a name (ignoring nested scopes to keep
things simple):</p>
<ul class="simple">
<li>locals</li>
<li>globals</li>
<li>builtin</li>
</ul>
<p>So in the <tt class="docutils literal">myfunc</tt> function, if we're trying to find a value for <tt class="docutils literal">open</tt>,
we'll first check the local namespace, then the globals namespace, then the
builtins namespace. And if <tt class="docutils literal">open</tt> is not defined in any namespace, a
<tt class="docutils literal">NameError</tt> is raised.</p>
</div>
<div class="section" id="scope-lookups-the-implementation">
<h2>Scope Lookups, the Implementation</h2>
<p>The lookup process above is just conceptual. The implementation of this
lookup process gives us room to exploit the implementation.</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">a</span>
<span class="k">def</span> <span class="nf">bar</span><span class="p">():</span>
<span class="k">return</span> <span class="n">a</span>
<span class="k">def</span> <span class="nf">baz</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span>
</pre></div>
<p>Let's look at the bytecode of each function:</p>
<pre class="literal-block">
>>> import dis
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (a)
3 6 LOAD_FAST 0 (a)
9 RETURN_VALUE
>>> dis.dis(bar)
2 0 LOAD_GLOBAL 0 (a)
3 RETURN_VALUE
>>> dis.dis(baz)
2 0 LOAD_FAST 0 (a)
3 RETURN_VALUE
</pre>
<p>Look at the differences between foo and bar. Right away we can
see that at the bytecode level python has already determined
what's a local variable and what is not because <tt class="docutils literal">foo</tt> is using
<tt class="docutils literal">LOAD_FAST</tt> and <tt class="docutils literal">bar</tt> is using <tt class="docutils literal">LOAD_GLOBAL</tt>.</p>
<p>We won't get into the details of how python's compiler knows when to emit which
bytecode (perhaps that's another post), but suffice to say python knows which
type of lookup it needs to perform when it executes a function.</p>
<p>One other thing that can be confusing is that <tt class="docutils literal">LOAD_GLOBAL</tt> is used
for lookups in the global as well as the builtin namespace. You can
think of this as "not local", again ignoring the issue of nested scopes.
The C code for this is roughly <a class="footnote-reference" href="#id3" id="id1">[1]</a>:</p>
<div class="highlight"><pre><span class="k">case</span> <span class="nl">LOAD_GLOBAL</span><span class="p">:</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">PyObject_GetItem</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">f_globals</span><span class="p">,</span> <span class="n">name</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">PyObject_GetItem</span><span class="p">(</span><span class="n">f</span><span class="o">-></span><span class="n">f_builtins</span><span class="p">,</span> <span class="n">name</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">v</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">PyErr_ExceptionMatches</span><span class="p">(</span><span class="n">PyExc_KeyError</span><span class="p">))</span>
<span class="n">format_exc_check_arg</span><span class="p">(</span>
<span class="n">PyExc_NameError</span><span class="p">,</span>
<span class="n">NAME_ERROR_MSG</span><span class="p">,</span> <span class="n">name</span><span class="p">);</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">PUSH</span><span class="p">(</span><span class="n">v</span><span class="p">);</span>
</pre></div>
<p>Even if you've never seen any of the C code for CPython, the above code is
pretty straightforward. First, check if the key name we're looking for is in
<tt class="docutils literal"><span class="pre">f->f_globals</span></tt> (the globals dict), then check if the name is in
<tt class="docutils literal"><span class="pre">f->f_builtins</span></tt> (the builtins dict), and finally, raise a <tt class="docutils literal">NameError</tt> if
both checks failed.</p>
</div>
<div class="section" id="binding-constants-to-the-local-scope">
<h2>Binding Constants to the Local Scope</h2>
<p>Now when we look at the initial code sample, we can see that the
last argument is binding a function into the local scope of a function.
It does this by assigning a value, <tt class="docutils literal">dict.__setitem__</tt>, as the default
value of an argument. Here's another example:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">not_list_or_dict</span><span class="p">(</span><span class="n">value</span><span class="p">):</span>
<span class="k">return</span> <span class="ow">not</span> <span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">dict</span><span class="p">)</span> <span class="ow">or</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="nb">list</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">not_list_or_dict</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">_isinstance</span><span class="o">=</span><span class="nb">isinstance</span><span class="p">,</span> <span class="n">_dict</span><span class="o">=</span><span class="nb">dict</span><span class="p">,</span> <span class="n">_list</span><span class="o">=</span><span class="nb">list</span><span class="p">):</span>
<span class="k">return</span> <span class="ow">not</span> <span class="p">(</span><span class="n">_isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">_dict</span><span class="p">)</span> <span class="ow">or</span> <span class="n">_isinstance</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">_list</span><span class="p">))</span>
</pre></div>
<p>We're doing the same thing here, binding what would normally be objects
that are in the builtin namespace into the local namespace instead.
So instead of requiring the use of <tt class="docutils literal">LOAD_GLOBAL</tt> (a global lookup),
python instead will use <tt class="docutils literal">LOCAL_FAST</tt>. So
how much faster is this? Let's do some crude testing:</p>
<pre class="literal-block">
$ python -m timeit -s 'def not_list_or_dict(value): return not (isinstance(value, dict) or isinstance(value, list))' 'not_list_or_dict(50)'
1000000 loops, best of 3: 0.48 usec per loop
$ python -m timeit -s 'def not_list_or_dict(value, _isinstance=isinstance, _dict=dict, _list=list): return not (_isinstance(value, _dict) or _isinstance(value, _list))' 'not_list_or_dict(50)'
1000000 loops, best of 3: 0.423 usec per loop
</pre>
<p>Or in other words, <strong>that's an 11.9% improvement</strong> <a class="footnote-reference" href="#id4" id="id2">[2]</a>. That's way more than the
5% I promised at the beginning of this post!</p>
</div>
<div class="section" id="there-s-more-to-the-story">
<h2>There's More to the Story</h2>
<p>It's reasonable to think that the speed improvment is because <tt class="docutils literal">LOAD_FAST</tt>
reads from the local namespace whereas <tt class="docutils literal">LOAD_GLOBAL</tt> will first check the
global namespace before falling back to checking the builtin namespace. And in
the example function above, <tt class="docutils literal">isinstance</tt>, <tt class="docutils literal">dict</tt>, and <tt class="docutils literal">list</tt> all come
from the built in namespace.</p>
<p>However, there's more going on. Not only are we able to skip additional lookup
with <tt class="docutils literal">LOAD_FAST</tt>, <strong>it's also a different type of lookup</strong>.</p>
<p>The C code snippet above showed the code for <tt class="docutils literal">LOAD_GLOBAL</tt>, but here's the
code for <tt class="docutils literal">LOAD_FAST</tt>:</p>
<div class="highlight"><pre><span class="k">case</span> <span class="nl">LOAD_FAST</span><span class="p">:</span>
<span class="n">PyObject</span> <span class="o">*</span><span class="n">value</span> <span class="o">=</span> <span class="n">fastlocal</span><span class="p">[</span><span class="n">oparg</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">format_exc_check_arg</span><span class="p">(</span><span class="n">PyExc_UnboundLocalError</span><span class="p">,</span>
<span class="n">UNBOUNDLOCAL_ERROR_MSG</span><span class="p">,</span>
<span class="n">PyTuple_GetItem</span><span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">co_varnames</span><span class="p">,</span> <span class="n">oparg</span><span class="p">));</span>
<span class="k">goto</span> <span class="n">error</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">PUSH</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">FAST_DISPATCH</span><span class="p">();</span>
</pre></div>
<p>We're retrieving the local value by indexing into an <em>array</em>. It's not shown
here, but <tt class="docutils literal">oparg</tt> is just an index into that array.</p>
<p>Now it's starting to make sense. In our first version <tt class="docutils literal">not_list_or_dict</tt>
had to perform 4 lookups, and each name was in the builtins namespace which
we only look at after looking in the globals namespace. That's 8 dictionary
key lookups. Compare that to directly indexing into a C array 4 times,
which is what happens in the second version of <tt class="docutils literal">not_list_or_dict</tt>, which
all use <tt class="docutils literal">LOAD_FAST</tt> under the hood. This is why lookups in the local
namespace are faster.</p>
</div>
<div class="section" id="wrapping-up">
<h2>Wrapping Up</h2>
<p>Now the next time you see this in someone else's code you'll know what's
going on.</p>
<p>And one final thing. Please don't actually do these kinds of optimizations
unless you really need to. And most of the time you don't need to. But when
the time really comes, and you really need to squeeze out every last bit of
performance, you'll have this in your back pocket.</p>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id3" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td>Though keep in mind that I removed some performance optimizations
in the above code to make it simpler to read. The real code is
slightly more complicated.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id2">[2]</a></td><td>On a toy example for a function that doesn't really do anything
interesting nor does it perform any IO and is mostly bound by the
python VM loop.</td></tr>
</tbody>
</table>
</div>
</div>
How to Easily Explore JMESPath on the Command Line2015-01-24T09:00:00-08:00James Saryerwinnietag:jamesls.com,2015-01-24:how-to-easily-explore-jmespath-on-the-command-line.html<p><a class="reference external" href="http://jamesls.org">JMESPath is an expression language</a> that allows you to
manipulate JSON. From selecting specific keys from a hash or only selecting
keys based on certain filter criteria, JMESPath gives you a lot of power
when working with JSON.</p>
<p>In my experience, the quickest way to get up to speed with a language is
to try the language out. The
<a class="reference external" href="http://jmespath.org/tutorial.html">JMESPath tutorial</a> gives you a brief
introduction to the language, but to really solidify the concepts you really
just need to spend some time experimenting with the language.</p>
<p>You <em>could</em> accomplish this by using one of the existing
<a class="reference external" href="http://jmespath.org/libraries.html">JMESPath libraries</a>, but there's an
easier to way to accomplish this. You can use the
<a class="reference external" href="https://github.com/jmespath/jmespath.terminal">JMESPath terminal</a>.
Either specify what JSON file to use or pipe the JSON document into
the <tt class="docutils literal"><span class="pre">jmespath-terminal</span></tt> command.</p>
<p>The <a class="reference external" href="https://github.com/jmespath/jmespath.terminal#getting-started">JMESPath terminal README</a>
has instructions on getting setup and how to use the JMESPath terminal.</p>
<p>Check it out, and feel free to leave any feedback and suggestions
on the <a class="reference external" href="https://github.com/jmespath/jmespath.terminal/issues">issue tracker</a>.</p>
Semidbm: 0.4.0 Released2013-05-29T10:32:00-07:00James Saryerwinnietag:jamesls.com,2013-05-29:semidbm-040-released.html<p>I've just released 0.4.0 of semidbm. This represents a number of really
cool features. See the <a class="reference external" href="http://semidbm.readthedocs.org/en/latest/changelog.html#id1">full changelog</a>
for more details.</p>
<p>One of the biggest features is python 3 support. I
was worried about not introducing a performance regression by supporting
python 3. Fortunately, this was <strong>not</strong> the case.</p>
<p>In fact, performance
<em>increased</em>. This was possible for a number of reasons. First, the index file
and data file were combined into a single file. This means that a
<tt class="docutils literal">__setitem__</tt> call results in only a single <tt class="docutils literal">write()</tt> call. Also, semidbm
now uses a binary format. This results in a more compact form and it's easier
to create the sequence of bytes we need to write out to disk. This is also
including the fact that semidbm now includes checksum data for each write that
occurs.</p>
<p><a class="reference external" href="https://pypi.python.org/pypi/semidbm/0.4.0">Try it out</a> for yourself.</p>
<div class="section" id="what-s-next">
<h2>What's Next?</h2>
<p>I think at this time, semidbm has more than exceeded it's original goal, which
was to be a pure python cross platform key value storage that had reasonable
performance. So what's next for semidbm? In a nutshell, higher level
abstractions (aka the "fun stuff"). Code that builds on the simple key value storage of
<tt class="docutils literal">semidbm.db</tt> and provides additional features. And as we get higher level, I
think it makes sense to reevaluate the original goals of semidbm and whether or
not it makes sense to carry those goals forward:</p>
<ul class="simple">
<li>Cross platform. I'm inclined to not support windows for these higher level
abstractions.</li>
<li>Pure python. I think the big reason for remaining pure python was for ease
of installation. Especially on windows, pip installing a package should just
work. With C extensions, this becomes much harder on windows. If semidbm
isn't going to support windows for these higher level abstractions, then C
extensions are fair game.</li>
</ul>
<p>Some ideas I've been considering:</p>
<ul class="simple">
<li>A C version of <tt class="docutils literal">_Semidbm</tt>.</li>
<li>A dict like interface that is concurrent (possibly single writer multiple
reader).</li>
<li>A sorted version of semidbm (supporting things like range queries).</li>
<li>Caching reads (need an efficient LRU cache).</li>
<li>Automatic background compaction of data file.</li>
<li>Batched writes</li>
<li>Transactions</li>
<li>Compression (I played around with this earlier. Zlib turned out to be too
slow for the smaller sized values (~100 bytes) but it might be worth being
able to configure this on a per db basis.</li>
</ul>
</div>
Trusting a Fake: Why Use Fakeredis2012-11-18T17:46:00-08:00James Saryerwinnietag:jamesls.com,2012-11-18:trusting-a-fake-why-use-fakeredis.html<p>Now that <a class="reference external" href="http://jamesls.com/blog/2012/11/18/fakeredis-0-dot-3-0-released/">fakeredis 0.3.0 is
out</a>
I think it's a good time to discuss the finer points of fakeredis, and
why you should consider using it for your redis unit testing needs.</p>
<p>What exactly is fakeredis? Other than the pedantic naming of "fake"
instead of "mock", it is an in memory implementation of the redis client
used for python. This allows you to write tests that use the redis-py
client interface without having to have redis running.</p>
<p>Setting up redis is not hard, even compiling from source is easy;
there's not even a <tt class="docutils literal">./configure</tt> step! But unit tests should require
no configuration to run. Someone should be able to checkout/clone the
repo, and be able to run your unit tests.</p>
<p>There's one big problem with writing fakes:</p>
<p><strong>How do you know your fake implementation matches the real
implementation?</strong></p>
<p>Fakeredis verifies this in a simple way. First, there's unit tests for
fakeredis. And for every unit test for fakeredis, there's the equivalent
integration test that actually talks to a real redis server. This
ensures that every single test for fakeredis has the <em>exact</em> same
behavior as real redis. There's nothing worse than writing unit tests
against a fake implementation only to find out that the real
implementation is actually different!</p>
<p>In fakeredis, this is implemented with a factory method pattern. The
fakeredis tests instantiate a <tt class="docutils literal">fakeredis.FakeRedis</tt> class while the
real redis integration tests instantiate a <tt class="docutils literal">redis.Redis</tt> instance:</p>
<div class="highlight"><pre><span class="k">class</span> <span class="nc">TestFakeRedis</span><span class="p">(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">setUp</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">redis</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">create_redis</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">create_redis</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">db</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
<span class="k">return</span> <span class="n">fakeredis</span><span class="o">.</span><span class="n">FakeStrictRedis</span><span class="p">(</span><span class="n">db</span><span class="o">=</span><span class="n">db</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_set_then_get</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">redis</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s">'foo'</span><span class="p">,</span> <span class="s">'bar'</span><span class="p">),</span> <span class="bp">True</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertEqual</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">redis</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">'foo'</span><span class="p">),</span> <span class="s">'bar'</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">TestRealRedis</span><span class="p">(</span><span class="n">TestFakeRedis</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">create_redis</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">db</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
<span class="k">return</span> <span class="n">redis</span><span class="o">.</span><span class="n">Redis</span><span class="p">(</span><span class="s">'localhost'</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">6379</span><span class="p">,</span> <span class="n">db</span><span class="o">=</span><span class="n">db</span><span class="p">)</span>
</pre></div>
<p>Now every test written in the <tt class="docutils literal">TestFakeRedis</tt> class will be
automatically run against both a <tt class="docutils literal">FakeRedis</tt> instance and a <tt class="docutils literal">Redis</tt>
instance, ensuring parity between the two.</p>
<p>This also makes it easier for contributors. If they notice an
inconsistency between fakeredis and redis, they only need to write a
single test and they'll have a simple repro that shows that the test
passes for redis but fails against FakeRedis.</p>
<p>And finally test coverage. Every single implemented command in fakeredis
has test cases. I only accept contributions for bug fixes/new features
if they have tests. I normally don't worry about actual coverage
numbers, but out of curiosity I checked what those numbers actually
were:</p>
<pre class="literal-block">
$ coverage report fakeredis.py
Name Stmts Miss Cover
-------------------------------
fakeredis 640 19 97%
</pre>
<p>Not bad. Most of the missing lines are either unimplemented commands
(pass statements counted as missing coverage) or precondition checks
such as:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">zadd</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="c"># ...</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">redis</span><span class="o">.</span><span class="n">RedisError</span><span class="p">(</span><span class="s">"ZADD requires an equal number of "</span>
<span class="s">"values and scores"</span><span class="p">)</span>
</pre></div>
<p>I do plan on going through and ensuring that all of these precondition
checks have tests.</p>
<p>So the next time you're looking for a fake implementation of redis,
consider fakeredis.</p>
fakeredis 0.3.0 released2012-11-18T15:21:00-08:00James Saryerwinnietag:jamesls.com,2012-11-18:fakeredis-030-released.html<p>A new release of <a class="reference external" href="http://pypi.python.org/pypi/fakeredis">fakeredis</a>
is out. This 0.3.0 release adds:</p>
<ul class="simple">
<li>Support for redis 2.6.</li>
<li>Improved support for pipelines/watch/multi/exec.</li>
<li>Full support for variadic commands.</li>
<li>Better consistency with the actual behavior of redis.</li>
</ul>
<p>And of course, a handful of bug fixes. This release was tested against:</p>
<ul class="simple">
<li>redis 2.6.4</li>
<li>redis-py 2.6.2</li>
<li>python 2.7.3, 2.6</li>
</ul>
<p>You can install fakeredis via <tt class="docutils literal">pip install fakeredis</tt>. Also check out:</p>
<ul class="simple">
<li><a class="reference external" href="http://pypi.python.org/pypi/fakeredis">pypi</a></li>
<li><a class="reference external" href="https://github.com/jamesls/fakeredis/">github repo</a></li>
<li><a class="reference external" href="https://github.com/jamesls/fakeredis/issues">report issues</a></li>
</ul>
Troubleshooting Python Code2012-03-21T15:07:00-07:00James Saryerwinnietag:jamesls.com,2012-03-21:troubleshooting-python-code.html<p><strong>MY PYTHON CODE ISN'T WORKING!!</strong> We've all been there right? This is a
series where I'll share miscellaneous tips I've learned for
troubleshooting python code. This is aimed at people who are relatively
new to python. In this first series, I'd like to cover one of those
common things you'll run into: <strong>the traceback.</strong></p>
<div class="section" id="reading-python-tracebacks">
<h2>Reading Python Tracebacks</h2>
<p>Many times an error in python code is accompanied by a traceback. If you
want to get really good at troubleshooting python programs, you'll need
to become really comfortable with reading a traceback. You should be
able to look at a traceback and have a general idea of what's happening
in the traceback. One of the things I always notice when working with
people new to python is how puzzled they look when they first see
tracebacks.</p>
<p>So let's work through an example. Consider this script:</p>
<div class="highlight"><pre><span class="kn">import</span> <span class="nn">httplib2</span>
<span class="k">def</span> <span class="nf">a</span><span class="p">():</span>
<span class="n">b</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">b</span><span class="p">():</span>
<span class="n">c</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">c</span><span class="p">():</span>
<span class="n">d</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">d</span><span class="p">():</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">httplib2</span><span class="o">.</span><span class="n">Http</span><span class="p">()</span>
<span class="n">h</span><span class="o">.</span><span class="n">request</span><span class="p">(</span><span class="n">uri</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
<span class="n">a</span><span class="p">()</span>
</pre></div>
<p>When this script is run we get this traceback:</p>
<pre class="literal-block">
Traceback (most recent call last):
File "issue.py", line 19, in <module>
a()
File "issue.py", line 5, in a
b()
File "issue.py", line 9, in b
c()
File "issue.py", line 12, in c
d()
File "issue.py", line 16, in d
h.request(uri=None)
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request
(scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 206, in urlnorm
(scheme, authority, path, query, fragment) = parse_uri(uri)
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 202, in parse_uri
groups = URI.match(uri).groups()
TypeError: expected string or buffer
</pre>
<p>While this can look intimidating at first, there's a few basic things to
remember when reading a traceback:</p>
<ul class="simple">
<li>The oldest frame in the stack is at the top, and the newest frame is
at the bottom. This means that <strong>the bottom of the traceback output
is where the uncaught exception was originally raised.</strong> This is the
opposite of other languages such as java and c/c++ where the first
line shows the newest frame (the frame where the uncaught exception
originated).</li>
<li>Pay attention to the filenames associated with each level of the
traceback, and pay attention where the frames jump across modules and
package "types" (more on this later).</li>
<li>Read the bottom most line to read the actual exception message.</li>
<li>Above all, remember that the traceback alone may not be sufficient to
understand what went wrong.</li>
</ul>
<p>So let's see how we can apply these steps to the traceback above. First,
let's use the first item: the stack frames go from oldest frame at the
beginning of the output to the newest frame at the bottom. To be
absolutely clear, in the above code, the call chain is:
<tt class="docutils literal">a() <span class="pre">-></span> b() <span class="pre">-></span> c() <span class="pre">-></span> d() <span class="pre">-></span> httplib2.Http.request</tt>. The oldest stack
frame is associated with the <tt class="docutils literal">a()</tt> function call (it's the call the
triggered all the remaining calls), and the newest stack frame is for
<tt class="docutils literal">httplib2.Http.request</tt> (it's the call that actually triggered the
exception being raised). Conceptually, you think of a python traceback
as growing downwards, any time something is pushed onto the stack, it is
appended to the output. And when something is popped off the stack, its
output is removed from the end of the stack.</p>
<p>Now let's apply the second item: pay attention to the filenames
associated with each level of the traceback. Right off the bat we can
see there are two main modules involved in this interaction. There's the
<tt class="docutils literal">issue</tt> module, which looks like this in the traceback:</p>
<pre class="literal-block">
File "issue.py", line 19, in <module>
a()
File "issue.py", line 5, in a
b()
</pre>
<p>and there's httplib2, which looks like this in the traceback:</p>
<pre class="literal-block">
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request
(scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
</pre>
<p>There's a few important observations:</p>
<ul class="simple">
<li>The length of the filenames. In this case the <tt class="docutils literal">issue.py</tt> filename
suggests that this originated from our current working directory,
hence the relative path.</li>
<li>The error actually occurs in a 3rd party library (the last three
lines of the output from the traceback).</li>
</ul>
<p>We know that an error occurs in a 3rd party library because the location
of this library is under the "site-packages" directory. As a rule of
thumb, if something is under the "site-packages" directory it's a third
party module (i.e. not something in the python standard library). This
is typically where packages installed the pip are replaced (e.g. pip
install httplib2).</p>
<p>The second item also says to pay attention to where the frames jump
across modules or package "types." In this traceback we can see that we
jump across modules and packages "types" here:</p>
<pre class="literal-block">
File "issue.py", line 16, in d
h.request(uri=None)
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request
(scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
</pre>
<p>In these four lines we can see that we jump from <tt class="docutils literal">issue.py</tt> to
<tt class="docutils literal">httplib2</tt>. By jumping across package "types", I simply mean where we
jump from our modules/packages to either standard library packages or
3rd party packages. From the four lines shown above we can see that by
calling <tt class="docutils literal">h.request()</tt> we jump into the <tt class="docutils literal">httplib2</tt> module.</p>
<p>Now let's apply the third item: Read the bottom most line to read the
actual exception message. In our example, the actual exception that's
raised is:</p>
<pre class="literal-block">
TypeError: expected string or buffer
</pre>
<p>Admittedly, not the most helpful error message. If we look at the line
before this line, we can see the actual line that caused this TypeError:</p>
<pre class="literal-block">
groups = URI.match(uri).groups()
</pre>
<p>The two most likely things to cause a TypeError would be a call to
<tt class="docutils literal">match()</tt> or a call to <tt class="docutils literal">groups()</tt>. Noticing that <tt class="docutils literal">uri</tt> arg is seen
at multiple frames in the traceback, our first guess would be that the
value of <tt class="docutils literal">uri</tt> is causing a TypeError. If we go bottom up until we
don't see the uri param mentioned, we can see that it's first mentioned
here:</p>
<pre class="literal-block">
File "issue.py", line 16, in d
h.request(uri=None)
File "/Users/jsaryer/.virtualenvs/90a/lib/python2.7/site-packages/httplib2/__init__.py", line 1394, in request
(scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
</pre>
<p>Given that the <tt class="docutils literal">h.request(uri=None)</tt> comes from our code, this is
probably the first place we should look.</p>
<p>It turns out that the <tt class="docutils literal">uri</tt> parameter needs to be a string:</p>
<div class="highlight"><pre><span class="n">h</span> <span class="o">=</span> <span class="n">httplib2</span><span class="o">.</span><span class="n">Http</span><span class="p">()</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="n">request</span><span class="p">(</span><span class="n">uri</span><span class="o">=</span><span class="s">'http://www.google.com'</span><span class="p">)</span>
</pre></div>
<p>Now, it doesn't always work out as nicely as this, but having a basic
example helps to serve as a basis for further debugging techniques.</p>
</div>
Semidbm: A Pure Python DBM2012-03-03T11:32:00-08:00James Saryerwinnietag:jamesls.com,2012-03-03:semidbm-a-pure-python-dbm.html<p><a class="reference external" href="https://github.com/jamesls/semidbm">Semidbm</a> is a pure python dbm.
While the <a class="reference external" href="http://semidbm.readthedocs.org">docs</a> go into the
specifics of <em>how</em> to use the dbm, I'd like to offer a more
editorialized view of semidbm (the <em>why</em> of semidbm).</p>
<p>Semidbm is a pure python dbm, which is basically a key value store.
Similar python modules in the standard library include
<a class="reference external" href="http://docs.python.org/library/gdbm.html">gdbm</a>,
<a class="reference external" href="http://docs.python.org/library/bsddb.html">bsddb</a>, and
<a class="reference external" href="http://docs.python.org/library/dumbdbm.html">dumbdbm</a>.</p>
<p>The first question one might ask is:</p>
<p><strong>Another persistent key value store, really?</strong></p>
<p>Fair question.</p>
<p>It all started when I was working on a project where I needed a simple
key value store, accessible from python. Technically, I was using the
<a class="reference external" href="http://docs.python.org/library/shelve.html">shelve</a> module, and it
decided to use the Berkeley DB (via
<a class="reference external" href="http://docs.python.org/library/anydbm.html">anydbm</a>). So far so
good. But there were a few issues:</p>
<ul class="simple">
<li>Not everyone has the Berkeley DB python bindings installed. Or in
general, dbms that are based on C libraries have varying availability
on people's systems.</li>
<li>Not all dbms perform equally.</li>
<li>Not all dbms are portable.</li>
</ul>
<div class="section" id="c-based-dbms-and-their-availability">
<h2>C based DBMs and their availability</h2>
<p>The first issue is regarding availability. Not all python installations
are the same. Just because a user has python installed does not mean
they necessarily have all the standard libraries installed. I just
checked my python install on my Macbook, and I don't have the bsddb
module available. On my debian system I don't have the gdbm module
installed. Given that these packages are just python bindings to C based
dbms, installing these packages involves:</p>
<ul class="simple">
<li>Install the C libraries and development packages for the appropriate
dbm.</li>
<li>Have a development environment that can build python.</li>
<li>Rebuild python</li>
</ul>
<p>None of these steps are that much work, but are there any alternatives?</p>
</div>
<div class="section" id="not-all-dbms-perform-equally">
<h2>Not all dbms perform equally</h2>
<p>On all of my systems I have the
<a class="reference external" href="http://docs.python.org/library/dbm.html">dbm</a> module available. This
is a C based DBM that seems to available on most python installations.
How fast is it? There's a <tt class="docutils literal">scripts/benchmark</tt> script available in the
semidbm repo that can benchmark any dbm like module. Here's the results
for the <a class="reference external" href="http://docs.python.org/library/dbm.html">dbm</a> module:</p>
<div class="highlight"><pre><span class="nv">$ </span>scripts/benchmark -d dbm
Generating random data.
Benchmarking: <module <span class="s1">'dbm'</span> from
<span class="s1">'/Users/jsaryer/.virtualenvs/semidbm/lib/python2.7/lib-dynload/dbm.so'</span>>
num_keys : 1000000
key_size : 16
value_size: 100
HASH: Out of overflow pages. Increase page size
ERROR: exception caught when benchmarking <module <span class="s1">'dbm'</span> from <span class="s1">'/Users/jsaryer/.virtualenvs/semidbm/lib/python2.7/lib-dynload/dbm.so'</span>>: cannot add item to database
</pre></div>
<p>Or in other words, it made it to about 450000 keys before this error was
generated. So storing a large number of keys doesn't seem possible with
python's dbm module.</p>
</div>
<div class="section" id="not-all-dbms-are-portable">
<h2>Not all dbms are portable</h2>
<p>While some dbms that aren't available simply require
compiling/installing the right packages and files, there are some dbms
that just aren't available on certain platforms (notoriously windows).</p>
<p>Well fortunately, there's a fallback python module that's guaranteed to
be available on every single python installation:
<a class="reference external" href="http://docs.python.org/library/dumbdbm.html">dumbdbm</a>.</p>
<p>Unfortunately, the performance is <strong>terrible.</strong> There's also a number of
undesirable qualities:</p>
<ul class="simple">
<li>When a key is added to the DB, the data file is updated, but the
index file is not updated, which means the data file and the index
file are not in sync. If python crashed, any newly added/updated keys
are lost.</li>
<li>Every deletion writes out the entire index. This makes deletions
painfully slow (O(n)).</li>
</ul>
<p>To be fair, dumbdbm was most likely written as a last resort fallback to
the more classic dbms. It's also really old (written by Guido himself if
I remember correctly).</p>
</div>
<div class="section" id="a-key-value-store-with-modest-aspirations">
<h2>A key value store with modest aspirations</h2>
<p>Hopefully the goals of semidbm are becoming clearer. I just wanted a dbm
that was:</p>
<ol class="arabic simple">
<li>Portable</li>
<li>Easily installable</li>
<li>Reasonably performance and semantics</li>
</ol>
<p>The first two points I felt I could achieve by simply using python, and
not requiring any C libraries or C extensions.</p>
<p>The third point I felt I could improve by taking dumbdbm and making some
minor improvements.</p>
<p>So that's the background of semidbm.</p>
</div>
<div class="section" id="can-simpler-really-be-better">
<h2>Can simpler really be better?</h2>
<p>I think so. The <a class="reference external" href="http://semidbm.readthedocs.org/en/latest/benchmarks.html">benchmark
page</a> has
more details regarding the performance, but as a quick comparison to
semidbm:</p>
<div class="highlight"><pre><span class="nv">$ </span>scripts/benchmark -d semidbm -n 10000
Generating random data.
Benchmarking: <module <span class="s1">'semidbm'</span>>
num_keys : 10000
key_size : 16
value_size: 100
fill_sequential : <span class="nb">time</span>: 0.126, micros/ops: 12.597, ops/s: 79382.850, MB/s: 8.782
read_hot : <span class="nb">time</span>: 0.041, micros/ops: 4.115, ops/s: 243036.754, MB/s: 26.886
read_sequential : <span class="nb">time</span>: 0.039, micros/ops: 3.861, ops/s: 258973.197, MB/s: 28.649
read_random : <span class="nb">time</span>: 0.042, micros/ops: 4.181, ops/s: 239171.571, MB/s: 26.459
delete_sequential : <span class="nb">time</span>: 0.058, micros/ops: 5.819, ops/s: 171856.854, MB/s: 19.012
<span class="nv">$ </span>scripts/benchmark -d dumbdbm -n 10000
Generating random data.
Benchmarking: <module <span class="s1">'dumbdbm'</span>>
num_keys : 10000
key_size : 16
value_size: 100
fill_sequential : <span class="nb">time</span>: 1.824, micros/ops: 182.400, ops/s: 5482.447, MB/s: 0.607
read_hot : <span class="nb">time</span>: 0.165, micros/ops: 16.543, ops/s: 60450.332, MB/s: 6.687
read_sequential : <span class="nb">time</span>: 0.167, micros/ops: 16.733, ops/s: 59762.818, MB/s: 6.611
read_random : <span class="nb">time</span>: 0.175, micros/ops: 17.505, ops/s: 57126.529, MB/s: 6.320
delete_sequential : <span class="nb">time</span>: 99.025, micros/ops: 9902.522, ops/s: 100.984, MB/s: 0.011
</pre></div>
<p>From the output above, writes are an order of magnitude faster (and
semidbm computes and writes out a checksum for every value) and reads
are almost 4 times faster. Deletion performance is much better (0.058
seconds vs. 99.025 seconds for deleting 10000 keys).</p>
<p>Also, every single insertion/update/deletion is immediately written out
to disk so if python crashes, at worst you'd lose one key, the key that
was being writen out to disk when python crashed.</p>
</div>
<div class="section" id="why-you-should-use-semidbm">
<h2>Why you should use semidbm</h2>
<p>I think if you ever need to use a pure python dbm, semidbm is a great
choice. Any time you'd otherwise have to use dumbdbm, use semidbm
instead.</p>
</div>
<div class="section" id="future-plans-for-semidbm">
<h2>Future plans for semidbm</h2>
<p>There's a number of things I'd like to investigate in the future:</p>
<ul class="simple">
<li>Faster db loading. Semidbm needs to read the entire data file to load
the db. There's potential to speed this up.</li>
<li>Caching reads. Looking at the implementation of other dbms, many of
them have some type of in memory cache to improve read performance.</li>
<li>Support for additional db methods. Semidbm does not support all of
the dict methods.</li>
<li>Batch writes/reads. Due to the append only nature of the file format,
this could substantially improve write performance.</li>
</ul>
<p>For more info, check out the
<a class="reference external" href="http://http://semidbm.readthedocs.org/">docs</a> and the <a class="reference external" href="https://github.com/jamesls/semidbm">github
repo</a>.</p>
</div>
Python and the "extra stuff"2012-02-01T20:16:00-08:00James Saryerwinnietag:jamesls.com,2012-02-01:python-and-the-extra-stuff.html<p>Learning a new programming language can be a daunting task. Even though
you start with the basic things like syntax, in order to become
productive in the language you must learn things like</p>
<ul class="simple">
<li>Common coding idioms and patterns</li>
<li>The standard library</li>
<li>Best practices (including what frameworks to use, what development
tools to use, etc)</li>
</ul>
<p>But then there's also the, for lack of a better term, "extra stuff." The
collection of miscellaneous tips and tricks you pick up while coding in
the language on a day to day basis. These set of tips end up saving you
a lot of time in the long run, but are hard to distinguish how useful a
tip really is when you first hear about it.</p>
<p>Well, this is my list of tips. It's not 100% complete, and focuses
mostly on various tidbits of information that, when I think about how I
code on a day to day basis, I find myself repeatedly doing.</p>
<div class="section" id="the-variable">
<h2>The <tt class="docutils literal">_</tt> variable</h2>
<p>This tip is useful when you're in an interactive python shell. The <tt class="docutils literal">_</tt>
variable stores the value of the most recently evaluated expression:</p>
<div class="highlight"><pre><span class="o">>>></span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">3</span>
<span class="mi">6</span>
<span class="o">>>></span> <span class="n">_</span> <span class="o">*</span> <span class="mi">24</span>
<span class="mi">144</span>
<span class="o">>>></span> <span class="n">_</span> <span class="o">/</span> <span class="mf">12.</span>
<span class="mf">12.0</span>
<span class="o">>>></span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span> <span class="k">if</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">10</span><span class="p">]</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">]</span>
<span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">_</span><span class="p">)</span>
<span class="mi">10</span>
</pre></div>
</div>
<div class="section" id="figuring-out-where-to-look">
<h2>Figuring Out Where to Look</h2>
<p>Sometimes if you're trying to debug a problem you'll to need to figure
out where a module is located. A really easy way to do this is to use
the <tt class="docutils literal">__file__</tt> attribute of a module object:</p>
<div class="highlight"><pre><span class="o">>>></span> <span class="kn">import</span> <span class="nn">httplib</span>
<span class="o">>>></span> <span class="n">httplib</span><span class="o">.</span><span class="n">__file__</span>
<span class="s">'/usr/local/lib/python2.7/httplib.pyc'</span>
</pre></div>
<p>You can also use <tt class="docutils literal">inspect.getfile(obj)</tt> to find where an object is
located.</p>
</div>
<div class="section" id="running-your-module-as-a-script">
<h2>Running Your Module as a Script</h2>
<p>Every module will have a <tt class="docutils literal">__name__</tt> attribute, but the value of that
attribute will depend on how the module is executed. Consider a module:</p>
<p><tt class="docutils literal">python foo.py print __name__</tt></p>
<p>When the module is imported the name will be "foo".</p>
<div class="highlight"><pre><span class="o">>>></span> <span class="kn">import</span> <span class="nn">foo</span>
<span class="n">foo</span>
<span class="o">>>></span>
</pre></div>
<p>However, when the module is executed as a script, the name will be
<tt class="docutils literal">__name__</tt>:</p>
<pre class="literal-block">
$ python foo.py
__main__
</pre>
<p>It may not be obvious how this is useful. The way that this is typically
used is to allow a module to be both imported and used as a script.
Sometimes the script is a command line interface to the functionality
available in the module. Sometimes the script provides a demo of the
capabilities of the module. And sometimes the script runs any tests that
live in the module (for example all of the doctests). To use this in
your own library you can use something like this:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">do_something</span><span class="p">(</span><span class="n">args</span><span class="p">):</span>
<span class="c"># Do something with args.</span>
<span class="k">pass</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">argv</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="k">if</span> <span class="n">argv</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">argv</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">parse_args</span><span class="p">(</span><span class="n">argv</span><span class="p">)</span>
<span class="n">do_something</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">'__main__'</span><span class="p">:</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</pre></div>
<p>The <tt class="docutils literal">main()</tt> function is only called when the module is run directly.</p>
</div>
<div class="section" id="the-m-option">
<h2>The -m option</h2>
<p>Once your module has an <tt class="docutils literal">if __name__ == '__main__'</tt> clause (I usually
refer to this as just the ifmain clause), an easy way to invoke the
module is to use the <tt class="docutils literal"><span class="pre">-m</span></tt> option of python. This allows you to refer
to a module by its import name rather than its specific path. In the
previous example the foo.py module could be run using:</p>
<div class="highlight"><pre><span class="nv">$ </span>python -m foo
</pre></div>
<p>One final thing worth pointing out is that many modules in python's
stdlib have useful ifmain functionality. A few notable ones include:</p>
<div class="highlight"><pre>python -m SimpleHTTPServer
</pre></div>
<p>This will serve the current working directory on port 8000. I use this
command on almost a daily basis. From quickly downloading files to
viewing html files on a remote server, this is one of the most useful
ifmain clauses in the entire python standard library.</p>
<div class="highlight"><pre>python -m pdb myfile.py
</pre></div>
<p>Run a python script via pdb (the python debugger).</p>
<div class="highlight"><pre>python -m trace --trace myfile.py
</pre></div>
<p>Print each line to stdout before it's executed. Be sure to see the help
of the trace module, there's a lot of useful options besides printing
each line being executed.</p>
<div class="highlight"><pre>python -m profile myfile.py
</pre></div>
<p>Profile myfile.py and print out a summary.</p>
<p>So there it is. My list of tips. In the future I plan on expanding on
some of these tips in more depth (the profiling workflow for python code
and how to debug python code stand out), but in the meantime, may these
tips be as helpful to you as they are to me.</p>
</div>
The Voidspace Mock Bug (and a look at decorators)2010-06-07T22:44:44-07:00James Saryerwinnietag:jamesls.com,2010-06-07:the-voidspace-mock-bug-and-a-look-at-decorators.html<p>My first (and still favorite) mock library for python has been Michael
Foord's <a class="reference external" href="http://www.voidspace.org.uk/python/mock/">voidspace mock</a>.
It's an excellent library, and for me feels like the most pythonic take
on mocking. I'd like to go over the code and examine one of the
functions provided in the API, namely the <strong>patch</strong> function. It's a
pretty interesting decorator whose implementation is worth looking at.
But of course, first a detour on how I got to examining the
implementation of patch (skip this part if you don't care and are only
interested in the examination of patch):</p>
<p>So anyways, we've been using version 0.4.0 of the library for years now
(I know, we're horribly out of date). Looking at the awesome
<a class="reference external" href="http://www.voidspace.org.uk/python/mock/changelog.html">changelog</a>,
there were so many features that looked like they would help us write
tests that I made the attempt at upgrading from 0.4.0 to 0.6.0.</p>
<p>My hopes of 0.6.0 being a drop in replacement for 0.4.0 were quickly
dispelled. There was no chance of this happening. This was frustrating,
but understandable. It clearly states on the main docs page that:</p>
<blockquote>
The current version is 0.6.0, dated 23rd August 2009. Mock is still
experimental; the API may change. If you find bugs or have
suggestions for improvements / extensions then please email me.</blockquote>
<p>Not to mention the pre 1.0 version numbers are a clear indicator as well
that the project is not to be considered stable. So I start looking at
what sort of backwards incompatible changes were made, and what it would
take to update to 0.6.0. At this point, I figured I might as well sync
up with svn and just use the latest version in trunk (magic method
support is highly desirable for us). So most of the changes were pretty
straightforward, reset() has been renamed to reset_mock(), side_effect
works a little differently, but ultimately, nothing too surprising.</p>
<div class="section" id="however">
<h2><strong>However...</strong></h2>
<p>There was one change I just couldn't wrap my head around. One change
that didn't make a whole lot of sense. <strong>The ordering of patch
decorators was reversed!</strong> In other words:</p>
<div class="highlight"><pre><span class="nd">@patch</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s">'b'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s">'c'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a_mock</span><span class="p">,</span> <span class="n">b_mock</span><span class="p">,</span> <span class="n">c_mock</span><span class="p">):</span> <span class="k">pass</span>
</pre></div>
<p>now becomes:</p>
<div class="highlight"><pre><span class="nd">@patch</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s">'b'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s">'c'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">c_mock</span><span class="p">,</span> <span class="n">b_mock</span><span class="p">,</span> <span class="n">a_mock</span><span class="p">):</span> <span class="k">pass</span>
</pre></div>
<p>Let's see. This didn't add/enable any new feature. There's nothing (as
far as my limited knowledge goes) about the accepted and proper ordering
of stacked decorators. So why the change?</p>
<p>Well curiosity got the best of me, so I thought maybe there's some issue
associated with this. The best i could find was <a class="reference external" href="http://code.google.com/p/mock/issues/detail?id=13&can=1">this
issue</a> which
is someone asking for the ordering to be put back, and for the author
stating to flip the order back (once again) would be even more
confusing, and that this was the result of a bugfix, that "decorators
were being applied incorrectly previously". Ok, I can go with that. If
this was the result of a necessary bugfix, I can understand the need for
a change.</p>
<p>So at this point, I wasn't interested in the "why the change" question,
but merely "what was the bug?" Curiosity got the best of me.</p>
<p>My next step was to track down the commit that caused this bug. Easy right?
Well, not exactly. Eventually I gave up trying to use SVN to track down the
commit (or more correctly the SVN server gave up on me and kept timing out), so
I imported the whole thing to git using git-svn. A few minutes with git bisect
showed the commit I was after. It's commit message:</p>
<p>Yeah, that's right, an empty commit message. Now I'm really curious so I
start looking at the diff to see if I can understand what the patch bug
was and how it was fixed:</p>
<pre class="literal-block">
-def _patch(target, attribute, new):
+def _patch(target, attribute, new, methods, spec):
def patcher(func):
original = getattr(target, attribute)
if hasattr(func, 'restore_list'):
func.restore_list.append((target, attribute, original))
- func.patch_list.append((target, attribute, new))
+ func.patch_list.append((target, attribute, new, methods, spec))
return func
- func.restore_list = [(target, attribute, original)]
- func.patch_list = [(target, attribute, new)]
+ patch_list = [(target, attribute, new, methods, spec)]
+ restore_list = [(target, attribute, original)]
def patched(*args, **keywargs):
- for target, attribute, new in func.patch_list:
+ for target, attribute, new, methods, spec in patch_list:
if new is DEFAULT:
- new = Mock()
+ new = Mock(methods=methods, spec=spec)
args += (new,)
setattr(target, attribute, new)
try:
return func(*args, **keywargs)
finally:
- for target, attribute, original in func.restore_list:
+ for target, attribute, original in restore_list:
setattr(target, attribute, original)
+ patched.restore_list = restore_list
+ patched.patch_list = patch_list
patched.__name__ = func.__name__
patched.compat_co_firstlineno = getattr(func, "compat_co_firstlineno",
func.func_code.co_firstlineno)
</pre>
<p>Did you see it? Did you see the bug? Well neither did I at first. In
fact, I stared at it for a while still not understanding what the bug
was or even how this somehow reversed the order of patch decorators. So
now in full detective mode, I sketched exactly what the code was doing
both pre patch and post patch to fully understand how this all worked.</p>
</div>
<div class="section" id="the-implementation-of-patch">
<h2>The Implementation of @patch</h2>
<p>For anyone not familiar with how patch works, see the <a class="reference external" href="http://www.voidspace.org.uk/python/mock/patch.html#patch">API
docs</a>. The
basic idea is that whatever object I specify as an argument to patch
will replaced with a mock and passed into the decorated function as an
argument. After the function is run, it will replace the patched out
object with it's original value. You can stack patch decorators on top
of each other to patch multiple things out for the duration of the
function.</p>
<p>This can get confusing when you think about what is going on behind the
scenes in a call like this:</p>
<div class="highlight"><pre><span class="nd">@patch</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s">'b'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">a_mock</span><span class="p">,</span> <span class="n">b_mock</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
<p>The first thing to do is to substitute the syntactic equivalent of the
decorators without using the decorator synax:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">a_mock</span><span class="p">,</span> <span class="n">b_mock</span><span class="p">):</span>
<span class="k">pass</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'a'</span><span class="p">)(</span><span class="n">patch</span><span class="p">(</span><span class="s">'b'</span><span class="p">)(</span><span class="n">f</span><span class="p">))</span>
</pre></div>
<p>It's now a little more clear what exactly is going on. Still to be even
more clear, here's the call sequence:</p>
<div class="highlight"><pre><span class="n">r1</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span>
<span class="n">r2</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'b'</span><span class="p">)</span>
<span class="n">r3</span> <span class="o">=</span> <span class="n">r2</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">r4</span> <span class="o">=</span> <span class="n">r1</span><span class="p">(</span><span class="n">r3</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">r4</span>
</pre></div>
<p>Of course, this all happens behind the scenes. This is what happens at
import time (or whenever the function being decorated is encountered).
The function hasn't even been invoked yet. What happens when the new f
is finally invoked (in this case, r4)? Well, this is implementation
specific. If we assume that each function call above returns a brand new
function, then it would look something like this:</p>
<div class="highlight"><pre><span class="n">r4</span><span class="p">(</span><span class="n">r3</span><span class="p">(</span><span class="n">f</span><span class="p">))</span>
</pre></div>
<p>Now this could be arbitrarily large. If we had we 10 patches, the
invocation would be:</p>
<div class="highlight"><pre><span class="n">r20</span><span class="p">(</span><span class="n">r19</span><span class="p">(</span><span class="n">r18</span><span class="p">(</span><span class="o">....</span><span class="n">r11</span><span class="p">(</span><span class="n">f</span><span class="p">)</span><span class="o">...</span><span class="p">)))</span>
</pre></div>
<p>Can we do any better given what we know about patch? Actually we can.
What we ultimately want is for the execution of the original f to occur
in an environment where all of the objects referenced in the patches
above have been replaced with mocks. If every invocation of r_n first
patches the object associated with <em>itself</em> before invoking the next
r_(n-1) function, by the time the f is finally invoked, the necessary
patches will have already occurred. Similarly when the function returns,
if we restore the patched object before returning control to the calling
function, by the time r_n returns, the original environment will have
been put back.</p>
<p>But we don't need all that nesting. Think of an alternative
implementation that accomplishes the same thing. One way to do it would
be to wait until we actually invoke the original f and then patch all
the objects out at once. When f returns, we can restore all the objects
in one swoop as well.</p>
<p>This is what the implementation of _patch was trying to do. The idea
was to tag the function with some metadata containing a list of objects
to patch. Each application of the decoration would check for that
attribute on the function, and if it existed, it would just append its
own patched values to the end of the list and return <em>the same exact
function</em> instead of creating a new function. Essentially:</p>
<div class="highlight"><pre><span class="n">r1</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span>
<span class="n">r2</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'b'</span><span class="p">)</span>
<span class="n">r3</span> <span class="o">=</span> <span class="n">r2</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">r3</span> <span class="o">=</span> <span class="n">r1</span><span class="p">(</span><span class="n">r3</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">r3</span>
</pre></div>
<p>This becomes more obvious with more patches:</p>
<div class="highlight"><pre><span class="n">r1</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span>
<span class="n">r2</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'b'</span><span class="p">)</span>
<span class="n">r3</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'c'</span><span class="p">)</span>
<span class="n">r4</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'d'</span><span class="p">)</span>
<span class="n">r5</span> <span class="o">=</span> <span class="n">r4</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="c"># before returning r5: r5.metadata = ['d']</span>
<span class="c"># Subsequent calls will always return r5</span>
<span class="n">r3</span><span class="p">(</span><span class="n">r5</span><span class="p">)</span> <span class="c"># r5.metadata.append('c') --> r5</span>
<span class="n">r2</span><span class="p">(</span><span class="n">r5</span><span class="p">)</span> <span class="c"># r5.metadata.append('b') --> r5</span>
<span class="n">r1</span><span class="p">(</span><span class="n">r5</span><span class="p">)</span> <span class="c"># r5.metadata.append('a') --> r5</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">r5</span>
</pre></div>
<p>In reality there would still be an r6-r8, but I show them without return
values to emphasize they are actually <em>modifying the state of r5</em>. So by
the end of the above sequence metadata looks like:</p>
<div class="highlight"><pre><span class="n">metadata</span> <span class="o">=</span> <span class="p">[</span><span class="s">'d'</span><span class="p">,</span> <span class="s">'c'</span><span class="p">,</span> <span class="s">'b'</span><span class="p">,</span> <span class="s">'a'</span><span class="p">]</span>
</pre></div>
<p>And you can now see how when we apply the patches in order, your
argument list will map to:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">d_mock</span><span class="p">,</span> <span class="n">c_mock</span><span class="p">,</span> <span class="n">b_mock</span><span class="p">,</span> <span class="n">a_mock</span><span class="p">):</span> <span class="k">pass</span>
</pre></div>
<p>Even though this is what we'd like it to do, this was <em>not</em> what the
original implementation was doing. So what was the original
implementation doing that caused the args to be passed in reverse order?
Well the key part of the diff above are these two parts:</p>
<div class="highlight"><pre><span class="o">-</span> <span class="n">func</span><span class="o">.</span><span class="n">restore_list</span> <span class="o">=</span> <span class="p">[(</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">,</span> <span class="n">original</span><span class="p">)]</span>
<span class="o">-</span> <span class="n">func</span><span class="o">.</span><span class="n">patch_list</span> <span class="o">=</span> <span class="p">[(</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">,</span> <span class="n">new</span><span class="p">)]</span>
<span class="o">+</span> <span class="n">patch_list</span> <span class="o">=</span> <span class="p">[(</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">,</span> <span class="n">new</span><span class="p">,</span> <span class="n">methods</span><span class="p">,</span> <span class="n">spec</span><span class="p">)]</span>
<span class="o">+</span> <span class="n">restore_list</span> <span class="o">=</span> <span class="p">[(</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">,</span> <span class="n">original</span><span class="p">)]</span>
</pre></div>
<p>and:</p>
<div class="highlight"><pre><span class="o">+</span> <span class="n">patched</span><span class="o">.</span><span class="n">restore_list</span> <span class="o">=</span> <span class="n">restore_list</span>
<span class="o">+</span> <span class="n">patched</span><span class="o">.</span><span class="n">patch_list</span> <span class="o">=</span> <span class="n">patch_list</span>
</pre></div>
<p>In other words, it was tagging the <em>inner</em> function with metadata
information instead of tagging the newly created function! What effect
did this have? Well, remember that if the function tagged with a certain
attribute, it will create a new function that wraps the passed in
function and tag the function with metadata. In other words, it creates
a brand new function. This maps to something like this:</p>
<div class="highlight"><pre><span class="n">r1</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span>
<span class="n">r2</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'b'</span><span class="p">)</span>
<span class="n">r3</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'c'</span><span class="p">)</span>
<span class="n">r4</span> <span class="o">=</span> <span class="n">patch</span><span class="p">(</span><span class="s">'d'</span><span class="p">)</span>
<span class="c"># This is important. Note how it's tagging f and NOT r5. This is the bug.</span>
<span class="c"># And it will propogate all the way through.</span>
<span class="n">r5</span> <span class="o">=</span> <span class="n">r4</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="c"># before returning r5: f.metadata = ['d']</span>
<span class="n">r6</span> <span class="o">=</span> <span class="n">r3</span><span class="p">(</span><span class="n">r5</span><span class="p">)</span> <span class="c"># since f is tagged and not r5, it will create a new function</span>
<span class="c"># and tag the inner function r5: r5.metadata = ['c']</span>
<span class="n">r7</span> <span class="o">=</span> <span class="n">r2</span><span class="p">(</span><span class="n">r6</span><span class="p">)</span> <span class="c"># create a new r7 and do: r6.metadata = ['b']</span>
<span class="n">r8</span> <span class="o">=</span> <span class="n">r1</span><span class="p">(</span><span class="n">r7</span><span class="p">)</span> <span class="c"># create a new r8 and do: r7.metadata = ['a']</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">r8</span>
</pre></div>
<p>So what happens when we execute r8? Well, it's the exact same function
as before. It will iterate through its list of "metadata" objects, and
patch them out before calling the function it wraps. So r8 will iterate
through a single element list, which will patch out 'a'. It will then
call r7. r7 will patch out 'b' and then call r6, which patches 'c' and
calls r5, which patches 'd' and finally calls the original f. Now you
can see how that processing results in the original f function having
the signature:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">a_mock</span><span class="p">,</span> <span class="n">b_mock</span><span class="p">,</span> <span class="n">c_mock</span><span class="p">,</span> <span class="n">d_mock</span><span class="p">):</span> <span class="k">pass</span>
</pre></div>
<p>Hopefully at this point you understand what the patch bug was. Sure it
happened to work, but this was not the intended way it was suppose to
work.</p>
<p>Now, the astute reader will notice that we could have made the bug fix
(that is, make it so all the nesting is removed and all the patches are
in a single list) and <strong>at the same time preserve backwards
compatability</strong>. Simply change:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">patcher</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
<span class="n">original</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">func</span><span class="p">,</span> <span class="s">'restore_list'</span><span class="p">):</span>
<span class="n">func</span><span class="o">.</span><span class="n">restore_list</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">,</span> <span class="n">original</span><span class="p">))</span>
<span class="n">func</span><span class="o">.</span><span class="n">patch_list</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">,</span> <span class="n">new</span><span class="p">,</span> <span class="n">methods</span><span class="p">,</span> <span class="n">spec</span><span class="p">))</span>
</pre></div>
<p>to:</p>
<div class="highlight"><pre><span class="k">def</span> <span class="nf">patcher</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
<span class="n">original</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">func</span><span class="p">,</span> <span class="s">'restore_list'</span><span class="p">):</span>
<span class="n">func</span><span class="o">.</span><span class="n">restore_list</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">,</span> <span class="n">original</span><span class="p">))</span>
<span class="n">func</span><span class="o">.</span><span class="n">patch_list</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">(</span><span class="n">target</span><span class="p">,</span> <span class="n">attribute</span><span class="p">,</span> <span class="n">new</span><span class="p">,</span> <span class="n">methods</span><span class="p">,</span> <span class="n">spec</span><span class="p">))</span>
</pre></div>
<p>Unfortunately, this is what we ended up doing at work, as we had way too
many tests that this would break. Plus, we were already used to the in
order semantics of the old implementation. This could be potentially
troublesome if we intend to get updates in the future (which we do).</p>
<p>And that, is the voidspace mock bug.</p>
</div>
Testing the Tests, or How to Avoid Recursion2010-03-14T07:12:03-07:00James Saryerwinnietag:jamesls.com,2010-03-14:testing-the-tests-or-how-to-avoid-recursion.html<p>To gain more confidence in production code, tests at varying levels
accompany the development of such code. Unittests and integration tests
are common, and sometimes suites of functional/acceptance tests are
present as well. There's been quite a bit of discussion surrounding how
to best write unittests (especially if you're using some xUnit framework
variant), and sometimes these best practices even make their way into
integration tests. Unfortunately, I still have not seen these best
practices make their way into functional tests. In particular, I
frequently see functional tests that are either very complicated, and/or
obscure.</p>
<p>Before going on, it's worth voicing why I think complicated and obscure
tests are bad. Typically production code is complicated. The domain is
filled with special cases and unintuitive behavior. In order to help
write production code, unittests are written to help make sure the
production code works as we expect (in addition to driving the design if
you use TDD). There's a secret so fundamental to writing unittests, yet
it's rarely explicitly called out:</p>
<div class="section" id="we-don-t-test-our-tests">
<h2>We don't test our tests!</h2>
<p>Think about that for a moment. We have complicated production code. We
write tests to ensure that the production code works as expected. What
tests the test? Well if you have really complicated test logic, then are
you really better off because you've written tests? Testcases are only
effective if we can assume they are correct, in which case they are
correctly asserting things about the system under test. If testcases are
incorrect, that is, they are asserting incorrect things about the
system, then the validity of the system under test is unknown, and
you're arguably worse off than before (is a test fail <em>really</em> a fail?).</p>
<p>So what's the solution for this? Well, test the tests of course! And
what if those tests are still complicated? Well, we'll test those as
well! In fact it's tests all the way down. In order to avoid this
recursion, we have to set a practical limitation to testcases: <em>they
must be simple enough to not require tests.</em> In practice, this typically
means two things:</p>
<ul class="simple">
<li>No conditionals</li>
<li>Small in length</li>
</ul>
<p>Now, how to avoid these two things (or how to replace these two things
with better alternatives) is the topic on it's own. What I'm interested
in for this post is why the trend of simple and small tests has still
not made it's way into the realm of functional testing.</p>
<p>If you look at any (reasonable) code base's unittests, they're typically
not so bad. They're short, they're small, you can read a test and more
or less understand what it's testing. If this code base contains
functional tests (and by functional tests, I basically mean any test
that simulates interacting with the system in similar way to how a user
would interact with the system), you'll typically see these
characteristics instead:</p>
<ul class="simple">
<li>Long</li>
<li>Complicated logic</li>
<li>Hard to understand</li>
<li>Fragile, failures aren't really failures sometimes</li>
<li>Terrible defect localization</li>
</ul>
<p>Why is this? Is it because functional testing is still progressing
towards what unittests currently are? Is it because of a lack of
consensus that functional testing should have the same characteristics
of unittests? Is it because writing small tests with no conditionals is
hard to do at such a high level?</p>
<p>My take on this is that I believe writing good functional tests is much
harder than writing unittests, and because of this, best practices are
typically ignored. Fortunately, I think this situation can be remedied,
and in the next post, I'll show several things you can do to achieve
more succinct tests.</p>
</div>