Semidbm: 0.4.0 Released

Wed 29 May 2013 by James Saryerwinnie

I've just released 0.4.0 of semidbm. This represents a number of really cool features. See the full changelog for more details.

One of the biggest features is python 3 support. I was worried about not introducing a performance regression by supporting python 3. Fortunately, this was not the case.

In fact, performance increased. This was possible for a number of reasons. First, the index file and data file were combined into a single file. This means that a __setitem__ call results in only a single write() call. Also, semidbm now uses a binary format. This results in a more compact form and it's easier to create the sequence of bytes we need to write out to disk. This is also including the fact that semidbm now includes checksum data for each write that occurs.

Try it out for yourself.

What's Next?

I think at this time, semidbm has more than exceeded it's original goal, which was to be a pure python cross platform key value storage that had reasonable performance. So what's next for semidbm? In a nutshell, higher level abstractions (aka the "fun stuff"). Code that builds on the simple key value storage of semidbm.db and provides additional features. And as we get higher level, I think it makes sense to reevaluate the original goals of semidbm and whether or not it makes sense to carry those goals forward:

  • Cross platform. I'm inclined to not support windows for these higher level abstractions.
  • Pure python. I think the big reason for remaining pure python was for ease of installation. Especially on windows, pip installing a package should just work. With C extensions, this becomes much harder on windows. If semidbm isn't going to support windows for these higher level abstractions, then C extensions are fair game.

Some ideas I've been considering:

  • A C version of _Semidbm.
  • A dict like interface that is concurrent (possibly single writer multiple reader).
  • A sorted version of semidbm (supporting things like range queries).
  • Caching reads (need an efficient LRU cache).
  • Automatic background compaction of data file.
  • Batched writes
  • Transactions
  • Compression (I played around with this earlier. Zlib turned out to be too slow for the smaller sized values (~100 bytes) but it might be worth being able to configure this on a per db basis.