There are a lot of things I like about programming in Python, but one of my favorites is how well Python allows code reuse. Between the excellent import semantics and the outrageous level of introspection in Python, just about any piece of Python code (assuming it’s generic enough) can be reused across a wide range of projects. However, since you can only reuse code that you know about, I thought I’d share a few of my favorite little Python utility modules.
These are mostly things that I’ve written (often more than once) myself, but never managed to generify, package and release. Fortunately, other people out in the Python community are more disciplined than myself.
Here we go…
First
If you’re at all familiar with Python’s deep use of iterables, then you no doubt know all about the any and all builtins. These take an iterable (often a sequence) and return a boolean value which indicates whether any or all of the values in the iterable are "true" (or "truthy" if you prefer).
But I’ve often found myself wanting to know which of the values are true, and almost as often, I just wanted to know what the first true value in a sequence was. That’s what the first function gives you. Here’s an example:
from first import first
a = [0, 0.0, (), [], {}, None, 'win!']
print first(a)
# win!
While you can get something similar in lots of ways, the code you write using first will almost certainly be more readable and more clear as to it’s intention. And clarity and readability is what Python is all about.
Bunch
I don’t know how many times I’ve written code to access dictionaries using attribute access syntax, but it’s too many. I don’t do that any more. Now I use the bunch module instead.
The main use I have for this kind of thing is when I’m integrating two libraries, one that speaks in terms of dictionaries and one that speaks in terms of generic Python objects. bunch makes this converting between them a breeze.
from bunch import *
d = {'a': 1, 'b': 2, 'c': 3}
bnch = bunchify(d)
# or you can do:
# bnch = Bunch(d)
print 'bnch.a -> {!s}'.format(bnch.a)
print 'bnch.b -> {!s}'.format(bnch.b)
print 'bnch.c -> {!s}'.format(bnch.c)
# there's an inverse operation as well
assert d == unbunchify(bunchify(d))
Filepath
If you’ve ever used Twisted, you might be familiar with the twisted.python.filepath module, which provides a nice object-oriented way to manipulate filesystem elements (files and directories). For example, you can do stuff like the following:
home = FilePath('/home/username')
# check whether the path exists
print home.exists()
# True
# create a new directory
newdir = home.child('somenewdirectory')
print newdir.exists()
# False
newdir.makedirs()
newdir.restat() # restat() refreshes the FilePath object's internal
# state after we make changes
print newdir.exists()
# True
# now create a file
myfile = newdir.child('somefile')
myfile.touch()
print myfile.exists()
# True
# and write to it
myfile.setContents('Have you got any Red Leicester?')
There’s a great blog post which covers the highlights on the Twisted Blog.
Now this is pretty great, but pulling in all of Twisted means adding a pretty huge dependency to your project, which is probably not what you want. Fortunately the good people at the Twisted project recognize this, and provide the filepath module as a separate distribution. Use it, it’s awesome.
datadiff
If you spend a lot of time looking at large Python data structures ,as I do (mostly parsed from giant JSON replies generated by REST APIs in my case), you often find yourself in the position of wanting to know whether two large Python data structures are the same or not. And if they’re not, you want to know what, exactly, has changed. There’s a plethora of ways to do this, and I’ve (re-)invented a number of them. But since I discovered the datadiff module, I don’t worry about this so much any more.
# here's two dicts that are slightly different
dict_A = {'a': 1, 'b': 2, 'c': 3}
dict_B = {'a': 1, 'b': 2, 'c': 4}
from datadiff import diff
print diff(dict_A, dict_B)
# results in:
# --- a
# +++ b
# {
# 'a': 1,
# 'b': 2,
# -'c': 3,
# +'c': 4,
# }
Obviously this is a simple, contrived example, but hopefully you can see how useful this is when dealing with larger data structures (like those generated from deserialized JSON payloads).
q
If you’ve ever wanted to add some simple print statements to your code, but the structure of your program makes dealing with this deceptively simple task more trouble than it should be, then you’ll probably like the q module.
q lets you easily send debugging/temporary output to a separate file stream (not stdout/stdderr), trace function calls, and start an interactive python session wherever you want. Easily.
Ka-Ping Yee explains this far better than I ever could in his lightning talk here from PyCon 2013. Check it out, it’s enlightening and entertaining.
Bonus: fake-factory
I was planning to stop there, but while writing this article a co-worker pointed me to another fantastic module called faker (formerly known as fake-factory). Now there are a couple of different modules called faker, but this one is (in my opinion, at least) far superior to the rest, so you’ll need to be sure you use the right one.
Find it here:
What faker does is generate fake data for you. More specifically, it generates random-but-reasonable data for a number of different real world data types, including names, addresses, email addresses, MIME types, browser user agent strings, country/language codes, md5/sha hashes, domain names, company names and slogans, all sorts of dates, geographic coordinates, and even lorem ipsum text.
This is some great stuff, but what sets this module apart from the others, for me, is that it’s really easy to use, fully localizable (set the locale to Italian, and it generates Italian names, addresses, etc.), and that you can feed it a specific seed to use for it’s random number generator, so if you need to, you can have it generate the same data repeatedly.
If you need to generate large amounts of data for functional or end-to-end testing, this module is truly great.