I just submitted my talk for PyOhio 2010

I’d be happy to get any crititicism ahead of the presentation. I’m hosting the code examples and the talk material here.

Building your own kind of dictionary

Python level: novice

Description

My talk is based on a project that seemed very simple at first. I wanted an object like the regular python dictionary, but with a few small tweaks:

  • values for some keys should be restricted to elements of a set
  • values for some keys should be restricted to instances of a type

For example, pretend I want a dictionary called favorites, and I want the value for the “color” key to be any instance of my Color class. Meanwhile, for the “movie” key, I want to make sure that the value belongs to my set of movies.

In the talk, I’ll walk through how I used tests to validate my different implementations until I came up with a winner.

Unlike my talk last year on metaclass tomfoolery, and the year before that on fun with decorators (and decorator factories) I’m hoping to make this talk straightforward and friendly to beginning programmers.

You’ll see:

  • how I use tests to solve a real-world problem
  • gotchas with the super keyword
  • a little about how python works under the hood.

Extended description

I’m not done with the slides, but all my code examples are finished. You can read it all online at my github repository here.

Outline

  1. What kind of object I want
    1. Tests define the expected behavior
    2. How to run those tests
  2. First implementation (subclass dict)
    1. How the implementation is defined
    2. Examine test results
    3. Examine the C code behind the dict class to see why my subclassed __setitem__ method won’t get called from the parent class
  3. Composition-based implementation
    1. Explain composition vs inheritance
    2. Examine test results
    3. Point out irritating need to manually redefine every related dictionary method on the container class
    4. Show how to use __getattr__ to avoid all that boring wrapper code
    5. Show how __getattr__ doesn’t play nice with inspection tools
  4. UserDict.UserDict
    1. Explain implementation
    2. Examine test results
    3. Add a new test that uses this class as a parent for a subclass
    4. Explain how UserDict.UserDict is not a new-style class, so the super keyword behaves differently
  5. UserDict.DictMixin
    1. Explain implementation
    2. Examine test results
  6. PEP 3119 and why it is nice
    1. duck-typing, why it is awesome, why it isn’t perfect
    2. abstract base classes
    3. As of python 2.6, don’t use UserDict.DictMixin; use collections.MutableMapping

New year’s python meme

Doing this after reading about it on Ben’s blog.

1. What’s the coolest Python application, framework or library you have discovered in 2009 ?

I love the restructured text tools. I use rst2pdf at least once a week.

2. What new programming technique did you learn in 2009?

This was a year of giving up on techniques, paradigms, development methods, frameworks, and fanciness in general and going back to basics. I’m like Rocky Balboa in Rocky IV — jogging in the snow and chopping wood with an axe.

3. What’s the name of the open source project you contributed the most in 2009? What did you do?

I’ve been working on pitz for about 15 months now. I’ve learned about a lot of stuff that I didn’t expect to learn when I decided to build a to-do manager. Because of that project, I’ve learned about how templating systems like jinja2 work internally, how to use cProfile, how to write tests for all sorts of weird situations, how to write a simple querying system, etc.

4. What was the Python blog or website you read the most in 2009?

I just went through my bookmarks tagged with python and didn’t see an obvious pattern. These days, the hundred or so people I follow on twitter keep me supplied with too much to read.

5. What are the three top things you want to learn in 2010 ?

Add an option for your script to drop into the debugger

I like using the python debugger to, umm, debug stuff. Just recently I wanted to use pdb to poke around in a script that would intermittently crash. I wanted to add an option to my script that would drop me into pdb when the script dies.

This turned out to be trivial to write. The script scratch.py doesn’t do much, but if you add the --drop-into-debugger option, you’ll get dumped into a pdb session, where you can study the problem more closely, like this:$ python scratch.py --drop-into-debugger
> /home/matt/scratch.py(15)main()
-> 1/0
(Pdb)

When I don’t add the --drop-into-debugger option, then of course the uncaught exception just crashes the script:
$ python scratch.py
Traceback (most recent call last):
File "scratch.py", line 23, in
main()
File "scratch.py", line 15, in main
1/0
ZeroDivisionError: integer division or modulo by zero

The code involved was really easy to write. I wrote a decorator called into_decorator with the decorator module, but that’s not strictly necessary. Here’s the code:

# vim: set expandtab ts=4 sw=4 filetype=python:

import pdb, sys
from decorator import decorator

@decorator
def into_debugger(f, *args, **kwargs):
try:
return f(*args, **kwargs)
except:
pdb.post_mortem()

def main():
x = 99
1/0

if __name__ == '__main__':
if len(sys.argv) > 1 and sys.argv[1] == '--drop-into-debugger':
into_debugger(main)()
else:
main()

That pdb.post_mortem function does the interesting work of inspecting the most recently caught exception. The pdb documentation is pretty dang sparse, but in this case, I got what I needed.

Anyhow, I hope this helps somebody out. I’ll probably be adding a –drop-into-debugger option on every script I write from now on.

pitz has a CLI

I’ve exposed lots and lots of pitz functionality as command-line scripts. Here’s the list so far:

$ pitz-help
pitz-abandon-task Abandon a task
pitz-add-task Walks through the setup of a new Task.
pitz-components All components in the project
pitz-estimates All estimates in the project
pitz-everything No description
pitz-finish-task Finish a task
pitz-milestones No description
pitz-my-tasks List my tasks
pitz-people All people in the project
pitz-prioritize-above Put one task in front of another task
pitz-prioritize-below Put one task behind another task
pitz-recent-activity 10 recent activities
pitz-setup No description
pitz-shell Start an ipython session after loading in a ...
pitz-show Show detailed view of one entity
pitz-start-task Begin a task
pitz-statuses All statuses in the project
pitz-tasks All tasks in the project
pitz-todo List every unstarted and started task in the...
pitz-unassign-task Take this task off somebody's list of stuff ...

Help me rewrite some repetitive scripts

I have about a dozen functions (that are run as scripts) that have very similar sections interleaved with specialized code. I copied two of the scripts below. Here’s the first:
def pitz_estimate_task():

p = optparse.OptionParser()

p.add_option('--version', action='store_true',
help='Print the version and exit')

# This script requires these arguments.
p.set_usage("%prog task [estimate]")
options, args = p.parse_args()

if options.version:
print_version()
return

# This is unique to this script.
if not args:
p.print_usage()
return

# And now we're back to boring generic stuff.
pitzdir = Project.find_pitzdir(options.pitzdir)
proj = Project.from_pitzdir(pitzdir)
proj.find_me()

# This section is specific to this script.
t = proj[args[0]]

if len(args) == 2:
est = proj[args[1]]

else:
est = Estimate.choose_from_already_instantiated()

t['estimate'] = est
# That was the last thing that was specific to just this script.

# Save the project (generic).
proj.save_entities_to_yaml_files()

That script does some “generic” stuff to build an object p, then adds on some extra tweaks to p, and uses p to build an options object and an args object.

Then the script does some generic stuff to build a proj object based on the data in the options.pitzdir object, and does some various method calls on the proj object.

And here’s another script:
def pitz_attach_file():

p = optparse.OptionParser()

p.add_option('--version', action='store_true',
help='Print the version and exit')

# Notice this line is different than the one in pitz_estimate_task.
p.set_usage("%prog entity file-to-attach")
options, args = p.parse_args()

if options.version:
print_version()
return

# This section is different too.
if len(args) != 2:
p.print_usage()
return

# Back to the generic code to build the project.
pitzdir = Project.find_pitzdir(options.pitzdir)
proj = Project.from_pitzdir(pitzdir)
proj.find_me()

# Some interesting stuff that is specific just for this script.
e, filepath = proj[args[0]], args[1]
e.save_attachment(filepath)

# Save the project. (Generic).
proj.save_entities_to_yaml_files()

So, the pattern in every script is: generic code, specific code, generic code, specific code, generic code. And each step depends on the previous step.

I know I could do stuff like wrap all the generic stuff into functions, but I’m not really a fan of that approach. I’m looking for an interesting way to reduce all repetition, but keep the legibility. I’m thinking some nested context managers or decorators might be the way to go. I like to hear ideas from other people, so, please, let me hear them.

By the way, all this code is from the command-line module of pitz, available here. That’s where you can see all the different variations on the same theme.

Why I’m hooked on parameters

I have this need to make everything into a parameter. Here’s some code I wrote recently:

import subprocess, tempfile

def edit_with_editor(s=None):
"""
Returns the text typed in the editor, after running strip().
"""
with tempfile.NamedTemporaryFile() as t:
if s:
t.write(str(s))
t.seek(0)

subprocess.call([os.environ.get('EDITOR', 'vi'), t.name])
return t.read().strip()

There’s a voice in my head that says I shouldn’t be importing the subprocess and tempfile modules outside my function and then referring to them from within. And, the voice goes on, it wouldn’t be OK to move the import within my function either. Instead, if my function needs to use NamedTemporaryFile, I should pass that function in as a parameter.

I think it all starts with the fact that I studied economics in college. I had a lot of lectures that started with a professor drawing something like this on the chalk board:

labor supply = f(...)

And then during the lecture, she would slowly replace the ellipses with parameters. By the end of class, the function might look like:

labor supply = f(income, wealth)

Then I’d also have a few pages of notes explaining the nature of the relationship between how hard somebody is willing to work, the wage they can earn, and the wealth they already have. The thing that sunk in deep is the idea that parameters (and only parameters) are what drives the dependent variable on the left hand site. Anything that drives labor supply is listed as a parameter. If it ain’t on the right hand side, then it is not relevant.

By the way, the war between the wealth effect and the income effect is one of the areas of economics that really does explain our behavior pretty dang well, and there’s neat charts involved, so go read about it.

How to download photos from phone over bluetooth with ubuntu

I have a samsung sync phone. It is about two years old now. It isn’t fancy, but it has a camera and a bluetooth device.

I wrote a python script to copy photos from my phone to my laptop. Any time I take a bunch of pictures, I just run
$ getpics.py
and if my phone is anywhere nearby, my computer will pull all the photos off my phone.

Bluetooth is a great little tool and I’m surprised it hasn’t caught on more with people like us. I’ve been able to access my phone from my laptop when I’m on the second floor and the phone is across the house and downstairs, still in my work bag. And bluetooth USB dongles are cheap (like around $9). There’s no need for carrying around USB keys or plugging in iPods. Everything should use bluetooth.

The getpics.py script requires that the phone and the laptop have already been bonded with each other. That’s not much work. Put your phone in to discoverable mode, then search for devices with your computer. When you find your phone, punch the same 4-digit PIN into the phone and your computer.

Anyway, here’s the script. I hope it helps somebody out.
#! /usr/bin/env python

# W. Matthew Wilson wrote this script and he released
# it into the public domain.

# vim: set expandtab ts=4 sw=4 filetype=python:

"""
Pull all the images off my phone.
"""

import commands, logging, os

from xml.etree import ElementTree
from xml.parsers.expat import ExpatError

logging.basicConfig(level=logging.DEBUG)

def get_list_of_pictures():

"""
Yield a list of strings, where each string is the file name of a
picture.
"""

# You'll need to change the 00:1D... stuff to your phone's
# MAC address.
# You might need to change /Graphics/ to where your phone
# stores its photos.
status, output = commands.getstatusoutput(
"obexftp -b 00:1D:F6:55:7E:D9 -l /Graphics/")

logging.debug("status of obexftp: %s" % status)

logging.info("Finished running obexftp...")

# Unfortunately, the output from obexftp includes some goofy status
# crap and then some XML data. So I parse each line separately,
# and skip the lines where parsing fails.

for line in output.split('\n'):

logging.debug("About to parse line %s." % line.strip())

try:
x = ElementTree.XML(line)
# Skip lines without a name attribute.
if 'name' not in x.keys(): continue

# Extract and yield the name
photofilename = x.get('name')

logging.debug("About to yield %s." % photofilename)

yield photofilename

logging.debug("Finished yielding %s." % photofilename)

except ExpatError:

logging.debug("This line is not valid XML: %s."
% line.strip())

def grab_pic(photofilename):

"""
Pull the photo named photofilename and save it locally.
"""

# You probably want to change this.
os.chdir("/home/matt/Documents/Photos/phonepics")

status, output = commands.getstatusoutput(
"obexftp -b 00:1D:F6:55:7E:D9 -g /Graphics/%s" % photofilename)

logging.info("Finished storing %s locally." % photofilename)

if __name__ == "__main__":

logging.info("Starting...")

for photo in get_list_of_pictures():
grab_pic(photo)

logging.info("All done!")

virtualenvwrapper postactivate and screen is a wonderful combination

I switched to virtualenvwrapper from plain old virtualenv a while back because I found out that virtualenvwrapper has a postactivate hook I can define. Now when I do

$ workon pitz

to start my work day, I get a customized work environment exactly the way I want it. In addition to activating a virtualenv named pitz, the postactivate script changes me into the top of my pitz checkout directory and then it starts up or reattaches a screen session customized just for pitz.

Setting this up was trivially easy. Here’s what my postactivate script looks like:
$ cat ~/.virtualenvs/pitz/bin/postactivate
cd /home/matt/projects/pitz
screen -S pitz -c ~/.screenrc-pitz -d -R

And here is that .screenrc-pitz file:
$ cat ~/.screenrc-pitz
# Matt's homemade .screenrc.

# Draw that blue and red bar across the bottom of the terminal, with the cute
# stuff like the window names and system CPU load.
hardstatus on
hardstatus alwayslastline
hardstatus string "%{.bW}%-w%{.rW}%n %t%{-}%+w %=%{..G} %H %l %{..Y} %m/%d %C%a "

# Let me scroll back a thousand lines or so.
defscrollback 1024

# terminfo and termcap for nice 256 color terminal
# allow bold colors - necessary for some reason
attrcolor b ".I"

# tell screen how to set colors. AB = background, AF=foreground
termcapinfo mlterm 'Co#256:AB=\E[48;5;%dm:AF=\E[38;5;%dm'

# erase background with current bg color
defbce "on"

# Automatically start up a bunch of windows.
screen -t "shell" 0 pitz-shell
screen -t "code" 1
screen -t "tests" 2
chdir docs
screen -t "docs" 3

Everything from the top down to the comment # Automatically start up a bunch of windows. just tweaks how screen works. The last part creates a bunch of named windows. In the first wiindow, it doesn’t start a bash prompt. Instead it starts up the pitz-shell.

One tiny detail that may not be obvious is that if I do $ workon pitz in one terminal, and then maybe I ssh in and rerun $ workon pitz again later, I won’t cause a second screen to start up. Instead, I’ll remote-detach the original session and reconnect to it.

In practice, all my remote servers have highly customized screen sessions already waiting for me. I just need to log in and resume them, and I have all the interesting log files, database connections, and monitoring scripts already up. For example, I know my apache log is open in a pager in window 5. Window 2 has a psql session running. Window 1 has a supervisorctl client running.

This setup shaves off precious seconds when I’m trying to figure stuff out during an emergency.

This stuff isn’t rocket science, but I hope it helps somebody out.

Python niceties

nicety (plural niceties)

  1. A small detail that is nice or polite to have but isn’t necessary.

Make a comma-separated string

>>> x = ['a', 'b', 'c']
>>> ', '.join(x)
'a, b, c'

Isn’t that prettier than some big loop?

If you want to join a list of elements that aren’t strings, you’ll have to add a step:>>> x = [1, 2, 3]
>>> ', '.join(map(str, x))
'1, 2, 3'

I’m using map to convert every element to a string first.

Choose a random element from a sequence

>>> from random import choice
>>> choice(x) # doctest: +SKIP
1

The doctest: +SKIP text is just there because the element returned is randomly chosen, and so I can’t predict what the value will be.

Use keyword parameters in your larger string templates

Well-named keywords make big string templates MUCH easier to deal with:>>> tmpl = """
... Dear %(display_name)s,
...
... Greetings from %(origin_country)s! I have millions of
... %(currency_type)s in my bank account and I can send it all to you if
... you first deposit %(amount)s %(currency_type)s into my account
... first.
...
... Sincerely,
... %(alias)s
... %(origin_country)s
... """

Seeing the key names makes it much easier to remember what gets interpolated. Also, it makes it easier to use the same values more than once.

Usually I pass in a dictionary to the template, but sometimes I do this instead:>>> display_name = 'Matt'
>>> origin_country = 'Nigeria',
>>> currency_type = 'US dollars'
>>> amount = '250.00'
>>> s = tmpl % locals()