Now I have a reason to use staticmethod

The python builtin staticmethod has been one of those language features that I understood, but couldn’t figure out why I would ever use it. When I did my PyOhio talk on decorators, I asked if anyone in the room could explain why to use it. People came up with these ideas:

  • Put associated functions inside the relevant class, so we don’t pollute the module namespace
  • Make java programmers feel less homesick

Both of these are valid, I guess, but in my mind, it just confirmed that I would never use staticmethod.

Then I watched this neat video. You don’t have to go watch that video before you read the next section. I just wanted to point to where I picked up this trick.

Remember that any function bound to a class gets called with an extra parameter inserted at the beginning. That parameter is a reference to the instance of the class. So, you can’t do this:

>>> def f(a, b):
... return a - b
...
>>> class C(object):
... pass
...
>>> C.f = f
>>> c = C()
>>> c.f(3, 1)
------------------------------------------------------------
Traceback (most recent call last):
File "", line 1, in
TypeError: f() takes exactly 2 arguments (3 given)

The call to f blew up because f only takes two parameters, but it got three parameters instead because python automatically adds on the self parameter at the beginning. So f really got called like this: f(self, 3, 1)

Now the reason for staticmethod becomes obvious. If I want to allow instances of my class C to call function f, I have to make f a staticmethod on C, so that the python plumbing won’t insert that extra first argument at the beginning.

So, I’ll overwrite the first C.f method with a static method, and then call c.f again:

>>> C.f = staticmethod(f)
>>> c.f(3, 1) # This is the same already-instantiated c from earlier.
2

As a side effect of watching that video, I think about dependency injection (DI) differently now. In the video, Thomas Woulters says he uses staticmethod for dependency injection. This kind of DI is a different approach than what I’ve seen before. Most everybody that talks about DI emphasizes passing in lots and lots of crap into the __init__ method or in any subsequent methods, so every component can be mocked out.

But it makes just as much sense to graft on dependencies to the class outside of the __init__ method too, since the language supports that. The end goal is still reached. Code can still tweak the dependencies and provide alternative objects, and at the same time, I don’t end up with ridiculously long function signatures.

PyOhio was a smashing success

The Columbus Metro Library offered a fantastic location for us. Wireless internet, multiple meeting rooms, one room with about 30 workstations, etc. Really great location. A $15 donation makes you a “friend” of the library, and gets you a 15% discount at the coffee booth.

Catherine Devlin led the charge of organizing this conference, and she did it amazingly well.

The slides from my decorator talk are available here. I’ll be breaking them down into a series of blog posts with a lot more commentary, so stay tuned.

I need to write faster tests

This is not ideal:

----------------------------------------------------------------------
Ran 84 tests in 370.741s

OK

My tests take so long for two reasons. First of all, most of them use twill to simulate a browser walking through a version of the web app running on localhost. Second, my test code reads like a novel. Here’s an example, slightly embellished to make a point:

setup: connect to the database and find or create a hospital and an employee named “Nurse Ratched.” Find or create a bunch of open shifts in the emergency department. Find or create another nurse named Lunchlady Doris*.

test: Nurse Ratched wants to see what shifts are available to be picked up. So she logs into the app. Then she navigates to the “open shifts” screen, and then filters down to shifts in the emergency department over the next seven days. Then she wants to sign up for the shift starting at midnight on Saturday night. So, she clicks the “sign up” icon. The system verifies that this shift + her already-scheduled hours won’t push her into overtime, and she has no other flags on her account, so she is automatically scheduled.

Then the system sends her a confirmation message, which according to her preferences, is sent to her email address. Then the system queues an SMS message to be delivered an hour before the shift starts in order to remind her (also according to her preferences).

Finally, the test verifies that the shift is now not listed as available by simulating Lunchlady Doris logging in and checking that same “open shifts” screen.

If everything checks out, print a dot, and move on to the next chapter.

teardown: Unassign Nurse Ratched from the shift she picked up.

I think twill in itself is fine. Marching through a series of pages is problematic. I do this to set up conditions for testing later on. As a side benefit, I verify everything checks out along the way.

On the plus side, I’m confident that the integration of all these components do in fact play nice together. I don’t think it’s safe to abandon end-to-end testing like this, but I would like not to depend it every time I want to make some slight change to a component. It would be nice to run these right before a commit, but only run some super-fast tests after each save.


[*]People that understand this reference should reevaluate their priorities in life. back

I heart Python doctests

I wrote the doctests for the function below and then wrote the code to satisfy them in a total of about 30 seconds. As an extra plus, these doctests immediately clarify behavior in corner cases.

def has_no(s):
"""
Return False if string s doesn't have the word 'no' inside.

>>> has_no('no problem')
True

>>> has_no('not really')
False

>>> has_no('no')
True

>>> has_no('oh nothing')
False
"""

if s.lower() == 'no': return True
if s.lower().startswith('no '): return True
if s.lower().endswith(' no'): return True
if ' no ' in s.lower(): return True

return False

Writing tests in any other testing framework would have taken me much longer. Compared to writing these tests with nose, writing this:

assert not has_no('oh nothing')

wouldn’t take me any more time than

>>> has_no('oh nothing')
False

But that’s not all there is to it. With nose, I’d need to open a new test_blah.py file, then import my original blah.py module, then I would have to decide between putting each assert in a separate test function or just writing a single function with all my asserts.

That’s how a 30-second task turns into a 5-minute task.

Anyhow, I’m surprised doctests don’t get a lot more attention. They’re beautiful. Adding tests to an existing code base couldn’t be any simpler. Just load functions into an interpreter and then play around with it (ipython has a %doctest_mode, by the way).

For a lot of simple functions (like the one above) it is easy to just write out the expected results manually rather than record from a session.

It is also possible to store doctests in external text files. The Django developers use this trick frequently.

Finally, I don’t try to solve every testing problem with doctests. I avoid doctests when I need elaborate test fixtures or mock objects. Most of my modules have a mix of functions with doctests and nose tests somewhere else to exercise the weird or composite stuff.

Incidentally, this post is where Tim Peters introduced the doctests module.

Help improve my PyOhio talk

I ran through my PyOhio presentation at tonight’s Clepy meeting.

I think I’ll spend more time talking about the material in the slides, rather than pausing just long enough to scan them with my eyes and move to the next. I’m anxious about boring people, so I think I go at a frenzied pace.

Also I need to learn how to tweak s5 (or at least rst2s5.py) so that I can have more control over how my content appears. A fair number of code samples had the last few lines truncated.

Anyway, I welcome comments on my presentation.

Doing my part to promote PyOhio.

It’s going to be a brain-melting conference with amazing swag. We got people you wouldn’t believe lined up to present. Google recruiters will be there with suitcases full of cash looking for new hires. Terminator robots will travel backwards in time to try to prevent all the amazing breakthroughs that will happen on this day. We’re gonna shake the foundations of science.

I already mentioned my topic but here it is again because I’m shameless.

Wacky idea for python coroutines

Christian Dowski posts some uses for python’s enhanced generators. I tried to type a comment on that post, but I couldn’t figure out how to submit it successfully. Either comments are not allowed or I failed the CAPTCHA.

Anyhow, ever since I read about how common lisp handles exceptions, I’ve been daydreaming about how to do the same trick in python. In lisp, an exception jumps to some other place to get handled, just like in python. However, what is different is that the exception handler can repair the problem and then hand control back into the original block. For example, in the lisp toplevel, if you forget to define a variable before you try to copy its value to somebody else, the exception will propagate to the debugger. And in the debugger, you can then assign a value to that variable, and then resume your original program.

So that’s the background for my idea for generators. The generator that is trying the exception-raising code could yield the traceback to another generator when it hits an uncaught exception. Then the other generator, the exception-handling generator, could repair/log/do whatever, and then yield a value back to the original code.

For example, if the original code is iterating through a list of two-tuples, and for each two-tuple, it divides the first element by the second element, when it raises a ZeroDivisionError, it could catch that and yield it over to the exception handler. Then the exception handler could do whatever, like maybe prompt the programmer to choose a new denominator. After the programmer chose a new denominator, the exception handler could yield that back to the original generator and then the original generator could resume.

Lua calls generators that can receive values “coroutines” or “non-preemtible threads”. I think those are better labels because they hint that generators are way more than just iterators in drag.

defaultdict.fromkeys does not play nice.

I use defaultdicts a lot when I’m grouping elements into a dictionary of lists. Here’s a simple example:

>>> a = defaultdict(list)

>>> a['x']
[]

>>> a['y'].append('yellow')

>>> a
defaultdict(, {'y': ['yellow'], 'x': []})

Now here’s where I got silly. I used defaultdict.fromkeys to prepopulate the ‘x’ and ‘y’ key right away, because I know I needed those:

>>> b = defaultdict.fromkeys(['x', 'y'], list)

>>> b
defaultdict(None, {'y': , 'x': })

>>> b['x']

>>> b['z']
------------------------------------------------------------
Traceback (most recent call last):
File "", line 1, in
KeyError: 'z'

Wowsa! b calls itself a defaultdict, but it is not a defaultdict.

I haven’t really thought this through, but this behavior is so unexpected that I would prefer that defaultdict.fromkeys raised a NotImplementedError.

My proposed talk for PyOhio

Here’s what I submitted for a presentation topic for PyOhio:

TITLE: Decorators are fun

EXPERTISE LEVEL: Hopefully, there will be something for everyone. Novices might enjoy the material at the beginning mostly, while experts would likely be more interested in the ruledispatch discussion.

SUMMARY: This talk will start with a friendly walkthrough of decorators for people that have never seen them, then go into some straightforward examples, then finish with a review of how decorators are used in Philip Eby’s ruledispatch package.

OUTLINE:

  • The simplest possible decorator.
  • Pass arguments to a decorator.
  • Write a decorator and still preserve the function signature of the decorated function.
  • Coerce values into a function into types using decorators.
  • Log values coming out of a function.
  • Phillip Eby’s ruledispatch package implements generic functions, aka multimethods, for python. I’ll walk through how he uses decorators, and why they’re such a good idea for this.

How to use tg-admin sql upgrade

The tg-admin script that is bundled with turbogears is really helpful, but I had a hard time learning how to use it.

Before you read any more, you should know that this only works when you use SQLObject, not SQLAlchemy, for your ORM.

These are my notes on how I use tg-admin to upgrade an existing database.

  • I have a production database that uses prod.cfg;
  • I have a development database that uses dev.cfg;
  • Neither databases have a sqlobject_db_version table initially, because I never payed attention to it yet.

The development database has a bunch of new columns, tables, and indexes that I want to add to the production database. For this example, I’ll pretend that all I want to do is add an index to a table.

First, I made sure that the dev database matches sqlobject classes:

tg-admin -c dev.cfg sql status

If those are out of sync, then do whatever you need to do to make sure your actual dev database matches your classes. Of course, tg-admin sql status is not perfect. For example, it overlooks missing indexes and constraints, at least with postgres.

Next, I recorded the state of the development database:

tg-admin -c dev.cfg sql record --force-db-version=2008-03-21

This will make a new table in the dev database called sqlobject_db_version. I am forcing it to have a value of today’s date (March 21st, 2008).

Now I connect to the production database and set a version on it with yesterday’s date:

tg-admin -c prod.cfg sql record --force-db-version=2008-03-20

Now I run this to try to upgrade the production database to match the development database:

tg-admin -c prod.cfg sql upgrade

Of course, that should fail, and I see some error message sort of like this:

$ tg-admin -c prod.cfg sql upgrade
Using database URI postgres://staffknex:staffknex@localhost/staffknex320
No way to upgrade from 2008-03-20 to 2008-03-21
(you need a 2008-03-20/upgrade_postgres_2008-03-21.sql script)

This is an example of a helpful error message. I need to write a script that will explain how to upgrade from yesterday’s version to today’s version.

That script will be really simple:

BEGIN;
CREATE UNIQUE INDEX majestic12 ON ufo_theorists (first_name, last_name);
END;

I suggest using BEGIN and END so that in case something goes wrong in the middle, your transaction will be rolled back automatically.

Now I can run this:

tg-admin -c prod.cfg sql upgrade

And my production database will be upgraded with the new index.

Now for some complaints:

  • Why isn’t this advertised better? This is a really nice feature.
  • You’re supposed to be able to specify the URI on the command-line with the –connection option, but I could never get it to work.
  • I really wish that tg-admin sql status detected stuff like missing indexes and constraints. I use these things heavily.
  • It would be nice to be able to mix python into the upgrade script, rather than just SQL. For example, I recently dropped a column that had both an employee’s first and last name, and separated this into two columns. I used SQL to make the new columns, then I used python to read data out of the old single column and write it to the two new columns. Then I used SQL again to drop the old column.

Like I said at the beginning, this is a really helpful script and I’m very grateful to whoever wrote it.