Help improve my PyOhio talk

I ran through my PyOhio presentation at tonight’s Clepy meeting.

I think I’ll spend more time talking about the material in the slides, rather than pausing just long enough to scan them with my eyes and move to the next. I’m anxious about boring people, so I think I go at a frenzied pace.

Also I need to learn how to tweak s5 (or at least rst2s5.py) so that I can have more control over how my content appears. A fair number of code samples had the last few lines truncated.

Anyway, I welcome comments on my presentation.

Don’t spend your termite poison money on insurance against Martian invasions.

This post wanders all over the place and I’m not sure I’m articulating my thoughts very well. Comments and criticism are welcome.

Fannie Mae and Freddie Mac (I don’t know why these companies have such ridiculous names either) are bound by regulations to hold enough capital (cash dollars) in order to remain solvent across some theoretical worst-case scenarios. The regulators dreamed up some really extreme situations that could likely bankrupt these companies, and insisted that the companies held enough cash to survive.

When I worked at Fannie Mae in the department that wrestled with the C++ model that calculated our reserve requirements for these 10-year stress tests, we used to joke around about how unlikely these stress tests really were. We would say that we might as well buy insurance against Martian invasions, or against all the animals teaming up together to attack humanity.

While Fannie Mae was legally complying with these unrealistic scenarios, the sub-prime crisis was a scenario that they were not prepared for, and it slaughtered them. The CEO had to step down. The price fell from around $80 a share when I left in 2001 to $18 today.

The sub-prime crisis at its core is very mundane. Lenders got sloppy and investors let their greed entice them to take risks they shouldn’t have. That’s all there is to it. Local banks lent money to high-risk borrowers, then the banks sold the loan to Fannie Mae, who sold the loans to Wall Street. Investors preferred the high-return investments over the low-return boring crap.

No perfect storm was necessary to trigger this. It was just a whole lot of people getting sloppy and eventually enough straws accumulated to break the camel’s back. The same pattern played out in the seventeenth century and probably a hundred times since then.

Now I’m a workaday programmer, and I see the same dynamic in code. People write elaborate systems to protect against ridiculously unlikely scenarios but then skimp on the boring stuff. Maybe they get the hard parts done but never make sure their app’s internals are well documented, easy to maintain, and intuitively designed.

In my experience, it’s the mundane bugs, not the diabolically clever hackers, that cause me the most grief.

If I write some algorithm that costs O(n2), I will almost immediately start trying to tame it down. The voices in my brain scream about worst-case costs. Macho programmers write badass algorithms. However, I find that the really smart thing to do is to spend a few minutes thinking about the likely use cases. If I know that for the forseeable future, I’m never going to run this algorithm with n > 5, then I think the grown-up thing to do is to write a big fat docstring that reminds me later about this risk, and then move on to getting other stuff done.

The market rewards a good-enough and finished solution more than an potentially amazing but currently unfinished solution.

If Fannie Mae had focused on just making sure that they were vetting the loans better, things wouldn’t have been so bad. The theoretical worst case scenarios are not going to happen before the more likely stuff goes wrong. I worked at Fannie Mae preparing against Martian invaders. We ignored the termites in the walls, so to speak.

Doing my part to promote PyOhio.

It’s going to be a brain-melting conference with amazing swag. We got people you wouldn’t believe lined up to present. Google recruiters will be there with suitcases full of cash looking for new hires. Terminator robots will travel backwards in time to try to prevent all the amazing breakthroughs that will happen on this day. We’re gonna shake the foundations of science.

I already mentioned my topic but here it is again because I’m shameless.

Wacky idea for python coroutines

Christian Dowski posts some uses for python’s enhanced generators. I tried to type a comment on that post, but I couldn’t figure out how to submit it successfully. Either comments are not allowed or I failed the CAPTCHA.

Anyhow, ever since I read about how common lisp handles exceptions, I’ve been daydreaming about how to do the same trick in python. In lisp, an exception jumps to some other place to get handled, just like in python. However, what is different is that the exception handler can repair the problem and then hand control back into the original block. For example, in the lisp toplevel, if you forget to define a variable before you try to copy its value to somebody else, the exception will propagate to the debugger. And in the debugger, you can then assign a value to that variable, and then resume your original program.

So that’s the background for my idea for generators. The generator that is trying the exception-raising code could yield the traceback to another generator when it hits an uncaught exception. Then the other generator, the exception-handling generator, could repair/log/do whatever, and then yield a value back to the original code.

For example, if the original code is iterating through a list of two-tuples, and for each two-tuple, it divides the first element by the second element, when it raises a ZeroDivisionError, it could catch that and yield it over to the exception handler. Then the exception handler could do whatever, like maybe prompt the programmer to choose a new denominator. After the programmer chose a new denominator, the exception handler could yield that back to the original generator and then the original generator could resume.

Lua calls generators that can receive values “coroutines” or “non-preemtible threads”. I think those are better labels because they hint that generators are way more than just iterators in drag.

defaultdict.fromkeys does not play nice.

I use defaultdicts a lot when I’m grouping elements into a dictionary of lists. Here’s a simple example:

>>> a = defaultdict(list)

>>> a['x']
[]

>>> a['y'].append('yellow')

>>> a
defaultdict(, {'y': ['yellow'], 'x': []})

Now here’s where I got silly. I used defaultdict.fromkeys to prepopulate the ‘x’ and ‘y’ key right away, because I know I needed those:

>>> b = defaultdict.fromkeys(['x', 'y'], list)

>>> b
defaultdict(None, {'y': , 'x': })

>>> b['x']

>>> b['z']
------------------------------------------------------------
Traceback (most recent call last):
File "", line 1, in
KeyError: 'z'

Wowsa! b calls itself a defaultdict, but it is not a defaultdict.

I haven’t really thought this through, but this behavior is so unexpected that I would prefer that defaultdict.fromkeys raised a NotImplementedError.

Frustration with postfix and sending email to a script

I want to set up an email address that executes a script with every incoming email, so in my /etc/aliases, I did this:

runscript: |/usr/local/bin/email_reading_script.py

Then I rebuilt the postfix alias db with sudo postalias /etc/aliases, and then sent an email to runscript@localhost. That’s when the fun began. The script is run by nobody/nogroup, so it couldn’t log to my logging directory, because I require people to be in the adm unix group.

Then I created a user named runscript, and moved that | /usr/local/bin/email_reading_script.py line into a .forward file. I added adm to runscript’s unix groups.

AND STILL NO JOY.

I don’t know why, but when the script runs, the shell only seems to get the primary group. So, I kept getting permission-denied errors. I finally got stuff to work when I set adm to be the primary group for my runscript user. Now everything is OK.

This ate up the better part of the @#$@#$ing day. Grr.

If this didn’t work, I was going to install procmail and go down that route.

Am using ubuntu hardy heron.

My proposed talk for PyOhio

Here’s what I submitted for a presentation topic for PyOhio:

TITLE: Decorators are fun

EXPERTISE LEVEL: Hopefully, there will be something for everyone. Novices might enjoy the material at the beginning mostly, while experts would likely be more interested in the ruledispatch discussion.

SUMMARY: This talk will start with a friendly walkthrough of decorators for people that have never seen them, then go into some straightforward examples, then finish with a review of how decorators are used in Philip Eby’s ruledispatch package.

OUTLINE:

  • The simplest possible decorator.
  • Pass arguments to a decorator.
  • Write a decorator and still preserve the function signature of the decorated function.
  • Coerce values into a function into types using decorators.
  • Log values coming out of a function.
  • Phillip Eby’s ruledispatch package implements generic functions, aka multimethods, for python. I’ll walk through how he uses decorators, and why they’re such a good idea for this.

The difference between syntactic analysis and code generation

I can parse the text for the paragraph below very quickly. The author uses simple grammar. However, it is taking me hours of study (following footnotes) to make any sense out of it:

The bin_rec function is an example of a hylomorphism (see citeseer.ist.psu.edu/meijer91functional.html)—the composition of an anamorphism (an unfolding function) and a catamorphism (a folding function). Hylomorphisms are interesting because they can be used to eliminate the construction of intermediate data structures (citeseer.ist.psu.edu/launchbury95warm.html).

From the article Cat: A Functional Stack-Based Little Language in the April issue of DDJ.

This experience matches how I imagine programming-language interpreters and compilers work. In the first pass, the interpreter reads in all the text and breaks it down grammatically, mapping chunks of text into nodes that have labels like “IDENTIFIER” or “FUNCTION DEFINITION” or whatever.

Then in the second pass, the system walks through the nodes and gets down to the business of writing out the ones and zeros that tell the little men inside my computer what to do.

I haven’t studied compilers formally (hey, my degree is in economics!) so please let me know how far off base I am. I’m aware that in reality, the first and second passes may not be separate from each other or can be interleaved.

How to use tg-admin sql upgrade

The tg-admin script that is bundled with turbogears is really helpful, but I had a hard time learning how to use it.

Before you read any more, you should know that this only works when you use SQLObject, not SQLAlchemy, for your ORM.

These are my notes on how I use tg-admin to upgrade an existing database.

  • I have a production database that uses prod.cfg;
  • I have a development database that uses dev.cfg;
  • Neither databases have a sqlobject_db_version table initially, because I never payed attention to it yet.

The development database has a bunch of new columns, tables, and indexes that I want to add to the production database. For this example, I’ll pretend that all I want to do is add an index to a table.

First, I made sure that the dev database matches sqlobject classes:

tg-admin -c dev.cfg sql status

If those are out of sync, then do whatever you need to do to make sure your actual dev database matches your classes. Of course, tg-admin sql status is not perfect. For example, it overlooks missing indexes and constraints, at least with postgres.

Next, I recorded the state of the development database:

tg-admin -c dev.cfg sql record --force-db-version=2008-03-21

This will make a new table in the dev database called sqlobject_db_version. I am forcing it to have a value of today’s date (March 21st, 2008).

Now I connect to the production database and set a version on it with yesterday’s date:

tg-admin -c prod.cfg sql record --force-db-version=2008-03-20

Now I run this to try to upgrade the production database to match the development database:

tg-admin -c prod.cfg sql upgrade

Of course, that should fail, and I see some error message sort of like this:

$ tg-admin -c prod.cfg sql upgrade
Using database URI postgres://staffknex:staffknex@localhost/staffknex320
No way to upgrade from 2008-03-20 to 2008-03-21
(you need a 2008-03-20/upgrade_postgres_2008-03-21.sql script)

This is an example of a helpful error message. I need to write a script that will explain how to upgrade from yesterday’s version to today’s version.

That script will be really simple:

BEGIN;
CREATE UNIQUE INDEX majestic12 ON ufo_theorists (first_name, last_name);
END;

I suggest using BEGIN and END so that in case something goes wrong in the middle, your transaction will be rolled back automatically.

Now I can run this:

tg-admin -c prod.cfg sql upgrade

And my production database will be upgraded with the new index.

Now for some complaints:

  • Why isn’t this advertised better? This is a really nice feature.
  • You’re supposed to be able to specify the URI on the command-line with the –connection option, but I could never get it to work.
  • I really wish that tg-admin sql status detected stuff like missing indexes and constraints. I use these things heavily.
  • It would be nice to be able to mix python into the upgrade script, rather than just SQL. For example, I recently dropped a column that had both an employee’s first and last name, and separated this into two columns. I used SQL to make the new columns, then I used python to read data out of the old single column and write it to the two new columns. Then I used SQL again to drop the old column.

Like I said at the beginning, this is a really helpful script and I’m very grateful to whoever wrote it.

My ten most-frequently-used shell commands

Here they are:
$ history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}'|sort -rn|head
80 cd
59 svn
49 bzr
40 sudo
35 vi
32 nosetests
26 l
15 rfcomm
14 screen
14 c

l is an alias for ls and c is an alias for clear. rfcomm is how I connect to my mobile phone over a virtual serial port via bluetooth.

I’m happy that vi and nosetests are right next to each other. It looks like I’m pretty good about rerunning my test cases after editing.

I got the idea for this post from this guy.