My article is finally online

Introduction to Python Decorators is available for you to read after you fill out the annoying registration form.

I have a few ideas for the next article. Do any of these seem interesting?

  1. Demystify metaclasses: use metaclasses to add camel-cased aliases for underscored method names, show how to automatically make methods into properties, and build a crude ORM.
  2. Explore logging with python, ranging from writing out to local files to setting up a syslog-ng server. Show how to use logging config files and filters.
  3. Build a prototype inheritance system into python. I got really interested in prototype inheritance when I studied lua. Prototypes make it really easy to change class-based behaviors at run time.

Finally, the meaning behind the pirates-vs-ninjas debate became clear to me during a recent nitrous-oxide haze (no, not how you think; I was getting my teeth cleaned at the dentist). Anyhow, pirates and ninjas are symbols.

The ninja is a metaphor for the corporate employee. A ninja will get the job done or die trying. A ninja will kill everyone in his own family if he’s ordered to. A ninja has no sense of entitlement or dignity or flex time.

Meanwhile, the pirate is the entrepeneur, or maybe the upper-level executive. He has no sense of duty or honor. He seeks adventure and glory only. He’ll jump ship as soon as possible. He might even maroon his crew-mates on a desert island if it means he gets the treasure to himself.

Pirates love to hire ninjas because a ninja never disobeys. Ninjas love to kill pirates because they can pretend they’re killing their own pirate boss.

Neat code complexity tool.

David Stanek wrote a nice utility to measure code complexity. This post explains the details. Anyway, I downloaded his code and tried it out. I really like it:

$ cat matt.py
"""A few functions for use with pygenie."""
def f(x):
"Make an 8-way branch, 1 layer deep."
if x == 1: return 1
elif x == 2: return 2
elif x == 3: return 3
elif x == 4: return 4
elif x == 5: return 5
elif x == 6: return 6
elif x == 7: return 7
elif x == 8: return 8

def g(a, b, c):
"This function has 8 paths."
if a:
if b:
if c:
return 1 # a and b and c.
else:
return 2 # a and b and not c.
else:
if c:
return 3 # a and not b and c
else:
return 4 # a and not b and not c.
else:
if b:
if c:
return 5 # not a and b and c.
else:
return 6 # not a and not b and c.
else:
if c:
return 7 # not a and b and not c.
else:
return 8 # not a and not b and not c.

def h(x):
if x: return True
else: return False

And this is what happens when I run the code:

$ ./pygenie.py complexity matt.py
File: /home/matt/svn-checkouts/cyclic_complexity/matt.py
Type Name Complexity
--------------------
F f 9
F g 8

The functions f and g have a complexity exceeding 7, so they print out.

This might make a nice nose plugin.

Python’s hash lookup is insanely good

As near as I can tell, looking up a string inside a set is effectively free.

I made a 9000-element set, where each element is foo_0, or foo_1, foo_2, … , on up to foo_8999. Then I measured the time cost of testing whether an element belongs to that set:

$ python -m timeit -s 's = set(["foo_%s" % i for i in range(9000) ])' '"foo_4500" in s'
1000000 loops, best of 3: 0.447 usec per loop

Next I measured a few scans across a list of the same size:

$ python -m timeit -s 's = ["foo_%s" % i for i in range(9000) ]' '"foo_0" in s'
1000000 loops, best of 3: 0.447 usec per loop
$ python -m timeit -s 's = ["foo_%s" % i for i in range(9000) ]' '"foo_1" in s'
1000000 loops, best of 3: 0.659 usec per loop
$ python -m timeit -s 's = ["foo_%s" % i for i in range(9000) ]' '"foo_900" in s'
10000 loops, best of 3: 130 usec per loop
$ python -m timeit -s 's = ["foo_%s" % i for i in range(9000) ]' '"foo_4500" in s'
1000 loops, best of 3: 631 usec per loop

It takes more time to do to two string comparisons than it does to hash the string and look it up in the set.

A few rules I try to follow with TurboGears

These are a few of the rules I try to follow in my design. So far, they’ve helped me out.

I aim to finish all interaction with the database before I get to the template layer.

This is non-trivial because it is so easy to forget that a method or an attribute will evaluate into a query. I use this rule because it lets me be certain about the number of interactions each page will have with the database.

I avoid branching (if-else clause) in my templates as much as possible.

I have a really hard time detangling code when I find a bunch of nested if statements. For all but the most trivial instances, I prefer to have a bunch of similar templates and then choose the best one. For example, instead of handling both a successful login and a failed login in a single template, I’ll make two different files and then choose the right one in my controller.

In practice, I have some really similar templates. But then I go back and strip out as much of the common code as possible and put those into widgets.

Any time I find a select() call in my controller, I consider making a new method in my model.

When I write something like this in a controller:

bluebirds = model.Bird.select(Bird.q.color == 'blue')

I usually come back later and put in something like this into the Bird class:

class Bird(SQLObject):
color = UnicodeCol()

@classmethod
def by_color(cls, color)
return cls.select(cls.q.color == color)

Now I have something that I can reuse. If I’m feeling whimsical I’ll use functools.partial to do something like this:

class Bird(SQLObject):
color = UnicodeCol()

def by_color(self, color):
return self.select(self.q.color == color)

redbirds = classmethod(partial(by_color, color='red'))
bluebirds = classmethod(partial(by_color, color='blue'))

Sidenote: I couldn’t figure out how to use the @classmethod decorator in the second version of by_color because partial complained. Appararently, callable(some_class_method) returns False, and partial requires the first argument to be a callable.

Maybe a reader can explain to me what’s going on there…

A few half-formed thoughts on SQLObject

I love SQLObject, but this is a rant about the tiny frustrations I face with it.

First, this is a minor point. I don’t really care about database independence that much. Postgres has a lot of wonderful features: I never have to worry about choosing the table engine that will enforce foreign key constraints, I like creating indexes with function inside:

create unique index nodup_parent on category (org_id, parent_cat, lower(name));

and I really like how easy it is to write stored procedures. Anyway, since I know I’m going to use postgresql, I don’t want to be restricted to only the features that exist or can be emulated in every platform. I know all about sqlmeta and createSQL and use it plenty. But I don’t like how when I set a default value, sometimes it is set in the database table, and other times, it isn’t.

Anyway, in practice, the most dangerous part of using SQLObject is that it hypnotizes you into forgetting about the queries behind everything. Imagine you have employees, departments, and a join table between them. You can set this up in SQLObject like this:

class Employee(SQLobject):
name = UnicodeCol(alternateID=True)
departments = RelatedJoin('Department')

class Department(SQLObject):
name = UnicodeCol(alternateID=True)
employees = RelatedJoin('Employee')

You want to draw a grid that indicates whether each user is a member in every group, so you might dash off some code like this:

for emp in Employee.select():
for d in Department.select():
if d in emp.departments:
print "yes!"
else:
print "no!"

In an ideal scenario, you can do this with three simple queries:

  • You need a list of employees
  • You need a list of departments
  • You need the list of employee-department of associations.

People that talk about how you can use outer joins to cram all that into one query will be dropped into a bottomless pit. Besides, I profiled it, and three separate queries is often much cheaper.

Anyway, back to the point. SQLObject will only run a single query to get the employees and a separate single query to get all the departments. So that’s good.

However, the place where all hell breaks loose is that if clause in the middle. If we have three employees and four departments, this statement

if d in emp.departments:

executes a dozen times. That’s unavoidable. The problem is that each time it executes, SQLObject runs a query like:

select department_id from department_employee where employee_id = (whatever);

Every time you say “is this particular department in this employee’s list of departments?” SQLObject grabs the full list of departments for that employee. So, if you ask about 10 different departments, you will run the exact same query ten times. Sure, the database is likely to cache the results of the query for you, but it is still very wasteful.

With just a few employees and a few departments, that’s not so bad. Eventually, though, as the number of employees and departments grow, the cost of that code grows at N2, which is just geek slang for sucky.

So, in conclusion, this may sound like a rant, but it really isnt. SQLObject is great. But it isn’t magic. It’s a great scaffolding system. But now I find that I’m rewriting a fair portion of code in order to reduce the database costs.

Aside: when I started paying attention to the queries generated by SQLObject, I found it really useful to edit postgresql.conf and enable log_min_duration_statement. Then every query and its cost will be logged for you. This is really useful stuff. It’s helped me to relax about doing a lot of things that I used to think were really bad.

How to use itertools.cycle to set even and odd rows

I find code like this in a lot of web applications:

list_of_x_objects = ['a', 'b', 'c', 'd', 'e']
for i, x in enumerate(list_of_x_objects):
if i % 2: htmlclass = "odd"
else: htmlclass = "even"
print """

  • %s
  • """ % (htmlclass, x)

    Never mind the print statement. That’s just to illustrate the point without having to explain some template syntax.

    The same thing can be expressed with itertools.cycle:

    list_of_x_objects = ['a', 'b', 'c', 'd', 'e']
    for htmlclass, x in zip(itertools.cycle(['odd', 'even']), list_of_x_objects):
    print """

  • %s
  • """ % (htmlclass, x)

    I see several advantages of the second approach:

    • It’s way more flexible. I can easily switch to a style that repeats every three lines (or four, or five…).
    • I don’t create the variable i when all I really want is a class variable that toggles between values.
    • The second approach avoids the modulus operator. Since I hardly ever use the modulus operator, when I do come across it, I always have to take a second and puzzle out what’s happening.

    Notes from Cleveland Ruby meeting on Thursday, Jan 25th

    This post contains some python-related information, I promise.

    Fun time. Corey Haines explained behavior-driven development and showed some examples using RSpec at last night’s Cleveland Ruby meetup.

    As an aside, Corey said “powershell is what the unix command line will be when it grows up” and a thousand angels fell over dead when they heard this blasphemy.

    The story-based tests in RSpec seem downright magic. You can write in an english-y syntax:

    Given a = 1,
    When
    b.foo(a)
    Then
    b should return "Hurray"

    Or something like that.

    I like that RSpec supports a result called “Pending”. This guy writes a good explanation of how it works, and I agree with this remark:

    It’s easy enough to rename a test method so it doesn’t execute, but before RSpec I’ve never worked with one where you can mark it as pending and it then reminds you that you still have work to come back too.

    I figure that it would be straightforward to add this into nose. Maybe raise a special exception called PendingTest that gets caught differently.

    I learned a neat way of using a mock object without having to pass it in as a parameter based on some code I saw last night.

    Corey had a couponcontroller that operated on coupon objects. He made a mock coupon object to use with his tests for his couponcontroller. Then, in his test code, he monkeypatched the coupon module so that when somebody said “give me a coupon” he got a mock coupon instead.

    I spent a few minutes trying something vaguely like that in python. I’m not sure I like it, but it gets the point across.

    I have a file coupon.py:

    # This is coupon.py.

    class Coupon(object):
    "I'm the real coupon"

    def foo(self):
    print "This is the real coupon"
    return "foo"

    And I have a file couponcontroller.py:

    # This is couponcontroller.py.

    from coupon import Coupon

    def couponcontroller():
    c = Coupon()
    return c.foo()

    In my test_couponcontroller.py, I want the couponcontroller to use my mock coupon, not the real one.

    # This is test_couponcontroller.py.

    import couponcontroller

    class mockCoupon(object):
    "I'm not the real coupon."
    def foo(self):
    print "Congratulations. You're using a mock."
    return "foo"

    def setup():
    # Mess with the module.
    couponcontroller.Coupon = mockCoupon

    def test_couponcontroller():
    "couponcontroller should return a string 'foo'"
    assert couponcontroller.couponcontroller() == "foo"

    It seems to work:

    $ nosetests -s test_couponcontroller.py
    couponcontroller should return a string 'foo' ... Congratulations. You're using a mock.
    ok

    ----------------------------------------------------------------------
    Ran 1 test in 0.003s

    OK

    In summary, there’s clearly a lot of smart people in the ruby community, even if they insist on using syntax like

    @adder ||= Adder.new

    Possible bug in 1.0.4b3 tag of turbogears

    The /visit/api.py file in the 1.0.4b3 tag of turbogears has this function, starting on line 177:

    def encode_utf8(params):
    '''
    will recursively encode to utf-8 all values in a dictionnary
    '''
    res = dict()
    for k, v in params.items():
    if type(v) is dict:
    res[k] = encode_utf8(v)

    else:
    res[k] = v.encode('utf-8')

    return res

    If you have a query string like ?a=1&a=2, then params has a key u’a’ that points to a list that contains u’1′ and u’2′. And encode isn’t defined for lists, so . . .

    Fortunately, the /visit/api.py file in the branches/1.0 branch already has a fix for this problem, so I ran setup.py develop in my checkout directory and was back in business.

    I lost so much time today figuring this out because I kept looking for the bug in my code, rather than in the framework itself. Also, the code works fine as long as the query string doesn’t have more than one value for the same key.

    While I’m on the soapbox, I really wish that testutil.py would change this function:

    def tearDown(self):
    database.rollback_all()
    for item in self._get_soClasses():
    if isinstance(item, types.TypeType) and issubclass(item,
    sqlobject.SQLObject) and item != sqlobject.SQLObject \
    and item != InheritableSQLObject:
    item.dropTable(ifExists=True)

    to something sort of like this instead:

    def tearDown(self):
    database.rollback_all()
    import copy # Probably don't actually import here, but this is just for illustration.
    x = copy.copy(self.__get_soClasses()) # store a copy of the list.
    x.reverse() # Now reverse it.
    for item in x: # Iterate the reversed copy.
    if isinstance(item, types.TypeType) and issubclass(item,
    sqlobject.SQLObject) and item != sqlobject.SQLObject \
    and item != InheritableSQLObject:
    item.dropTable(ifExists=True)

    The whole point of using self.__get_soClasses is that it looks for a list that defines the order to follow when creating tables. You can define soClasses in your model to make sure that your independent tables are created before your dependent tables.

    Well, when it comes time to destroy all your tables, you should destroy the dependent tables first.

    I posted this about a month ago to the turbogears trunk mailing list already.

    Sidenote — if you’re one of the people that are selflessly donating your time to working on turbogears, please don’t take my rants here personally. I’m really grateful that other people are building tools and giving them away, so that I can make a living.

    MVC Blasphemy

    I just put HTML code into my data model. I have a list-of-objects page. Each object is an instance of an object defined in my data model, derived from a row in a database. Each object needs a pretty link drawn that object’s detailed-view page. So I added a property on my object:
    class Message(SQLObject):
    def _get_view(self):
    "Draw a link to the view page for this message."
    return cElementTree.XML("""VIEW""" % self.id)
    # Lots of other stuff snipped out.

    This is now what my kid template looks like:

    MESSAGE STUFF

    I pass in messages and columns; messages is a list of objects and columns is a tuple of strings that map to attributes or properties, like “view”.

    I’m happy with this decision. I know I could have manipulated the messages or created some new classes in my controller, but I couldn’t really see any advantage. This way works.

    I just don’t want anyone else doing this 🙂

    Don’t put parentheses around your assert expressions and the error string!

    I had a bunch of unit tests that were passing even though I knew they should be failing. I traced it down to the fact that I put parentheses around my assert statements because the tests were really really long and I wanted to put the error string on a separate line.

    This is what I spent the last 45 minutes trying to figure out:

    >>> assert 1 == 0, "OH NOES"
    ------------------------------------------------------------
    Traceback (most recent call last):
    File "", line 1, in
    AssertionError: OH NOES

    >>> assert (1 == 0,
    ... "OH NOES")

    >>> assert (1 == 0, "OH NOES")

    >>>

    The assertion doesn’t raise because I suspect that the assert evaluates each element in the tuple separately, and the string returns True.

    And these don’t work, but for different reasons:

    >>> (assert 1 == 0, "OH NOES")
    ------------------------------------------------------------
    File "", line 1
    (assert 1 == 0, "OH NOES")
    ^
    SyntaxError: invalid syntax

    >>> assert 1 == 0,
    ------------------------------------------------------------
    File "", line 1
    assert 1 == 0,
    ^
    SyntaxError: invalid syntax

    Dangit.