Archive for Programming


The difference between syntactic analysis and code generation

I can parse the text for the paragraph below very quickly. The author uses simple grammar. However, it is taking me hours of study (following footnotes) to make any sense out of it:

The bin_rec function is an example of a hylomorphism (see citeseer.ist.psu.edu/meijer91functional.html)—the composition of an anamorphism (an unfolding function) and a catamorphism (a folding function). Hylomorphisms are interesting because they can be used to eliminate the construction of intermediate data structures (citeseer.ist.psu.edu/launchbury95warm.html).

From the article Cat: A Functional Stack-Based Little Language in the April issue of DDJ.

This experience matches how I imagine programming-language interpreters and compilers work. In the first pass, the interpreter reads in all the text and breaks it down grammatically, mapping chunks of text into nodes that have labels like “IDENTIFIER” or “FUNCTION DEFINITION” or whatever.

Then in the second pass, the system walks through the nodes and gets down to the business of writing out the ones and zeros that tell the little men inside my computer what to do.

I haven’t studied compilers formally (hey, my degree is in economics!) so please let me know how far off base I am. I’m aware that in reality, the first and second passes may not be separate from each other or can be interleaved.

Comments (2)

How to use tg-admin sql upgrade

The tg-admin script that is bundled with turbogears is really helpful, but I had a hard time learning how to use it.

Before you read any more, you should know that this only works when you use SQLObject, not SQLAlchemy, for your ORM.

These are my notes on how I use tg-admin to upgrade an existing database.

  • I have a production database that uses prod.cfg;
  • I have a development database that uses dev.cfg;
  • Neither databases have a sqlobject_db_version table initially, because I never payed attention to it yet.

The development database has a bunch of new columns, tables, and indexes that I want to add to the production database. For this example, I’ll pretend that all I want to do is add an index to a table.

First, I made sure that the dev database matches sqlobject classes:

tg-admin -c dev.cfg sql status

If those are out of sync, then do whatever you need to do to make sure your actual dev database matches your classes. Of course, tg-admin sql status is not perfect. For example, it overlooks missing indexes and constraints, at least with postgres.

Next, I recorded the state of the development database:

tg-admin -c dev.cfg sql record --force-db-version=2008-03-21

This will make a new table in the dev database called sqlobject_db_version. I am forcing it to have a value of today’s date (March 21st, 2008).

Now I connect to the production database and set a version on it with yesterday’s date:

tg-admin -c prod.cfg sql record --force-db-version=2008-03-20

Now I run this to try to upgrade the production database to match the development database:

tg-admin -c prod.cfg sql upgrade

Of course, that should fail, and I see some error message sort of like this:

$ tg-admin -c prod.cfg sql upgrade
Using database URI postgres://staffknex:staffknex@localhost/staffknex320
No way to upgrade from 2008-03-20 to 2008-03-21
(you need a 2008-03-20/upgrade_postgres_2008-03-21.sql script)

This is an example of a helpful error message. I need to write a script that will explain how to upgrade from yesterday’s version to today’s version.

That script will be really simple:

BEGIN;
CREATE UNIQUE INDEX majestic12 ON ufo_theorists (first_name, last_name);
END;

I suggest using BEGIN and END so that in case something goes wrong in the middle, your transaction will be rolled back automatically.

Now I can run this:

tg-admin -c prod.cfg sql upgrade

And my production database will be upgraded with the new index.

Now for some complaints:

  • Why isn’t this advertised better? This is a really nice feature.
  • You’re supposed to be able to specify the URI on the command-line with the –connection option, but I could never get it to work.
  • I really wish that tg-admin sql status detected stuff like missing indexes and constraints. I use these things heavily.
  • It would be nice to be able to mix python into the upgrade script, rather than just SQL. For example, I recently dropped a column that had both an employee’s first and last name, and separated this into two columns. I used SQL to make the new columns, then I used python to read data out of the old single column and write it to the two new columns. Then I used SQL again to drop the old column.

Like I said at the beginning, this is a really helpful script and I’m very grateful to whoever wrote it.

Comments (6)

My ten most-frequently-used shell commands

Here they are:
$ history|awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}'|sort -rn|head
80 cd
59 svn
49 bzr
40 sudo
35 vi
32 nosetests
26 l
15 rfcomm
14 screen
14 c

l is an alias for ls and c is an alias for clear. rfcomm is how I connect to my mobile phone over a virtual serial port via bluetooth.

I’m happy that vi and nosetests are right next to each other. It looks like I’m pretty good about rerunning my test cases after editing.

I got the idea for this post from this guy.

Comments (4)

My article is finally online

Introduction to Python Decorators is available for you to read after you fill out the annoying registration form.

I have a few ideas for the next article. Do any of these seem interesting?

  1. Demystify metaclasses: use metaclasses to add camel-cased aliases for underscored method names, show how to automatically make methods into properties, and build a crude ORM.
  2. Explore logging with python, ranging from writing out to local files to setting up a syslog-ng server. Show how to use logging config files and filters.
  3. Build a prototype inheritance system into python. I got really interested in prototype inheritance when I studied lua. Prototypes make it really easy to change class-based behaviors at run time.

Finally, the meaning behind the pirates-vs-ninjas debate became clear to me during a recent nitrous-oxide haze (no, not how you think; I was getting my teeth cleaned at the dentist). Anyhow, pirates and ninjas are symbols.

The ninja is a metaphor for the corporate employee. A ninja will get the job done or die trying. A ninja will kill everyone in his own family if he’s ordered to. A ninja has no sense of entitlement or dignity or flex time.

Meanwhile, the pirate is the entrepeneur, or maybe the upper-level executive. He has no sense of duty or honor. He seeks adventure and glory only. He’ll jump ship as soon as possible. He might even maroon his crew-mates on a desert island if it means he gets the treasure to himself.

Pirates love to hire ninjas because a ninja never disobeys. Ninjas love to kill pirates because they can pretend they’re killing their own pirate boss.

Comments (5)

Flex Camp Cleveland was fun

Yesterday I attended Flex Camp Cleveland.

Slides from the presentation “Introduction to Object-Oriented Programming” are available here. Thanks to Kristopher Schultz for the link.

A few random notes and opinions mixed together:

  • Flex is a topic that attracted a lot of people! We had more than a hundred attendees, and at least drove from places like Columbus, Michigan, and Pittsburgh.
  • There’s a lot of technical talent in Ohio. I don’t believe that we have a technical shortage. Instead, we have a shortage of managers and entrepeneurs that know how to build technology-based businesses.
  • Flex addresses a lot of my personal frustrations with HTML. In particular, I’m thrilled to be able to add tags to the language itself.
  • Flex 3.0 is not just zero-cost, but truly open-source.
  • Search engine spiders may have difficulty indexing flex apps, but typically, you don’t want search engines linking into your app. You want them linking into your documentation and marketing text.
  • People say that there’s a difference between being knowledgeable about a subject and being able to teach that subject well. The same thing is true about public speaking. Speaking well is really hard to do. You can easily tell a professional from an amateur.
  • Air is now available on linux, in an alpha state. Hurray!
  • Flex offers server-initiated pushes out to clients. Flex uses a family of protocols to make this happen. Depending on network configuration, it uses anything from a true server-initiated push to client-side polling.
  • We all got flexcamp t-shirts, but they were all sized extra-large. Any reader that wants mine is welcome to it.
  • I’m a command-line snob and don’t like using GUIs to build user interfaces, but I was impressed by how powerful the designer view was in flex builder. I wasn’t impressed enough to pay for it though.

Comments (3)

Neat code complexity tool.

David Stanek wrote a nice utility to measure code complexity. This post explains the details. Anyway, I downloaded his code and tried it out. I really like it:

$ cat matt.py
"""A few functions for use with pygenie."""
def f(x):
    "Make an 8-way branch, 1 layer deep."
    if x == 1: return 1
    elif x == 2: return 2
    elif x == 3: return 3
    elif x == 4: return 4
    elif x == 5: return 5
    elif x == 6: return 6
    elif x == 7: return 7
    elif x == 8: return 8
 
def g(a, b, c):
    "This function has 8 paths."
    if a:
        if b:
            if c:
                return 1 # a and b and c.
            else:
                return 2 # a and b and not c.
        else:
            if c:        
                return 3 # a and not b and c
            else:
                return 4 # a and not b and not c.
    else:
        if b:
            if c:
                return 5 # not a and b and c.
            else:
                return 6 # not a and not b and c.
        else:
            if c:
                return 7 # not a and b and not c.
            else:
                return 8 # not a and not b and not c.
 
def h(x):
    if x: return True
    else: return False

And this is what happens when I run the code:

$ ./pygenie.py complexity matt.py
File: /home/matt/svn-checkouts/cyclic_complexity/matt.py
Type Name Complexity
------------------–
F    f    9          
F    g    8  

The functions f and g have a complexity exceeding 7, so they print out.

This might make a nice nose plugin.

Comments

Use bazaar and subversion together

Subversion is great and I have been using it for years. But I keep running into the same two problems over and over:

  • I did a bunch of work and I want to commit it, but I don’t have network access.
  • I spent a day playing around with a crazy idea, and I want to store my half-finished and buggy work. I don’t want to put it into the trunk because it will break a bunch of other people’s work. I don’t want to make a branch because I find branching and merging again to be tedious.

Both problems could be solved if I could make commits that were local to my hard drive and visible to me only.

Today I spent a few hours on IRC and I found out that bazaar plays really nicely with subversion, and now I have a solution.

  1. First, I loaded my subversion repository into bazaar. Here’s the command I ran:

    bzr branch svn+http://my-svn-repository.example.com/myproj

    My subversion repository has 990 revisions. This command took about 3 minutes to finish.

  2. Then I went into the local myproj directory and edited some code. I used bzr status and bzr diff to see what I changed while I worked. They operate nearly identically to their svn counterparts, but bzr diff has no --summarize option.
  3. Committing locally is really simple:

    bzr commit -m "This is a local commit and svn won't see it!"

  4. Then I did some more edits and committed those:

    bzr commit -m "OK, now maybe I have something that I want to put into subversion."

  5. Before committing to the subversion repository, I pulled down other people’s updates to the subversion repository like this:

    bzr pull svn+http://my-svn-repository.example.com/myproj

    This is a lot like running svn update.

  6. After making sure my stuff plays nice with other people’s updates, I pushed my stuff into subversion:

    bzr push svn+http://my-svn-repository.example.com/myproj

  7. Now when anyone runs svn update in their subversion working copy, they will get my work along with my commit comment.

So far, I’m really happy with this set up. The Bazaar-Subversion FAQ had a lot of really useful tips. Finally, there’s still a lot of stuff I haven’t tried yet, like renaming or copying files with svn.

Comments (2)

Python’s hash lookup is insanely good

As near as I can tell, looking up a string inside a set is effectively free.

I made a 9000-element set, where each element is foo_0, or foo_1, foo_2, … , on up to foo_8999. Then I measured the time cost of testing whether an element belongs to that set:

$ python -m timeit -s 's = set(["foo_%s" % i for i in range(9000) ])' '"foo_4500" in s'
1000000 loops, best of 3: 0.447 usec per loop

Next I measured a few scans across a list of the same size:

$ python -m timeit -s 's = ["foo_%s" % i for i in range(9000) ]' '"foo_0" in s'
1000000 loops, best of 3: 0.447 usec per loop
$ python -m timeit -s 's = ["foo_%s" % i for i in range(9000) ]' '"foo_1" in s'
1000000 loops, best of 3: 0.659 usec per loop
$ python -m timeit -s 's = ["foo_%s" % i for i in range(9000) ]' '"foo_900" in s'
10000 loops, best of 3: 130 usec per loop
$ python -m timeit -s 's = ["foo_%s" % i for i in range(9000) ]' '"foo_4500" in s'
1000 loops, best of 3: 631 usec per loop

It takes more time to do to two string comparisons than it does to hash the string and look it up in the set.

Comments (8)

A few rules I try to follow with TurboGears

These are a few of the rules I try to follow in my design. So far, they’ve helped me out.

I aim to finish all interaction with the database before I get to the template layer.

This is non-trivial because it is so easy to forget that a method or an attribute will evaluate into a query. I use this rule because it lets me be certain about the number of interactions each page will have with the database.

I avoid branching (if-else clause) in my templates as much as possible.

I have a really hard time detangling code when I find a bunch of nested if statements. For all but the most trivial instances, I prefer to have a bunch of similar templates and then choose the best one. For example, instead of handling both a successful login and a failed login in a single template, I’ll make two different files and then choose the right one in my controller.

In practice, I have some really similar templates. But then I go back and strip out as much of the common code as possible and put those into widgets.

Any time I find a select() call in my controller, I consider making a new method in my model.

When I write something like this in a controller:

bluebirds = model.Bird.select(Bird.q.color == 'blue')

I usually come back later and put in something like this into the Bird class:

class Bird(SQLObject):
    color = UnicodeCol()
 
    @classmethod
    def by_color(cls, color)
       return cls.select(cls.q.color == color)

Now I have something that I can reuse. If I’m feeling whimsical I’ll use functools.partial to do something like this:

class Bird(SQLObject):
    color = UnicodeCol()
 
    def by_color(self, color):
        return self.select(self.q.color == color)
 
    redbirds = classmethod(partial(by_color, color='red'))
    bluebirds = classmethod(partial(by_color, color='blue'))

Sidenote: I couldn’t figure out how to use the @classmethod decorator in the second version of by_color because partial complained. Appararently, callable(some_class_method) returns False, and partial requires the first argument to be a callable.

Maybe a reader can explain to me what’s going on there…

Comments (4)

A few half-formed thoughts on SQLObject

I love SQLObject, but this is a rant about the tiny frustrations I face with it.

First, this is a minor point. I don’t really care about database independence that much. Postgres has a lot of wonderful features: I never have to worry about choosing the table engine that will enforce foreign key constraints, I like creating indexes with function inside:

create unique index nodup_parent on category (org_id, parent_cat, lower(name));

and I really like how easy it is to write stored procedures. Anyway, since I know I’m going to use postgresql, I don’t want to be restricted to only the features that exist or can be emulated in every platform. I know all about sqlmeta and createSQL and use it plenty. But I don’t like how when I set a default value, sometimes it is set in the database table, and other times, it isn’t.

Anyway, in practice, the most dangerous part of using SQLObject is that it hypnotizes you into forgetting about the queries behind everything. Imagine you have employees, departments, and a join table between them. You can set this up in SQLObject like this:

class Employee(SQLobject):
    name = UnicodeCol(alternateID=True)
    departments = RelatedJoin('Department')
 
class Department(SQLObject):
    name = UnicodeCol(alternateID=True)
    employees = RelatedJoin('Employee')

You want to draw a grid that indicates whether each user is a member in every group, so you might dash off some code like this:

for emp in Employee.select():
    for d in Department.select():
        if d in emp.departments:
            print "yes!"
        else:
            print "no!"

In an ideal scenario, you can do this with three simple queries:

  • You need a list of employees
  • You need a list of departments
  • You need the list of employee-department of associations.

People that talk about how you can use outer joins to cram all that into one query will be dropped into a bottomless pit. Besides, I profiled it, and three separate queries is often much cheaper.

Anyway, back to the point. SQLObject will only run a single query to get the employees and a separate single query to get all the departments. So that’s good.

However, the place where all hell breaks loose is that if clause in the middle. If we have three employees and four departments, this statement

if d in emp.departments:

executes a dozen times. That’s unavoidable. The problem is that each time it executes, SQLObject runs a query like:

select department_id from department_employee where employee_id = (whatever);

Every time you say “is this particular department in this employee’s list of departments?” SQLObject grabs the full list of departments for that employee. So, if you ask about 10 different departments, you will run the exact same query ten times. Sure, the database is likely to cache the results of the query for you, but it is still very wasteful.

With just a few employees and a few departments, that’s not so bad. Eventually, though, as the number of employees and departments grow, the cost of that code grows at N2, which is just geek slang for sucky.

So, in conclusion, this may sound like a rant, but it really isnt. SQLObject is great. But it isn’t magic. It’s a great scaffolding system. But now I find that I’m rewriting a fair portion of code in order to reduce the database costs.

Aside: when I started paying attention to the queries generated by SQLObject, I found it really useful to edit postgresql.conf and enable log_min_duration_statement. Then every query and its cost will be logged for you. This is really useful stuff. It’s helped me to relax about doing a lot of things that I used to think were really bad.

Comments (4)

« Previous entries