Monthly Archives: December 2008

Ditz versus bugs everywhere

A few months ago, I sketched out a ticket-tracking system that would be married with my source code. Then some commenters told me about bugs everywhere (be) and ditz.

I’ve looked at both, but I’ve been using ditz full-time while just watching be. Anyhow, here’s a few comparisons:

setting up a project

Here’s what it looks like when you set up a project in ditz:


$ ditz init
I wasn't able to find a configuration file ./.ditz-config.
We'll set it up right now.
Your name (enter for Matthew Wilson): 
Your email address (enter for matt@sprout.tplus1.com): 
Directory to store issues state in (enter for .ditz): 
Use your text editor for multi-line input when possible (y/n)? y
Paginate output (always/never/auto)? auto
Project name (enter for scratch): 
Issues can be tracked across the project as a whole, or the project can be
split into components, and issues tracked separately for each component.
Track issues separately for different components? (y/n): y

Current components:
None!

(A)dd component, (r)emove component, or (d)one: a
Component name: documentation

... snip ...

(A)dd component, (r)emove component, or (d)one: d
Ok, .ditz directory created successfully.

And here’s how you can create a single issue.

$ ditz add
Title: Write something justifying yet another web framework
Is this a (b)ugfix, a (f)eature, or a (t)ask? t
Choose a component:
  1) scratch
  2) documentation
  3) model code
  4) controller code
  5) view code
Component (1--5): 2
Issue creator (enter for Matthew Wilson <matt@sprout.tplus1.com>): 
Added issue documentation-1 (e8a4a43f78ee83300cc0372a13375d9534b97abb).

You can’t tell, but when I punched in the title, ditz opened my $EDITOR and I wrote a longer description in there.

Now the same thing in be:


$ be set-root
Guessing id 'matt <matt@sprout>'
No revision control detected.
Directory initialized.

$ be new 'Write something justifying yet another web framework'
Guessing id 'matt <matt@sprout>'
Guessing id 'matt <matt@sprout>'
Created bug with ID 4d4

Not quite the same experience!

Here’s what a ditz issue looks like:


$ ditz show documentation-1
Issue documentation-1
---------------------
      Title: Write something justifying yet another web framework
Description: Why not just polish any of the ones already out there?
       Type: task
     Status: unstarted
    Creator: Matthew Wilson <matt@sprout.tplus1.com>
        Age: four minutes
    Release: 
 References: 
 Identifier: e8a4a43f78ee83300cc0372a13375d9534b97abb

Event log:
- created (matt, four minutes ago)

And in be:


$ be show 4d4
Guessing id 'matt <matt@sprout>'
          ID : 4d4e6a17-2097-42bb-a3cd-3c17566ecce8
  Short name : 4d4
    Severity : minor
      Status : open
    Assigned : 
      Target : 
     Creator : matt <matt@sprout>
     Created : Mon, 22 Dec 2008 20:25 (Tue, 23 Dec 2008 01:25:04 +0000)
Write something justifying yet another web framework

Ditz issues have titles, long descriptions, types (feature, bugfix, or task), releases (optionally) and links to components (also optionally). There are ditz plugins to add support for assigning issues to people.

be has most of the same concepts, just with different names.

data serialization and storage

ditz makes a .ditz directory at the top of a project and be makes a .be directory in the top of the project.

Inside the .ditz folder, there’s one project.yaml file that lists releases (groupings of issues) and components (also groupings of issues, but cross-cutting). Then each issue lives in its own yaml file, and they look like this:


$ cat .ditz/issue-ac3177b3bf8c6757625977ef27279c1fe05df662.yaml 
--- !ditz.rubyforge.org,2008-03-06/issue 
title: Write some "WHY?" documentation
desc: Justify the existence of this project.
type: :task
component: documentation
release: 
reporter: Matthew Wilson <matt@sprout.tplus1.com>
status: :unstarted
disposition: 
creation_time: 2008-12-23 00:59:05.840956 Z
references: []

id: ac3177b3bf8c6757625977ef27279c1fe05df662
log_events: 
- - 2008-12-23 00:59:05.841349 Z
  - Matthew Wilson <matt@sprout.tplus1.com>
  - created
  - ""
- - 2008-12-23 01:08:58.605955 Z
  - Matthew Wilson <matt@sprout.tplus1.com>
  - commented
  - |-
    Yeah, if you're gonna build another web framework, this needs to be
    really good.

Meanwhile, be is fairly similar, but bugs get whole directories to themselves. be uses what seems to be a home-made plain-text format for storing bugs:

$ cat .be/bugs/4da8ee85-9353-4a92-a654-8510bb8be0d0/values 

creator=matt <matt@sprout>

severity=minor

status=open

summary=Write some "WHY?" documentation

time=Tue, 23 Dec 2008 01:12:13 +0000

There’s actually much more whitespace than that. I replaced the eight blank lines between each line of text with just two blank lines.

While ditz stores the comments inside the issue’s yaml file, be makes a directory under the issue’s directory, and then stores the text of the comments in one file and the information about who said it in a separate file.

The community

The ditz mailing list is really active with people debating ideas for new features. The be mailing list is now showing some signs of life after looking dead in August.

What ditz has that be lacks

ditz can make really pretty HTML pages for all the issues for a project. example.

yaml was a really good choice. yaml makes it easy to deserialize to higher objects than just crappy boring primitive types like arrays. Instead, you can hop all the way to your own weird home made objects by specifying a tag. Then all the stuff in the yaml file gets passed in to your object.

Ditz has lots and lots of commands that are only on the be roadmap. You can search your issues with regular expressions with ditz grep, you can claim issues for yourself, you can group issues by releases and components, etc, etc, etc.

The ditz issue data model can be extended with plugins. Like I mentioned earlier, one plugin makes it possible for people to claim issues as assigned to them.

What I like about be

It’s written in python. I hate to feed the python snobbery monster, but there are certain python niceties that I don’t like doing without. In particular, ipython is just too awesome. When I read the ditz code, I spent most of my time navigating the code to get to the part that I cared about that. With ipython, I don’t have that problem. I just hit foo?? and immediately see the source code.

And ruby’s documentation is not what I’ve grown accustomed to with python. For comparison:

I think the Python docs have more explanatory text in just the table of contents.

In addition, ditz uses a lot homemade code: there’s a homemade option parser library (trollop), a homemade hack on the way ruby stores data files so that all the HTML templates are available, and all sorts of gymnastic FP tricks to get a lot of shit done in a very small number of lines. That’s cool, but as a yellow-belt in Ruby, it is really @#$ing hard to make any contributions to this project. Here’s some code that I find a little difficult to read:


def operation method, desc, *args_spec, &options_blk
    @operations ||= {}
    @operations[method] = { :desc => desc, :args_spec => args_spec,
                              :options_blk => options_blk }
end

operation :stop, "Stop work on an issue", :started_issue do
    opt :comment, "Specify a comment", :short => 'm', :type => String
    opt :no_comment, "Skip asking for a comment", :default => false
end

def stop project, config, opts, issue
    puts "Stopping work on issue #{issue.name}: #{issue.title}."
    issue.stop_work config.user, get_comment(opts)
    puts "Recorded work stop for #{issue.name}."
end

After tracing through a few hundred lines of stuff like that, I usually get discouraged and just write a feature request rather than a patch.

In summary

I like ditz. I like reading nearly inscrutable Ruby code to see how wacky people solve problems. My experience with ditz so far has been about an A minus, which is pretty good!

Why I’m going to write my own

There have been a few times where the lack of a proper database system has bit me. Like when I renamed a release, I had to do some searching and replacing in lots and lots of files. Also, regenerating my HTML views is taking almost two minutes now that I have so many issues. Also, certain operations, like moving a handful of issues from one release to another, or searching for intersections of issue subsets, are trickier than what they should be.

Besides all that, I’m fascinated by couchdb, and I think this would be a good use.

I think my system is going to use a local couchdb server that loads in all the issues from local yaml files into the server on startup. Then after lots of work updating, I’ll write out all the issues back into yaml. So, when you update your checkout of your code, you’ll need to restart or reload your couchdb server. Then you can use the couchdb server to work with the system, and then at the end, re-serialize the data back out to JSON, and then to yaml.

ditz and be are sort of like old-school CGI web apps where each user action has to start up some the framework, do the action, then tear down. My system will instead keep all the issue data in memory and require explicit startups and shutdowns.

A few simple PostgreSQL tricks

I’ve got a bunch of these things collecting dust in a text file in my home directory. Maybe they’ll help somebody else out.

Transpose display in psql with \x


    matt=# select now(), 1 + 1 as sum;
                  now              | sum 
    -------------------------------+-----
     2008-08-20 13:00:44.178076-04 |   2
    (1 row)

    matt=# x
    Expanded display is on.
    matt=# select now(), 1 + 1 as sum;
    -[ RECORD 1 ]----------------------
    now | 2008-08-20 13:00:46.930375-04
    sum | 2
    matt=# x
    Expanded display is off.
    matt=# select now(), 1 + 1 as sum;
                  now              | sum 
    -------------------------------+-----
     2008-08-20 13:01:19.725394-04 |   2
    (1 row)

See how long every query takes


matt=# timing
Timing is on.
matt=# select now();
             now
-----------------------------
 2008-12-18 12:31:50.60008-05
(1 row)

Time: 76.322 ms

By the way, you can put these into your $HOME/.psqlrc to always turn on timing at the beginning of a session::

    $ cat $HOME/.psqlrc

    -- Report time used for each query.
    timing

Define a function in python

First you have to install plpython. Then connect to the database as somebody with sufficient privileges and then type:


    CREATE LANGUAGE 'plpythonu'; 

to allow functions defined in plpython.

Here’s a toy example:


    matt=# create or replace function snoz (INT)
    returns INT AS
    $$
    x = args[0]
    return x * x * x
    $$ language 'plpythonu';
    CREATE FUNCTION

And here it is in use:


    matt=# select 3, snoz(3);
     ?column? | snoz 
    ----------+------
            3 |   27
    (1 row)

Use a trigger to set the modified date column

First define a function to set the modifieddate column:


    create or replace function ts_modifieddate()
    returns TRIGGER as
    '
    BEGIN
       NEW.modifieddate = now();
        RETURN NEW;
    END;
    ' language 'plpgsql';

Now set up the trigger to call that function::

    create trigger set_modifieddate before update
    on knid for each row
    execute procedure ts_modifieddate();

Use a trigger to set one column based on other columns

I got a table like this:


    create table snoz (
        a bool default false, 
        b bool default false, 
        c bool default false
    );

I want a trigger to set c to true when a and/or b is true.

First, I define a function that does what I want:


    create or replace function set_c()
    returns trigger as $$
    BEGIN
    if NEW.a = true or NEW.b = true then NEW.c = true; END IF;
    RETURN NEW;
    END;
    $$ language 'plpgsql';

Now I wire up that trigger to execute when I want:

    create trigger update_c 
    before insert or update 
    on snoz 
    for each row
    execute procedure set_c();

Here it is all in action:

    matt=# insert into snoz (a, b) values (false, false), (false, false);
    INSERT 0 2
    matt=# select * from snoz;
     a | b | c
    ---+---+---
     f | f | f
     f | f | f
    (2 rows)

    matt=# insert into snoz (a, b) values (false, true), (true, false);
    INSERT 0 2
    matt=# select * from snoz;
     a | b | c
    ---+---+---
     f | f | f
     f | f | f
     f | t | t
     t | f | t
    (4 rows)

See, it works!

code-formatting people: I need your help

This is the sort of post my wife says doesn’t count when I tell her I wrote a new blog entry.

Anyhow, I’m curious how people format their code. For the stuff below, how would you write it? Feel free to do whatever you want; make temporary variables, rearrange blocks, whatever. I’m looking for interesting new ways to make my code as easy to comprehend as possible.

Here’s an example of how I break assignments over multiple lines:


# By the way, self.categories is a dictionary, not a list, 
# because there's no guarantee that all the keys everywhere 
# will be contiguous.
 cat1, cat2, cat3, cat4, cat5 = 
[self.categories[x] for x in range(1, 6)]

This next excerpt shows a couple of my habits. I love the new conditional assignment possible in 2.5, I hate making lots of intermediate variables, and I really love using list comprehensions rather than for loops.


def my_send_methods(self, SendMethod, disabled=False):
    return [(x.id, x.display_name,
        {       
            'selected':1 if self.preferred_send_method == x else None,
            'disabled':1 if disabled else None,
        })                                             
        for x in SendMethod.constants.values()]

viewer discretion advised on this next block

This next block is the return statement from one of my TurboGears 1.0 controller methods.


return dict(htmlclass="advancedscheduling", v2org=v2org, name=name,

        employees=[(0, 'None')] + [(x.id, x.display_name, 
                {'selected':1 if x.id == employee_id else None}) 
            for x in v2org.employees],

        locations=[(x.id, x.display_name, {'selected':1 if x.id == location_id else None}) 
            for x in v2org.locations],

        statuses=[(x.id, x.display_name, {'selected':1 if x.id == status_id else None}) 
            for x in model.ShiftStatus.select()],

)

So, how to make that better? I want to preempt suggestions to do stuff like:


a = [...]
b = [...]
c = [...]
return dict(a=a, b=b, c=c)

That’s not better! For two reasons:

  1. You’re taking up way more vertical space

    Taken to its logical conclusion, you’ll take a statement like

    
    return dict(
        employees=[(0, 'None')] + [(x.id, x.display_name, 
                {'selected':1 if x.id == employee_id else None}) 
            for x in v2org.employees]
        )
    

    And then rewrite it as

    
    employees = [(0, 'None')]
    for emp in v2org.employees:
        if emp.id == employee_id:
            d = {'selected': 1}
        else:
            d = {'selected':None}
        t = (emp.id, emp.display_name, d)
        employees.append(t)
    return dict(employees=employees)
    

    I find this annoying, but not as annoying as the next point.

  2. It is no longer obvious that each value is calculated independently

    Repeating the example from above:

    
    a = [...]
    b = [...]
    c = [...]
    return dict(a=a, b=b, c=c)
    

    Until I read inside those list comprehensions, or even worse, follow the functions you used to define each one of those, I can’t be certain that each of those variables are determined independently of each other.

    In other words, in that version, at a casual glance, it is possible that b depends on a. But in the dictionary approach, it is obvious that b does not depend on a, since a doesn’t exist in any way that b can get access to it.

Now THIS is recycling

I subscribed to Countryside magazine a few months ago. It’s a magazine all about “off-the-grid” living; stuff like alternative energy, organic farming, making soap out of lard and lye, etc.

Anyhow, one letter from a reader blew my mind. He’s talking about everything he does when he finds an abandoned washing machine:

. . . When I come across one, I load it up on my truck and bring it home. . . . I get a storage container and start disassembling the machine. I start filling the container with the screws, washers, bolts, retaining clips, hose clamps, springs, etc. used in the machine as well as the wiring harness. Many times this hardware comes in handy when a need arises.

Also, I save the small clear plastic tubing (there are so many uses for this quarter inch or so tubing), the steel drive shaft, and the spent motor. The drive shaft and spent motor I usually take to the scrap yard once I accumulate enough to take down there. Also, the gear at the end of the drive shaft is encased in a gear box partially filled with oil (not much, perhaps a pint or so). I open the gear box and drain this oil into an oil can. I use this oil on metal parts when they need to be oiled. When I accumulate too much oil, I take it to Discount Auto for recycling.

I use the “skin” (top and all four sides) of the washer as a good source of sheet metal for auto body repairs and for replacing fatigued metal on things such as my lawn mower, chipper / shredder, and rusted out metal doors (to name a few). . . .

I use one washing machine tub for a small burn barrel (lots of air holes to facilitate a good hot fire), and other tubs are used as planting pots to grow various vegetables and herbs (good drainage and easy to move if relocation is necessary). . . .

I cut up and save the machine’s rubber tub liner for a source of gasket material when needed. . . .

I snipped out a lot more detail to make this easier to read. Go get the September/October 2008 issue for the unabridged version.

This guy is my hero. When civilization collapses, he will build a giant landwalker AT-AT from old Maytags and rule the earth.

Rewrite my ugly code

I have a big gnarly function named get_start_and_stop_dates. Please rewrite it into something that still passes all the doctests but isn’t so ugly.

If you save this in a file named matt_sucks.py, you can run all the doctests like this:


$ nosetests --with-doctest matt_sucks.py 
Doctest: matt_sucks.get_first_and_last_dom ... ok
Doctest: matt_sucks.get_start_and_stop_dates ... ok
Doctest: matt_sucks.stubborn_datetimeparser ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.028s

OK

Here’s the code:


import simplejson
from datetime import date, datetime, timedelta

def get_start_and_stop_dates(d, s):

    """
    Returns a tuple of datetime.date objects.

    First checks dictionary d, then looks in the cookie s, then returns
    the results of get_first_and_last_dom().

    We return values from the dictionary d, even if the values exist in
    simple_cookie s:

    >>> d = {'start_date':'12-07-2008', 'stop_date':'12-20-2008'}
    >>> import Cookie, simplejson
    >>> s = Cookie.SimpleCookie()
    >>> s['start_date'] = simplejson.dumps('12-08-2008')
    >>> s['stop_date'] = simplejson.dumps('12-11-2008')
    >>> a, b = get_start_and_stop_dates(d, s)
    >>> from datetime import date
    >>> isinstance(a, date) and isinstance(b, date)
    True
    >>> a.strftime('%m-%d-%Y'), b.strftime('%m-%d-%Y')
    ('12-07-2008', '12-20-2008')

    If the dictionary d doesn't have values, then we get them from the
    simple_cookie object s:

    >>> a, b = get_start_and_stop_dates({}, s)
    >>> from datetime import date
    >>> isinstance(a, date) and isinstance(b, date)
    True
    >>> a.strftime('%m-%d-%Y'), b.strftime('%m-%d-%Y')
    ('12-08-2008', '12-11-2008')

    We handle mix-and-match scenarios, like where one value is in d and
    another is in s:

    >>> s2 = Cookie.SimpleCookie()
    >>> s2['stop_date'] = simplejson.dumps('2-28-1975')
    >>> get_start_and_stop_dates({'start_date':'2-17-1975'}, s2)
    (datetime.date(1975, 2, 17), datetime.date(1975, 2, 28))

    When just one of the dates is specified, then the other will be
    the first/last day of the month containing the other date:

    >>> get_start_and_stop_dates({'start_date':'2-17-1975'},
    ...     Cookie.SimpleCookie())
    (datetime.date(1975, 2, 17), datetime.date(1975, 2, 28))

    >>> get_start_and_stop_dates({'stop_date':'2-17-1975'},
    ...     Cookie.SimpleCookie())
    (datetime.date(1975, 2, 1), datetime.date(1975, 2, 17))

    Finally, we call get_first_and_last_dom when all else fails:

    >>> get_first_and_last_dom() == get_start_and_stop_dates({}, 
    ...     Cookie.SimpleCookie())
    True
    """

    # I've revised this several times, but this is still pretty ugly.
    # It probably can be divided into more functions.

    # These are the last-resort values, holding the first and last days
    # of the current month.
    first, last = get_first_and_last_dom()

    # These are the dateformats that the dates will be in.
    dateformats = ['%m-%d-%Y', '%Y-%m-%d', '%Y-%m-%d %H:%M:%S']

    start_date = stop_date = None

    # Figure out the start_date first.
    if 'start_date' in d and d['start_date']:
        start_date = stubborn_datetimeparser(d['start_date'],
            dateformats).date()

    elif s.has_key('start_date') and s['start_date'].value:
        start_date = stubborn_datetimeparser(simplejson.loads(s['start_date'].value),
            dateformats).date()

    # Now repeat the process for stop_date.
    # TODO: pull this redundancy into a single function and call it
    # twice.
    if 'stop_date' in d and d['stop_date']:
        stop_date = stubborn_datetimeparser(d['stop_date'],
            dateformats).date()

    elif s.has_key('stop_date') and s['stop_date'].value:
        stop_date = stubborn_datetimeparser(simplejson.loads(s['stop_date'].value),
            dateformats).date()

    # Now figure out what to return.  Remember, if we found one date,
    # but not the other, then we return the first/last date of that month,
    # not the current month.

    if not start_date and not stop_date:
        return first, last

    elif start_date and stop_date:
        return start_date, stop_date

    elif start_date and not stop_date:
        a, b = get_first_and_last_dom(start_date)
        return start_date, b

    elif not start_date and stop_date:
        a, b = get_first_and_last_dom(stop_date)
        return a, stop_date

def get_first_and_last_dom(dt="today"):
    """
    Return a tuple of datetime.date objects with the first and last day of the
    month holding dt, which defaults to today.

    >>> first, last = get_first_and_last_dom(datetime(2008, 12, 6))
    >>> first == datetime(2008, 12, 1).date()
    True
    >>> last == datetime(2008, 12, 31).date()
    True

    >>> first, last = get_first_and_last_dom(datetime(2008, 11, 30))
    >>> first == datetime(2008, 11, 1).date()
    True
    >>> last == datetime(2008, 11, 30).date()
    True
    """

    if dt == "today":
        dt = datetime.now()

    # We have to be a little careful figuring out the last date.
    if dt.month < 12:
        last_day = (datetime(dt.year, dt.month+1, 1) - timedelta(days=1)).date()
    else:
        last_day = (datetime(dt.year+1, 1, 1) - timedelta(days=1)).date()

    first_day = datetime(dt.year, dt.month, 1).date()
    return first_day, last_day

def stubborn_datetimeparser(s, dateformats):
    """
    Keep trying to parse s into a datetime object until we succeed or
    run out of dateformats.
    
    When the first format works, we immediately return:

    >>> dateformats = ['%Y-%m-%d', '%m-%d-%Y', '%m-%d-%Y %H:%M']
    >>> stubborn_datetimeparser('12-1-2008', dateformats)
    datetime.datetime(2008, 12, 1, 0, 0)

    Otherwise, we keep trying until we parse it:

    >>> stubborn_datetimeparser('12-1-2008', dateformats)
    datetime.datetime(2008, 12, 1, 0, 0)

    >>> stubborn_datetimeparser('12-1-2008 15:47', dateformats)
    datetime.datetime(2008, 12, 1, 15, 47)

    or we run out of formats, and raise a ValueError:

    >>> stubborn_datetimeparser('12/1/2008', dateformats)
    Traceback (most recent call last):
        ...
    ValueError: I couldn't parse '12/1/2008' with any of my formats!
    """

    for datefmt in dateformats:
        try:
            return datetime.strptime(s, datefmt)

        except ValueError:
            pass

    # This else matches the for datefmt in dateformats loop.  It means
    # that we didn't break out of the loop early.
    else:
        raise ValueError("I couldn't parse '%s' with any of my formats!" % s)

Now get to work!

Define your validation schema inline

The TurboGears docs show how to assign validators for individual parameters in the validate decorator like this:


@validate(validators={'a':validators.Int(), 'b':validators.DateConverter()})
@error_handler()
def f(self, a, b, tg_errors=None):
    # Now a is already an integer and b is already a datetime.date object, 
    # unless there were some validation errors.

That’s great, but there are some validations that depend on numerous parameters at the same time. For example, you might want to make sure that an employee’s hire date precedes the termination date.

I already knew how to subclass validators.Schema to do this, and then pass that instance into the validate decorator like this:


class MattSchema(validators.Schema):
    a = validators.Int()
    b = validators.DateConverter()
    chained_validators = [blah] # pretend that blah does some compound validation. 

@validate(validators=MattSchema())
def f(self, a, b)

This approach is fine, but today I discovered that it is also possible to define a Schema inline, inside the validate decorator, and specify the chained_validators right there, like this:

@expose('.templates.shiftreports.overtime')
@validate(validators=validators.Schema(
        a=validators.Int(),
        dt=validators.DateConverter(),
        chained_validators=[blah]),
    state_factory=matt_state_factory)            
def f(self, a, b):

What’s the point? Well, it seems wasteful to define a class and hide it in another file if that schema is only going to be used for exactly one controller. Also, this makes it really fast for me to mix and match comound validators with controllers. I don’t need to pop open my separate validators file where all my elaborate schemas live. I can define them right here.

I’m very forgetful too, so I like to keep my code shallow so that I can instantly see what the heck something does. With all the validators right there, I can easily figure out what the system intends to do.

However, I would define a Schema subclass as soon as I see that I need the same thing twice.

I’m happy that the FormEncode authors had the foresight to support this inline approach along with the declarative style.