A few simple PostgreSQL tricks

I’ve got a bunch of these things collecting dust in a text file in my home directory. Maybe they’ll help somebody else out.

Transpose display in psql with \x


matt=# select now(), 1 + 1 as sum;
now | sum
-------------------------------+-----
2008-08-20 13:00:44.178076-04 | 2
(1 row)

matt=# \x
Expanded display is on.
matt=# select now(), 1 + 1 as sum;
-[ RECORD 1 ]----------------------
now | 2008-08-20 13:00:46.930375-04
sum | 2
matt=# \x
Expanded display is off.
matt=# select now(), 1 + 1 as sum;
now | sum
-------------------------------+-----
2008-08-20 13:01:19.725394-04 | 2
(1 row)

See how long every query takes


matt=# \timing
Timing is on.
matt=# select now();
now
-----------------------------
2008-12-18 12:31:50.60008-05
(1 row)

Time: 76.322 ms

By the way, you can put these into your $HOME/.psqlrc to always turn on timing at the beginning of a session::

$ cat $HOME/.psqlrc

-- Report time used for each query.
\timing

Define a function in python

First you have to install plpython. Then connect to the database as somebody with sufficient privileges and then type:

CREATE LANGUAGE 'plpythonu';

to allow functions defined in plpython.

Here’s a toy example:

matt=# create or replace function snoz (INT)
returns INT AS
$$
x = args[0]
return x * x * x
$$ language 'plpythonu';
CREATE FUNCTION

And here it is in use:

matt=# select 3, snoz(3);
?column? | snoz
----------+------
3 | 27
(1 row)

Use a trigger to set the modified date column

First define a function to set the modifieddate column:

create or replace function ts_modifieddate()
returns TRIGGER as
'
BEGIN
NEW.modifieddate = now();
RETURN NEW;
END;
' language 'plpgsql';

Now set up the trigger to call that function::

create trigger set_modifieddate before update
on knid for each row
execute procedure ts_modifieddate();

Use a trigger to set one column based on other columns

I got a table like this:

create table snoz (
a bool default false,
b bool default false,
c bool default false
);

I want a trigger to set c to true when a and/or b is true.

First, I define a function that does what I want:

create or replace function set_c()
returns trigger as $$
BEGIN
if NEW.a = true or NEW.b = true then NEW.c = true; END IF;
RETURN NEW;
END;
$$ language 'plpgsql';

Now I wire up that trigger to execute when I want:

create trigger update_c
before insert or update
on snoz
for each row
execute procedure set_c();

Here it is all in action:

matt=# insert into snoz (a, b) values (false, false), (false, false);
INSERT 0 2
matt=# select * from snoz;
a | b | c
---+---+---
f | f | f
f | f | f
(2 rows)

matt=# insert into snoz (a, b) values (false, true), (true, false);
INSERT 0 2
matt=# select * from snoz;
a | b | c
---+---+---
f | f | f
f | f | f
f | t | t
t | f | t
(4 rows)

See, it works!

code-formatting people: I need your help

This is the sort of post my wife says doesn’t count when I tell her I wrote a new blog entry.

Anyhow, I’m curious how people format their code. For the stuff below, how would you write it? Feel free to do whatever you want; make temporary variables, rearrange blocks, whatever. I’m looking for interesting new ways to make my code as easy to comprehend as possible.

Here’s an example of how I break assignments over multiple lines:

# By the way, self.categories is a dictionary, not a list,
# because there's no guarantee that all the keys everywhere
# will be contiguous.
cat1, cat2, cat3, cat4, cat5 = \
[self.categories[x] for x in range(1, 6)]

This next excerpt shows a couple of my habits. I love the new conditional assignment possible in 2.5, I hate making lots of intermediate variables, and I really love using list comprehensions rather than for loops.

def my_send_methods(self, SendMethod, disabled=False):
return [(x.id, x.display_name,
{
'selected':1 if self.preferred_send_method == x else None,
'disabled':1 if disabled else None,
})
for x in SendMethod.constants.values()]

viewer discretion advised on this next block

This next block is the return statement from one of my TurboGears 1.0 controller methods.

return dict(htmlclass="advancedscheduling", v2org=v2org, name=name,

employees=[(0, 'None')] + [(x.id, x.display_name,
{'selected':1 if x.id == employee_id else None})
for x in v2org.employees],

locations=[(x.id, x.display_name, {'selected':1 if x.id == location_id else None})
for x in v2org.locations],

statuses=[(x.id, x.display_name, {'selected':1 if x.id == status_id else None})
for x in model.ShiftStatus.select()],

)

So, how to make that better? I want to preempt suggestions to do stuff like:

a = [...]
b = [...]
c = [...]
return dict(a=a, b=b, c=c)

That’s not better! For two reasons:

  1. You’re taking up way more vertical space

    Taken to its logical conclusion, you’ll take a statement like

    return dict(
    employees=[(0, 'None')] + [(x.id, x.display_name,
    {'selected':1 if x.id == employee_id else None})
    for x in v2org.employees]
    )

    And then rewrite it as

    employees = [(0, 'None')]
    for emp in v2org.employees:
    if emp.id == employee_id:
    d = {'selected': 1}
    else:
    d = {'selected':None}
    t = (emp.id, emp.display_name, d)
    employees.append(t)
    return dict(employees=employees)

    I find this annoying, but not as annoying as the next point.

  2. It is no longer obvious that each value is calculated independently

    Repeating the example from above:

    a = [...]
    b = [...]
    c = [...]
    return dict(a=a, b=b, c=c)

    Until I read inside those list comprehensions, or even worse, follow the functions you used to define each one of those, I can’t be certain that each of those variables are determined independently of each other.

    In other words, in that version, at a casual glance, it is possible that b depends on a. But in the dictionary approach, it is obvious that b does not depend on a, since a doesn’t exist in any way that b can get access to it.

Rewrite my ugly code

I have a big gnarly function named get_start_and_stop_dates. Please rewrite it into something that still passes all the doctests but isn’t so ugly.

If you save this in a file named matt_sucks.py, you can run all the doctests like this:

$ nosetests --with-doctest matt_sucks.py
Doctest: matt_sucks.get_first_and_last_dom ... ok
Doctest: matt_sucks.get_start_and_stop_dates ... ok
Doctest: matt_sucks.stubborn_datetimeparser ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.028s

OK

Here’s the code:

import simplejson
from datetime import date, datetime, timedelta

def get_start_and_stop_dates(d, s):

"""
Returns a tuple of datetime.date objects.

First checks dictionary d, then looks in the cookie s, then returns
the results of get_first_and_last_dom().

We return values from the dictionary d, even if the values exist in
simple_cookie s:

>>> d = {'start_date':'12-07-2008', 'stop_date':'12-20-2008'}
>>> import Cookie, simplejson
>>> s = Cookie.SimpleCookie()
>>> s['start_date'] = simplejson.dumps('12-08-2008')
>>> s['stop_date'] = simplejson.dumps('12-11-2008')
>>> a, b = get_start_and_stop_dates(d, s)
>>> from datetime import date
>>> isinstance(a, date) and isinstance(b, date)
True
>>> a.strftime('%m-%d-%Y'), b.strftime('%m-%d-%Y')
('12-07-2008', '12-20-2008')

If the dictionary d doesn't have values, then we get them from the
simple_cookie object s:

>>> a, b = get_start_and_stop_dates({}, s)
>>> from datetime import date
>>> isinstance(a, date) and isinstance(b, date)
True
>>> a.strftime('%m-%d-%Y'), b.strftime('%m-%d-%Y')
('12-08-2008', '12-11-2008')

We handle mix-and-match scenarios, like where one value is in d and
another is in s:

>>> s2 = Cookie.SimpleCookie()
>>> s2['stop_date'] = simplejson.dumps('2-28-1975')
>>> get_start_and_stop_dates({'start_date':'2-17-1975'}, s2)
(datetime.date(1975, 2, 17), datetime.date(1975, 2, 28))

When just one of the dates is specified, then the other will be
the first/last day of the month containing the other date:

>>> get_start_and_stop_dates({'start_date':'2-17-1975'},
... Cookie.SimpleCookie())
(datetime.date(1975, 2, 17), datetime.date(1975, 2, 28))

>>> get_start_and_stop_dates({'stop_date':'2-17-1975'},
... Cookie.SimpleCookie())
(datetime.date(1975, 2, 1), datetime.date(1975, 2, 17))

Finally, we call get_first_and_last_dom when all else fails:

>>> get_first_and_last_dom() == get_start_and_stop_dates({},
... Cookie.SimpleCookie())
True
"""

# I've revised this several times, but this is still pretty ugly.
# It probably can be divided into more functions.

# These are the last-resort values, holding the first and last days
# of the current month.
first, last = get_first_and_last_dom()

# These are the dateformats that the dates will be in.
dateformats = ['%m-%d-%Y', '%Y-%m-%d', '%Y-%m-%d %H:%M:%S']

start_date = stop_date = None

# Figure out the start_date first.
if 'start_date' in d and d['start_date']:
start_date = stubborn_datetimeparser(d['start_date'],
dateformats).date()

elif s.has_key('start_date') and s['start_date'].value:
start_date = stubborn_datetimeparser(simplejson.loads(s['start_date'].value),
dateformats).date()

# Now repeat the process for stop_date.
# TODO: pull this redundancy into a single function and call it
# twice.
if 'stop_date' in d and d['stop_date']:
stop_date = stubborn_datetimeparser(d['stop_date'],
dateformats).date()

elif s.has_key('stop_date') and s['stop_date'].value:
stop_date = stubborn_datetimeparser(simplejson.loads(s['stop_date'].value),
dateformats).date()

# Now figure out what to return. Remember, if we found one date,
# but not the other, then we return the first/last date of that month,
# not the current month.

if not start_date and not stop_date:
return first, last

elif start_date and stop_date:
return start_date, stop_date

elif start_date and not stop_date:
a, b = get_first_and_last_dom(start_date)
return start_date, b

elif not start_date and stop_date:
a, b = get_first_and_last_dom(stop_date)
return a, stop_date

def get_first_and_last_dom(dt="today"):
"""
Return a tuple of datetime.date objects with the first and last day of the
month holding dt, which defaults to today.

>>> first, last = get_first_and_last_dom(datetime(2008, 12, 6))
>>> first == datetime(2008, 12, 1).date()
True
>>> last == datetime(2008, 12, 31).date()
True

>>> first, last = get_first_and_last_dom(datetime(2008, 11, 30))
>>> first == datetime(2008, 11, 1).date()
True
>>> last == datetime(2008, 11, 30).date()
True
"""

if dt == "today":
dt = datetime.now()

# We have to be a little careful figuring out the last date.
if dt.month < 12: last_day = (datetime(dt.year, dt.month+1, 1) - timedelta(days=1)).date() else: last_day = (datetime(dt.year+1, 1, 1) - timedelta(days=1)).date() first_day = datetime(dt.year, dt.month, 1).date() return first_day, last_day def stubborn_datetimeparser(s, dateformats): """ Keep trying to parse s into a datetime object until we succeed or run out of dateformats. When the first format works, we immediately return: >>> dateformats = ['%Y-%m-%d', '%m-%d-%Y', '%m-%d-%Y %H:%M']
>>> stubborn_datetimeparser('12-1-2008', dateformats)
datetime.datetime(2008, 12, 1, 0, 0)

Otherwise, we keep trying until we parse it:

>>> stubborn_datetimeparser('12-1-2008', dateformats)
datetime.datetime(2008, 12, 1, 0, 0)

>>> stubborn_datetimeparser('12-1-2008 15:47', dateformats)
datetime.datetime(2008, 12, 1, 15, 47)

or we run out of formats, and raise a ValueError:

>>> stubborn_datetimeparser('12/1/2008', dateformats)
Traceback (most recent call last):
...
ValueError: I couldn't parse '12/1/2008' with any of my formats!
"""

for datefmt in dateformats:
try:
return datetime.strptime(s, datefmt)

except ValueError:
pass

# This else matches the for datefmt in dateformats loop. It means
# that we didn't break out of the loop early.
else:
raise ValueError("I couldn't parse '%s' with any of my formats!" % s)

Now get to work!

Define your validation schema inline

The TurboGears docs show how to assign validators for individual parameters in the validate decorator like this:

@validate(validators={'a':validators.Int(), 'b':validators.DateConverter()})
@error_handler()
def f(self, a, b, tg_errors=None):
# Now a is already an integer and b is already a datetime.date object,
# unless there were some validation errors.

That’s great, but there are some validations that depend on numerous parameters at the same time. For example, you might want to make sure that an employee’s hire date precedes the termination date.

I already knew how to subclass validators.Schema to do this, and then pass that instance into the validate decorator like this:

class MattSchema(validators.Schema):
a = validators.Int()
b = validators.DateConverter()
chained_validators = [blah] # pretend that blah does some compound validation.

@validate(validators=MattSchema())
def f(self, a, b)

This approach is fine, but today I discovered that it is also possible to define a Schema inline, inside the validate decorator, and specify the chained_validators right there, like this:

@expose('.templates.shiftreports.overtime')
@validate(validators=validators.Schema(
a=validators.Int(),
dt=validators.DateConverter(),
chained_validators=[blah]),
state_factory=matt_state_factory)
def f(self, a, b):

What’s the point? Well, it seems wasteful to define a class and hide it in another file if that schema is only going to be used for exactly one controller. Also, this makes it really fast for me to mix and match comound validators with controllers. I don’t need to pop open my separate validators file where all my elaborate schemas live. I can define them right here.

I’m very forgetful too, so I like to keep my code shallow so that I can instantly see what the heck something does. With all the validators right there, I can easily figure out what the system intends to do.

However, I would define a Schema subclass as soon as I see that I need the same thing twice.

I’m happy that the FormEncode authors had the foresight to support this inline approach along with the declarative style.

Using state with FormEncode and TG’s validate decorator

I believe I figured out a way to reduce a few redundant lines from my controller methods. I’m looking for opinions about whether this is a wise idea.

At the top of nearly every method in my controllers, I look up the current user and the hospital this user belongs to, sort of like this:

@expose('.templates.m1')
def m1(self):
u = identity.current.user
hospital = u.hospital

Anyhow, I realized I can offload this irritating work to a validator that uses a state factory. Now my method looks like this:

@expose('.templates.m1')
@validate(validators=LookupSchema(), state_factory=my_state_factory)
def m1(self, u=None, hospital=None):

So now all my methods get called with those values already set up. I have to make u and hospital keyword parameters, because otherwise TG will try to pull their values out of the URL.

Here’s how it works. First I make my_state_factory that builds an object that has those values:

def my_state_factory():

class StateBlob(object):
pass

sb = StateBlob()
sb.u = identity.current.user
sb.hospital = u.hospital

return sb

Now the LookupSchema extracts those values out of the state blob object and adds them to the dictionary of values:

from formencode.schema import Schema, SimpleFormValidator
@SimpleFormValidator
def f(value_dict, state, validator):
value_dict['u'] = state.u
value_dict['hospital'] = state.hospital

class LookupSchema(Schema):
allow_extra_fields = True # otherwise, it fusses about self ?!?!?
chained_validators = [f]

So the benefit of all this is that some repetitive code is now just defined in a single place. Also, I’m getting more comfortable with the internals of FormEncode and the TG validate decorator.

Pretty soon, my controllers will be some really skinny methods. All the calculations of new variables based on the original parameters will happen outside the controller. The controller will just handle picking the right template.

Use FormEncode to verify one date precedes another

I have a form on my site that lets people choose a start date and a stop date. Then I show statistics for that date range. I wrote a FormEncode schema to verify that the start date is before the stop date.

The documentation on schema validators is fairly sparse, so I’m publishing this because it might help somebody else out.

Code


# This is in a file named formencodefun.py
from formencode import Schema
from formencode.validators import DateConverter, FancyValidator, Invalid

class DateCompare(FancyValidator):
messages = dict(invalid="Start date must be before stop date")

def validate_python(self, field_dict, state):

start_date = field_dict['start_date']
stop_date = field_dict['stop_date']

if start_date > stop_date:
msg = self.message('invalid', state)

raise Invalid(msg, field_dict, state,
error_dict=dict(stop_date=msg))

class MySchema(Schema):
start_date = DateConverter()
stop_date = DateConverter()

chained_validators = [DateCompare()]

Usage


>>> from formencodefun import MySchema
>>> s = MySchema()
>>> d1 = {'start_date':'11-02-2008', 'stop_date':'11-15-2008'}
>>> d2 = {'start_date':'11-15-2008', 'stop_date':'11-02-2008'}
>>> s.to_python(d1)
{'stop_date': datetime.date(2008, 11, 15), 'start_date': datetime.date(2008, 11, 2)}
>>> s.to_python(d2)
------------------------------------------------------------
Traceback (most recent call last):
File "", line 1, in
File "/home/matt/virtualenvs/scratch/lib/python2.5/site-packages/FormEncode-1.1-py2.5.egg/formencode/api.py", line 400, in to_python
value = tp(value, state)
File "/home/matt/virtualenvs/scratch/lib/python2.5/site-packages/FormEncode-1.1-py2.5.egg/formencode/schema.py", line 200, in _to_python
new = validator.to_python(new, state)
File "/home/matt/virtualenvs/scratch/lib/python2.5/site-packages/FormEncode-1.1-py2.5.egg/formencode/api.py", line 403, in to_python
vp(value, state)
File "formencodefun.py", line 18, in validate_python
error_dict=dict(stop_date=msg))
Invalid: Start date must be before stop date

How it works

Notice when I run s.to_python(d1), I get a dictionary back with the the values for start_date and stop_date replaced with datetime.date objects.

Then when I run my schema on d2, where the start_date is after the stop_date, my schema raises an Invalid exception. In a web framework like TurboGears, there is some exception handler that will catch that exception and take that error dictionary and redraw the form and print my error message.

Notice that the DateConverters first take my strings and turn them into datetime.date objects before the test in DateCompare. FormEncode runs the chained validators after it runs the individual validators.

In this case, I just want to make sure that the start date precedes the stop date. I have written other validators that add extra keys into the field dict or change the values, but I want to keep this example simple.

If the first DateConverters fail, then the chained validators never run:

>>> s.to_python({'start_date':'UNPARSEABLE', 'stop_date':'11-20-2008'})
------------------------------------------------------------
Traceback (most recent call last):
File "", line 1, in
File "/home/matt/virtualenvs/scratch/lib/python2.5/site-packages/FormEncode-1.1-py2.5.egg/formencode/api.py", line 400, in to_python
value = tp(value, state)
File "/home/matt/virtualenvs/scratch/lib/python2.5/site-packages/FormEncode-1.1-py2.5.egg/formencode/schema.py", line 197, in _to_python
error_dict=errors)
Invalid: start_date: Please enter the date in the form mm/dd/yyyy

When my DateCompare validator is run, I can be confident that the objects with the keys start_date and stop_date in the field_dict have already been converted to datetime.date objects.

In summary, FormEncode is awesome, but I have spent a lot of time beating my head against the wall trying to learn how to use it.

My metaclass article published

The November issue of Python Magazine has my article on metaclasses. I regain the rights to my text after 3 months, so I’ll be posting sections of that article here as blog posts over the next year.

In the meantime, I’d love to hear critical feedback so that I can improve the material.

Or if you want to write a message telling me how awesome I am, that’s cool too, but I would prefer you give me money. In fact, that applies to people that want to call me an idiot. You also should just give me money.

Mobile app idea

Here’s the problem: when you meet somebody, you can exchange business cards, and maybe that business card has your facebook or linkedin url.

Then later, that person can send you an invite or a friend request based on how you gave them your URL.

It would be nicer to more quickly close this gap, so that at the moment you meet somebody, you can instantly “friend” each other, sort of like this:

  1. I’m talking to somebody that asks me for my card. I pull out my mobile phone and so do they. I tell them my phone number, and they text a message to some other third party service with the message payload being my phone number.
  2. Then the other person tells me their phone number and I send a text message to that same third party service with the other guy’s phone number.
  3. The third-party system links the accounts associated with the two phone numbers as “friends”.
  4. Later when I check my account on the third-party service, I see all the contact info for the person I met.

This ain’t rocket science. Just a faster way to share contact information. Does this already exist?

Instead of using SMS, the system could also use HTTP posts, but not every phone has internet access. In addition to just using my phone number, maybe me and the other person could both agree to send in a short password, which must match before the system links the two accounts.

Got back from PyWorks

PyWorks was a lot of fun.

Mark Ramm did a talk about WSGI where in one slide he showed how to build up something akin to the full Pylons stack by just applying lots of WSGI middlewares. That really inspired me. Sure, frameworks are great, but I think I’m going to try to going to the opposite extreme for a while — building up from components, rather than using somebody else’s aggregation of components.

Based on a few hallway conversations, It turns out I’m not the only one using version control as a way to deploy code in production. My production box runs a git clone of my production branch.

jonEbird did a talk on using LDAP with Python. It was the first time I had realized that LDAP can do more than user authentication.

I liked Mike Naberezny’s talk on routes also. Good introduction. It reaffirmed my idea that it’s not going to be all that difficult to string a bunch of stuff together for my next web app.

Chris Perkins showed some neat TDD / agile programming techniques possible and made SQLAlchemy seem much friendlier than when I looked at it last year.