Archive for PostgreSQL


A few half-formed thoughts on SQLObject

I love SQLObject, but this is a rant about the tiny frustrations I face with it.

First, this is a minor point. I don’t really care about database independence that much. Postgres has a lot of wonderful features: I never have to worry about choosing the table engine that will enforce foreign key constraints, I like creating indexes with function inside:

create unique index nodup_parent on category (org_id, parent_cat, lower(name));

and I really like how easy it is to write stored procedures. Anyway, since I know I’m going to use postgresql, I don’t want to be restricted to only the features that exist or can be emulated in every platform. I know all about sqlmeta and createSQL and use it plenty. But I don’t like how when I set a default value, sometimes it is set in the database table, and other times, it isn’t.

Anyway, in practice, the most dangerous part of using SQLObject is that it hypnotizes you into forgetting about the queries behind everything. Imagine you have employees, departments, and a join table between them. You can set this up in SQLObject like this:

class Employee(SQLobject):
    name = UnicodeCol(alternateID=True)
    departments = RelatedJoin('Department')
 
class Department(SQLObject):
    name = UnicodeCol(alternateID=True)
    employees = RelatedJoin('Employee')

You want to draw a grid that indicates whether each user is a member in every group, so you might dash off some code like this:

for emp in Employee.select():
    for d in Department.select():
        if d in emp.departments:
            print "yes!"
        else:
            print "no!"

In an ideal scenario, you can do this with three simple queries:

  • You need a list of employees
  • You need a list of departments
  • You need the list of employee-department of associations.

People that talk about how you can use outer joins to cram all that into one query will be dropped into a bottomless pit. Besides, I profiled it, and three separate queries is often much cheaper.

Anyway, back to the point. SQLObject will only run a single query to get the employees and a separate single query to get all the departments. So that’s good.

However, the place where all hell breaks loose is that if clause in the middle. If we have three employees and four departments, this statement

if d in emp.departments:

executes a dozen times. That’s unavoidable. The problem is that each time it executes, SQLObject runs a query like:

select department_id from department_employee where employee_id = (whatever);

Every time you say “is this particular department in this employee’s list of departments?” SQLObject grabs the full list of departments for that employee. So, if you ask about 10 different departments, you will run the exact same query ten times. Sure, the database is likely to cache the results of the query for you, but it is still very wasteful.

With just a few employees and a few departments, that’s not so bad. Eventually, though, as the number of employees and departments grow, the cost of that code grows at N2, which is just geek slang for sucky.

So, in conclusion, this may sound like a rant, but it really isnt. SQLObject is great. But it isn’t magic. It’s a great scaffolding system. But now I find that I’m rewriting a fair portion of code in order to reduce the database costs.

Aside: when I started paying attention to the queries generated by SQLObject, I found it really useful to edit postgresql.conf and enable log_min_duration_statement. Then every query and its cost will be logged for you. This is really useful stuff. It’s helped me to relax about doing a lot of things that I used to think were really bad.

Comments (4)

How to use compound foreign keys with Postgresql

I am working on a multiple-choice quiz program. We have four tables related to questions and answers:

  • a table of questions
  • a table of answers
  • a table that links each question to multiple answers
  • a table that records a user’s actual answer to a question.

I needed a way to track answers to multiple-choice questions, and to limit possible answers to each question. For example, I want to make sure that if I ask the question “what is your favorite color?” then the answer is a color, and not something silly like “yes.” This is tricky, because “yes” is a valid answer for some questions, but not for others.

This table holds each question.


create table questions (
q_id serial primary key,
q_text text
);

Here are some questions. Since I used the serial data type, I’m not going to assign the q_id column. Postgresql will assign the q_id automatically.


insert into questions (q_text) values ('What is your favorite color?'); /* q_id 1 */
insert into questions (q_text) values ('Is it raining outside?');       /* q_id 2 */

After inserting, this is what the database looks like now:

select * from questions;
q_id |            q_text
------+------------------------------
1 | What is your favorite color?
2 | Is it raining outside?
(2 rows)

This table lists every possible answer, each with a unique id.

create table all_answers (
a_id serial primary key,
a_text text
);

These are some answers.

insert into all_answers (a_text) values ('red');      /* a_id 1 */
insert into all_answers (a_text) values ('yes');      /* a_id 2 */
insert into all_answers (a_text) values ('green');    /* a_id 3 */
insert into all_answers (a_text) values ('no');       /* a_id 4 */

Here’s what all_answers looks like after adding data:

select * from all_answers;
a_id | a_text
------+------–
1 | red
2 | yes
3 | green
4 | no
(4 rows)

This table links each question to meaningful possible answers. I am using question ID (q_id) and answer ID (a_id) numbers in order to save space.

create table possible_answers (
q_id integer references questions (q_id),
a_id integer references all_answers (a_id),
primary key (q_id, a_id)
);

Now, I’ll link certain questions with certain answers. This insert links ‘What is your favorite color?’ with ‘red’.

insert into possible_answers (q_id, a_id) values (1, 1);

This statement links ‘Is it raining outside?’ with ‘no’.

insert into possible_answers (q_id, a_id) values (1, 3);

And, just to continue the trend…

insert into possible_answers (q_id, a_id) values (2, 2);
insert into possible_answers (q_id, a_id) values (2, 4);

Now let’s see what we’ve got:

select * from possible_answers;
q_id | a_id
------+------
1 |    1
1 |    3
2 |    2
2 |    4
(4 rows)

Finally, this is the table that will record the actual answer given for a question and make sure that the answer is appropriate for the question.

create table real_answers (
q_id integer references questions (q_id),                         /* simple FK    */
a_id integer references all_answers (a_id),                       /* simple FK    */
foreign key (q_id, a_id) references possible_answers (q_id, a_id) /* compound FK  */
);

Now watch what happens when I try to insert an answer that doesn’t match the question into the database. I’m going to try to answer the question “What is your favorite color?” with the answer “yes”.

Hopefully the database will prevent me.

select q_text from questions where q_id = 1;
q_text
------------------------------
What is your favorite color?
(1 row)


select a_text from all_answers where a_id = 2;
a_text
------–
yes
(1 row)

insert into real_answers (q_id, a_id) values (1, 2);
ERROR:  $3 referential integrity violation - key referenced from real_answers
not found in possible_answers
Hurray! We are prevented from entering the data! Now, let’s try storing an acceptable answer.

insert into real_answers (q_id, a_id) values (1,1);
INSERT 17196 1

That means it worked!