Grab all the rows when parameter is null, or just grab the rows that match

If I pass in NULL for xyz, I want to get all the rows.

If I set xyz to a value, I only want the rows in the table where the xyz column matches the value I pass in.

And I don’t want to do build up a string in my app code.

Here is how:


select *
from my_table
where
case
when %(xyz)s is null then true
when %(xyz)s = xyz then true
else false
end
;

And here’s a slight tweak if you want to pass in an array of allowed values:


select *
from my_table
where
case
when %(xyz)s is null then true
when xyz = any(%(xyz)s) then true
else false
end
;

I hope this helps! If you know a better way, let me know!

Meaningful primary keys beat sequences and UUIDs

Summary

Instead of serial primary keys or UUIDs for primary keys, when possible, you should make a PK that somehow describes the underlying data. This is not as hard as you might think!

In detail

Everything you read about databases will say that every table really ought to have a primary key, in other words, some value that is unique in every row. The clumsiest, crudest, simplest way to do that is to use something like a sequence (or a row number) for that primary key.

Here’s a table where I track schedules at golf courses:

create table loops -- a "loop" is what people call a game of golf
(
loop_number serial primary key,
club_short_name text not null references clubs (short_name),
tee_time date not null default today(),
golfer integer not null references golfers (golfer_id),
number_holes integer not null check (number_holes in (9, 18)),
unique (club_name, golfer, tee_time, number_holes) -- prevent duplicates!
);

Every time somebody inserts a row in that loops table, the loop_number column will automatically get a unique number. Now if we’re making some kind of web app, it is easy to make a link to a particular game (aka loop) by just making a URL with the loop_number in there, like this:

https://example.com/loop?loop_number=99

or this:

https://example.com/loop/99

Depending on how fancy you like your URLs.

This style was super-popular in the Ruby on Rails heyday.

Then maybe a few years after that, when distributed databases started catching on, it wasn’t as easy to get the next sequential value, because you couldn’t check all the nodes quickly. So people started using stuff like UUIDs like this:

row_id uuid not null default uuid_generate_v4() primary key

instead of

row_number serial primary key

And then URLs started looking more like this:

https://example.com/loop/5f7664e6-15c2-4d08-858e-3306f7a8ca07

Sidenote

I’ve heard a lot of people argue that replacing sequential primary keys with UUIDs is somehow more secure because it is very easy for some malicious person to change

https://example.com/loop/99

to something like

https://example.com/loop/100

and possibly spy on information not meant for them. Doing the same trick with UUIDs is not so easy; in other words, if you hand out a URL like

https://example.com/loop/5f7664e6-15c2-4d08-858e-3306f7a8ca07

to some customer, they probably won’t find any valid rows by just slightly incrementing that UUID (and that’s if they can figure out HOW to increment it).

I personally don’t think that switching from serial PK’s to UUID’s is always enough to block this attack, but it is absolutely a great first step! In practice, it seems pretty hard to guess another valid UUID, but it certainly is not impossible, and the RFC for UUIDs has this little nugget of advice:

Do not assume that UUIDs are hard to guess; they should not be used
as security capabilities (identifiers whose mere possession grants
access), for example. A predictable random number source will
exacerbate the situation.

Incidentally, this URL tweaking is a big source of data breaches! I’ve seen it in the wild numerous times. I found this post that describes this kind of attack in more detail.

Unfortunately, the web popular frameworks don’t offer much help for this issue. They all make it easy to check that a user is authenticated (they are who they say they are), and maybe they offer some kind of role-based permissions, but you’re pretty much on your own when building a multi-tenant system with data privacy.

In other words, if you want to block user X from spying on data that should only be seen by user Y, you need to check that yourself, in all the different places in your code where you pull back data.

All that said, my favorite reason to go with UUIDs is that you don’t reveal that you only have like three clients on your system when you’re out doing demos, and that’s pretty dang important when you’re fundraising!

Back to the main point

Based on the table above, any loop is a unique combination of a club, a golfer, a tee time, and a number of holes of golf. For example, one row might track that Thurston Howell III is playing 18 holes at snooty-snooterton country club on April 1st, 2018.

It would be great if we could have a primary key like this:

snt-2018-04-01-th3-18

The snt part identifies the club, the 2018-04-01 part identifies the tee time, the th3 identifies the golfer, and the 18 part identifies the number of holes.

This makes for vastly easier to understand data! And it isn’t hard to do this. Just add a trigger on your table that fires before insert and update that sets your primary key column:

create or replace function set_loop_pk ()
returns trigger
as
$$
begin

NEW.loop_pk = NEW.club_short_name
|| '-'
|| to_char(NEW.tee_time, 'yyyy-mm-dd')
|| '-'
|| NEW.golfer_initials
|| '-'
|| number_holes,
;

return NEW;

end;
$$ language plpgsql;

create trigger set_loop_pk
before insert or update
on loops
for each row
execute procedure set_loop_pk();

Of course I replaced columns like golfer with golfer_initials, but hopefully that didn’t trip you up.

Also, the code above assumes you’re using the postgresql database, but you can translate it into whatever other database environment you want.

If you’re some kind of crazy person, I suppose you could even build that PK in your ORM layer.

Why is this better? This is better because anyone that sees a meaningful PK can infer a lot about the inner data. This is one of those things that will save you tons of time in a crisis because you don’t need to write lots of joins to understand who the heck is user ‘ac29a573-35f2-4200-b10a-384999426ee6’ or which club has club_id 876.

Your end-users will be more confident in the system as well. Labels printed with a meaningful PK are self-evident. URLs hint about the contents.

Sure, there are times when you need obfuscation, but it easy to have a meaningful PK and then scramble it somehow, with a real cryptographic solution, rather than leaning on a UUID.

Or really, the best approach in my opinion is to keep your meaningful PK, but also tag on another parameter that combines that meaningful PK with a secret value and then hashes it. People call this approach an HMAC.

Last point: if your data is only unique because you’re using a sequence or because you’re using a UUID, well, you’re not really “doing databases right”. Using a meaningful PK means you’ve figured out enough about what you’re storing to know what makes a row unique.

Product review: todoist

I’ve been using Todoist for a few months. It’s not bad!

What I like

  1. Since there’s a mobile app and a web interface, it is really likely I’ll get stuff stored in there.
  2. Nearly no extra data is required to store a task. I just can just put “bananas” in a new task’s title and hit save. Again, this increases the likelihood I’m actually going to use the product. If a bunch of fields were required and didn’t have defaults, I might put off using the app.
  3. The alexa interface is fun. I can say “alexa add to my todo list …” and then add a line. I can also have alexa read back my to-do list.
  4. I like how projects and tasks can both be nested. I like how a task only belongs to one project, but can have many labels on it. And I like the priority feature.

What isn’t perfect

  1. There’s no obvious way to track the estimated size / difficulty / required work for a task. In other words, I can’t mark a task as “easy” or “really tricky” and then rank by that.
  2. Linking to tasks isn’t fun or easy. Links look like this:
    https://todoist.com/showTask?id=2179109422

    I found that link buried behind two mouse clicks. Meanwhile, github issues start at #1 in each project and increment up from there. That is so much nicer! I can easily tell somebody “hey look up task XYZ-432” but I can’t remember ten digits!

  3. A task has a title, but I want another field where I can add more description of the task. For example, some times, I want to add add links to screenshots, or blog posts with discussions, etc.
  4. Tasks need more statuses, like “in progress” and “will not do this”. Right now, as far as I can tell, a task is either not finished or finished or deleted. I need more statuses!
  5. There’s not an easy way to put tasks in order relative to each other. It is possible to set priority levels on tasks, but if three tasks are all at the same priority level, it isn’t easy to put them in a particular order.
  6. This is kind of complex, and expects a lot from a single application, but there a lot of times that I want to store stuff related to a project that aren’t to-do entries. For example, say I have a conversation with a client. We probably talked about a bunch of things:
    • near-term to-do items
    • stuff that would be nice, but not immediately planned
    • background information about the project

    The last point doesn’t fit that well into the todoist model!

  7. You can’t (as far as I can tell) upload attachments to tasks. Update! You can, but you have to add them as comments!
  8. There’s a developer API, but not an official CLI program. Instead, there’s a bunch of half-finished CLI programs on github.

The Pareto principle (why some bugs are OK to ignore)

There’s this thing called the Pareto Principle, which says:

roughly 80% of the effects come from 20% of the causes

You can quibble about the specific number values. Maybe 80 and 20 aren’t exactly right. But as long as you have customers that aren’t perfectly evenly distributed across bugs, you should consider that maybe some of your bugs aren’t worth fixing.

Here’s a contrived example: Imagine you got a product XYZ, and you got 100 users. They’re all mad because of five bugs (bug A through bug E).

  • 80 of your users are mad because of bug A. (80% of 100)
  • 16 other users are mad because of bug B (80% of the remaining 20)
  • 3 users hate bug C (80% of the remaining 4 users)
  • User #100 filed two bug reports: D and E. He won’t be happy until both are resolved.

If you add up 80 + 16 + 3, you’re at 99 users. In other words, if you fix 3 out of 5 bugs, 99% of your customer base would be satisfied.

However, making that last customer happy is probably not worth it! You can satisfy 99% of your market by doing 60% of the required work.

Stop offering janky fixes

When doctors show up to work, they take time to wash hands thoroughly even if there are queued-up patients in critical status.

Meanwhile, us programmers deal with production bugs in the most expedient way possible. And usually that involves some janky fix and a comment like this:

# TODO: this won't work forever

and then we’re on to the next crisis.

We have to get better about this. Its fun to play the hero, and say we can fix everything right away, but in the end, we are digging our own graves.

This post is fueled by me cleaning up a mess caused by too many janky fixes all imploding simultaneously.

Last point: don’t blame your bosses and their unreasonable demands. Don’t expect them to understand the PROs and CONs. Simply do not offer any solution that makes the problem worse. We are the experts!

Going back to the doctor example, I’m sure the desperate patient would love to rush the doctor, because sure, 9 out of 10 times, their hands are probably clean enough, and if an infection does start, well, that’s what antibiotics are for.

But part of the reason why doctors are so revered and so well compensated is because they insist on being treated a certain way.

Ask a doctor for a “good enough” solution, or maybe ask how much would it cost if they don’t do it “the absolutely perfect” way, or any of the other lines your middle managers and sales people hit you with when trying whittle down your estimate.

Doctors will just stare at you like you’re an idiot. That’s what we need to start doing.

Postgresql: convert a string to a date or NULL

We’re working with some user-submitted text that we need to convert into dates. Most of the data looks correct, but some of it looks glitchy:


expiration_date
8/16
5/17
damaged
6/16

See that line with “damaged” in there? That will cause to_date to throw an error:

select to_date('damaged', 'YY/MM');
ERROR: invalid value "da" for "YY"
DETAIL: Value must be an integer.

So I wrote this function:

create or replace function dt_or_null (s text, fmt text)

returns date
as
$$
begin

return to_date(s, fmt);

exception
when others then return null;

end;

$$ language plpgsql;

And this is how it works:

select 'damaged', dt_or_null('damaged', 'YY/MM');
+----------+------------+
| ?column? | dt_or_null |
+----------+------------+
| damaged | |
+----------+------------+
(1 row)

My advice to new programmers looking to start their career

Your resume is probably pretty good, but you need to show you can build stuff beyond school assignments. You don’t need a job to do that though! Here’s my advice:

  1. Prove that you can build and maintain something without being supervised. Build some kind of web project in your free time and host it online on AWS or rackspace or my favorite, Linode. That link has my referral code in it, by the way 🙂

    Start with something as easy as possible. Don’t worry though — you will discover a ton of difficulties as you work through it. Your project can be anything:

    • a really simple recipe database
    • the most popular mens socks on Amazon
    • weather forecast for nearby cities

    At the bottom of every screen in that project, add a link to your github profile and your linkedin page, and put your email in there and say something like “I’m looking for work!”

    Once you’re done, pick a new project. Maybe rewrite the same thing in a different language. The point here is to make real things that regular people can interact with.

    Silly projects are likely to get more attention. For example, the KJV Programming tumblr site is hugely popular and doesn’t really do anything useful for anyone.

  2. Get involved with some volunteer programming work. In Cleveland, there are several groups of programmers that volunteer their time. Look at Cleveland Givecamp, for example, or Open Cleveland.

    Where ever you are, I bet there’s a group like this already. If not, start one!

    Or, just find an organization like a church or a club or a business that you like and offer to work with them to do something like set up a better website, automate some financial reports, or even just help them manage their facebook / instagram / twitter accounts.

    You will learn how to work with non-technical people this way. That is an important skill!

  3. Start a blog.

    Write tutorials for little things you figure out while building your projects. Write tutorials for stuff that you are learning in school, like recursion, or operator overloading in C++, or why you hate or love one language vs another.

    Write about the nonprofits or clubs or small businesses you’re working with.

    Practice writing clearly and succinctly.

    Read William Strunk’s The Elements of Style at least three times. It’s nearly a hundred years old and still the best writing guide out there.

    Publish what you do on twitter and reddit and hacker news and other places so you get more attention. Don’t waste a minute arguing with the haters though. Nobody cares about them.

    Add google analytics to your blog and study what posts attract the most attention.

  4. Go to as many technical meetups as you can and introduce yourself to people and tell them you are looking for work. Talk about what you are working on. Ask them where they work and if they like it and if they know of openings.

    If you’re anywhere near Columbus, Ohio, show up at PyOhio on July 30th and 31st and introduce yourself to as many people as you can. Maybe even do a 5-minute lightning talk on one of your projects — the sillier the project is, the better.

  5. Cold-call recruiters at companies like Robert Half, Oxford, Randstad, etc and tell them you’re looking for work. Ask them what skills are the most sought after.

    Learn those skills, and build projects with them, and then write out about it.

The point with all this stuff is to make yourself a programming celebrity. You don’t want to go looking for jobs — you want jobs to come to you.

Good luck on your quest!

Consider that you are lucky to live at a time where a few of us have vastly more upward economic mobility than ever before. It just takes effort.

Are you an animal or a human?

What’s good and bad about github issues

Ticketing / workflow / bugtracker systems are always nasty. Github’s is pretty good. Maybe the best of what’s out there. But it ain’t perfect.

Here’s what I like:

  • It’s ready to go immediately once you start your github repo.
  • You can link a commit to an issue by mentioning the issue number in the commit.
  • Labels let you store a TON of metadata.

And what I dislike:

  • No obvious way to tell if somebody is actively working on an issue. More generally, no “status” field exists on an issue.
  • No obvious way to do a query like “label X or label Y”.
  • No command-line interface.
  • Since github doesn’t include a built-in mailing list, github issues often get used for support requests. Then when somebody explains “here’s how to do … “, the issue gets closed, and that helpful expensive-to-write documentation is hidden away. The solution here is for github to host a mailing list for every repository.