Meaningful primary keys beat sequences and UUIDs

Posted on March 18, 2018 by matt

Summary

Instead of serial primary keys or UUIDs for primary keys, when possible, you should make a PK that somehow describes the underlying data. This is not as hard as you might think!

In detail

Everything you read about databases will say that every table really ought to have a primary key, in other words, some value that is unique in every row. The clumsiest, crudest, simplest way to do that is to use something like a sequence (or a row number) for that primary key.

Here’s a table where I track schedules at golf courses:
create table loops -- a "loop" is what people call a game of golf ( loop_number serial primary key, club_short_name text not null references clubs (short_name), tee_time date not null default today(), golfer integer not null references golfers (golfer_id), number_holes integer not null check (number_holes in (9, 18)), unique (club_name, golfer, tee_time, number_holes) -- prevent duplicates! );

Every time somebody inserts a row in that loops table, the loop_number column will automatically get a unique number. Now if we’re making some kind of web app, it is easy to make a link to a particular game (aka loop) by just making a URL with the loop_number in there, like this:
https://example.com/loop?loop_number=99
or this:
https://example.com/loop/99

Depending on how fancy you like your URLs.

This style was super-popular in the Ruby on Rails heyday.

Then maybe a few years after that, when distributed databases started catching on, it wasn’t as easy to get the next sequential value, because you couldn’t check all the nodes quickly. So people started using stuff like UUIDs like this:
row_id uuid not null default uuid_generate_v4() primary key
instead of
row_number serial primary key

And then URLs started looking more like this:
https://example.com/loop/5f7664e6-15c2-4d08-858e-3306f7a8ca07

Sidenote

I’ve heard a lot of people argue that replacing sequential primary keys with UUIDs is somehow more secure because it is very easy for some malicious person to change
https://example.com/loop/99
to something like
https://example.com/loop/100
and possibly spy on information not meant for them. Doing the same trick with UUIDs is not so easy; in other words, if you hand out a URL like

https://example.com/loop/5f7664e6-15c2-4d08-858e-3306f7a8ca07

to some customer, they probably won’t find any valid rows by just slightly incrementing that UUID (and that’s if they can figure out HOW to increment it).

I personally don’t think that switching from serial PK’s to UUID’s is always enough to block this attack, but it is absolutely a great first step! In practice, it seems pretty hard to guess another valid UUID, but it certainly is not impossible, and the RFC for UUIDs has this little nugget of advice:

Do not assume that UUIDs are hard to guess; they should not be used
as security capabilities (identifiers whose mere possession grants
access), for example. A predictable random number source will
exacerbate the situation.

Incidentally, this URL tweaking is a big source of data breaches! I’ve seen it in the wild numerous times. I found this post that describes this kind of attack in more detail.

Unfortunately, the web popular frameworks don’t offer much help for this issue. They all make it easy to check that a user is authenticated (they are who they say they are), and maybe they offer some kind of role-based permissions, but you’re pretty much on your own when building a multi-tenant system with data privacy.

In other words, if you want to block user X from spying on data that should only be seen by user Y, you need to check that yourself, in all the different places in your code where you pull back data.

All that said, my favorite reason to go with UUIDs is that you don’t reveal that you only have like three clients on your system when you’re out doing demos, and that’s pretty dang important when you’re fundraising!

Back to the main point

Based on the table above, any loop is a unique combination of a club, a golfer, a tee time, and a number of holes of golf. For example, one row might track that Thurston Howell III is playing 18 holes at snooty-snooterton country club on April 1st, 2018.

It would be great if we could have a primary key like this:
snt-2018-04-01-th3-18

The snt part identifies the club, the 2018-04-01 part identifies the tee time, the th3 identifies the golfer, and the 18 part identifies the number of holes.

This makes for vastly easier to understand data! And it isn’t hard to do this. Just add a trigger on your table that fires before insert and update that sets your primary key column:
create or replace function set_loop_pk () returns trigger as $$ begin


    NEW.loop_pk = NEW.club_short_name

        || '-'

        || to_char(NEW.tee_time, 'yyyy-mm-dd')

        || '-'

        || NEW.golfer_initials

        || '-'

        || number_holes,

    ;
    return NEW;
end;

$$ language plpgsql;

create trigger set_loop_pk before insert or update on loops for each row execute procedure set_loop_pk();

Of course I replaced columns like golfer with golfer_initials, but hopefully that didn’t trip you up.

Also, the code above assumes you’re using the postgresql database, but you can translate it into whatever other database environment you want.

If you’re some kind of crazy person, I suppose you could even build that PK in your ORM layer.

Why is this better? This is better because anyone that sees a meaningful PK can infer a lot about the inner data. This is one of those things that will save you tons of time in a crisis because you don’t need to write lots of joins to understand who the heck is user ‘ac29a573-35f2-4200-b10a-384999426ee6’ or which club has club_id 876.

Your end-users will be more confident in the system as well. Labels printed with a meaningful PK are self-evident. URLs hint about the contents.

Sure, there are times when you need obfuscation, but it easy to have a meaningful PK and then scramble it somehow, with a real cryptographic solution, rather than leaning on a UUID.

Or really, the best approach in my opinion is to keep your meaningful PK, but also tag on another parameter that combines that meaningful PK with a secret value and then hashes it. People call this approach an HMAC.

Last point: if your data is only unique because you’re using a sequence or because you’re using a UUID, well, you’re not really “doing databases right”. Using a meaningful PK means you’ve figured out enough about what you’re storing to know what makes a row unique.

I see a path to 1984-style thought crimes from where we are now

Posted on March 18, 2018 by matt

A little while ago I got admonished online.

I posted a remark that I was shocked by the sex assault allegations against Charlie Rose.

I wrote, “who’s next — Lurch from Addams Family?”

A friend wrote back I was trivializing sex assault. Then she went on to say something about how I was part of the problem. I thought about it for a while and I just didn’t agree with it.

So I decided not to spend time debating the issue and I deleted the comment.

Then an hour later my friend posted a screenshot showing how I deleted the comment and said I was gaslighting her.

And these were not short posts she was making. These were full-on rants. From somebody I’ve known for 20 years.

Again I decided there was no way we were going to have a good conversation about this, so I deleted the post and blocked this person. I’m not going to host somebody insulting me on my story.

After that, I heard through someone else that she had posted a rant about my behavior calling me a bully.

It seems like at this point in the game, just expressing a regressive point of view is apparently the same as committing acts of violence.

If you see me in the labor camps, please say hi!

I learned a little about the origin of the states secret privilege today

Posted on September 24, 2017 by matt

Just finished this book about the CIA, written a by a career employee.

The most interesting part to me was at the end of the book, when he explained the origin of the “state secrets” privilege that gets used so often these days.

It started when a bomber crashed in 1948 and killed a few civilian contractors on board, and their families sued the government.

The government said they couldn’t release details about the plane or the flight without revealing classified information.

Lower courts rejected this and ordered the government to cough up the data. The Supreme Court ruled in favor of the government and created the idea of the state secret privilege.

Plot twist: 50 years later, the daughter of one of the contractors used a FOIA request to get the data.

The reports had no secure information. The plane crashed because of garden-variety incompetence.

The state secret privilege was BORN out of a desire to cover up embarrassing information.

Product review: todoist

Posted on May 1, 2017 by matt

I’ve been using Todoist for a few months. It’s not bad!

What I like

Since there’s a mobile app and a web interface, it is really likely I’ll get stuff stored in there.
Nearly no extra data is required to store a task. I just can just put “bananas” in a new task’s title and hit save. Again, this increases the likelihood I’m actually going to use the product. If a bunch of fields were required and didn’t have defaults, I might put off using the app.
The alexa interface is fun. I can say “alexa add to my todo list …” and then add a line. I can also have alexa read back my to-do list.
I like how projects and tasks can both be nested. I like how a task only belongs to one project, but can have many labels on it. And I like the priority feature.

What isn’t perfect

There’s no obvious way to track the estimated size / difficulty / required work for a task. In other words, I can’t mark a task as “easy” or “really tricky” and then rank by that.
Linking to tasks isn’t fun or easy. Links look like this:
```
https://todoist.com/showTask?id=2179109422
```
I found that link buried behind two mouse clicks. Meanwhile, github issues start at #1 in each project and increment up from there. That is so much nicer! I can easily tell somebody “hey look up task XYZ-432” but I can’t remember ten digits!
A task has a title, but I want another field where I can add more description of the task. For example, some times, I want to add add links to screenshots, or blog posts with discussions, etc.
Tasks need more statuses, like “in progress” and “will not do this”. Right now, as far as I can tell, a task is either not finished or finished or deleted. I need more statuses!
There’s not an easy way to put tasks in order relative to each other. It is possible to set priority levels on tasks, but if three tasks are all at the same priority level, it isn’t easy to put them in a particular order.
This is kind of complex, and expects a lot from a single application, but there a lot of times that I want to store stuff related to a project that aren’t to-do entries. For example, say I have a conversation with a client. We probably talked about a bunch of things:
- near-term to-do items
- stuff that would be nice, but not immediately planned
- background information about the project
The last point doesn’t fit that well into the todoist model!
You can’t (as far as I can tell) upload attachments to tasks. Update! You can, but you have to add them as comments!
There’s a developer API, but not an official CLI program. Instead, there’s a bunch of half-finished CLI programs on github.

Book report: Winter’s Gambit by Dana McSwain

Posted on April 29, 2017 by matt

This is the book that I’m reviewing.

That link has my referral code in it, so I’ll get a few pennies if you buy it after clicking that link.
And you really should buy it, because Dana McSwain is brilliant.

Summary

I’ve read a bunch of Dana McSwain’s stuff over the last few years. Generally speaking, I go through her stuff in the same way as I go through a big tube of Sour Cream and Onion Pringles. There’s no stopping until it’s all gone and then I feel bad. But in this case, it’s because I envy how she writes such clever twisty plotlines and beautiful characters, rather than because of carbohydrate poisoning.

You know how hard it is to find an album where every song is really good? Even your favorite bands rarely crank out albums like that. This books is like those cherished albums.

The Alex and Frank Mythos

This is the fourth book Dana McSwain has published about characters named Alex and Frank and all four books vaguely exist in the same cosmos (kinda / sorta, anyway). Alex, Frank, Alexei, the pizza story, and many other things, places, and people are recurring icons in her books.

Incidentally, the books don’t build on each other and you can read them in any order.

Nothing about her style or subject matter reminds me at all of H. P. Lovecraft, but he also wrote a whole bunch of stories about different people running around in the same setting.

You won’t find any purple prose or eldritch horror here; instead, you’ll find music preference mockery and truck stop food poisoning and skeeball anecdotes. That’s not my point with this comparison. My point is that for both writers, us readers build up in our own heads a composite sketch of this universe from all these partial hints of unseen actors.

Trope synthesis

Don’t get me wrong — they’re characters beyond this overly glib description, but Frank drips with 1980s action movie imagery. And of course, he is tortured by his brutal past doing shady stuff for the government.

Meanwhile Alex projects a caustic exterior protecting an sweetness underneath, which reminded me so much of female roles in 90s movies like Reality Bites and Singles and Before Sunrise. Same thing with how she dresses.

The title

A gambit in chess is when you make a risky move that could put you in a much better position or a much worse position depending on how it plays out.

A gambit strategy is the opposite of building up a fortress and staying inside it. It requires optimism and vulnerability.

The text is breezy and you might fly right past all these poignant struggles between hope and fear, but this is so much more than just an exciting adventure story.

So is this book kind of like an exploration of what would happen if Die Hard and Winona Rider from Heathers took a cross-country road trip to a meeting with a Hollywood movie producer while pursued by Russian mobsters?

Yes. And it works. You’ll probably read it one setting.

Notes from Fight To Win 32 in Cleveland

Posted on April 23, 2017 by matt

I noticed a few themes at last night’s Fight To Win 32:

Takedowns are hard unless you’re Darren Branch. I am pretty sure he landed more takedowns in his match (which was like the 20th match) than in all previous fights summed up.
A few fighters hopped back up to their feet once the fight went to the ground, and it usually led to good things. BJJ exposes how other martial arts don’t train what to do when the fight hits the floor. Last night exposes how few fighters can keep the fight on the floor, except when both sides agree to play along.
A good part of the crowd (the casual fans) stop watching when the match begins with one fighter pulling half guard. If the goal is selling tickets, I can understand discouraging this stuff.
Fundamental whitebelt stuff like guard recovery / retention, mount escapes, and surviving in bad positions define the game at all skill levels. This sport really is a lot like chess. You learn the basics quickly and then spend a lifetime mastering them.
Classic submisssions like triangles, cross chokes, and kimura attacks were just as effective and popular as heel hooks at finishing fights.
The crowd was great! I brought my 12-year-old kid there and he had a wonderful time. I never saw any drunk meathead behavior. In fact, the crowd was absolutely quiet during a lot of matches.

Book Report: The Hard Thing About Hard Things by Ben Horowitz

Posted on March 26, 2017 by matt

Executive summary: Read this if you enjoy self-aggrandizing stories about how to be successful when you start with $20 million in investment.

Buy it through this link and I’ll get a few pennies!

What I liked

There’s a story about how Horowitz was trying to sell his business and one buyer backed out. The second interested buyer heard the news about the first buyer backing out, and then revised their offer downward.

Horowitz didn’t accept the lower offer. He said the buyer had to stick with their initial offer. There was a standoff, but eventually the buyer went with the original price.

Also, there’s some good advice and analysis on what makes software sales effective.

What I disliked

This is a much bigger list.

First off

Ben Horowitz uses “she” and “her” when talking about hypothetical CEOs. It is jarring to read a sentence where obviously, Ben Horowitz is talking about himself in the abstract, but he uses “she” and “her” for pronouns.

You might think using “she” and “her” as default pronouns makes sense if you call yourself a feminist. And Ben Horowitz tells you right at the beginning how he’s donating profits from this book to a charity focusing on women.

But if you read between the lines of the book, Ben Horowitz is no feminist. Here’s a few examples:

As far as I can tell, there are literally no quotes from any women at all in the book.
Ben Horowitz is all about making his employees work lots of overtime. This is maybe not the top offender, but every survey I’ve read mentions this expectation as being unfriendly to women.
His own personal life follows an old-school pattern: his wife stays home with his children, while he works all the time. This is how he expects his employees to operate as well.
Take a look at who his investment firm writes checks to, and count the boy names and girl names. The ratio is at least 20 to 1.
Go spend a few minutes reading articles like this one, showing how so few women have any of the top positions.

In short, every time I read him use “her” and “she, I got pulled out of the narrative, and started arguing with him in my head.

For the record, not that anyone cares, I don’t call myself a feminist. Mostly because it’s a term that applies to such a broad group of ideas that it has become useless.

The next thing

He starts every chapter with some hip hop lyrics. Mostly irrelevant lyrics, too. There’s no tie back to the lyrics in the text.

This all reminded of that brilliant scene in Office Space, where the guy blasts Geto Boys on the way to his tech job, but when he’s stuck at a red light, he turns the music way down as a black guy walks by selling flowers.

That’s how I see Ben Horowitz every time he tries to say that he and Jay Z or he and Kanye have stuff in common.

Side note: I don’t understand how this guy listens to DMX and then also acts like he cares about womens issues. You’re gonna start your book with a DMX Lyric, right after you tell us how you’re donating all the profits to women’s causes? Really?

Third and last point

Ben Horowitz calls himself a wartime CEO. This really bugs me. We’re living in a time when real combat vets are having a terrible time re-integrating in society.

Horowitz doesn’t know the first thing about their struggles. He’s just playing GI Joe.

Business and war are different. Business involves negotiations and contracts and agreements.

War involves napalm falling on children, or laying siege to cities, or raping and pillaging innocents. That’s war. Ben Horowitz may be good at the game he plays, but he’s not a warrior.

Ben Horowitz is not a feminist, not black, and not a Navy SEAL, but he sure likes to think he’s all three. In my view, he is a typical “drink champagne for charity”, limousine liberal, massive hypocrite. He grabs phrases and images from other people to make himself look more enlightened, less a member of the elite, and more macho than he really is.

And if he wants to get in the ring with me any time, I’m game.

The Pareto principle (why some bugs are OK to ignore)

Posted on March 22, 2017 by matt

There’s this thing called the Pareto Principle, which says:

roughly 80% of the effects come from 20% of the causes

You can quibble about the specific number values. Maybe 80 and 20 aren’t exactly right. But as long as you have customers that aren’t perfectly evenly distributed across bugs, you should consider that maybe some of your bugs aren’t worth fixing.

Here’s a contrived example: Imagine you got a product XYZ, and you got 100 users. They’re all mad because of five bugs (bug A through bug E).

80 of your users are mad because of bug A. (80% of 100)
16 other users are mad because of bug B (80% of the remaining 20)
3 users hate bug C (80% of the remaining 4 users)
User #100 filed two bug reports: D and E. He won’t be happy until both are resolved.

If you add up 80 + 16 + 3, you’re at 99 users. In other words, if you fix 3 out of 5 bugs, 99% of your customer base would be satisfied.

However, making that last customer happy is probably not worth it! You can satisfy 99% of your market by doing 60% of the required work.

Stop offering janky fixes

Posted on December 11, 2016 by matt

When doctors show up to work, they take time to wash hands thoroughly even if there are queued-up patients in critical status.

Meanwhile, us programmers deal with production bugs in the most expedient way possible. And usually that involves some janky fix and a comment like this:
# TODO: this won't work forever
and then we’re on to the next crisis.

We have to get better about this. Its fun to play the hero, and say we can fix everything right away, but in the end, we are digging our own graves.

This post is fueled by me cleaning up a mess caused by too many janky fixes all imploding simultaneously.

Last point: don’t blame your bosses and their unreasonable demands. Don’t expect them to understand the PROs and CONs. Simply do not offer any solution that makes the problem worse. We are the experts!

Going back to the doctor example, I’m sure the desperate patient would love to rush the doctor, because sure, 9 out of 10 times, their hands are probably clean enough, and if an infection does start, well, that’s what antibiotics are for.

But part of the reason why doctors are so revered and so well compensated is because they insist on being treated a certain way.

Ask a doctor for a “good enough” solution, or maybe ask how much would it cost if they don’t do it “the absolutely perfect” way, or any of the other lines your middle managers and sales people hit you with when trying whittle down your estimate.

Doctors will just stare at you like you’re an idiot. That’s what we need to start doing.

Postgresql: convert a string to a date or NULL

Posted on May 30, 2016 by matt

We’re working with some user-submitted text that we need to convert into dates. Most of the data looks correct, but some of it looks glitchy:

expiration_date 8/16 5/17 damaged 6/16

See that line with “damaged” in there? That will cause to_date to throw an error:
select to_date('damaged', 'YY/MM'); ERROR: invalid value "da" for "YY" DETAIL: Value must be an integer.

So I wrote this function:
create or replace function dt_or_null (s text, fmt text)


returns date

as

$$

begin
    return to_date(s, fmt);
exception

    when others then return null;
end;

$$ language plpgsql;

t+1

Programming, gardening, economics, life in Cleveland Heights

Meaningful primary keys beat sequences and UUIDs

Summary

In detail

Sidenote

Back to the main point

I see a path to 1984-style thought crimes from where we are now

I learned a little about the origin of the states secret privilege today

Product review: todoist

What I like

What isn’t perfect

Book report: Winter’s Gambit by Dana McSwain

Summary

The Alex and Frank Mythos

Trope synthesis

The title

Notes from Fight To Win 32 in Cleveland

Book Report: The Hard Thing About Hard Things by Ben Horowitz

What I liked

What I disliked

First off

The next thing

Third and last point

The Pareto principle (why some bugs are OK to ignore)

Stop offering janky fixes

Postgresql: convert a string to a date or NULL