The Pareto principle (why some bugs are OK to ignore)

There’s this thing called the Pareto Principle, which says:

roughly 80% of the effects come from 20% of the causes

You can quibble about the specific number values. Maybe 80 and 20 aren’t exactly right. But as long as you have customers that aren’t perfectly evenly distributed across bugs, you should consider that maybe some of your bugs aren’t worth fixing.

Here’s a contrived example: Imagine you got a product XYZ, and you got 100 users. They’re all mad because of five bugs (bug A through bug E).

  • 80 of your users are mad because of bug A. (80% of 100)
  • 16 other users are mad because of bug B (80% of the remaining 20)
  • 3 users hate bug C (80% of the remaining 4 users)
  • User #100 filed two bug reports: D and E. He won’t be happy until both are resolved.

If you add up 80 + 16 + 3, you’re at 99 users. In other words, if you fix 3 out of 5 bugs, 99% of your customer base would be satisfied.

However, making that last customer happy is probably not worth it! You can satisfy 99% of your market by doing 60% of the required work.

Stop offering janky fixes

When doctors show up to work, they take time to wash hands thoroughly even if there are queued-up patients in critical status.

Meanwhile, us programmers deal with production bugs in the most expedient way possible. And usually that involves some janky fix and a comment like this:

# TODO: this won't work forever

and then we’re on to the next crisis.

We have to get better about this. Its fun to play the hero, and say we can fix everything right away, but in the end, we are digging our own graves.

This post is fueled by me cleaning up a mess caused by too many janky fixes all imploding simultaneously.

Last point: don’t blame your bosses and their unreasonable demands. Don’t expect them to understand the PROs and CONs. Simply do not offer any solution that makes the problem worse. We are the experts!

Going back to the doctor example, I’m sure the desperate patient would love to rush the doctor, because sure, 9 out of 10 times, their hands are probably clean enough, and if an infection does start, well, that’s what antibiotics are for.

But part of the reason why doctors are so revered and so well compensated is because they insist on being treated a certain way.

Ask a doctor for a “good enough” solution, or maybe ask how much would it cost if they don’t do it “the absolutely perfect” way, or any of the other lines your middle managers and sales people hit you with when trying whittle down your estimate.

Doctors will just stare at you like you’re an idiot. That’s what we need to start doing.

Postgresql: convert a string to a date or NULL

We’re working with some user-submitted text that we need to convert into dates. Most of the data looks correct, but some of it looks glitchy:


See that line with “damaged” in there? That will cause to_date to throw an error:

select to_date('damaged', 'YY/MM');
ERROR: invalid value "da" for "YY"
DETAIL: Value must be an integer.

So I wrote this function:

create or replace function dt_or_null (s text, fmt text)

returns date

return to_date(s, fmt);

when others then return null;


$$ language plpgsql;

And this is how it works:

select 'damaged', dt_or_null('damaged', 'YY/MM');
| ?column? | dt_or_null |
| damaged | |
(1 row)

My advice to new programmers looking to start their career

Your resume is probably pretty good, but you need to show you can build stuff beyond school assignments. You don’t need a job to do that though! Here’s my advice:

  1. Prove that you can build and maintain something without being supervised. Build some kind of web project in your free time and host it online on AWS or rackspace or my favorite, Linode. That link has my referral code in it, by the way 🙂

    Start with something as easy as possible. Don’t worry though — you will discover a ton of difficulties as you work through it. Your project can be anything:

    • a really simple recipe database
    • the most popular mens socks on Amazon
    • weather forecast for nearby cities

    At the bottom of every screen in that project, add a link to your github profile and your linkedin page, and put your email in there and say something like “I’m looking for work!”

    Once you’re done, pick a new project. Maybe rewrite the same thing in a different language. The point here is to make real things that regular people can interact with.

    Silly projects are likely to get more attention. For example, the KJV Programming tumblr site is hugely popular and doesn’t really do anything useful for anyone.

  2. Get involved with some volunteer programming work. In Cleveland, there are several groups of programmers that volunteer their time. Look at Cleveland Givecamp, for example, or Open Cleveland.

    Where ever you are, I bet there’s a group like this already. If not, start one!

    Or, just find an organization like a church or a club or a business that you like and offer to work with them to do something like set up a better website, automate some financial reports, or even just help them manage their facebook / instagram / twitter accounts.

    You will learn how to work with non-technical people this way. That is an important skill!

  3. Start a blog.

    Write tutorials for little things you figure out while building your projects. Write tutorials for stuff that you are learning in school, like recursion, or operator overloading in C++, or why you hate or love one language vs another.

    Write about the nonprofits or clubs or small businesses you’re working with.

    Practice writing clearly and succinctly.

    Read William Strunk’s The Elements of Style at least three times. It’s nearly a hundred years old and still the best writing guide out there.

    Publish what you do on twitter and reddit and hacker news and other places so you get more attention. Don’t waste a minute arguing with the haters though. Nobody cares about them.

    Add google analytics to your blog and study what posts attract the most attention.

  4. Go to as many technical meetups as you can and introduce yourself to people and tell them you are looking for work. Talk about what you are working on. Ask them where they work and if they like it and if they know of openings.

    If you’re anywhere near Columbus, Ohio, show up at PyOhio on July 30th and 31st and introduce yourself to as many people as you can. Maybe even do a 5-minute lightning talk on one of your projects — the sillier the project is, the better.

  5. Cold-call recruiters at companies like Robert Half, Oxford, Randstad, etc and tell them you’re looking for work. Ask them what skills are the most sought after.

    Learn those skills, and build projects with them, and then write out about it.

The point with all this stuff is to make yourself a programming celebrity. You don’t want to go looking for jobs — you want jobs to come to you.

Good luck on your quest!

Consider that you are lucky to live at a time where a few of us have vastly more upward economic mobility than ever before. It just takes effort.

Are you an animal or a human?

What’s good and bad about github issues

Ticketing / workflow / bugtracker systems are always nasty. Github’s is pretty good. Maybe the best of what’s out there. But it ain’t perfect.

Here’s what I like:

  • It’s ready to go immediately once you start your github repo.
  • You can link a commit to an issue by mentioning the issue number in the commit.
  • Labels let you store a TON of metadata.

And what I dislike:

  • No obvious way to tell if somebody is actively working on an issue. More generally, no “status” field exists on an issue.
  • No obvious way to do a query like “label X or label Y”.
  • No command-line interface.
  • Since github doesn’t include a built-in mailing list, github issues often get used for support requests. Then when somebody explains “here’s how to do … “, the issue gets closed, and that helpful expensive-to-write documentation is hidden away. The solution here is for github to host a mailing list for every repository.

old-school code checklist

You can’t call it old-school code unless a majority of these are true:

  • global vars are all registered at the top of the file, and are used to track state
  • Comments contain author’s initials and a date
  • Last line of the file is just the number 1;
  • Uses LDAP
  • You recognize the dude that wrote it because you’ve seen his email address at the bottom of some man pages

Obscure python syntax error

Been writing python for a long time. When I wrote this code, I could not figure out why I was getting a syntax error.

d1 = dict(
display_name='Matt Wilson',)

d2 = dict(

It is the trailing comma after **d1. It is not OK. Which is really weird, because the trailing comma after display_name=’Matt Wilson’ is just fine.

We need to make our conference presentations more accessible

When I say “more accessible” I mean any of these:

  • useful for people with hearing / reading / seeing / cognitive / anything differences
  • approachable for people at many different skill levels
  • not intimidating. not insulting or offensive or exclusive.
  • useful for people that are studying the material afterward
  • Help me out here — what are some more examples?

I’m not an expert on how to do this, but I know this is a problem. I want help figuring out solutions, so please get in touch with me if you can help with that. I will ignore defenders of the status quo.

More detail on the problem

We pump a lot of energy into making really cool presentations for conferences, and then, when the conference is over, a lot of times, that great content usually just disappears.

Or if it doesn’t disappear, we don’t do a good job of getting it out where more people can benefit from our work. Maybe there’s a zipfile with our slides on a page for our talk on the conference website afterwards.

Maybe the slides (without most of the commentary) will show up online in one of those flash widgets.

Or if you’re really lucky, a video recording of the presentation will show up. And that’s great. For example, Next Day Video records and edits the PyOhio presentations, and does fantastic work, but just a video is not sufficient for all audiences.

A video recording is great for some things, but not for others. It isn’t easy how to copy a URL mentioned in a video, for example, or copy-paste a block of code. Or bookmark something 25-minutes in.

Consider that for every person in your audience, over the next few years, there’s probably at least 10 or a hundred or maybe even a thousand people that will be doing searches online for the facts you’re covering right now.

A lot of those people might be brand new to the language or library. A lot of those people might not be native English speakers. And maybe they’re on slow internet connections too.

A few ideas to make this better

I have a few ideas for what to do, listed below, but I’m more interested in getting feedback from readers. So please, let me know how what you think we should do.

Anyhow, my ideas:

  • Require presenters to submit something like a paper, not just a stack of LOLCATS slides for their proposal.
  • Bundle the materials from the presenters as soon as possible and get those out on the web. The SAS Global Forum does this, and it is great. I have read nearly all the papers presented at their conferences, because they make it so easy to get their material, and because everybody is writing actual papers, not doing stream of consciousness performance art.
  • Use open formats for text. Avoid PDFs, power-point slide desks, and similar stuff like the plague. Take SEO to heart. Not because we want to sell advertising, but because we want to share our knowledge.
  • Encourage attendees to critically react to the presentations. Maybe even consider the presentation material as open-source. A presentation contains some code sample that confuses people in the audience, then people should rewrite that with something more intuitive and more obvious.