Spreadsheets are the devil, but here is how to avoid getting burned.

Spreadsheets seem like they are adequate tools for serious analysis. And unfortunately, people are graduating from stats and OR programs without mastering any of the other alternatives. But brother, I stand before you today to tell you that spreadsheets are the devil.

When you face a modeling problem, spreadsheets tempt you with the seemingly easy way out. It all starts with how easy it is to import data. Excel’s import wizard is fast and pretty smart about automatically assigning column types. Meanwhile, your hapless colleagues are going to spend a day reading manuals just to load in that same tab-delimited text file.

Now that you’ve got the raw inputs loaded, you figure that within a few days you’ll be done building your trendlines and you’ll kill time choosing fonts for your pie charts. But what happens — invariably — is that you think you are done and then you look at your number on your final worksheet and realize it can’t be right. You must now find the error in any of the possibly hundreds of tiny formulas all chained together. Welcome to cell HE11.

Meanwhile, while you’ve got numbers that are laughably wrong, your SAS friend after a few days at least has his PROC REPORT output to show the boss, even if he did have to print it on the basement mainframe dot-matrix printer.

So, despite all that, sometimes, I find that I just have to use a spreadsheet. In that circumstance, I try to follow a set of rules. Any time I deviate from these rules, I always get burned.

  1. Put at the top of each sheet a few paragraphs that describe the model. Ideally, this text should be so clear and specific that I can rebuild the spreadsheet just based on this information. (This also helps make sure that you implemented the logic correctly.)
  2. Indicate what are the cells that the user should play with, and what cells should not be tweaked. Point out where the final answer pops out. Establish a color scheme to distinguish between input data and formulas.
  3. Emulate the IRS 1040, where there is a column of text and just a few columns of numbers, and each row is as simple as possible. There’s a main column that gets summed at the bottom, and a secondary column where complex totals are broke down further.
  4. Decompose those formulas and don’t store literal data inside of formulas! For example, in a mortgage calculator, break out the interest rate, the mortgage size, and the number of years in the mortgage into separate cells, and then show the result in another cell:

    s1

    Don’t be tempted to cram all those numbers inside a single cell like this:

    s2

    Sure, you save a few rows and it compresses the size of your sheet, but in the end, you make your sheet much less flexible, and it will be more difficult to separate data-entry errors from formula errors.

  5. Finally, Put everything in top-to-bottom order in each sheet and have a single flow. Don’t have lots of parallel panels side-by-side. It becomes too confusing.

I am certain that there are even more rules that are better than these. Enlighten me.