Monthly Archives: August 2007

How to use vimdiff as the subversion diff tool

vimdiff is fantastic. Follow these instructions to make subversion use vimdiff when you run svn diff.

Get this diffwrap.sh script and save it anywhere. I saved mine in my $HOME/bin directory. Make sure to make it executable! I’m showing it below:


#!/bin/sh

# Configure your favorite diff program here.
DIFF="/usr/bin/vimdiff"

# Subversion provides the paths we need as the sixth and seventh
# parameters.
LEFT=${6}
RIGHT=${7}

# Call the diff command (change the following line to make sense for
# your merge program).
$DIFF $LEFT $RIGHT

# Return an errorcode of 0 if no differences were detected, 1 if some were.
# Any other errorcode will be treated as fatal.

Then change your $HOME/.subversion/config file to point at that script:


[helpers]
diff-cmd = /home/matt/bin/diffwrap.sh

Then go diff a file!

See this section of the svn book for all the details.

The efficient market hypothesis and search-engine optimization

The efficient market hypothesis (EMH) is an idea in the finance world that the market price for a commodity accurately reflects all information available at the moment. There’s little point in us trying to pick particular winning stocks, unless we have some very secret information. The best investment strategy involves diversifying risk so that results match the aggregate market changes.

It’s a very appealing idea because it means the laziest strategy is also the best.

Anyhow, I suspect that something sort of like EMH dominates the search engine world, but instead of prices for commodities at points in time, the market is search engine rankings and keywords. In my inchoate model, each keyword is its own market. The “price” of a given website would be how well it ranks in a search for that keyword.

I brought up EMH because if we view a website’s search engine ranking as a market price, and we make the assumption that search engines on average operate as efficiently as any other market, then ultimately, your site’s search engine ranking can’t really be pushed up through artificial means, for the same reason that you can’t push up stock prices artificially. I should make it clear that I mean that you can’t do it sustainably.

Hopping back into the financial world, an unfortunately common tactic is to run a pump-and-dump scam. The principle is simple — I own a whole bunch of shares of a company, and so I go out and promote the heck out of that company while at the same time I’m selling all my shares to all the suckers that I manage to convince. Meriill Lynch paid a $100 million fine a few years ago for doing this.

I get unsolicited email all the time with investment recommendations. These are sent from people running the same operation on a smaller scale.

Of course even if these efforts do let the dumper get out and make some money, once the market learns about what is really going on, the stock price craters. See what happened to Enron for one of the most famous examples of how the market reacts to information assymetry.

Getting to the point, these tactics may push up the price just long enough for the dumper to dump, but no serious stockholder would ever consider using a pump-and-dump scam as a long-run strategy to improve the price per share.

I think this is what search-engine marketers mean when they talk about how black-hat tactics don’t work in the long run and will likely backfire. The more I read and learn about search engine optimization and website marketing, the industry experts all seem to be really saying that you have to have good content on your site, and you have to have recommendations from the larger community. Everything I read really seems to say that the best SEO strategy is to build a really good website, rather than to build a shoddy product and market it aggressively.

In short, the invisible hand can not be denied.

I don’t like the patronizing “we”

People around me at work say phrases like:

  • “Do we know how long this will take?”
  • “Do we have someone that can figure that out?”

Wikipedia calls this the patronizing we and the description is dead on:

The patronizing we is sometimes used in addressing instead of “you”. A doctor may ask a patient: And how are we feeling today? This usage is emotionally non-neutral and usually bears a condescending, ironic, praising, or some other flavor, depending on an intonation: “Aren’t we looking cute?”.

I don’t like it. People tend to use it to assign an activity implicitly, like when somebody says “We’ll take care of it” and they really want me to do something, but they also want to somehow associate themselves with my labor.

And when some of the lazy marketing people say “Do we know how many of X there are?” what they really mean is “I’m so mushy-headed I can’t even bother thinking who I should ask to find this out”.

Finally, the “We need to get this done!” and “We need to make this a priority!” imperatives are the absolute worst. The speaker is admonishing subordinates and at the same time taking credit for anything that may happen.

Perhaps later I will construct a lookup table to disambiguate these phrases.

When to use globals

I am dogmatic about never using global variables. But when people way smarter than me use them, like Brian Kernighan does when he builds a command-line interpreter in that chapter of The Unix Programming Environment, I wonder if maybe I’m being too reluctant.

I was looking at a python module vaguely like this:


def foo():
    do_foo_stuff()

def bar():
    do_bar_stuff()

def baz():
    do_baz_stuff()

def quux():
    do_quux_stuff()

I had a bunch of independent functions. I wanted to add logging. I saw two easy ways to do it:

def foo():
    logger = get_logging_singleton()
    logger.log_stuff()
    do_foo_stuff()

def bar():
    logger = get_logging_singleton()
    logger.log_stuff()
    do_bar_stuff()

def baz():
    logger = get_logging_singleton()
    logger.log_stuff()
    do_baz_stuff()

def quux():
    logger = get_logging_singleton()
    logger.log_stuff()
    do_quux_stuff()

In the above code, I would get a reference to my logger object in each function call. No globals. Maybe I am violating some tenet of dependency injection, but I’ll talk about that later. Anyhow, the point I want to make is that the above approach is the way I would do it in the past.

Here’s how I decided to write it this time:


logger = get_logging_singleton()

def foo():
    logger.log_stuff()
    do_foo_stuff()

def bar():
    logger.log_stuff()
    do_bar_stuff()

def baz():
    logger.log_stuff()
    do_baz_stuff()

def quux():
    logger.log_stuff()
    do_quux_stuff()

All the functions access the logger created in the main namespace of the module. It feels a tiny bit wrong, but I think it is the right thing to do. The other way violates DRY in a big fat way.

So, a third option would be to require the caller to pass in the logging object in every function call, like this:


def quux(logger):
    logger.log_stuff()
    do_quux_stuff()

This seems like the best possible outcome — it satisfies my hangup about avoiding global variables and the caller can make decisions about log levels by passing any particular logger it wants to.

There’s two reasons why I didn’t take this approach:

  1. I was working on existing code, and I didn’t have the option of cramming in extra parameters in the calling library. So, I could do something like def quux(logger=globally_defined_logger) but I’m trying to make this prettier, not uglier. The whole reason that I wanted to add logging was that I wanted some visibility into what what the heck was going wrong in my app. I didn’t have time to monkey with overhauling the whole system.
  2. I plan to control my logger behavior from an external configuration system. I don’t want to change code inside the caller every time I want to bump the log level up or down. It is the conventional wisdom in my work environment that I face less risk just tweaking a configuration file setting and restarting my app rather than editing my code*.

[*]I suspect that in the final analysis, this belief will be exposed as garbage. But for right now, it seems pretty true that bugs occur more frequently after editing code than after editing config files.

UPDATE: Apparently, I’m not just talking to myself here! Gary Bernhardt linked to this post and added some really interesting points. Also, his link to the post on the origin of the phrase now you have two problems was something I hadn’t heard of before.

Dependency Injection Demystified

I’m building a robot to make breakfast.

def make_breakfast():
    fridge = get_reference_to_fridge()
    eggs = fridge.get_eggs(number_of_eggs=2)
    fried_eggs = fry(eggs, over_easy)
    cabinet = get_reference_to_cabinet()
    plate = cabinet.get_plate()
    add(fried_eggs, plate)

    return plate

This is OK, but I realize my robot needs lots and lots of practice, and I don’t like wasting all my nice eggs and getting all my plates dirty. So, while my robot is still learning how to cook, I want to specify that it uses paper plates and some crappy expired eggs I fished out of the grocery store dumpster.

Continue reading

Google presentation at Clepy on August 6th, 2007

Tonight Brian Fitzpatrick (Fitz) from the Chicago Google office did a presentation for the clepy group on version control at Google. They use subversion on top of their own super-cool bigtable filesystem back end.

We had a good discussion on the merits of centralized vs. decentralized version control. According to Fitz, decentralized systems discourage collaboration. He made the joke, “Did you hear about the decentralized version control conference? Nobody showed up.” He made the point that centralized repositories encourage review and discussion. I agree with that.

Apparently subversion 1.5, which will be released in a few months, will have much improved merging facilities. We won’t need to use --stop-on-copy to figure out where we branched. Also, it will be safe to repeat a merge, because nothing will happen on the second attempt.

I don’t like the dispatching system in turbogears

I wanted to translate a(1).b(2)into the TurboGears URL /a/1/b/2. A browser making a request for /a/1/b/2 would trigger that code.

This page explains how to do it. You build a single default method that catches everything and then does introspection to figure out where to send the request.

It works fine, but it isn’t nearly as obvious or concise as the regular-expression approach I’ve seen in rails and Django.