Monthly Archives: October 2008

Break up changes into different commits with git add -p

This guy’s post led to this one.

I’m irresponsible about committing after each conceptual unit of work. Lots of time, I’ll edit a file to fix one bug, then while I’m in there, I’ll edit some other code because I see a better way to do something else. Then maybe I’ll add a few doctests to a completely different section because I feel like it.

After a few hours, I have edits in a single file that are related to multiple separate tasks. So back when I used svn, I would commit it with a message like “Fixed topics A, B, C”. Or I would say “Fixed A and a bunch of other stuff”.

Now with git, before I commit my changes, I run:


$ git add -p frob.py

Then git opens up an interactive session that walks through all the changes in that file and asks me if I want to stage each one. It is also possible to look at every change across a repository if you want to — don’t specify the file or files you want to see.

In the first pass, I stage all the hunks related to the first issue. Then I commit those changes. Then I repeat the process and stage chunks related to the next issue.

Keep in mind that I committed my changes after the first pass, so when I go through the file the second time, I won’t get prompted for those changes.

A real-world example

I’ve got two edits in mkinstall.py. One is a change to the list of files I want to ignore, and the other edit is a silly stylistic change. I want to commit them separately.


$ git diff mkinstall.py 
diff --git a/mkinstall.py b/mkinstall.py
index 4c6bb4e..2cb43de 100644
--- a/mkinstall.py
+++ b/mkinstall.py
@@ -17,7 +17,8 @@ Otherwise, I'll add a symlink.
 import os, shutil
 

 # Anything you want to skip:
-skip_us = ["mkinstall.py", ".svn", "_vimrc"]
+skip_us = ["mkinstall.py", ".svn", "_vimrc", "diffwrap.sh", "lib",
+    "lynx_bookmarks.html", "ipythonrc-matt", ".git"]
 
 # Anything you want to copy rather than symlink to: 
 copy_us = [".vim"]
@@ -60,7 +61,8 @@ for thing in copy_us:
     if os.path.islink(homefile):
         print "A symbolic link to %s exists already, so I'm not going to copy over it." % homefile

-    elif os.path.exists(homefile): continue
+    elif os.path.exists(homefile): 
+        continue

     else:
         svnfile = os.path.join(svnpath, thing)

This is what happens when I run git add -p:


$ git add -p mkinstall.py 
diff --git a/mkinstall.py b/mkinstall.py
index 4c6bb4e..2cb43de 100644
--- a/mkinstall.py
+++ b/mkinstall.py
@@ -17,7 +17,8 @@ Otherwise, I'll add a symlink.
 import os, shutil

 # Anything you want to skip:
-skip_us = ["mkinstall.py", ".svn", "_vimrc"]
+skip_us = ["mkinstall.py", ".svn", "_vimrc", "diffwrap.sh", "lib",
+    "lynx_bookmarks.html", "ipythonrc-matt", ".git"]
 
 # Anything you want to copy rather than symlink to: 
 copy_us = [".vim"]
Stage this hunk [y/n/a/d/j/J/?]? 

At this point, I will hit y. Now that section of the file is staged to be committed. That is not the same as committing it.
Now git shows the next section of code that is different:


Stage this hunk [y/n/a/d/j/J/?]? y
@@ -60,7 +61,8 @@ for thing in copy_us:
     if os.path.islink(homefile):
         print "A symbolic link to %s exists already, so I'm not going to copy over it." % homefile
  
-    elif os.path.exists(homefile): continue
+    elif os.path.exists(homefile): 
+        continue
  
     else:
         svnfile = os.path.join(svnpath, thing)
Stage this hunk [y/n/a/d/K/?]? 

I don’t want to stage this right now, so I hit n. That’s the last edit in the file, so the interactive session completes. Now when I run git diff –cached, which tells me what is about to be committed, look what I see:


$ git diff --cached mkinstall.py 
diff --git a/mkinstall.py b/mkinstall.py
index 4c6bb4e..a348ee1 100644
--- a/mkinstall.py
+++ b/mkinstall.py
@@ -17,7 +17,8 @@ Otherwise, I'll add a symlink.
 import os, shutil
 
 # Anything you want to skip:
-skip_us = ["mkinstall.py", ".svn", "_vimrc"]
+skip_us = ["mkinstall.py", ".svn", "_vimrc", "diffwrap.sh", "lib",
+    "lynx_bookmarks.html", "ipythonrc-matt", ".git"]
  
 # Anything you want to copy rather than symlink to: 
 copy_us = [".vim"]

So now I’ll commit this edit with an appropriate remark:

$ git commit -m "Added some more files to the list of files to be skipped"
Created commit ce0478d: Added some more files to the list of files to be skipped
 1 files changed, 2 insertions(+), 1 deletions(-)

Now I’ll view the unstaged changes again in my file, and notice that the other change still remains:


$ git diff mkinstall.py 
diff --git a/mkinstall.py b/mkinstall.py
index a348ee1..2cb43de 100644
--- a/mkinstall.py
+++ b/mkinstall.py
@@ -61,7 +61,8 @@ for thing in copy_us:
     if os.path.islink(homefile):
         print "A symbolic link to %s exists already, so I'm not going to copy over it." % homefile
  
-    elif os.path.exists(homefile): continue
+    elif os.path.exists(homefile): 
+        continue
  
     else:
         svnfile = os.path.join(svnpath, thing)

At this point, I can rerun git add -p and stage up more stuff to be committed. In this case, it is more realistic that I would run

git commit -a -m "Made a silly style change"

That will stage and commit that last edit in one swoop.

After that hail storm, my tomatoes look like Admiral Adama

We had a pretty intense hailstorm that started yesterday afternoon and ran for a few hours. I have a lot of green tomatoes still on the vine outside. Today I brought some in. They’re all dented and pock-marked now. Here’s a closeup:

all the dents are from the hail

Now here’s a completely different tomato. This one didn’t get damaged. Anyhow, this is just one single tomato, not three conjoined tomatoes. I call it the rumpshaker.

zumma zoom zoom zoomand a boom boomshake baby shake baby shake

Sometimes I think validate + formencode is more hassle than it is worth

I’m hoping somebody will read this and show me a better way.

In general, I like formencode. I like that I can do stuff like:


@validate(validator=SomeGnarlySchema())
def m(self, a, b, c, d, e=None):

And then I know that all my parameters have been converted from their original string values into whatever I want.

But I also find that I spend a lot of time getting my complex schemas to work. Like right now, I have an optional parameter e. e should either be a string representing a date, or it can be None.

I’ve got a validator with this logic in it for e:

  1. First try to return a datetime.date object from parsing e.
  2. Otherwise, look in the cookie for a key “e” and try to return that after parsing it into a datetime.date.
  3. Finally, just return today’s date.

So, the idea is that some visitor can come to page /m and always see data for today. Or, they can use a calendar widget to choose a value. On subsequent visits back to /m, I’ll keep showing them that same date they chose because I saved in it a cookie.

Here’s the problem. I have to make e an optional parameter because I don’t want to require that people hit the site with a url that contains a value for e.

However, when e is None, then my validator for e is ignored! So, as far as I know, at this point, I need to use a validator that operates on the whole set of parameters.

Which is also possible, but in my brain, it seems wrong that I have to use a schema-level validator when I really am only validating one single field.

More generally, anybody that subscribes to the formencode mailing list sees first-hand just how confusing a lot of people find formencode. It is a very powerful library, but very tricky to get right.

Here’s my question — does validate really need to use formencode? Is there some better, simpler solution? I’ve read about how django tackles this problem, and their approach does seem simpler, but I can’t say for sure until I really build something with it.

If any readers can show how to make a form.clean method that does the 1-2-3 logic I described above, I’d be really grateful.

Maybe formencode just needs a fat cookbook of solutions.

I had a strange dream last night

I somehow found a spreadsheet that listed assigned targets for a bunch of US gov’t hitmen. I don’t remember how I got the spreadsheet.

I thought it was a hoax at first until I heard on the news that somebody just died, and then sure enough, they were on the spreadsheet. Then I saw how I was being followed around the zoo by an old beat-up VW bug. Yeah, the whole dream took place at the zoo. And I was 17 or so. And I didn’t have any shoes on.

While I was running around the zoo, I decided I needed to upload the file on lots of servers, and set up some kind of dead-man switch where it would automatically be visible to lots of people unless I kept verifying that I hadn’t been killed or disappeared. That would be my insurance in case the old Chinese man driving the VW bug caught me (did I mention that he was Chinese? That was a later discovery).

So, the rest of the dream got pretty chaotic at this point. I kept sliding through a bunch of scenes where I was either writing my dead-man script, or running through the zoo snack bar while avoiding my pursuers, or having to explain the situation to anybody else who also had the spreadsheet, but thought it was just a hoax like I initially did.

Then I woke up and all day today I’ve been having intrusive thoughts about how I really need to finish the god-damn script and get the hell out of the office before that crusty old Chinese dude drives up in Herbie.

So here’s the point — what the fuck do they put in the food at Don Tequila’s? The food was your standard Taqueria-grade Mexican slop, except with a fat dose of mescaline in the ranchero sauce. I don’t think I’ll be going back, at least not on a school night. I got enough anxiety as it is.

The USA-Soviet collapse gap

I love doomsday prophecies.

I grew up in the 1980s in Texas, watching stuff on TV like the The Day After and Damnation Alley and lots of other bad post-apocalyptic sci-fi schlock. Meanwhile, my parents took us to a church that was very focused on Christian eschatology*, so my childhood daydreams revolved around on how I would survive in the inevitable supernatural, post-armageddon war zone. I used to imagine how buildings would look when they were all burned out and destroyed and possibly occupied by mutants.

I’m sure that’s why I’m so into growing vegetables now. I’ll be ready with my basement full of turnips when the shit goes down.

One of my college professors argued that Christianity at its heart is a religion about redemption in the afterlife, and if the ancient Jews weren’t so miserable under the Romans, the religion never would have caught on. Even today, it appeals to people most often that are at the end of their rope.

I suspect a similar dynamic applies with all these doomsday preachers. People like talking about the end of the world because these scenarios offer them hope out of whatever mess they’re stuck in currently. For example, as a delinquent 7th grade kid, I knew that if we went to war with the Russians, or if a meteor crashed into the Earth, or if a super-virus plague broke out, or if aliens landed and started harvesting our life force, I wouldn’t get in trouble for not doing my world history homework, so on some level, I wanted it to happen.

[*] has very little in common with Christian scatology. It’s just a pretentious word that means what the religion believes will happen at the end of the world.

The Houston Chronicle endorses Obama

I’ve been wondering what my historically Republican hometown newspaper would do, and today they endorsed Obama. Here’s an excerpt:

Perhaps the worst mistake McCain made in his campaign for the White House was the choice of the inexperienced and inflammatory Palin as his vice-presidential running mate. Had he selected a moderate, experienced Republican lawmaker such as Texas Sen. Kay Bailey Hutchison with a strong appeal to independents, the Chronicle’s choice for an endorsement would have been far more difficult.

That captures my feelings. If McCain would have stayed true to his remarks in 2000 about Jerry Falwell, and stayed true to his original position on the Bush tax cuts, and then chosen a VP like Joe Leiberman, truly showing his independent nature, then this could have been a very different race.

Poverty isn’t only about material stuff

I read on Jasdeep’s blog that today is blog action day, and so I’m jotting some stuff down.

Everybody seems to be vaguely aware of some supposed link between stress and illness, but I never see people considering the ramifications. A few years ago I read The Status Syndrome by Michael Marmot. He argues that life in a society with a big distance between the top and bottom is toxic to the people at the bottom. In other words, feeling like you occupy a low rung on a social ladder slowly kills you.

So, here’s what I want to say about poverty. Poverty isn’t the state of living without modern conveniences. It is the mental state that comes from too much anxiety, desperation, and hopelessness.

Some research on generic/EAV tables

Yesterday I confirmed a hunch I’ve had about database schema design. Here’s the background: I’m working on a feature where I track employees and their preferred locations, shifts, and stations.

For example, I’ll track that Alice likes the morning shift at the west-side location, and she likes to work the front register station most of all, but her second choice is the drive-though.

Meanwhile, Bob likes the west-side and north-side locations, is indifferent about the shift, and likes the dishwasher station. Note the one-to-many relationship between Bob and his preferred locations and his lack of shift preferences.

I came up with two ways to make my tables:

FIRST METHOD

create table preferred_location (
   employee_id int references employee (id),
   location_id int references location (id));

create table preferred_shift (
   employee_id int references employee (id),
   shift int references shift (id));

create table preferred_station (
   employee_id int references employee (id),
   station_id int references station (id));

Hopefully, this is obvious. I store that Alice likes the west-side location in the preferred_location table like this:


    (Alice's ID, west-side location ID)

Then I store the fact that she likes the morning shift in the preferred shift table like this:


    (Alice's ID, morning shift ID)

Every time I want to add some new type of preference, e.g., hats, I need to make a table to hold all the legal hats and then make a table linking employees to their hat preference.

SECOND METHOD

This way keeps all the preferences in a single table.

create table preferences (
   employee_id int references employee (id),
   preference_type text,
   preference_value text));

Here’s how I would store that Bob likes to be a dishwasher:


    (Bob's ID, 'station', 'dishwasher')

Here’s what I like about this method two: I don’t need to tweak the database schema whatsoever when I dream up new preferences. In fact, I can let system users create new preference types at run-time, and the system just works. In this scenario, adding each employee’s hat preference does not require updating my schema.

On the downside, I wouldn’t have any FK constraints. Somebody could store a preference like they want to work a nonexistent shift and I wouldn’t know until I get an angry customer calling me. I’d have to do a lot of application-level data validation, which I hate.

Finally, there’s just something about method two that seems … wrong, even though I’ve seen variations of this theme in production environments at previous jobs (cough, ALLCODES, cough, PINDATA, cough).

So, with this dilemma, I wrote a post to the PostgreSQL users mailing list and got a fantastic reply. Here’s some excerpts:

Your “method 2″ is something called an Entity-Attribute-Value table design[1].

That said, by going the EAV/”Method-2″ route, you’re gaining flexibility, but at the cost of increased complication, and ultimately repurposing a relational database to do something that isn’t very database-like, that’s really more like a spreadsheet. (So why not just use a spreadsheet?) You have little room for recording additional information, like ordering preferences, or indicating that (say) a station preference depends on a location preference, or that a shift time depends on day of the week, etc — so you’re probably not getting as much flexibility as you think. Sure, you could add an “Extra_Data” column, so you have rows:


 Marie-Location-West-1,
 Marie-Location-East-2,
 Marie-Shift-Evening-Tuesday,
 Marie-Station-Register-West,
 Marie-Shift-Morning-Sunday,

etc. But you can see the data integrity nightmare already, when you somehow manage to record “Marie-Shift-Register-1″. Not to mention that you’ll have to do format conversions for that “Extra_Data” field, and incorporate logic somewhere else in your program that deciphers whatever’s in the generic data field to come up with ordering preferences for locations, station preferences by shift times, or whatever else you want to store.

[1] http://en.wikipedia.org/wiki/Entity-Attribute-Value_model

At this point, I was pretty sure I would go with method 1, but not absolutely certain. Then I read that linked article, which really just said more of the same.

Then I read this Ask Tom post and that erased the last bit of lingering doubt I had. Method 2 is incompatible with performance. Method 2 turns your database into a glorified flat file. Here’s some of my favorite excerpts from the Ask Tom post:

Frequently I see applications built on a generic data model for “maximum flexibility” or applications built in ways that prohibit performance. Many times – these are one in the same thing! For example, it is well known you can represent any object in a database using just four tables:


Create table objects ( oid int primary key, name varchar2(255) );

Create table attributes 
( attrId int primary key, attrName varchar2(255), 
datatype varchar2(25) );

Create table object_Attributes 
( oid int, attrId int, value varchar2(4000), 
primary key(oid,attrId) );

Create table Links ( oid1 int, oid2 int, 
primary key (oid1, oid2) );  

Looks great, right? I mean, the developers don’t have to create tables anymore, we can add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The developers can do whatever they want and the DBA can’t stop them. This is ultimate “flexibility”. I’ve seen people try to build entire systems on this model.

But, how does it perform? Miserably, terribly, horribly. A simple “select first_name, last_name from person” query is transformed into a 3-table join with aggregates and all.

There’s a comment on that story about some java developers that insisted on this approach and then had to redesign the whole thing post-launch. I also like the “chief big table” remark.

Anyhow, it’s nice to know that (this time) my instincts were correct.