Dear Google

I hate experts-exchange.com. I am tired of getting my hopes up that I found an answer to my problem, and then finding out that they want money before they’ll show me the answer.

From now on, please filter them out of all my future searches.

How to use indexes with SAS datasets

I just came across some old SAS tutorials that I wrote a few years ago, so I’ll be posting them here when I’ve got nothing else to say.

Indexes are an alternative to sorting your dataset or building a format. They speed up any where or by processing.

Creating indexes

You can create an index in a data step like this:

data clm2 (index=(mbrno empssn mi=(mbrno iyymm) ei=(empssn iyymm)));
set clm1;
run;

The mi and ei are compound indexes, which behave as if you sorted your dataset mbrno iyymm or by empssn iyymm.

You can use proc datasets to add an index to a dataset that already exists:

proc datasets library=saslib;
modify clm2;
index create mbrno empssn mi=(mbrno iyymm) ei=(empssn iyymm);

Using indexes

Indexes allow you to merge datasets that aren’t sorted. In the above example, now you can use clm2 just like it was sorted by any of the indexed vars:

data clm_plus_elig;
merge clm2 mem;
by empssn iyymm;
run;

This is another example of how to do a lookup.

data d1;
infile datalines;
input col1 8.;
datalines;
101
106
102
102
103
103
104
105
;

data d2 (index=(col1));
infile datalines;
input col1 col2 $8.;
datalines;
104 ddd
102 bbb
103 ccc
101 aaa
;

data d3;
set d1;
set d2 key=col1 / unique;

/* This block handles bad lookups. */
if _IORC_ eq %sysrc(_DSENOM) then do;
col2 = "xxx";
end;

run;

proc print heading=h data=d3;
run;

And this is the output:

this is the d3 output dataset. 14:43 Friday, March 11, 2005 1

Obs col1 col2

1 101 aaa
2 106 xxx
3 102 bbb
4 102 bbb
5 103 ccc
6 103 ccc
7 104 ddd
8 105 xxx

Posted in SAS

How to make sure you write more tests

I’ve been writing way more tests for my code lately, and they’ve become the backbone of my development style.

I used to write some web code, then play around with it in the browser and look for something to blow up. Automated testing was usually an afterthought or used for confirmation. Or maybe in some cases I would write tests dilligently at the beginning of a project, but start skipping them as deadlines approached. Following TDD has always felt like sticking to high-school abstinence pledges. I wrote tests because I thought I should do it, not because I wanted to do it.

But now I’ve found a way to make writing tests central to getting my work done. It’s because of the computer I’m using now. It is such a crappy box that it takes about 5 minutes for firefox to start up. Then each page load takes about another 30 seconds at least. In the time it takes to click through three or four pages using firefox or epiphany, my twill scripts can run through about a hundred pages.

There’s a scene in Star Wars: a New Hope where Obiwan trains Luke to use the light saber while blindfolded. Well, he’s not blindfolded, really, he’s wearing a helmet with the blast shield down, but the idea is the same. Luke has to use “the force” to feel out where the floating droid is, rather than relying on his vision.

Anyway, writing web pages with this Compaq Presario 1200 feels kind of like that. It’s too frustrating to check my pages with Firefox. The only way I can make sure that anything really works is to write a test for it.

PS: I wrote this posting with lynx.

Possible bug in 1.0.4b3 tag of turbogears

The /visit/api.py file in the 1.0.4b3 tag of turbogears has this function, starting on line 177:

def encode_utf8(params):
'''
will recursively encode to utf-8 all values in a dictionnary
'''
res = dict()
for k, v in params.items():
if type(v) is dict:
res[k] = encode_utf8(v)

else:
res[k] = v.encode('utf-8')

return res

If you have a query string like ?a=1&a=2, then params has a key u’a’ that points to a list that contains u’1′ and u’2′. And encode isn’t defined for lists, so . . .

Fortunately, the /visit/api.py file in the branches/1.0 branch already has a fix for this problem, so I ran setup.py develop in my checkout directory and was back in business.

I lost so much time today figuring this out because I kept looking for the bug in my code, rather than in the framework itself. Also, the code works fine as long as the query string doesn’t have more than one value for the same key.

While I’m on the soapbox, I really wish that testutil.py would change this function:

def tearDown(self):
database.rollback_all()
for item in self._get_soClasses():
if isinstance(item, types.TypeType) and issubclass(item,
sqlobject.SQLObject) and item != sqlobject.SQLObject \
and item != InheritableSQLObject:
item.dropTable(ifExists=True)

to something sort of like this instead:

def tearDown(self):
database.rollback_all()
import copy # Probably don't actually import here, but this is just for illustration.
x = copy.copy(self.__get_soClasses()) # store a copy of the list.
x.reverse() # Now reverse it.
for item in x: # Iterate the reversed copy.
if isinstance(item, types.TypeType) and issubclass(item,
sqlobject.SQLObject) and item != sqlobject.SQLObject \
and item != InheritableSQLObject:
item.dropTable(ifExists=True)

The whole point of using self.__get_soClasses is that it looks for a list that defines the order to follow when creating tables. You can define soClasses in your model to make sure that your independent tables are created before your dependent tables.

Well, when it comes time to destroy all your tables, you should destroy the dependent tables first.

I posted this about a month ago to the turbogears trunk mailing list already.

Sidenote — if you’re one of the people that are selflessly donating your time to working on turbogears, please don’t take my rants here personally. I’m really grateful that other people are building tools and giving them away, so that I can make a living.

Neat artwork

I follow this guy’s blog because he writes a fantastic comic called Gone With the Blastwave. Anyway, he posted this video showing him drawing a picture.

I really like watching him go from a blank screen to the finished work. It seems almost magical.

Posted in art

It’s Monster Truck season

My hometown paper is running a story about monster trucks. I’ve been to shows in Texas, and I’ve been to shows in Ohio. The ones in Ohio are fun, but the ones in Texas operate on a whole other level. I suspect my feelings about monster truck shows here match how expat Japanese people feel about USian Sumo wrestlers.

In Ohio, I saw a jeep with a jet engine race around, and a I watched a few big trucks crush a bunch of cars. I have lots of photos of Bigfoots in mid-flight right here. I like this one in particular:

trucks

Meanwhile, at the show in Houston, I saw a guy set himself on fire and jump from the ceiling of the Astrodome. I watched a 20-car demolition derby that went on for an hour.

In Ohio, at the end, a giant robot dinosaur came out bit an old jalopy in half.

In Houston, a guy jumped his car off a ramp and flew into a tower of old custom vans.

Here’s a few pics of the robot dinosaur for those that were too square to be there:
t0

t2

t1

In summary, I like Monster Trucks.

Test-driven-development can be labor intensive

I wanted to add a list of checkboxes to the “create employee” and “edit employee” pages. Each checkbox adds that employee to a group, and each group has certain privileges. Really standard stuff. I wrote the code in my controller and template in about 20 minutes

Then I played around with the new pages in my browser to check for obvious errors. I created some employees, monkeyed with their groups, then saved them, then opened up the edit screen, verified everything worked right, then monkeyed with them some more, and so forth. That probably took 5 minutes of clicking around lazilly.

In the past, that’s when I would have committed the code to the repository and moved on to something else. I played with it, it didn’t break, so we’re done. Total dev time: 30 minutes.

This time, I wrote a series of twill scripts to go through the different combinatorials of users and groups.

For example, if I have groups A and B, I would really test creating 4 different employees:

  • new user with no group membership
  • new user in group A
  • new user in group B
  • new user in groups A and B

After each creation, I verify that the page displays the data correctly and I then hit the database and make sure that everything is set correctly.

For the screen that allows editing employees, the most thorough possible test would take those four new employees and loop until each has been changed to every other possible configuration.

This took about another two hours by the time it was done. The next time I have to write code like this, it will be much faster because I figured out how to write code that yields tests iteratively. Using TDD, total dev time hit about 2.5 hours.

So, in this particular case, is it worth it?

Here’s the reasons why I would say that it was worth it:

I’m still learning how to write good tests. Writing thorough tests requires a different mindset, for me anyway. If I wait until I face some really gnarly complex code to write tests, I’m likely to write some crappy incomplete tests. Each test I write makes me faster at writing tests. Also, when I’m regularly writing tests, I write application code with testing in mind. I think more about design and protecting against what could go wrong, rather than just reaching the finish line any way I can.

And here’s the contrarian view:

Time is scarce and writing tests takes time. In a time-constrained environment, writing needless tests is as silly as blowing off real work to write blogs.

I’m not sure which voice in my head I will listen to on this one.

MVC Blasphemy

I just put HTML code into my data model. I have a list-of-objects page. Each object is an instance of an object defined in my data model, derived from a row in a database. Each object needs a pretty link drawn that object’s detailed-view page. So I added a property on my object:
class Message(SQLObject):
def _get_view(self):
"Draw a link to the view page for this message."
return cElementTree.XML("""VIEW""" % self.id)
# Lots of other stuff snipped out.

This is now what my kid template looks like:

MESSAGE STUFF

I pass in messages and columns; messages is a list of objects and columns is a tuple of strings that map to attributes or properties, like “view”.

I’m happy with this decision. I know I could have manipulated the messages or created some new classes in my controller, but I couldn’t really see any advantage. This way works.

I just don’t want anyone else doing this 🙂

Don’t put parentheses around your assert expressions and the error string!

I had a bunch of unit tests that were passing even though I knew they should be failing. I traced it down to the fact that I put parentheses around my assert statements because the tests were really really long and I wanted to put the error string on a separate line.

This is what I spent the last 45 minutes trying to figure out:

>>> assert 1 == 0, "OH NOES"
------------------------------------------------------------
Traceback (most recent call last):
File "", line 1, in
AssertionError: OH NOES

>>> assert (1 == 0,
... "OH NOES")

>>> assert (1 == 0, "OH NOES")

>>>

The assertion doesn’t raise because I suspect that the assert evaluates each element in the tuple separately, and the string returns True.

And these don’t work, but for different reasons:

>>> (assert 1 == 0, "OH NOES")
------------------------------------------------------------
File "", line 1
(assert 1 == 0, "OH NOES")
^
SyntaxError: invalid syntax

>>> assert 1 == 0,
------------------------------------------------------------
File "", line 1
assert 1 == 0,
^
SyntaxError: invalid syntax

Dangit.