Found a possible error in chapter 7 of the TurboGears book

I bought the TurboGears book about two weeks ago, and I have been working through it. I like the book in general, but I agree with the reviewers on Amazon that complain about the number of errors. I can’t think of another programming book that I’ve read with this many errors.

All of the errors I noticed are little glitchy typographical errors, rather than incorrect theory. The authors really do a good job of illustrating the MVC approach to web design, so I’m glad I bought it.

Anyway, this page lists mistakes found after publication, and the community of readers seems to be doing a good job of helping each other out.

I think I might have found another tiny error. This code appears at the bottom of page 109:

class ProjectFields(widgets.WidgetsList):
title = TextField(label="project", validator=validators.NotEmpty())
client_revenue = widgets.TextField(validator=validators.Number())
project_form = widgets.TableForm(fields=ProjectFields(), action="save_project_test")

I don’t see the point in using both TextField and widgets.TextField. But more importantly, I think the indentation is wrong in the last line. I don’t think project_form is supposed to be an attribute of the ProjectField class.

I think the code should look more like this:


class ProjectFields(widgets.WidgetsList):
title = widgets.TextField(label="project", validator=validators.NotEmpty())
client_revenue = widgets.TextField(validator=validators.Number())

# Moved outside the class.
project_form = widgets.TableForm(fields=ProjectFields(), action="save_project_test")

But maybe I’m missing something. I posted to the TurboGears Book mailing list, so hopefully I’ll find out.

becontrary.com is a neat site built with TurboGears

BeContrary.com is a very clever idea for a site. This debate on going dutch illustrates how the site works. And this is a good discussion of different styles of python templates.

The site’s author, Will McGugan, wrote up a blog post describing his experience with turbogears here. He says he chose TurboGears partially because he had already worked with CherryPy and really liked it. Will made this remark after talking about SQLObject:

Incidently, I don’t like the way that Python ORMs specify the database schema as Python code. Since the schema doesn’t change when the application is running, I would prefer a simple xml definition that would be used by the ORM to dynamically build classes for each table that could be derived from in the model code.

I like this idea, but instead of writing XML, I would prefer to write SQL and have python parse that to build classes.

Why write me a response back at all?

I wrote this email to Samsung technical support a few days ago:

SUBJECT: Need Hayes commands (AT commands) for phone

Hi —

I own a Samsung A707 phone with AT&T service.

I can make a serial port connection to my phone via bluetooth. However, it seems like my phone doesn’t understand most AT commands.

Is there a list anywhere with all the AT commands that this phone supports?

Thanks for the help.

Matt

And here’s the reply I got back:

Dear Matthew,

Thank you for your inquiry. We suggest searching the internet for Hayes AT commands.

Do you have more questions regarding your Samsung Mobile Phone? For 24 hour information and assistance, we offer a new FAQ/ARS System (Automated Response System) at http://www.samsungtelecom.com/support It’s like having your very own personal Samsung Technician at your fingertips.

Thank you for your continued interest in Samsung products.

Sincerely,
Technical Support
Renee

WOW. Is Renee a chatbot or did a real human being actually spend time writing this response? I especially like that they link to the tech support section of the website. This is where I sent the email in.

Fortunately for me, the people on the gnokii mailing list are helping me out.

A few different ways to store data with varying attributes

Got an email from a friend:

I want to create a database that acts just like the index card note cards one used to make for doing research papers in HS and Univ. I’ve got most of it down, but I am having trouble figuring out how to normalize the page numbers and the source data.
Let’s say one has three kinds of sources – books, magazines, and websites. Well, a book will have:

author(s)
title
place of publication
publishedr
copywrite date

a magazine:
author(s) – but only maybe – what does one do about The Economist?
title of article
title of magazine
date of publication

a website:
author(s) – again only maybe
title of website
URL

Here’s what I said in reply:

So, I think I get your question. Books, magazines, and websites are all different examples of sources that you might cite. They have some attributes in common and some attributes that are unique.

Going with the high school term paper example, let’s pretend that you wrote a paper and your bibliography looks like this:

  • (book) Tom Sawyer by Mark Twain. Published by Hustler, 1833, in NY.
  • (book) Huckleberry Finn by Mark Twain. Published by Hustler, 1834, in NY.
  • (magazine) “Indonesia sucks”, The Economist. No author listed. February 2001 issue. page 67.
  • (magazine) “No, the Economist Sucks”, Jakarta Post. Joe Brown is the author. Article appeared in the March 14, 2007 issue, on page 6D.

  • (website): “Indonesia” on WIkipedia, http://en.wikipedia.org/wiki/Indonesia. Lots and lots of authors. I used text found on the site as of June 1st, 2007.
  • (website) “blog #96 post”, http://anncoulter.com, Ann Coulter is the author, article posted on July 4th, 2007. I used text found on the site as of this date.

I can see at least three ways to set this up:

1. You can make a single table called sources that includes the union of all these different types. So, you would have a column called “publisher” and another column called URL. The book sources would have a blank URL field, and the website sources would have a blank publisher field. You could have a column called source type which would have values of “book”, “website”, “URL”, or anything else that fits.

PROs: simple!

CONs: It is tricky to bake in good data validation into your database. You can’t easily add rules to enforce that you get all the required data for each row. Also, every time you discover a new source type, you may need to modify the table and add even more columns.

2. You create a separate table for each source type. So, you have a books table, a magazines table, and then a websites table.

PROs: Now, you can easily make sure that all your books data has all the required data.

CONs: Accumulating all the results for one of your papers means you have to do a query against each table separately and then use some UNION keyword to add them together. Also, when you need to a new source type, you’ll need to add a new table to your schema.

3. Make a bunch of tables:

sources (source_id)

fields (field_id, field_name)

source_fields(source_id, field_id, field_value)

So, this entry:

(book) Tom Sawyer by Mark Twain. Published by Hustler, 1833, in NY.

Would get a single row in the sources table.

And the fields table would have these values:

(field_id, field_name)
1, source type
2, title
3, author
4, publisher
5, publish date
6, publish location

Then finally, we’d put the actual data in the source fields table:

(source_id, field_id, field_value)
1, 1, “book”
1, 2, “Tom Sawyer”
1, 3, “Mark Twain”
1, 4, “Hustler”

… you get the idea, hopefully.

Then, when you want to store a magazine, the first thing you do is add any new field types you need to the fields table, and then add your data

PROs: you can make up new attributes for your data any time you want, and never have to change your database. For example, if you need to start storing TV shows, you can just add the new types to the fields table and you’re good.

CONs: The field_value field needs to accept any kind of data. So, probably, you’ll want to make it a column type like a TEXT column that can hold arbitrarily large objects, and then before you store anything in your database you need to convert it to a text object. So, you’re not going to be able to index this data well and you’re not going to be able to require that the data matches some formats.

So, figuring which of these approaches is the correct one depends on the specifics of the scenario.

How well can you predict today all the future types of data? If you have perfect clairvoyance, or if you don’t mind monkeying with the database, approach #3 is pointless. I recommend approach #3 in a scenario when you have lots of users, and you don’t want each of them monkeying with the schema.

How worried are you about bad data getting entered in? You can always use triggers and stored procedures or some outer application code to add validation on any of these, but it won’t be easy. Using approach #2 will make validation the easiest.

How fast do queries need to be? If we want to know all the books written by Mark Twain, approach #2 will likely give the fastest response, followed by #1 and then #3.

By the way, while we’re on the subject of bibliographies, all my class notes on normal forms are here and I wrote up a blog on compound foreign key constraints a while back here.

Matt

PS: I’m using this email as my next blog entry. And here it is 🙂

I submitted a topic for CodeMash 2008

I attended last year’s CodeMash conference and learned a lot. This year, I submitted a topic. Here’s my description:

Bottom-up programming with IPython

IPython is a vastly improved Python interactive interpreter.

You can start up the interpreter, load in a module, execute a function from that module, then hop from your interpreter into your favorite editor to redefine that function, then go back to the interpeter and run the automatically reloaded function.

If your new code raises an uncaught exception, you can start a python-debugger session right there and inspect the traceback.

With all these tools integrated, you can attack problems by building separate components and then put them together in the interpreter and watch what happens. If you find a bug, you have immediate access to the traceback, so you don’t need to go back into your code and logging or print statements.

Fans of iterative development and using introspection will likely enjoy this talk.

We’ll see what happens.

Lua metatable examples

Lua has exactly one data structure — tables. And it uses those to implement everything else.

This is how inheritance works in lua:

t1 = {a = 3} -- t1 is a table with one name-value pair.
t2 = {} -- t2 is an empty table.
setmetatable(t1, t2) -- t2 is t1's metatable.
t3 = {c = 5} -- t3 is just another table like t1.

t2.__index = t3 -- when a lookup fails in t1, t2 will look for a value
-- in t3.

print(t1.a)
print(t1.b)
print(t1.c)

And the output is here:
$ lua lua_fun.lua
3
nil
5

This page explains with more detail.

When I first read this stuff, I wondered why I I couldn’t just make the metatable t2 be the place where t1 goes when a lookup fails, rather than require t3 to hold the defaults. Then I realized that __index doesn’t necessarilly need to point to another table. It could also hold a function, like this:

-- Now, we'll change the way t2 handles failed lookups in t1 so that it always returns the key that was asked for.
t2.__index = function (t, k)
return k
end

print(t1.a)
print(t1.b)
print(t1.c)

And now we get:
3
b
c

It is actually possible to make t2 be the metatable:
-- What happens with this?
t2.__index = t2
t2.d = 6

print(t1.a)
print(t1.b)
print(t1.c)
print(t1.d)

The results:
3
nil
nil
6

In conclusion, lua is neat.

The python logging module is much better than print statements

A while back, I swore off using adding print statements to my code while debugging. I forced myself to use the python debugger to see values inside my code. I’m really glad I did it. Now I’m comfortable with all those cute single-letter commands that remind me of gdb. The pdb module and the command-line pdb.py script are both good friends now.

However, every once in a while, I find myself lapsing back into cramming a bunch of print statements into my code because they’re just so easy. Sometimes I don’t want to walk through my code using breakpoints. I just need to know a simple value when the script runs.

The bad thing is when I write in a bunch of print statements, then debug the problem, then comment out or remove all those print statements, then run into a slightly different bug later., and find myself adding in all those print statements again.

So I’m forcing myself to use logging in every script I do, no matter how trivial it is, so I can get comfortable with the python standard library logging module. So far, I’m really happy with it.

I’ll start with a script that uses print statements and revise it a few times and show off how logging is a better solution. Here is the original script, where I use print statements to watch what happens:

# This is a.py

def g():
1 / 0

def f():
print "inside f!"
try:
g()
except Exception, ex:
print "Something awful happened!"
print "Finishing f!"

if __name__ == "__main__": f()

Running the script yields this output:

$ python a.py
inside f!
Something awful happened!
Finishing f!

It turns out that rewriting that script to use logging instead just ain’t that hard:

# This is b.py.

import logging

# Log everything, and send it to stderr.
logging.basicConfig(level=logging.DEBUG)

def g():
1/0

def f():
logging.debug("Inside f!")
try:
g()
except Exception, ex:
logging.exception("Something awful happened!")
logging.debug("Finishing f!")

if __name__ == "__main__":
f()

And here is the output:

$ python b.py
DEBUG 2007-09-18 23:30:19,912 debug 1327 Inside f!
ERROR 2007-09-18 23:30:19,913 error 1294 Something awful happened!
Traceback (most recent call last):
File "b.py", line 22, in f
g()
File "b.py", line 14, in g
1/0
ZeroDivisionError: integer division or modulo by zero
DEBUG 2007-09-18 23:30:19,915 debug 1327 Finishing f!

Note how we got that pretty view of the traceback when we used the exception method. Doing that with prints wouldn’t be very much fun.

So, at the cost of a few extra lines, we got something pretty close to print statements, which also gives us better views of tracebacks.

But that’s really just the tip of the iceberg. This is the same script written again, but I’m defining a custom logger object, and I’m using a more detailed format:

# This is c.py
import logging

# Make a global logging object.
x = logging.getLogger("logfun")
x.setLevel(logging.DEBUG)
h = logging.StreamHandler()
f = logging.Formatter("%(levelname)s %(asctime)s %(funcName)s %(lineno)d %(message)s")
h.setFormatter(f)
x.addHandler(h)

def g():

1/0

def f():

logfun = logging.getLogger("logfun")

logfun.debug("Inside f!")

try:

g()

except Exception, ex:

logfun.exception("Something awful happened!")

logfun.debug("Finishing f!")

if __name__ == "__main__":
f()

And the output:
$ python c.py
DEBUG 2007-09-18 23:32:27,157 f 23 Inside f!
ERROR 2007-09-18 23:32:27,158 exception 1021 Something awful happened!
Traceback (most recent call last):
File "c.py", line 27, in f
g()
File "c.py", line 17, in g
1/0
ZeroDivisionError: integer division or modulo by zero
DEBUG 2007-09-18 23:32:27,159 f 33 Finishing f!

Now I will change how the script handles the different types of log messages. Debug messages will go to a text file, and error messages will be emailed to me so that I am forced to pay attention to them.

# This is d.py
import logging, logging.handlers

# Make a global logging object.
x = logging.getLogger("logfun")
x.setLevel(logging.DEBUG)

# This handler writes everything to a file.
h1 = logging.FileHandler("/var/log/myapp.log")
f = logging.Formatter("%(levelname)s %(asctime)s %(funcName)s %(lineno)d %(message)s")
h1.setFormatter(f)
h1.setLevel(logging.DEBUG)
x.addHandler(h1)

# This handler emails me anything that is an error or worse.
h2 = logging.handlers.SMTPHandler('localhost', '[email protected]', ['[email protected]'], 'ERROR log')
h2.setLevel(logging.ERROR)
h2.setFormatter(f)
x.addHandler(h2)

def g():

1/0

def f():

logfun = logging.getLogger("logfun")

logfun.debug("Inside f!")

try:

g()

except Exception, ex:

logfun.exception("Something awful happened!")

logfun.debug("Finishing f!")

if __name__ == "__main__":
f()

Lots of really great handlers exist in the logging.handlers module. You can log by sending HTTP gets or posts, you can send UDP packets, you can write to a local file, etc.

Finally, I’d like to point out that Mike Pirnat has some excellent slides (in PDF format) on logging using the standard python library here.

How to use vimdiff as the subversion diff tool

vimdiff is fantastic. Follow these instructions to make subversion use vimdiff when you run svn diff.

Get this diffwrap.sh script and save it anywhere. I saved mine in my $HOME/bin directory. Make sure to make it executable! I’m showing it below:

#!/bin/sh

# Configure your favorite diff program here.
DIFF="/usr/bin/vimdiff"

# Subversion provides the paths we need as the sixth and seventh
# parameters.
LEFT=${6}
RIGHT=${7}

# Call the diff command (change the following line to make sense for
# your merge program).
$DIFF $LEFT $RIGHT

# Return an errorcode of 0 if no differences were detected, 1 if some were.
# Any other errorcode will be treated as fatal.

Then change your $HOME/.subversion/config file to point at that script:

[helpers]
diff-cmd = /home/matt/bin/diffwrap.sh

Then go diff a file!

See this section of the svn book for all the details.

When to use globals

I am dogmatic about never using global variables. But when people way smarter than me use them, like Brian Kernighan does when he builds a command-line interpreter in that chapter of The Unix Programming Environment, I wonder if maybe I’m being too reluctant.

I was looking at a python module vaguely like this:

def foo():
do_foo_stuff()

def bar():
do_bar_stuff()

def baz():
do_baz_stuff()

def quux():
do_quux_stuff()

I had a bunch of independent functions. I wanted to add logging. I saw two easy ways to do it:

def foo():
logger = get_logging_singleton()
logger.log_stuff()
do_foo_stuff()

def bar():
logger = get_logging_singleton()
logger.log_stuff()
do_bar_stuff()

def baz():
logger = get_logging_singleton()
logger.log_stuff()
do_baz_stuff()

def quux():
logger = get_logging_singleton()
logger.log_stuff()
do_quux_stuff()

In the above code, I would get a reference to my logger object in each function call. No globals. Maybe I am violating some tenet of dependency injection, but I’ll talk about that later. Anyhow, the point I want to make is that the above approach is the way I would do it in the past.

Here’s how I decided to write it this time:

logger = get_logging_singleton()

def foo():
logger.log_stuff()
do_foo_stuff()

def bar():
logger.log_stuff()
do_bar_stuff()

def baz():
logger.log_stuff()
do_baz_stuff()

def quux():
logger.log_stuff()
do_quux_stuff()

All the functions access the logger created in the main namespace of the module. It feels a tiny bit wrong, but I think it is the right thing to do. The other way violates DRY in a big fat way.

So, a third option would be to require the caller to pass in the logging object in every function call, like this:

def quux(logger):
logger.log_stuff()
do_quux_stuff()

This seems like the best possible outcome — it satisfies my hangup about avoiding global variables and the caller can make decisions about log levels by passing any particular logger it wants to.

There’s two reasons why I didn’t take this approach:

  1. I was working on existing code, and I didn’t have the option of cramming in extra parameters in the calling library. So, I could do something like def quux(logger=globally_defined_logger) but I’m trying to make this prettier, not uglier. The whole reason that I wanted to add logging was that I wanted some visibility into what what the heck was going wrong in my app. I didn’t have time to monkey with overhauling the whole system.
  2. I plan to control my logger behavior from an external configuration system. I don’t want to change code inside the caller every time I want to bump the log level up or down. It is the conventional wisdom in my work environment that I face less risk just tweaking a configuration file setting and restarting my app rather than editing my code*.

[*]I suspect that in the final analysis, this belief will be exposed as garbage. But for right now, it seems pretty true that bugs occur more frequently after editing code than after editing config files.

UPDATE: Apparently, I’m not just talking to myself here! Gary Bernhardt linked to this post and added some really interesting points. Also, his link to the post on the origin of the phrase now you have two problems was something I hadn’t heard of before.

Dependency Injection Demystified

I’m building a robot to make breakfast.

def make_breakfast():
    fridge = get_reference_to_fridge()
    eggs = fridge.get_eggs(number_of_eggs=2)
    fried_eggs = fry(eggs, over_easy)
    cabinet = get_reference_to_cabinet()
    plate = cabinet.get_plate()
    add(fried_eggs, plate)

    return plate

This is OK, but I realize my robot needs lots and lots of practice, and I don’t like wasting all my nice eggs and getting all my plates dirty. So, while my robot is still learning how to cook, I want to specify that it uses paper plates and some crappy expired eggs I fished out of the grocery store dumpster.

Continue reading