python contextmanagers, subprocesses, and an ascii spinner

I have a command line script that takes too long to run. I profiled the script and the problem is in some fairly complex third-party code. So instead of fixing that stuff, I’m going to distract users with a pretty ascii spinner.

Run this script to see the spinner in action (you’ll have to kill the process afterward):

$ cat spin_forever.py
# vim: set expandtab ts=4 sw=4 filetype=python:

import sys, time

def draw_ascii_spinner(delay=0.2):
for char in '/-\\\\|': # there should be a backslash in here.
sys.stdout.write(char)
sys.stdout.flush()
time.sleep(delay)
sys.stdout.write('\\\\r') # this should be backslash r.

while "forever":
draw_ascii_spinner()

Next I made a context manager (this is the name of the things you use with the new python “with” keyword) that uses the subprocess module to fork off another process to run the spinner:

$ cat distraction.py
# vim: set expandtab ts=4 sw=4 filetype=python:

import contextlib, subprocess, time

@contextlib.contextmanager
def spinning_distraction():
p = subprocess.Popen(['python', 'spin_forever.py'])
yield
p.terminate()

Finally I added my new context manager to the code that takes so long to run:

def main():

with spinning_distraction():
time.sleep(3) # pretend this is real work here.

if __name__ == '__main__':
main()

If you’re following along at home, you can run python distraction.py and you’ll see the spinner go until the main application finishes sleeping.

The main thing I don’t like is how I’m using an external file named spin_forever.py. I don’t want to worry about the path to where spin_forever.py lives. I would prefer to use some function defined in a module that I can import.

Is this pylint error message valid or bogus?

The vast majority of pythoners use None, but I prefer descriptive strings for default values, like this:
from datetime import datetime, timedelta

def midnight_next_day(initial_time="use today's date"):

if initial_time == "use today's date":
initial_time = datetime.now()

return initial_time.date() + timedelta(days=1)

Pylint doesn’t like it though:

$ pylint -e f.py
No config file found, using default configuration
************* Module f
E: 10:midnight_next_day: Instance of 'str' has no 'date' member (but some types could not be inferred)

I changed from using a string as the default to None, and then pylint
didn’t mind:

$ cat f.py
from datetime import datetime
def midnight_next_day(initial_time=None):

if initial_time is None:
initial_time = datetime.now()

return initial_time.date() + timedelta(days=1)

$ pylint -e f.py
No config file found, using default configuration

Checking where a variable is None is probably much faster than doing string comparison, but I find the string more informative.

Looking for comments. I already posted this to comp.lang.python here and got some good feedback.

Which code layout do you like more?

I can lay out the same function in two ways. Which one is better?

One return statement and a mutable temporary variable


def f(a, b):
temp = g(a)
if b:
temp = h(temp)
return temp

Two return statements and everything uses single assignment


def f(a, b):
if b:
return h(g(a)
else:
return g(a)

I prefer the second approach. Ever since I learned how prolog requires single assignment, I avoid updating variables. I’m not adamant about it, but in general, I try not to. It’s just a style thing though. I’ve met a lot of programmers (older ones, usually) that like only having one return statement in a function. I don’t know why they like that, but I’ve heard it from numerous people.

Some notes on nosetests and coverage

I use Nose to run my automated tests. It can report on code coverage like this:

$ nosetests --quiet --with-coverage --cover-package pitz
Name Stmts Exec Cover Missing
------------------------------------------------------------
pitz 29 29 100%
pitz.bag 108 107 99% 150
pitz.cmdline 50 12 24% 23-54, 62-79, 92-93, 96, 109-114, 119-130
pitz.entity 105 105 100%
pitz.exceptions 1 1 100%
pitz.junkyard 0 0 100%
pitz.junkyard.ditzloader 22 15 68% 31-37, 45-47
pitz.pitzfiles 0 0 100%
pitz.project 52 52 100%
pitz.projecttypes 0 0 100%
pitz.projecttypes.agilepitz 55 54 98% 66
pitz.projecttypes.simplepitz 66 61 92% 84-90
------------------------------------------------------------
TOTAL 488 436 89%
----------------------------------------------------------------------
Ran 69 tests in 6.350s

OK (SKIP=6)

Most of the uncovered code is in the cmdline module, which does a lot of work with the filesystem and starting up IPython, and I’m having trouble writing tests there. You could help, you know 🙂.

I’m keenly aware that running with coverage makes tests much slower. Normally, my pitz tests run in about half a second:


$ nosetests --quiet
----------------------------------------------------------------------
Ran 69 tests in 0.504s

OK (SKIP=6)

Fortunately, I don”t have to rerun all your tests to see what lines are uncovered. I can query the .coverage file created by nose afterward to get details. It is really easy to get coverage for just one module:

$ coverage -r -m pitz/cmdline.py
Name Stmts Exec Cover Missing
--------------------------------------------
pitz/cmdline 50 12 24% 23-54, 62-79, 92-93, 96, 109-114, 119-130

And getting coverage on multiple modules is straightforward, but kind of tedious:

$ coverage -r -m pitz/cmdline.py pitz/__init__.py pitz/entity.py
Name Stmts Exec Cover Missing
---------------------------------------------
pitz/__init__ 29 29 100%
pitz/cmdline 50 12 24% 23-54, 62-79, 92-93, 96, 109-114, 119-130
pitz/entity 105 105 100%
---------------------------------------------
TOTAL 184 146 79%

I don’t know of an elegant solution to do what nosetests does, where it shows me all coverage for a package. Running coverage -r without any module lists dumps out everything, which is never what I want:

$ coverage -r
Name Stmts Exec Cover
--------------------------------------------------------------------------------------------------------------------------------------
/home/matt/checkouts/myfiles/bin/__init__ 0 0 100%
/home/matt/virtualenvs/pitz/lib/python2.6/site-packages/Jinja2-2.1.1-py2.6-linux-i686.egg/jinja2/__init__ 12 11 91%
/home/matt/virtualenvs/pitz/lib/python2.6/site-packages/Jinja2-2.1.1-py2.6-linux-i686.egg/jinja2/bccache 111 36 32%
Traceback (most recent call last):
KeyboardInterrupt

A while back on twitter I posted that I wanted my editor to read my .coverage file and highlight the lines that aren’t covered. I don’t know anything about how to control syntax highlighting, but I run this little deal from within vim to get the same facts:

:! coverage -r -m %

My edit buffer disappears and I see this instead:

Name Stmts Exec Cover Missing
---------------------------------------
cmdline 50 12 24% 23-54, 62-79, 92-93, 96, 109-114, 119-130

Press ENTER or type command to continue

In vim, :! is how you run a command in a shell. Incidentally, it is also possible to pull the results from the command into vim and write data from vim to the command, but that’s not what I want right now.

Vim replaces the % symbol with the file I’m currently editing. Of course, this command only works when I start vim in the same directory as the .coverage file. If I ain’t in that directory, then I have to specify how to find the .coverage file by setting the environmental variable COVERAGE_FILE like this:

:! COVERAGE_FILE=../.coverage coverage -r -m %

Setting it that way means it doesn’t last beyond that one shell. If I want vim to remember the setting, I could set COVERAGE_FILE when I start vim like this:

$ COVERAGE_FILE=../.coverage vi cmdline.py

Or I could export it like this:

$ export COVERAGE_FILE=../.coverage
$ vi cmdline.py

In summary, coverage is a neat tool, but it is silly to think that 100% test coverage guarantees anything. Coverage won’t warn you when your calculator returns 3 for 1 + 1.

Need help with data files and setup.py

I’m working on a package that includes some files that are meant to be copied and edited by people using the package.

My project is named “pitz” and it is a bugtracker. Instead of using a config file to set the options for a project, I want to use python files.

When somebody installs pitz, I want to save some .py files somewhere so that when they run my pitz-setup script, I can go find those .py files and copy them into their working directory.

I have two questions:

  1. Do I need to write my setup.py file to specify that the .py files in particular directory need to be treated like data, not code? For example, I don’t want the installer to hide those files inside an egg.
  2. How can I find those .py files later and copy them?

Here’s my setup.py so far:

from setuptools import setup, find_packages
version = '0.1'
setup(name='pitz',
version=version,
description="Python to-do tracker inspired by ditz (ditz.rubyforge.org)",

long_description="""\
ditz (http://ditz.rubyforge.org) is the best distributed ticketing
system that I know of. There's a few things I want to change, so I
started pitz.""",

classifiers=[],
keywords='ditz',
author='Matt Wilson',
author_email='[email protected]',
url='http://tplus1.com',
license='',
packages=find_packages(exclude=['ez_setup', 'examples', 'tests']),

include_package_data=True,
package_dir={'pitz':'pitz'},

data_files=[('share/pitz',
[
'pitz/pitztypes/agilepitz.py.sample',
'pitz/pitztypes/tracpitz.py.sample',
])],

zip_safe=False,
install_requires=[
# 'PyYAML',
# 'sphinx',
# 'nose',
# 'jinja2',
# -*- Extra requirements: -*-
],

# I know about the much fancier entry points, but I prefer this
# solution. Why does everything have to be zany?
scripts = ['scripts/pitz-shell'],

test_suite = 'nose.collector',
)

When I run python setup.py install, I do get those .sample files copied, but they get copied into a folder way inside of my pitz install:

$ cd ~/virtualenvs/scratch/lib/
$ find -type f -name '*.sample'
./python2.6/site-packages/pitz-0.1dev-py2.6.egg/share/pitz/tracpitz.py.sample
./python2.6/site-packages/pitz-0.1dev-py2.6.egg/share/pitz/agilepitz.py.sample

I don’t know how I can write a script to copy those tracpitz.py.sample files out. Maybe I can ask pitz what its version is, and then build a tring and use os.path.join, but that doesn’t look like any fun at all.

So, what should I do instead?

I submitted my proposal for PyOhio 2009

Here it is: Clever uses for metaclasses

SUMMARY

This talk introduces metaclasses and attempts to “defang” them, showing what they are good for, and when they are silly.

Metaclasses can reduce redundancy in code but can be very confusing, so in this talk, I will walk through several examples of how to use metaclasses to solve problems.

I plan to cover (at least) these examples:

  • Add extra methods and attributes to a class after its definition.
  • Verify a class correctly implements a specification.
  • Give a subclass its own class-variables rather than sharing them with the parent class.
  • Implement the basics of an object-relational mapper (ORM).

I published an article in the November 2008 issue of Python Magazine with the same title (Clever Uses for Metaclasses) and I’ll use some code from that article.

I want to give this talk in a friendly, informal manner, so that people that feel intimidated by metaclasses realize that there’s nothing to be scared of.

EXPERTISE LEVEL

This talk is aimed at the intermediate-level programmer, already familiar with object-oriented concepts and is really comfortable with Python.

I do not expect people in the audience to know ANYTHING about metaclasses before this talk.

AREAS OF PYTHON

  • object-oriented programming
  • metaprogramming
  • dynamic language tomfoolery
  • introspection

How to see the to-do list for pitz

pitz is (among other things) a to-do list tracker like trac or bugzilla or version one.

I’m storing the list of stuff to do for pitz in the pitz source code. Here’s how to see the unfinished stuff in pitz.

Get a copy of the code


$ git clone git://github.com/mw44118/pitz.git
Initialized empty Git repository in /home/matt/pitz/.git/
remote: Counting objects: 621, done.
remote: Compressing objects: 100% (604/604), done.
remote: Total 621 (delta 383), reused 0 (delta 0)
Receiving objects: 100% (621/621), 98.52 KiB | 135 KiB/s, done.
Resolving deltas: 100% (383/383), done.

Install it, probably in a virtualenv


$ source ~/virtualenvs/pitz/bin/activate
$ python setup.py develop

Fire up pitz-shell

I have written one command-line tool so far: pitz-shell. Use it to start a python interpreter loaded with any pitz project. Here’s how to start a session for pitz itself:

$ pitz-shell pitz/pitzfiles/project-99c58812-5c1c-4fec-874c-c998933ba88b.yaml
/home/matt/virtualenvs/pitz/lib/python2.6/site-packages/ipython-0.9.1-py2.6.egg/IPython/Magic.py:38: DeprecationWarning: the sets module is deprecated
from sets import Set

pitz-shell imports a bunch of classes and makes an object named p (p stands for project). p has all the information about the project described in the yaml file passed in as an argument to pitz-shell. The __repr__ method on p gives some summarized data:

In [1]: p
Out[1]:

p.todo is a property that just returns a bag of unfinished tasks for the project:

In [4]: p.todo

Out[4]:
You can print any bag to see all the contents of the bag, and p.todo is no different:

In [5]: print(p.todo)
===========
Stuff to do
===========

(23 task entities)
------------------

0: Add support for something like 'ditz grep' (unknown status)
1: Update entities by loading a CSV file (unknown status)
2: Figure out why some tasks are not converting pointers to objects (unknown status)
3: Support intersection, union, and other set operations on bags (unknown status)
4: Demonstrate really simple tasks and priorities workflow (unknown status)
5: Support a .pitz config file with all pitz scripts (unknown status)
6: Add a todo property on project (or maybe bag) (unknown status)
7: Write code to use strings as keys (unknown status)
8: Prompt to save work at the end of an interactive pitz session (unknown status)
9: Make it possible to support a filter like attribute!=value (unknown status)
10: Write code to support sorting by anything (unknown status)
11: Support hooks (unknown status)
12: Write an attributes property on a bag that lists count of each attribute in any entities (unknown status)
13: Allow two bags to be compared for equality by using their entities (unknown status)
14: Make it easy to list each employee's tasks (unknown status)
15: Support a $PITZDIR env var to tell where yaml files live (unknown status)
16: Demonstrate release -< iteration -< user story -< task workflow. (unknown status) 17: Load new entities from a CSV file (unknown status) 18: Support grep on entities (unknown status) 19: write data to yaml in order (unknown status) 20: Support entity subclasses like releases, iterations, user stories, and tasks (unknown status) 21: A bag should dump to a single CSV file (unknown status) 22: Support using substring of name as name (unknown status)

That's how you see the to-do list for pitz!

In a future post, I'll show how to make new tasks and how to update tasks.

I also need to explain how Pitz lets you come up with whatever wacky workflow you want. When you set up a pitz project, you can use the classes I came up with, or subclass Entity into your own weird types. In a future post, I'll show I'm using pitz to model an agile development system using releases, iterations, checkpoints, user stories, tasks, and people.

pitz data model outline

I just finished a whole bunch of documentation on the pitz data model. You can read it here or just read all the stuff I copied below:

There are two classes in pitz: entities and bags. Everything else are subclasses of these two.

Entities

Making them

Every entity is an object like a dictionary. You can make an entity like this:


>>> from pitz import Entity
>>> e = Entity(title="example entity",
... creator="Matt",
... importance="not very")

You can also load an entity from a yaml file, but I’ll explain that later.

You can look up a value for any attribute like this:


>>> e['title']
'example entity'
>>> e.keys() #doctest: +NORMALIZE_WHITESPACE
['name', 'creator', 'importance', 'title', 'modified_time',
'created_time', 'type']
>>> e['type']
'entity'

Viewing them

Entities have a summarized view useful when you want to see a list of entities, and a detailed view that shows all the boring detail:

>>> e.summarized_view
'example entity (entity)'

>>> print(e.detailed_view) #doctest: +SKIP
example entity (entity)
-----------------------

name:
entity-bdd31951-cff0-42a5-92b4-97ef966a6f6f

creator:
Matt

importance:
not very

title:
example entity

modified_time:
2009-04-04 07:47:09.456068

created_time:
2009-04-04 07:47:09.456068

type:
entity

Notice how our entity has some attributes we never set, like name, type, created_time, and modified_time. I make these in the __init__ method of the entity class.

By the way, you can ignore the #doctest: +SKIP comment. That is there so the doctests will skip trying to running this example, which will generate unpredictable values.

Saving and loading them

Entities have an instance method named to_yaml_file and a from_yaml_file classmethod. Here’s how to use them:

>>> outfile = e.to_yaml_file('.') # Writes file to this directory.
>>> e2 = Entity.from_yaml_file(outfile)

Bags

Making them

While entities are based on dictionaries, bags are based on lists. You can give a bag instance a title, which is nice for remembering what it is you want it for. Bags make it easy to organize a bunch of entities.

>>> from pitz import Bag
>>> b = Bag(title="Stuff that is not very important")
>>> b.append(e)

Viewing them

Converting a bag to a string prints the summarized view of all the entities inside:

>>> print(b) #doctest: +SKIP
================================
Stuff that is not very important
================================

1 entity entities
-----------------

0: example entity (entity)

That number 0 can be used to pull out the entity at that position, just like a regular boring old list:

>>> e == b[0]
True

Querying them

Bags have a matches_dict method that accepts a bunch of key-value pairs and then returns a new bag that contains all the entities in the first bag that match all those key-value pairs.

First, I’ll make a few more entities:

>>> e1 = Entity(title="example #1", creator="Matt",
... importance="Really important")
>>> e2 = Entity(title="example #2", creator="Matt",
... importance="not very")

Now I’ll make a new bag that has both of these new entities:

>>> b = Bag('Everything')
>>> b.append(e1)
>>> b.append(e2)
>>> print(b)
==========
Everything
==========

2 entity entities
-----------------

0: example #1 (entity)
1: example #2 (entity)


Here is how to get a new bag with just the entities that have an importance attribute set to “not very”:

>>> not_very_important = b.matches_dict(importance="not very")
>>> len(not_very_important) == 1
True
>>> not_very_important[0] == e2
True

Since matches_dict is the most common method I call on a bag, I made the __call__ method on the Bag class run matches_dict. So that means this works just as well:

>>> not_very_important = b(importance="not very")

Saving and loading them

Bags can send all contained entities to yaml files with to_yaml_files, and bags can load a bunch of entities from yaml files with from_yaml_files.

Right now, there is no way for a bag to save itself to yaml.

The Special Project Bag

After I finished bags and entities, I thought I was done, but then I ran into a few frustrations:

  • When I made a bunch of entities, but didn’t append them all into one bag, then I couldn’t run filters across all of them.
  • At the end of a session, it wasn’t easy for me to make sure that all of the entities got saved out to yaml.
  • I couldn’t figure out an elegant way to store one entity as a value for another entity’s attribute.

So I made a “special” Bag subclass called Project. The idea here is that every entity should be a member of the project bag. Also, every entity should have a reference back to the project.

Using a project is easy. Just pass it in as the first argument when you make an entity. Imagine I want to link some tasks to Matt and some other tasks to Lindsey. First I make a project:

>>> from pitz import Project
>>> weekend_chores = Project(title="Weekend chores")

Now I make the rest of the entities:

>>> matt = Entity(weekend_chores, title="Matt")
>>> lindsey = Entity(weekend_chores, title="Lindsey")
>>> t1 = Entity(weekend_chores, title="Mow the yard", assigned_to=matt)
>>> t2 = Entity(weekend_chores, title="Buy some groceries",
... assigned_to=lindsey)

Now it is easy to get tasks for matt:

>>> chores_for_matt = weekend_chores(assigned_to=matt)
>>> mow_the_yard = chores_for_matt[0]
>>> mow_the_yard['assigned_to'] == matt
True

Pointers

There’s a problem in that last example: when I send this mow_the_yard entity out to a YAML file, what will I store as the value for the “assigned_to” attribute?

In SQL, this is what foreign keys are good for. In my chores table, I would store a reference to a particular row in the people table.

I wanted the same functionality in pitz, so I came up with pointers. This is dry stuff, so here’s an example:

>>> class Chore(Entity):
... pointers = dict(assigned_to='person')
...
>>> class Person(Entity):
... pass
>>> matt = Person(weekend_chores, title="Matt")
>>> lindsey = Person(weekend_chores, title="Lindsey")
>>> ch1 = Chore(weekend_chores, title="Mow the yard", assigned_to=matt)
>>> ch2 = Chore(weekend_chores, title="Buy some groceries",
... assigned_to=lindsey)

Not much is different, but instead of matt, lindsey, and the various chores all being entities, they’re now subclasses. But here’s the advantage of defining pointers on Chore:

>>> ch1['assigned_to'] >>> matt['name'] # doctest: +SKIP
'person-530ad3cc-14f1-491a-bdb6-ed1dd65afe46'
>>> ch1.replace_objects_with_pointers()
>>> ch1['assigned_to'] # doctest: +SKIP
'person-530ad3cc-14f1-491a-bdb6-ed1dd65afe46'

First of all, notice how I printed out the name attribute on matt.

After running the replace_objects_with_pointers method, I don’t have a reference to the matt object. Instead, I have matt’s name now.

Now I can send this data out to a yaml file. And when I load it back in from yaml, I can then reverse this action, and go look up an entity with the same name:

>>> mn = matt.name
>>> matt == weekend_chores.by_name(mn)
True

In practice, I convert all the entities to pointers, then write out the yaml files, then convert all the pointers back into objects automatically.

That’s the end of the data model documentation. I hope that shines enough light so that it is obvious if pitz would be useful to you or not.

I’m working on a separate article where I show some real-world workflows modeled in pitz, but that will be next week’s post.