Need help with data files and setup.py

I’m working on a package that includes some files that are meant to be copied and edited by people using the package.

My project is named “pitz” and it is a bugtracker. Instead of using a config file to set the options for a project, I want to use python files.

When somebody installs pitz, I want to save some .py files somewhere so that when they run my pitz-setup script, I can go find those .py files and copy them into their working directory.

I have two questions:

  1. Do I need to write my setup.py file to specify that the .py files in particular directory need to be treated like data, not code? For example, I don’t want the installer to hide those files inside an egg.
  2. How can I find those .py files later and copy them?

Here’s my setup.py so far:

from setuptools import setup, find_packages
version = '0.1'
setup(name='pitz',
version=version,
description="Python to-do tracker inspired by ditz (ditz.rubyforge.org)",

long_description="""\
ditz (http://ditz.rubyforge.org) is the best distributed ticketing
system that I know of. There's a few things I want to change, so I
started pitz.""",

classifiers=[],
keywords='ditz',
author='Matt Wilson',
author_email='[email protected]',
url='http://tplus1.com',
license='',
packages=find_packages(exclude=['ez_setup', 'examples', 'tests']),

include_package_data=True,
package_dir={'pitz':'pitz'},

data_files=[('share/pitz',
[
'pitz/pitztypes/agilepitz.py.sample',
'pitz/pitztypes/tracpitz.py.sample',
])],

zip_safe=False,
install_requires=[
# 'PyYAML',
# 'sphinx',
# 'nose',
# 'jinja2',
# -*- Extra requirements: -*-
],

# I know about the much fancier entry points, but I prefer this
# solution. Why does everything have to be zany?
scripts = ['scripts/pitz-shell'],

test_suite = 'nose.collector',
)

When I run python setup.py install, I do get those .sample files copied, but they get copied into a folder way inside of my pitz install:

$ cd ~/virtualenvs/scratch/lib/
$ find -type f -name '*.sample'
./python2.6/site-packages/pitz-0.1dev-py2.6.egg/share/pitz/tracpitz.py.sample
./python2.6/site-packages/pitz-0.1dev-py2.6.egg/share/pitz/agilepitz.py.sample

I don’t know how I can write a script to copy those tracpitz.py.sample files out. Maybe I can ask pitz what its version is, and then build a tring and use os.path.join, but that doesn’t look like any fun at all.

So, what should I do instead?

I submitted my proposal for PyOhio 2009

Here it is: Clever uses for metaclasses

SUMMARY

This talk introduces metaclasses and attempts to “defang” them, showing what they are good for, and when they are silly.

Metaclasses can reduce redundancy in code but can be very confusing, so in this talk, I will walk through several examples of how to use metaclasses to solve problems.

I plan to cover (at least) these examples:

  • Add extra methods and attributes to a class after its definition.
  • Verify a class correctly implements a specification.
  • Give a subclass its own class-variables rather than sharing them with the parent class.
  • Implement the basics of an object-relational mapper (ORM).

I published an article in the November 2008 issue of Python Magazine with the same title (Clever Uses for Metaclasses) and I’ll use some code from that article.

I want to give this talk in a friendly, informal manner, so that people that feel intimidated by metaclasses realize that there’s nothing to be scared of.

EXPERTISE LEVEL

This talk is aimed at the intermediate-level programmer, already familiar with object-oriented concepts and is really comfortable with Python.

I do not expect people in the audience to know ANYTHING about metaclasses before this talk.

AREAS OF PYTHON

  • object-oriented programming
  • metaprogramming
  • dynamic language tomfoolery
  • introspection

How to see the to-do list for pitz

pitz is (among other things) a to-do list tracker like trac or bugzilla or version one.

I’m storing the list of stuff to do for pitz in the pitz source code. Here’s how to see the unfinished stuff in pitz.

Get a copy of the code


$ git clone git://github.com/mw44118/pitz.git
Initialized empty Git repository in /home/matt/pitz/.git/
remote: Counting objects: 621, done.
remote: Compressing objects: 100% (604/604), done.
remote: Total 621 (delta 383), reused 0 (delta 0)
Receiving objects: 100% (621/621), 98.52 KiB | 135 KiB/s, done.
Resolving deltas: 100% (383/383), done.

Install it, probably in a virtualenv


$ source ~/virtualenvs/pitz/bin/activate
$ python setup.py develop

Fire up pitz-shell

I have written one command-line tool so far: pitz-shell. Use it to start a python interpreter loaded with any pitz project. Here’s how to start a session for pitz itself:

$ pitz-shell pitz/pitzfiles/project-99c58812-5c1c-4fec-874c-c998933ba88b.yaml
/home/matt/virtualenvs/pitz/lib/python2.6/site-packages/ipython-0.9.1-py2.6.egg/IPython/Magic.py:38: DeprecationWarning: the sets module is deprecated
from sets import Set

pitz-shell imports a bunch of classes and makes an object named p (p stands for project). p has all the information about the project described in the yaml file passed in as an argument to pitz-shell. The __repr__ method on p gives some summarized data:

In [1]: p
Out[1]:

p.todo is a property that just returns a bag of unfinished tasks for the project:

In [4]: p.todo

Out[4]:
You can print any bag to see all the contents of the bag, and p.todo is no different:

In [5]: print(p.todo)
===========
Stuff to do
===========

(23 task entities)
------------------

0: Add support for something like 'ditz grep' (unknown status)
1: Update entities by loading a CSV file (unknown status)
2: Figure out why some tasks are not converting pointers to objects (unknown status)
3: Support intersection, union, and other set operations on bags (unknown status)
4: Demonstrate really simple tasks and priorities workflow (unknown status)
5: Support a .pitz config file with all pitz scripts (unknown status)
6: Add a todo property on project (or maybe bag) (unknown status)
7: Write code to use strings as keys (unknown status)
8: Prompt to save work at the end of an interactive pitz session (unknown status)
9: Make it possible to support a filter like attribute!=value (unknown status)
10: Write code to support sorting by anything (unknown status)
11: Support hooks (unknown status)
12: Write an attributes property on a bag that lists count of each attribute in any entities (unknown status)
13: Allow two bags to be compared for equality by using their entities (unknown status)
14: Make it easy to list each employee's tasks (unknown status)
15: Support a $PITZDIR env var to tell where yaml files live (unknown status)
16: Demonstrate release -< iteration -< user story -< task workflow. (unknown status) 17: Load new entities from a CSV file (unknown status) 18: Support grep on entities (unknown status) 19: write data to yaml in order (unknown status) 20: Support entity subclasses like releases, iterations, user stories, and tasks (unknown status) 21: A bag should dump to a single CSV file (unknown status) 22: Support using substring of name as name (unknown status)

That's how you see the to-do list for pitz!

In a future post, I'll show how to make new tasks and how to update tasks.

I also need to explain how Pitz lets you come up with whatever wacky workflow you want. When you set up a pitz project, you can use the classes I came up with, or subclass Entity into your own weird types. In a future post, I'll show I'm using pitz to model an agile development system using releases, iterations, checkpoints, user stories, tasks, and people.

pitz data model outline

I just finished a whole bunch of documentation on the pitz data model. You can read it here or just read all the stuff I copied below:

There are two classes in pitz: entities and bags. Everything else are subclasses of these two.

Entities

Making them

Every entity is an object like a dictionary. You can make an entity like this:


>>> from pitz import Entity
>>> e = Entity(title="example entity",
... creator="Matt",
... importance="not very")

You can also load an entity from a yaml file, but I’ll explain that later.

You can look up a value for any attribute like this:


>>> e['title']
'example entity'
>>> e.keys() #doctest: +NORMALIZE_WHITESPACE
['name', 'creator', 'importance', 'title', 'modified_time',
'created_time', 'type']
>>> e['type']
'entity'

Viewing them

Entities have a summarized view useful when you want to see a list of entities, and a detailed view that shows all the boring detail:

>>> e.summarized_view
'example entity (entity)'

>>> print(e.detailed_view) #doctest: +SKIP
example entity (entity)
-----------------------

name:
entity-bdd31951-cff0-42a5-92b4-97ef966a6f6f

creator:
Matt

importance:
not very

title:
example entity

modified_time:
2009-04-04 07:47:09.456068

created_time:
2009-04-04 07:47:09.456068

type:
entity

Notice how our entity has some attributes we never set, like name, type, created_time, and modified_time. I make these in the __init__ method of the entity class.

By the way, you can ignore the #doctest: +SKIP comment. That is there so the doctests will skip trying to running this example, which will generate unpredictable values.

Saving and loading them

Entities have an instance method named to_yaml_file and a from_yaml_file classmethod. Here’s how to use them:

>>> outfile = e.to_yaml_file('.') # Writes file to this directory.
>>> e2 = Entity.from_yaml_file(outfile)

Bags

Making them

While entities are based on dictionaries, bags are based on lists. You can give a bag instance a title, which is nice for remembering what it is you want it for. Bags make it easy to organize a bunch of entities.

>>> from pitz import Bag
>>> b = Bag(title="Stuff that is not very important")
>>> b.append(e)

Viewing them

Converting a bag to a string prints the summarized view of all the entities inside:

>>> print(b) #doctest: +SKIP
================================
Stuff that is not very important
================================

1 entity entities
-----------------

0: example entity (entity)

That number 0 can be used to pull out the entity at that position, just like a regular boring old list:

>>> e == b[0]
True

Querying them

Bags have a matches_dict method that accepts a bunch of key-value pairs and then returns a new bag that contains all the entities in the first bag that match all those key-value pairs.

First, I’ll make a few more entities:

>>> e1 = Entity(title="example #1", creator="Matt",
... importance="Really important")
>>> e2 = Entity(title="example #2", creator="Matt",
... importance="not very")

Now I’ll make a new bag that has both of these new entities:

>>> b = Bag('Everything')
>>> b.append(e1)
>>> b.append(e2)
>>> print(b)
==========
Everything
==========

2 entity entities
-----------------

0: example #1 (entity)
1: example #2 (entity)


Here is how to get a new bag with just the entities that have an importance attribute set to “not very”:

>>> not_very_important = b.matches_dict(importance="not very")
>>> len(not_very_important) == 1
True
>>> not_very_important[0] == e2
True

Since matches_dict is the most common method I call on a bag, I made the __call__ method on the Bag class run matches_dict. So that means this works just as well:

>>> not_very_important = b(importance="not very")

Saving and loading them

Bags can send all contained entities to yaml files with to_yaml_files, and bags can load a bunch of entities from yaml files with from_yaml_files.

Right now, there is no way for a bag to save itself to yaml.

The Special Project Bag

After I finished bags and entities, I thought I was done, but then I ran into a few frustrations:

  • When I made a bunch of entities, but didn’t append them all into one bag, then I couldn’t run filters across all of them.
  • At the end of a session, it wasn’t easy for me to make sure that all of the entities got saved out to yaml.
  • I couldn’t figure out an elegant way to store one entity as a value for another entity’s attribute.

So I made a “special” Bag subclass called Project. The idea here is that every entity should be a member of the project bag. Also, every entity should have a reference back to the project.

Using a project is easy. Just pass it in as the first argument when you make an entity. Imagine I want to link some tasks to Matt and some other tasks to Lindsey. First I make a project:

>>> from pitz import Project
>>> weekend_chores = Project(title="Weekend chores")

Now I make the rest of the entities:

>>> matt = Entity(weekend_chores, title="Matt")
>>> lindsey = Entity(weekend_chores, title="Lindsey")
>>> t1 = Entity(weekend_chores, title="Mow the yard", assigned_to=matt)
>>> t2 = Entity(weekend_chores, title="Buy some groceries",
... assigned_to=lindsey)

Now it is easy to get tasks for matt:

>>> chores_for_matt = weekend_chores(assigned_to=matt)
>>> mow_the_yard = chores_for_matt[0]
>>> mow_the_yard['assigned_to'] == matt
True

Pointers

There’s a problem in that last example: when I send this mow_the_yard entity out to a YAML file, what will I store as the value for the “assigned_to” attribute?

In SQL, this is what foreign keys are good for. In my chores table, I would store a reference to a particular row in the people table.

I wanted the same functionality in pitz, so I came up with pointers. This is dry stuff, so here’s an example:

>>> class Chore(Entity):
... pointers = dict(assigned_to='person')
...
>>> class Person(Entity):
... pass
>>> matt = Person(weekend_chores, title="Matt")
>>> lindsey = Person(weekend_chores, title="Lindsey")
>>> ch1 = Chore(weekend_chores, title="Mow the yard", assigned_to=matt)
>>> ch2 = Chore(weekend_chores, title="Buy some groceries",
... assigned_to=lindsey)

Not much is different, but instead of matt, lindsey, and the various chores all being entities, they’re now subclasses. But here’s the advantage of defining pointers on Chore:

>>> ch1['assigned_to'] >>> matt['name'] # doctest: +SKIP
'person-530ad3cc-14f1-491a-bdb6-ed1dd65afe46'
>>> ch1.replace_objects_with_pointers()
>>> ch1['assigned_to'] # doctest: +SKIP
'person-530ad3cc-14f1-491a-bdb6-ed1dd65afe46'

First of all, notice how I printed out the name attribute on matt.

After running the replace_objects_with_pointers method, I don’t have a reference to the matt object. Instead, I have matt’s name now.

Now I can send this data out to a yaml file. And when I load it back in from yaml, I can then reverse this action, and go look up an entity with the same name:

>>> mn = matt.name
>>> matt == weekend_chores.by_name(mn)
True

In practice, I convert all the entities to pointers, then write out the yaml files, then convert all the pointers back into objects automatically.

That’s the end of the data model documentation. I hope that shines enough light so that it is obvious if pitz would be useful to you or not.

I’m working on a separate article where I show some real-world workflows modeled in pitz, but that will be next week’s post.