A worse blogging system

I’ve been daydreaming about this for a while. I took some time to write out my thoughts. They’re still half-baked.

Blogs and RSS feeds are pretty good. I don’t have to manually go to sites. My reader polls the sites I subscribe to and it pulls the feeds. But the situation could be a lot better.

Problems with blogging from the reader’s POV

Feed readers don’t work all that well offline. Sure, maybe the RSS feed itself is downloaded, but images won’t likely be pulled down.

Also, polling is kind of goofy. It would be nicer to use some kind of pub-sub framework where I get notified.

RSS feeds usually only store recent stories.

Very often I find a great blog that has dozens of stories. I would love to be able to download the entire blog for offline viewing.

What about Google Gears?

Yeah, what about it? I know of one single blog that actually uses it in this context. I would like to think there is a solution to this problem that doesn’t require building C++ extensions to the browser.

Problems from the writer’s POV

This section is based on my experiences with WordPress and Blogger. Obviously, publishing content on a remote site requires an internet connection to that remote site, but there is no real reason that I should need an internet connection to preview the rendering of my content.

Also, there’s no obvious way I can integrate my source control tools with my blog engine.

Several times I start an article on my laptop, upload it as a draft to my server, then work on it on my server, then lose my internet connection, and go back to an out-of-date draft on my laptop to continue work.

I can write an article much more quickly using simplified markup and I can be pretty certain that it will render into valid HTML. There are a few plugins for WordPress that support writing with markdown, but they require using the wordpress text editor. Sure, I could copy and paste from my real editor, but that’s less than ideal.

The idea

Take these ingredients:

  • Any decentralized source control system.
  • Any simplified markup language, like reStructuredText, markdown, or textile
  • Any tool to make pretty html out of that markup language.

And optionally:

  • A new tool to build lots of index files and RSS feeds.
  • A new tool to notify interested parties that something new is ready, by email, jabber, pingback, etc.

Here’s a simple example:

  1. I write a text file using reStructuredText.
  2. I use a local git repo to track revisions.
  3. I use a local tool to render my text file into HTML and make sure I’m happy with the look. Git is set to ignore these HTML files.
  4. When I’m done, I use git to push my work to a remote repository on a box with a webserver.
  5. That repository has some code that fires when ever it receives a new push:
    • It runs the exact same HTML rendering programs I used locally.
    • It builds a new RSS feed.
    • It rebuilds any internal indexes, tables of contents, whatever are appropriate.
    • It interacts with whatever pub-sub crap is useful so other people learn about the new content.

On the remote git repository, all the rendered HTML, RSS, etc would be available for cloning and the webserver supports people reading my blog the old-fashioned way.

WordPress has other features like being able to navigate through archives, or select stories by tags, or send updates to twitter, etc. I think all of these could be solved somehow during the publishing phase.

For example, navigation through archives doesn’t really require any scripting. I just need to generate indexes for every date range.

Tag-based navigation also doesn’t really require running:

SELECT POSTS.*
FROM POSTS, POST_TAGS, TAGS
WHERE POST.ID = POST_TAGS.POST_ID
AND POST_TAGS.TAG_ID = TAG.ID
AND TAGS.NAME = 'some inoffensive tag name';

It would be sufficient to just regenerate indexes for every tag after each post during the publishing phase.

What about comments?

WordPress allows visitors to post comments on a blog, and it does a pretty good job filtering out spammers with the Akismet plugin. I see two solutions; one is straightforward and mediocre and one is preposterous.

The straightforward solution is to use a service like disqus to track comments on an external server.

The rendered HTML pages would include a blob of javascript. That javascript makes a request to pull all the comments for this URL to the site, and then it appends the text to the DOM. Of course, people that download the material for offline viewing won’t see the comments when they don’t have an internet connection.

Sure, it would be possible to regularly scrape the comments out of the remote server and rebuild all the files available for offline viewing, but that only solves the reading part.

Copyright issues with comments

Imagine I write a blog post with a mediocre code sample inside, and you think of a better way to write the same code.

You start writing a comment on my site (or on my Disqus section, it doesn’t matter) and you’re about to submit, when you see a little line that says all comments become my copyright, and you know you want to use this code in some GPL project.

Maybe you don’t see any lines at all that explain who owns blog comments, so then you’re uncertain about what applies.

Anyhow, there’s a deadweight loss here. You have something to say that would help me out, but you won’t say it. If I knew what you were going to say, I’d make a special exception just for this one comment.

By the way, If you want me to change my license so I don’t own the comments, then I’m faced with a bad situation where somebody can post a comment, and then demand later that I take it down. This is a serious problem for “real” sites. Look at the terms of service on reddit. It insists on a perpetual non-exclusive right to any content posted there.

The ridiculous solution

Just like it will be possible to clone my blog text, commenters should have their own repository where I can clone their comments.

So, when Lindsey comments on my (Matt’s) site, she really writes a post on her own site, and then sends my site a message that says:

Hi Matt,

I read your blog post [1] and I wrote a comment here on my site [2].

You can show my comment on to your site as long as you agree with my comment license [3].

[1] http://matt.example.com/why-rinsing-is-as-good-as-washing

[2] http://lindsey.example.com/soap-is-not-optional

[3] http://lindsey.example.com/comment-license

Lindsey

This message could be an email, an HTTP post, whatever. I could manually process this message, or I could set up some handler that figures out what to do based on some rules ahead of time.

So, we’ve changed the flow of comments from lots of people pushing text to me to a system where they just send me notifications and if I want to pull them, then I can.

This system allows more offline work to be done. Lindsey can clone my site and read it. Then she can write a comment. The next time she has an internet connection, she publishes her comment to her site, which triggers the message to be sent to my site.

Conversation hubs

So, pretend that I don’t show Lindsey’s comment on my site because I think her point makes me look stupid. Now how do third-parties get to see her remarks?

Well, this is a solution that is better than the status quo. Imagine that when Lindsey sent me a message about her comment, she also sent a similar message to another server called a conversation hub.

She tells that hub that her post http://lindsey.example.com/soap-is-not-optional is a response to my post http://matt.example.com/why-rinsing-is-as-good-as-washing.

When somebody clones a feed from my site, they can also check a few of these conversation hubs and optionally clone any posts that have indicated they are relevant to that post.

We’d need better tools to assemble a conversation thread from all the different pieces. But that’s not really that hard.

What about spamming the conversation hub?

A spammer could just send messages to the conversation hubs linking their posts to everything out there.

Well, the conversation hubs could insist on real authentication, and then allow feedback from people. Also, people that check for comments at a hub can request to only see comments that have received aggregate positive feedback.

What about Adsense?

Well, if I switch to this approach, and people start downloading my text files to read offline, they ain’t gonna see my adsense ads, and I’ll be deprived of my $15/year revenue.

But for people that actually make real money off adsense, the question is valid. Remember that we’re talking about helping people read your site offline. Those people that are mostly offline aren’t seeing the site now anyway.

The online visitors can still see them though. Also, people that view the HTML files after cloning my publish node may still see them if they have a working internet connection and they allow the embedded javascript to run.

Sure, there’s a risk that some online viewers will switch to the offline-views and then turn off javascript or their internet connection so that they can’t see the ads.

Publishers would need to weigh this risk. Maybe the solution could be to sell offline copies at a price equal to the expected lost revenue from the switchers.

What about SEO?

It’s a non-issue. The HTML is available online just like it always was.

I’m so worked up over this bailout I’m participating in democracy

I just finished using a form on George Voinovich’s site to let him know my thoughts on this banking crisis.

I’m not adamantly opposed to the bailout in theory. I get the idea that the some market activities have external consequences. But I also get that this administration always says “trust me!” right before shit gets really, really bad. If we’re going to do a bailout, let’s do it in a boring and well-thought out way. I want to make sure that this bailout buys us enough safeguards and regulations so that we’re never faced with this crap again.

The villains on k5 have a pretty good discussion about this bailout. I like this comment:

Just about the only way that it would cost 700 billion to get with two chicks is if one was Natalie Portman and the other one was a clone of Natalie Portman. Even cloning a human probably wouldn’t get you particularly close to 700 billion but you might be in the same ballpark.

Ha ha.

Anyhow, I also went to Sherrod Brown’s website and read his statements from today’s hearing and I really like his angle. I’m not too worried about letting him know how I feel since he’s already there.

I also liked how Sherrod Brown has RSS feeds for his site, and a pretty nice looking color scheme. Maybe that’s because he just got there.

UPDATE

Another fine Ohio politician, Marcy Kaptur, is also on the right side of this:

The next season of Bizarre Foods starts Tuesday

Sometimes I wonder if Andrew Zimmern just wants a nice tame meal, but since he’s the star of Bizarre Foods, everywhere he goes, he’s imprisoned by it. Like maybe somebody invites him to a dinner party and everyone is eating spaghetti, but when the host delivers his bowl, he sees it has a bunch of crickets on it.

I think his neighbors probably dump all their rotten fruit at his house. People probably call him to ask if he wants to drink their expired milk.

Anyhow, this video has scenes from next season.

Posted in TV

A bunch of random stuff

I’m going to do my decorators are fun! talk at PyWorks in November, bright and early Thursday morning, November 12th.

In completely unrelated news, my neighbor called the county health department because they saw a rat and the inspector says we have a bunch of burrows in the overgrown woodsy part of our yard in the way back. This might be just the argument I need to convince my wife to let me buy a blunderbuss so I can hunt the little monsters down.

Last week I switched to git from svn + bzr and so far, it’s gone really well. I’ve gotten fantastic help from the people in #git on irc.freenode.org. I love being able to pull and push code across branches without really being careful. In subversion, I always needed to count revisions exactly, and make sure I never repeated, or else I’d end up with a big mess. And git is really fast, too. Even checking the status for my code tree seems like it goes faster than with svn, but maybe I’m imagining it.

I’ve been writing triggers and stored procedures in postgreSQL recently, using both plpgsql and plpythonu. Some tasks were vastly simpler to write in plpythonu and others were easier in plpgsql. I’m working on a real post that describes what I like about each language.

I finished my second week of teaching. I’m really happy that my students are interested in learning more than just HTML and CSS — they want to learn how to process form data, rather than just design pretty forms. So next week, I think we’re going to get into PHP basics, with the ultimate goal being teaching them how to use CMS stuff like WordPress and Joomla for bigger projects.

Finally, it looks like that other Matt Wilson turned up in Berkeley, CA. I’m glad he’s safe.