A worse blogging system

I’ve been daydreaming about this for a while. I took some time to write out my thoughts. They’re still half-baked.

Blogs and RSS feeds are pretty good. I don’t have to manually go to sites. My reader polls the sites I subscribe to and it pulls the feeds. But the situation could be a lot better.

Problems with blogging from the reader’s POV

Feed readers don’t work all that well offline. Sure, maybe the RSS feed itself is downloaded, but images won’t likely be pulled down.

Also, polling is kind of goofy. It would be nicer to use some kind of pub-sub framework where I get notified.

RSS feeds usually only store recent stories.

Very often I find a great blog that has dozens of stories. I would love to be able to download the entire blog for offline viewing.

What about Google Gears?

Yeah, what about it? I know of one single blog that actually uses it in this context. I would like to think there is a solution to this problem that doesn’t require building C++ extensions to the browser.

Problems from the writer’s POV

This section is based on my experiences with WordPress and Blogger. Obviously, publishing content on a remote site requires an internet connection to that remote site, but there is no real reason that I should need an internet connection to preview the rendering of my content.

Also, there’s no obvious way I can integrate my source control tools with my blog engine.

Several times I start an article on my laptop, upload it as a draft to my server, then work on it on my server, then lose my internet connection, and go back to an out-of-date draft on my laptop to continue work.

I can write an article much more quickly using simplified markup and I can be pretty certain that it will render into valid HTML. There are a few plugins for WordPress that support writing with markdown, but they require using the wordpress text editor. Sure, I could copy and paste from my real editor, but that’s less than ideal.

The idea

Take these ingredients:

  • Any decentralized source control system.
  • Any simplified markup language, like reStructuredText, markdown, or textile
  • Any tool to make pretty html out of that markup language.

And optionally:

  • A new tool to build lots of index files and RSS feeds.
  • A new tool to notify interested parties that something new is ready, by email, jabber, pingback, etc.

Here’s a simple example:

  1. I write a text file using reStructuredText.
  2. I use a local git repo to track revisions.
  3. I use a local tool to render my text file into HTML and make sure I’m happy with the look. Git is set to ignore these HTML files.
  4. When I’m done, I use git to push my work to a remote repository on a box with a webserver.
  5. That repository has some code that fires when ever it receives a new push:
    • It runs the exact same HTML rendering programs I used locally.
    • It builds a new RSS feed.
    • It rebuilds any internal indexes, tables of contents, whatever are appropriate.
    • It interacts with whatever pub-sub crap is useful so other people learn about the new content.

On the remote git repository, all the rendered HTML, RSS, etc would be available for cloning and the webserver supports people reading my blog the old-fashioned way.

WordPress has other features like being able to navigate through archives, or select stories by tags, or send updates to twitter, etc. I think all of these could be solved somehow during the publishing phase.

For example, navigation through archives doesn’t really require any scripting. I just need to generate indexes for every date range.

Tag-based navigation also doesn’t really require running:

SELECT POSTS.*
FROM POSTS, POST_TAGS, TAGS
WHERE POST.ID = POST_TAGS.POST_ID
AND POST_TAGS.TAG_ID = TAG.ID
AND TAGS.NAME = 'some inoffensive tag name';

It would be sufficient to just regenerate indexes for every tag after each post during the publishing phase.

What about comments?

WordPress allows visitors to post comments on a blog, and it does a pretty good job filtering out spammers with the Akismet plugin. I see two solutions; one is straightforward and mediocre and one is preposterous.

The straightforward solution is to use a service like disqus to track comments on an external server.

The rendered HTML pages would include a blob of javascript. That javascript makes a request to pull all the comments for this URL to the site, and then it appends the text to the DOM. Of course, people that download the material for offline viewing won’t see the comments when they don’t have an internet connection.

Sure, it would be possible to regularly scrape the comments out of the remote server and rebuild all the files available for offline viewing, but that only solves the reading part.

Copyright issues with comments

Imagine I write a blog post with a mediocre code sample inside, and you think of a better way to write the same code.

You start writing a comment on my site (or on my Disqus section, it doesn’t matter) and you’re about to submit, when you see a little line that says all comments become my copyright, and you know you want to use this code in some GPL project.

Maybe you don’t see any lines at all that explain who owns blog comments, so then you’re uncertain about what applies.

Anyhow, there’s a deadweight loss here. You have something to say that would help me out, but you won’t say it. If I knew what you were going to say, I’d make a special exception just for this one comment.

By the way, If you want me to change my license so I don’t own the comments, then I’m faced with a bad situation where somebody can post a comment, and then demand later that I take it down. This is a serious problem for “real” sites. Look at the terms of service on reddit. It insists on a perpetual non-exclusive right to any content posted there.

The ridiculous solution

Just like it will be possible to clone my blog text, commenters should have their own repository where I can clone their comments.

So, when Lindsey comments on my (Matt’s) site, she really writes a post on her own site, and then sends my site a message that says:

Hi Matt,

I read your blog post [1] and I wrote a comment here on my site [2].

You can show my comment on to your site as long as you agree with my comment license [3].

[1] http://matt.example.com/why-rinsing-is-as-good-as-washing

[2] http://lindsey.example.com/soap-is-not-optional

[3] http://lindsey.example.com/comment-license

Lindsey

This message could be an email, an HTTP post, whatever. I could manually process this message, or I could set up some handler that figures out what to do based on some rules ahead of time.

So, we’ve changed the flow of comments from lots of people pushing text to me to a system where they just send me notifications and if I want to pull them, then I can.

This system allows more offline work to be done. Lindsey can clone my site and read it. Then she can write a comment. The next time she has an internet connection, she publishes her comment to her site, which triggers the message to be sent to my site.

Conversation hubs

So, pretend that I don’t show Lindsey’s comment on my site because I think her point makes me look stupid. Now how do third-parties get to see her remarks?

Well, this is a solution that is better than the status quo. Imagine that when Lindsey sent me a message about her comment, she also sent a similar message to another server called a conversation hub.

She tells that hub that her post http://lindsey.example.com/soap-is-not-optional is a response to my post http://matt.example.com/why-rinsing-is-as-good-as-washing.

When somebody clones a feed from my site, they can also check a few of these conversation hubs and optionally clone any posts that have indicated they are relevant to that post.

We’d need better tools to assemble a conversation thread from all the different pieces. But that’s not really that hard.

What about spamming the conversation hub?

A spammer could just send messages to the conversation hubs linking their posts to everything out there.

Well, the conversation hubs could insist on real authentication, and then allow feedback from people. Also, people that check for comments at a hub can request to only see comments that have received aggregate positive feedback.

What about Adsense?

Well, if I switch to this approach, and people start downloading my text files to read offline, they ain’t gonna see my adsense ads, and I’ll be deprived of my $15/year revenue.

But for people that actually make real money off adsense, the question is valid. Remember that we’re talking about helping people read your site offline. Those people that are mostly offline aren’t seeing the site now anyway.

The online visitors can still see them though. Also, people that view the HTML files after cloning my publish node may still see them if they have a working internet connection and they allow the embedded javascript to run.

Sure, there’s a risk that some online viewers will switch to the offline-views and then turn off javascript or their internet connection so that they can’t see the ads.

Publishers would need to weigh this risk. Maybe the solution could be to sell offline copies at a price equal to the expected lost revenue from the switchers.

What about SEO?

It’s a non-issue. The HTML is available online just like it always was.

8 thoughts on “A worse blogging system

  1. Apart from the pub-sub stuff, you could just use ikiwiki.

    You can run ikiwiki locally as well as online. There is no tool to provide “realtime preview”, but it isn't hard to write a plugin or script to do this.

  2. With regards to copyright issues with comments, what about a line stating that if they post, they agree to apply $COPYLEFT_LICENSE to their comment, with a link to the license? Creative Commons No Derivative seem appropriate for comments.

  3. Jeremiah,

    The problem with that is that if the commenter does not agree with
    whatever license the blogger chooses, then the comment never appears.
    If the comments live on the commenter's server, then the commenter
    doesn't have to agree to anything before commenting.

  4. Hey, I got hooked up with this url by you on kur05hin. I read the text a few times and it's on point.

    I'm specifically interested in “tools to assemble a conversation thread from all the different pieces.” Sites could assemble comments in a way that weighs them for “karma”, optionally choosing to assemble comments by nepotism for all anyone cares. Let the site decide whatever they want.

    Right now either people surf the web to whore links in a thin veil of participating in a community, or someone tries to build “the next big community” because they want to serve advertisements.

    However, if conversations come to you this kind of behavior dies. This is where Baysian probability becomes interesting because it find stuff you are “probably” interested in, along side other weights like “i want to read what my friends are writing” and “anything with bacon” or whatever else.

    I'm sure there are many other good reasons for this, with rationale much different that mine, but if you are serious with a penchant for “assembling content from various sources” and have practical solutions that could be done in PHP please keep sux0r in mind (hence me, here, shamelessly link whoring… like a snake eating it's tail I guess, but that's a good thing)

    Thanks for reading.

  5. Yeah, I'll plan to check in on sux0r (great name, btw) and eventually
    wade through the code to extract that bayesian part. I need my
    startup to get acquired so I can spend more time on these interesting
    side projects.

    Matt

  6. Hey, I got hooked up with this url by you on kur05hin. I read the text a few times and it's on point.

    I'm specifically interested in “tools to assemble a conversation thread from all the different pieces.” Sites could assemble comments in a way that weighs them for “karma”, optionally choosing to assemble comments by nepotism for all anyone cares. Let the site decide whatever they want.

    Right now either people surf the web to whore links in a thin veil of participating in a community, or someone tries to build “the next big community” because they want to serve advertisements.

    However, if conversations come to you this kind of behavior dies. This is where Baysian probability becomes interesting because it find stuff you are “probably” interested in, along side other weights like “i want to read what my friends are writing” and “anything with bacon” or whatever else.

    I'm sure there are many other good reasons for this, with rationale much different that mine, but if you are serious with a penchant for “assembling content from various sources” and have practical solutions that could be done in PHP please keep sux0r in mind (hence me, here, shamelessly link whoring… like a snake eating it's tail I guess, but that's a good thing)

    Thanks for reading.

  7. Yeah, I'll plan to check in on sux0r (great name, btw) and eventually
    wade through the code to extract that bayesian part. I need my
    startup to get acquired so I can spend more time on these interesting
    side projects.

    Matt

Comments are closed.