Monthly Archives: January 2003

Tinkering with RSS and NNTP

RSS via NNTP is certainly not a new concept – I first read about the idea on Matt Webb‘s site almost three years ago. More recently there’s been mention over at Jon’s (Crossing the bridge of weak ties), and Ben’s (RSS to NNTP and HEP Messaging Server).

This week, Ben had mentioned the Panopticon in reference to the forthcoming ETCON. During last year’s, I had hacked around with the Panopticon, creating a sort of Jabber-based information diffusion service to lighten the load on the Panopticon mechanism’s single source socket.

With all the talk of lightening the load from RSS consumers, my thoughts turned from these Panopticon experiments to NNTP, as of course it’s a technology that is designed for information diffusion, and bearing and sharing load. I couldn’t resist a bit of tinkering with NNTP, partly to follow up a little bit myself on RSS to/via NNTP, but mostly in fact to re-acquaint myself with the wonderfully arcane configuration of the majestic beast that is inn. In addition, there’s been talk recently of aggregators moving out of the realms of satellite applications and into the browser itself. The Blagg and Blosxom powered Morning Reading page – my personal (but open) news aggregator – is already web-based, so I thought I’d have a look in the other direction.

Aided partly by Jon’s Practical Internet Groupware book and partly by the man pages, I put together a simple configuration for a server that I could locally post weblog posts to as articles.

As I saw it, there are two approaches to newsgroup article creation in this context, and each has its pros and cons.

Send items from all weblogs to the same newsgroup
This approach means that the’aggregation effect’ (stories from different weblogs) is explicit, as the posts in a single newsgroup are sourced from different RSS feeds, and you read them sequentially. [You can see this effect in the screenshot, in the highlighted public.test newsgroup (shortened to "p.test")]. It also means, however, that such a ‘collective’ newsgroup is going to be less useful for diffusion and load sharing as it’s specific to one (or a few) person’s feed tastes.

Send items from each weblog to a separate newsgroup
While the ‘aggregation effect’ is still available (by using the newsreader’s “read next unread” function, which will normally jump from one newsgroup to the next), it’s not as in-your-face. However, with a single newsgroup for each RSS feed, there are tremendous possibilities for NNTP peer exchange of (RSS weblog item) articles and consequently load sharing in the consumption of RSS – because picking and choosing feeds remains possible at the right (newsgroup) granular level. That said, this approach doesn’t exclude the possibility of composite newsgroups which consisted of, say, finance news feeds, or feeds in similar categories, which would be interesting to more than one person.

After deliberation of such matters, I then wrote a very simple plugin for Blagg, which would post each weblog item to one or more newsgroups. For my purposes, I solved the question of what to call each newsgroup by using the ‘nickname’, required for each feed, in Blagg’s rss.dat file which controls the aggregation activity.

Screenshot of Mozilla newsreader reading aggregated RSS items

The plugin is called nntp. I modified Blagg slightly so it would pass the nickname to the plugin. My version of Blagg 0+4i is here (it has a number of other modifications too). Feel free to take the plugin and modify it to suit your purpose. It was only a bit of twiddling, but it seems to work.

There are plenty of possibilities for experimentation: combining the various weblog trackbacking mechanisms with NNTP article IDs to link articles together in a thread; replying (to the newsgroup) to an article might send a comment to the post at the source weblog. Hmmmm…

Asleep in 2002

Was I asleep in parts of 2002? Or is my memory really as bad as people tell me it is? Of course Amazon offer consumable XML. Now that an email from Chanticleer has jogged my grey matter, I have found this which looks extremely promising. Interesting times ahead for me, although at first glance it looks as if it might be only amazon.com specific (i.e. not for data on amazon.co.uk). Anyway, perhaps I’ll catch up with the present sometime this year…

Your wishlist in ‘consumable’ XML

While experimenting with wishlist data, it occurred to me that it might be desirable to have one’s wishlist exposable directly from a URL, and in a consumable format. This would lend itself quite nicely to URL pipelining.

I hacked up a very simple module, WWW::Amazon::Wishlist::XML (keeping to the original namespace in CPAN) which acts as an Apache handler so you can plug your wishlist ID (mine’s 3G7VX6N7NMGWM) in and get some basic XML out, in a simple HTTP GET request.

Here’s an example:

http://www.pipetree.com/service/wishlist/uk/3G7VX6N7NMGWM

Note the ‘uk’ part in the path. It signifies that the wishlist is held at amazon.co.uk. If held at amazon.com, specify ‘com’, like this:

http://www.pipetree.com/service/wishlist/com/11SZLJ2XQH8UE

It uses the patched version of WWW::Amazon::Wishlist so should be ok for now with .com-based wishlists too. Of course, it’s experimental anyway (as are most of the things I post here) and is likely to explode without warning.

The fragility of retro-engineering

I just discovered that while the CPAN module WWW::Amazon::Wishlist pulled ASINs out of amazon.co.uk-based wishlists, it seems not to be able to find ASINs in amazon.com-based ones. I guess that the HTML layout that the module is scraping has changed. Or at least the hrefs that the module is pulling the ASINs from.

While lamenting the fact that retro-fitting like this is like trying to put a wave into a box, I’ve made a second patch to the module (the $url regex) so it can successfully find ASINs in U.S. wishlists too.

I wonder when/if we will see consumable wishlist data available directly from Amazon, a la AllConsuming’s XML directory?

Transferring my Amazon wishlist to AllConsuming.net

Now that I can monitor comments about books I have in my AllConsuming collection, I thought it would be nice to add those books in my Amazon wishlist to that AllConsuming collection so that I could see what people were saying about the books I wanted to buy.

So I hacked up a few scripts, and here are the results.

Getting my wishlist

Using Simon Wistow‘s very useful WWW::Amazon::Wishlist, it was a cinch to grab details of the books on my wishlist. (I had to patch the module very slightly because of a problem with the user agent string not being set).

The script I wrote, wishlist, simply outputs a list of ISBN/ASINs and title/author details, like this:

[dj@cicero scraps]$ ./wishlist
0751327824 The Forgotten Arts and Crafts by John Seymour
090498205X With Ammon Wrigley in Saddleworth by Sam Seville
0672322404 Mod_perl Developer's Cookbook by Geoffrey Young,  et al
0465024750 Fluid Concepts and Creative Analogies: Computer Models of ...
0765304368 Down and Out in the Magic Kingdom by Cory Doctorow
...

Interacting with AllConsuming

While I’m sure the allconsuming.net site and services are going to morph as services are added and changed, I nevertheless couldn’t reist writing a very simple Perl class, Allconsuming::Agent that allows you to log in (logIn()) and add books to your collection (addToFavouriteBooks(), addToCurrentlyReading()). It’s very basic but does the job for now. It tries to play nice by logging you out (logOut()) of the site automatically when you’ve finished. It can also tell if the site knows about a certain book (knowsBook()) – I think AllConsuming uses amazon.com to look books up and so the discrepancies between that and www.amazon.co.uk, for example, show themselves as AllConsuming’s innocent blankness with certain ISBNs.

Anyway, I’m prepared for the eventuality that things will change at allconsuming.net sooner or later, so this class won’t work forever…but it’s fine for now.

Adding my wishlisted books

So putting this all together, I wrote a driver script, acadd, which grabs my current reading list data from AllConsuming, and reads in a list of ISBN/ASINs that would be typically produced from a script like wishlist.

Reading through the wishlist book data, acadd does this:

  • checks to make sure the book isn’t in my AllConsuming collection already
  • checks that AllConsuming knows about the book
  • adds the book to my collection at AllConsuming

Here’s a snippet of what actually happened when I piped the output of the one script into the other:

[dj@cicero scraps]$ ./wishlist | ./acadd
0751327824 The Forgotten Arts and Crafts by John Se... [UNKNOWN]
090498205X With Ammon Wrigley in Saddleworth by Sam... [UNKNOWN]
0672322404 Mod_perl Developer's Cookbook by Geoffre... [HAVE]
0465024750 Fluid Concepts and Creative Analogies: C... [HAVE]
0765304368 Down and Out in the Magic Kingdom by Cor... [ADDED OK]
...

Woo! Cory’s new book, appearing on my Amazon wishlist, was added to my allconsuming.net collection. (In case you’re wondering, I am only adding books like this to ‘Currently Reading’, rather than any other collection category, temporarily, as right now only the books in this category along with the ‘Favourites’ category can be retrieved with the SOAP API – and it’s upon this API that booktalk relies.)

Anyway, it’s late, time for bed, driving to Brussels early tomorrow morning. Mmmm. Belgian beer beckons!

Content-Type and Blosxom’s RSS

Agreeing with Sam on what content-type should be used for the weblog’s feed (basically it should be whatever you specify in your link tag for that feed), last night I changed the appropriate Blosxom template file, content_type.rss, so that “application/rss+xml” would be sent out with the Content-Type header accompanying the RSS XML.

Unfortunately it broke the feed, in that none of the content was being entity-escaped (escaping of entities in RSS is of course a whole different story which I’ll leave for now). Blosxom decides whether to do entity-escaping if the content-type is “text/xml”. So I made a quick fix to the check, so that the content of any flavour whose content-type was anything ending in “Wxml” would be entity-escaped.

Funnily enough, I was only recently talking about link rel=’…’ tags in Presentations, Wikis, and Site Navigation last night.

So apologies for those people whose readers may have choked on unescaped content for the past few hours from this site.

Presentations, Wikis, and Site Navigation

A while ago, inspired by others, I was looking at adding metadata to this weblog in the form of link rel=’…’ tags that link to related resources. The classic use of such tags in weblogging is for a weblog to point to its RSS feed.

Cut to the present, and Piers and I are thinking about a joint conference presentation. While the presentation format is not in question (HTML), I’ve been wondering how I might investigate these link rel=’…’ tags further, learn some more about wikis, and have a bit of fun in the process.

While HTML-based presentations are nice, something that has always jarred (for me) has been the presence of slide navigation links within the presentation display itself. Whether buttons, graphics, or hyperlinks, they invariably (a) get in the way and (b) can move around slightly with layout changes from page to page in the presentation.

I wanted to see if I could solve this problem.

The MoinMoin Wiki (which I use for documenting various things) generates link rel=’…’ tags for each page, to point to the “Front Page”, “Glossary”, “Index” and “Help” pages that are standard within that Wiki. The Wiki markup includes processing instructions that start with hash symbols (#), to control things like whether section and subsection headings should be automatically numbered or not, and so on. The name/value style directives are known as ‘pragmas’.

screenshot  of site navigation bar in Mozilla

What I did was to hack some of the MoinMoin (Python code) (a few lines added only) so that I could

  • specify any ‘previous’ and ‘next’ slide pages in the markup of a page using #pragma directives
  • have the Wiki automatically generate each page’s appropriate link rel=’…’ tags for site navigation according to these new directives

That way, browsers aware of these tags (including my browser of choice, Mozilla), can display a useful and discreet navigation bar automatically. Problem solved!

I tweaked two MoinMoin files, Page.py and wikiutil.py. It might have broken something else, you never know. It’s just a little hack. Also, so that you can get a feel for what I mean, have a browse of these few presentation demo wiki pages with your browser site navigation support turned on and/or visible. Use the EditPage feature to look at the markup source and see the #pragma directives. (Please don’t change anything, let others see it too – thanks).

So hurrah. We can build, present, and follow up on the presentation content in the rich hypertextual style that HTML and URIs afford, and collaborate on the content in the Wiki way.

On an incidental note, I’ve also added a link rel=’start’ tag to point to the homepage of this weblog. This is made available in Mozilla as the “Top” button in the site navigation bar.

Wisdom, diplomacy, or serendipity?

allconsuming.net has a SOAP interface. Nice and easy to call and use.

But for those (including me) who (also) have a REST bent, there is also a tip-o’-the-hat style flavour that has interesting possibilities. The (readonly) methods are also available as URLs like this:

http://allconsuming.net/soap-client.cgi?hourly=1

or

http://allconsuming.net/soap-client.cgi?friends=1&url=http://www.pipetree.com/qmacro

where the methods are “GetHourlyList()” (hourly=1) and “GetFriends()” (friends=1) respectively.

While the actual data returned in the message body is clearly Data::Dumpered output of the data structure that would be returned in the SOAP response, a slight change on the server side to produce the data in ‘original’ XML form would be very useful indeed for pipeline-style applications, perhaps.

Erik is using these URLs to show readers examples of response output. But I bet the potential diplomacy wasn’t lost on him.

The universal canvas and RSS apps

It seems that beyond carrying syndication information, RSS is a very useful and flexible way to get all sorts of application data pushed to a user over time. In the same way that a web browser is a universal canvas upon which limitless services and information can be painted, so (in an albeit much smaller way) an RSS reader/aggregator might also find its place as an inbox for time-related delivery of all sorts of information. This is borne out by the experimental “booktalk” application which uses the RSS infrastructure to deliver information that appears over time.

Application data and RSS is something that Matthew Langham touched upon last December. And of course, this isn’t just hot air, we’re already generating RSS out of SAP R/3 at work for a sales (SD) application.

The disruptive engineering spectrum, and “booktalk”, an AllConsuming app

At one end of the spectrum, along which building blocks for future cooperative web applications lie, we have the library software vendors who were unwitting participants in a great web service experiment “LibraryLookup” built and described by Jon in a recent InfoWorld column. While I’m sure everything is fine now, I don’t think their initial reaction to their participation was favourable. Fair enough.

At the other end of the spectrum, enter Erik Benson and his creation allconsuming.net, a very interesting site which builds a representation of the collective literary consciousness of the weblogging community by scanning weblog RSS feeds for mentions of books (Amazon and other URLs, specifically ISBN/ASINs) and collating excerpts from those weblog posts with data from other web sources such as Amazon and Google. Add to that the ability to sign up and create your own lists of books (currently reading, favourites, and so on) and you have a fine web resource for aiding and abetting your bookworm tendencies.

A fine web resource not only for humans, but as a software service too. In constructing allconsuming.net, Erik has deliberately left software hooks and information bait dangling from the site, ready for us to connect and consume. Moreover, he encourages us to do so, telling us to “Use [his] XML” and try out his SOAP interface.

So I did.

While allconsuming.net can send you book reading recommendations (by email) based on what your friends are reading and commenting about, I thought it might be useful to be able to read any comments that were made on books that you had in your collection. “I’ve got book X. Let me know when someone says something about book X”.

So I whipped up a little script, booktalk, which indeed uses allconsuming.net‘s hooks to build a new service. What booktalk does, crontabbed on an hourly basis, is to grab a user’s currently reading and favourite books lists and then look at the hourly list of latest books mentioned. Any intersections are pushed onto the top of a list of items in an RSS file, which represents a sort of ‘commentary alert’ feed for that user and his books. It goes without saying that the point of this is so that the user can easily monitor new comments on books in his collection by subscribing to that feed, which, aggregated by Blagg and rendered by Blosxom, would look something like this.

Of course, the usual caveats apply – it’s experimental, and works for me, your mileage may vary.