Monthly Archives: May 2002

RSS’s underlying structure, and meta-RSS.

I’ve been pondering the nature of RSS, the lightweight syndication and feed format which is a heavy contender for the most talked-about XML format these days (uh, I suppose this post goes some way to help the cause too :-).

So what about RSS? What makes it so appealing? Well, a big reason is of course its position as a foundation of stability (despite its own temporary instability, format-wise), in the burgeoning, nay, blossoming, world of weblogs, syndication, and knowledge sharing. But I’ve been wondering if it’s more than that. It’s a simple format. But a powerful one. The RSS skeleton reflects an information model that can be found everywhere: header and body. You could say header and items. Items. Positions. What’s the fundamental structure of pretty much every piece of (business) transactional data in SAP (and other ERM) systems? A document. A document, which has a header, and items, or ‘positions’. Sales orders, invoices, purchase requisitions… the list goes on. Hmmm. Could it be that the RSS skeleton is so popular and flexible because it’s one of the netspace’s protean formats, and easy to grok?

RSS 1.0 celebrates that flexibility with it’s modular approach.

I’m celebrating that with a ‘meta’ RSS feed, available from the [meta] link in the My Feeds list on the right hand side of this page. It’s a list of all the feeds I’m subscribed to right now, in an RSS 1.0 format. Currently, I’m just using core tags, but it might be a better idea to create a simple module to enable an explicit statement of what the data is. (I know there’s OCS too, but hey, it’s Friday).

HTML LINK to RSS source

I’m doing my bit for the weblog community domino effect. Mark, and others, have added <link/> tags to their weblog HTML, to point to the RSS feeds for the respective weblogs. I think this is a good idea, so have done it too in this, my Blosxom-powered weblog.

I do remember sitting in on an RSS BOF at last year’s OSCON where we discussed the idea of having an index.rss at the website’s root, rather like the robots.txt file. This <link/> based pointing is a nicer approach, as there’s an explicit relationship between the RSS XML and what it describes.

What’s more, Mark has a nifty bit of Javascript that grabs any RSS URL that it finds in this new <link/> home, and bungs it at Radio Userland‘s localhost-based webserver invoking a subscribe on that RSS feed. Very nice. I don’t run RU, nor use IE much, but nevertheless this would work, even with Blosxom, because I’m running bladder :-)

ETag-enabled wget

Well, a little evening hack while watching Inspector Morse has produced a minimalist script wget.pl – a tiny wrapper around the wget command so that you can be more polite when retrieving HTTP based information – in particular RSS feeds.

The idea was sparked by Simon‘s post about using HTTP 1.1’s ETag and If-None-Match headers. I wanted to write as small and minimal a script as possible, and rely on as little as possible (hence the cramped code style), in honour of Blosxom, and of course Blagg, the RSS aggregator, for which the script was designed. You should be able to drop this script reference into Blagg by specifying the RSS retrieval program like this:


my $get_prog = '/path/to/wget.pl';

Don’t forget, the ETag advantage is only to be had from static files served by HTTP. Information generated on the fly, such as that from CGI scripts (such as Blosxom) aren’t given ETags.

 

Update 06/06/2012

It’s now just over 10 years since I originally wrote this post, and in relation to a great post on REST by Sascha Wenninger over on the SAP Community Network, I’ve just re-found the script — thanks to a comment on Mark Baker’s blog that pointed to wget.pl being part of a Ruby RDF test package. Thanks mrG, whoever you are! The link above now works, but here’s the script in its rude entirety for your viewing pleasure.

#!/usr/bin/perl -w

# ETag-aware wget
# Uses wget to more politely retrieve HTTP based information
# DJ Adams
# Version 0+1b

# wget --header='If-None-Match: "3ea6d375;3e2eee38"' http://www.w3.org/

# Changes
# 0+1b 2003-02-03 dja added User-Agent string to wget call
# 0+1 original version

use strict;
my $cachedir = '/tmp/etagcache'; # change this if you want
my $etagfile = "$cachedir/".unpack("H*", $ARGV[0]);
my $etag = `cat $etagfile 2>/dev/null`;
$etag =~ s/\"/"/g;
$etag =~ s/^ETag: (.*?)n$/$1/ and $etag = qq[--header='If-None-Match: $etag'];

my $com="wget -U 'blagg/0+4i+ (wget.pl/0+1b)' --timeout=60 -s --quiet $etag -O - $ARGV[0]";
print "Running: $com";

my ($headers, $body) = split(/nn/, `wget -U 'blagg/0+4i+ (wget.pl/0+1b)' --timeout=60 -s --quiet $etag -O - $ARGV[0]`, 2);
print "Got headers: $headersnn";
if (defined $body) {
  ($etag) = $headers =~ /^(ETag:.*?)$/m;
  print "Return value etag: $etag";
  defined $etag and $etag =~ s/"/\"/g, `echo '$etag' > $etagfile`;
  print "n==========n";
  print $body;
}
else {
  print "Cached.";
}

Top tip

If you want to avoid a stint in Accident & Emergency in-patients, and a bandaged ankle, don’t bolt down the stairs at Clapham Junction station two at a time and then miss your footing before you get to the bottom.

Ouch.

Ahem.

More thoughts on HTTP, Email, and Jabber

Since writing the previous entry, some more thoughts have drawn themselves to my attention. There are advantages that HTTP does have over email. Built-in authentication for one thing. I’ve only used basic authentication, but what about digest? Moreover, Jabber goes one better and has a framework for identity.

Actually, talking about HTTP headers with basic and digest authentication, here’s something else I’ve been wondering. Simon Fell rightly suggests using a more polite and sensitive way to grab RSS sources, by use of the Etag and If-Not-Match headers. Very sensible. But what about the If-Modified-Since header?

Here’s one advantage that email has over HTTP. A built-in queueing system. Ok, the actual queueing system is made most visible by use of email clients, where you see mails in a queue, ready to read or process. But this is just a mask over the flat stack of emails that you can pop with, er, the POP protocol.

“Yesbut”, as a friend used to say in meetings and discussions. Here’s something I’ve been pondering too. Last week I downloaded and installed the fabulous RT, (“Request Tracker”) – a ticketing system written in Perl. It’s very flexible and extensible. RT allows tickets to be managed in queues. It also allows tickets to be created (or corresponded upon) through different interfaces – via a web interface, via email, or via the command line. Any incoming transaction is inserted into a queue (if it’s a new ticket) or appended to an existing queue entry (if it’s correspondence on an existing ticket). I wonder if I can build a small front end to accept HTTP-based business calls and stick them in an RT queue? Of course, I also wonder whether that would be useful, but if nothing else, it would be stimulating.

“Web^H^H^HInternet Services”? Some Ramblings.

I’ve been pondering the term “Web Services”. While I completely understand and agree with all the reasonings behind the term (the ‘original’ services were accessible via web clients, HTTP is the underlying and ubiquitous transport, blah blah blah), I’m wondering whether “Web Services” is the best term to use.

While the current rush of implementations use HTTP as the transport (witness HTTP as the most common transport for SOAP RPC, or HTTP as the designated transport in the XML-RPC specification), there are apparent pitfalls.

Firstly, look at the steam generated from the SOAP-through-firewalls debate. (On the one hand they have a point, on the other hand, it’s not necessarily up to a firewall to vet at the application level – look at EDI for example). Secondly, some people are of the opinion that HTTP needs to be replaced, in the light of its apparent weaknesses for the things that people want to use it for these days. If this happens, will we change the ‘Web Services’ name?

Secondly, focusing on HTTP (and therefore the ‘Web’ in ‘Web Services’) does a, err, disservice to other protocols careening around the ‘net. What about the venerable SMTP, for example? There has been valid comments made about the applicability of HTTP in ‘increasingly asynchronous’ transactions. Fire off a request for some information, say, a quotation, and the response may take days to come back. Is this legal, moral, sensible, in HTTP?. Ok, you could frame the asynchronicity in HTTP by using two request/responses (one pair in one direction and the other in the other direction: “I want a quotation, post it here when you’re ready with it” -> “Ok, will do” … “Hey, here’s the quotation” -> “Ooh, thanks”). (Hmmm, why do I think of RESTful things when mulling this over in my head?) You could of course go for one-way messages suspended in a SOAP solution to achieve the same effect, I guess. Hmmm, so many options, so little time.

Anyway, as an alternative to HTTP, how about transporting this stuff over other protocols, like the aforementioned SMTP (or a combination of SMTP and whatever endpoint protocol – POP, IMAP, and so on – you need). Or even Jabber! Both lend themselves to asynchronous interaction more than HTTP does. Or so it seems to me. Both involve to a greater or lesser degree some modicum of store-n-forward, allowing the endpoints to talk at their leisure.

Of course, this is all very high level, and based, as usual, on my ignorance of detail. But I often prefer to wonder about things rather than to know straight away which is right and which is wrong. And here, just like in the REST vs SOAP RPC debate, I don’t think there is a definitive right and wrong way. Horses for courses.

Postscript

I wrote the above at 30000 feet (or however high it was) above the English channel. Now that I’m on good old terra firma, travelling in a rickety South Central train from Victoria Station, I’ve had another thought. Revisiting the REST architectural style in extremis (what’s all this Latin doing here?) in the context of what I wrote above (ha, in both senses of the word) would be a good mental exercise and a focused way of finding out more about how it works. From what I understand, the URI is exalted as a holy pointer, being in many respects the blessed reference mechanism to the business objects that are exchanged in service provision and consumption.

I think I’ll stop now before this prose goes completely off the scale; suffice it to say that instead of the service returning a quotation, as a payload XML document in the body of the return email, it plonks it somewhere where it can be retrieved by HTTP, and sends a little notification with the URL instead.

Hmm, lots of things to think about…

(Jabber-)Browsing the Panopticon data

Ok, further to my initial Panopticon/Jabber experiments, I’ve extended the panpush.pl script to respond to jabber:iq:browse requests. As the script starts, and receives the initial gush of data from the Panopticon port, and as it receives further pushes, it stores the data on the avatar icons, and makes this data available as results to the jabber:iq:browse requests.

To get a list of avatars in the Panopticon, you can send a query like this:


<iq type='get' to='bot@gnu.mine.nu/panopticon' id='b1'>
  <query xmlns='jabber:iq:browse'/>
</iq>

The response will look something like this:


<iq type='result' from='bot@gnu.mine.nu/panopticon'
       to='dj@gnu.mine.nu/home' id='b1'>
  <panopticon xmlns='jabber:iq:browse'
              jid='bot@gnu.mine.nu/panopticon' name='The Panopticon'>
    <icon jid='bot@gnu.mine.nu/panopticon/2b8bf6a9e9a173f95f27ae1a8d6fb2f4'>
      <name>Blammo the Clown</name>
    </icon>
    <icon jid='bot@gnu.mine.nu/panopticon/3ab6c14732e8937cf26db26755c4aae7'>
      <name>Rael Dornfest</name>
    </icon>
    <icon jid='bot@gnu.mine.nu/panopticon/47e48c975621bf43fc81622265d47a31'>
      <name>Dan Gillmor</name>
    </icon>
    ...
    <icon jid='bot@gnu.mine.nu/panopticon/deedbeef'>
      <name>#etcon bot</name>
    </icon>
  </panopticon>
</iq>

(I’d originally just returned each icon without the <name/> tag, but figured that that would probably be less than useful.)

You can ‘drill down’ with a further query (sent to the JID of the icon you’re interested in – remember, Jabber browsing is most effective when you can navigate a hierarchy of information via their nodes’ JIDs) like this:


<iq type='get' id='b2'
    to='bot@gnu.mine.nu/panopticon/2b8bf6a9e9a173f95f27ae1a8d6fb2f4'>
  <query xmlns='jabber:iq:browse'/>
</iq>

Which should hopefully elicit a response like this:


<iq type='result' to='dj@gnu.mine.nu/home' id='b2'
    from='bot@gnu.mine.nu/panopticon/2b8bf6a9e9a173f95f27ae1a8d6fb2f4'>
  <icon xmlns='jabber:iq:browse'
        jid='bot@gnu.mine.nu/panopticon/2b8bf6a9e9a173f95f27ae1a8d6fb2f4'
        id='2b8bf6a9e9a173f95f27ae1a8d6fb2f4'>
    <url>http://progressquest.com/expo.php?name=Blammo the Clown</url>
    <text>Mmm...  Beer Elementals</text>
    <x>805</x>
    <y>494</y>
    <name>Blammo the Clown</name>
  </icon>
</iq>

This should reflect the latest information to be had on that avatar.

Amphetadesk Links

I was just chatting to Morbus in the Emerging Tech. Conference IRC channel (#etcon on irc.openprojects.net) and he mentioned his Amphetadesk news aggregator had a ‘subscribe to this’ feature for RSS URLs similar to the Radio Userland coffee-cup feature I’ve already included in the My Feeds section.

So Morbus told me what the links should look like, and I just added them to my crontab’d script that produces the My Feeds list from Blosxom‘s rss.dat file.

You can see the result in the form of the [A] links in the list – click on these if you’re running Amphetadesk!

The Panopticon

Everything that goes around, comes around. What thing links my old University (UCL), Jeremy Bentham, (whose preserved figure sits in UCL’s South Cloisters), and this year’s Emerging Technology Conference?

Why, the Panopticon, of course. An architectural figure, envisioned by Bentham, which allows one to see but not be seen. “The Panopticon” is also the name given to a wonderful experiment in “blogger stalking” (a phrase from BoingBoing) with avatars and a floormap of the conference area.

This Panopticon’s creator, Danny O’Brien (of NTK fame), put out some instructions as to how the thing worked, and mentioned that he would be streaming the metadata out of a port on his server. He asked if anyone could regurgitate the data to a Jabber room so other clients could grab it from there rather than hammer his server, so I took up the challenge :-) This is, in essence, poor man’s pubsub (again) in the spirit of load dissipation: with a ratio of, say, 1:50 (Panopticon port connections – to – Jabber conference room listeners) we can relieve the strain and have a bit of fun.

Ok, well it was a very quick hack. The data coming out of the server port is a stream of XML. Hmmm. Sounds familiar ;-) I quickly hacked together a library, Panopticon.pm, based loosely upon Jabber::Connection, a Perl library for building Jabber entities (XML streams flow over Jabber connections, too, y’know). With this quick and dirty library in hand, I wrote an equally quick and dirty script, panpush.pl, which uses Panopticon.pm and Jabber::Connection to do this:

  • connect to a Jabber server and authenticate
  • join a conference room
  • open up the panopticon server port
  • fire the data that comes out of the port into the conference room for all to see and read, on a continuous basis

The Panopticon data is XML. Jabber is XML. So I decided the nice thing to do would be to avoid just blurting XML into the conference room – that would be like shouting gobbledygook in a room full of people. Instead, I wrote something sensible to the room each time some data fell out of the end of the Panopticon socket (the name of the blogger’s avatar), and attached the actual Panopticon XML as an extension to the groupchat message. Here’s an example:

Panopticon produces this:
<icon id='4ee9da17f5839275ad0ca5d58c2bacaa'>
  <x>456</x>
  <y>255</y>
</icon>
panpush.pl sends this to the room:
<message to='panopticon@conf.gnu.mine.nu' type='groupchat'>
  <body>DJ Adams</body>
  <x xmlns='panopticon:icon'>
    <icon id='4ee9da17f5839275ad0ca5d58c2bacaa'>
      <x>456</x>
      <y>255</y>
    </icon>
  </x>
</message>

The scary thing is that it seems to work! Grab your nearest Jabber client and enter room

panopticon@conf.gnu.mine.nu

(remember, you don’t have to have a Jabber user account on gnu.mine.nu to join a conference room there – just use your normal Jabber account, say, at jabber.org). If it’s still working, you should see ‘panopticon’ in that room – that’s the panpush.pl script. When some avatar metadata changes and pops out of the Panopticon server’s port, it will appear in the room – currently represented as the avatar’s name.

Want more? Want to actually do something with the data in the room?

Well, I’ve just written an example antithesis to panpush.pl – panclient.pl. This connects to the conference room, and listens out for packets containing the panopticon XML extensions. It just prints them out, but of course you can do with the data as you please. It’s just an example.

Oh, one more thing. As panpush.pl catches the panopticon XML and squirts it into the room, it also caches the actual avatar data, keyed by each icon’s id attribute. I plan to allow queries to be sent to the ‘panopticon’ room occupant, probably in the form of jabber:iq:browse IQ queries, so that clients can find out about what avatars are currently around, and what properties they have (name, url, xy coordinates, and so on).