Monthly Archives: October 2002

Notes to self: What should the rdf:about point to?

If nothing else, RSS 1.0 is a great source of contemplation and wondering. This morning, I’ve been considering the thoughts surrounding the rdf:about attribute in the channel element:

<channel rdf:about="...">
  ...
</channel>

What should the value of the rdf:about be? The URI of the RSS file itself or the URI of the document that the RSS is describing? This is not a new question – it’s been debated already, and I’m not trying to dredge up the issues in any particular way now; I just want to get my (seemingly random) thoughts down here (weblogging for me is a great framework in which to marshall my thinking, which I’m trying to make the most of in this period of infrequent connectivity).

There are some arguments for the value of rdf:about to be the URI of HTML document (I’m using HTML examples in this post for simplicity’s sake), and others for it to be the URI of the RSS document itself. Here’s what I’ve been thinking:

Thoughts against the value being the RSS URI:

  • In every case that I’ve seen, the URI for the RSS data and the URI for the HTML are different. There are two separate documents. If the value of rdf:about is the RSS URI, what is going to describe the HTML URI?
  • There have been arguments put forward that the RSS and the HTML that it described are really just two representations of the same thing. But I thought it was Not Good Practice to identify a single resource (the thing) with two (or more) different URIs? [On the other hand, if RSS was retrieved via content negotiation, we could use a single URI and just state in the HTTP request that we wanted (to Accept) the data or the metadata.]
  • If RSS ‘is just another representation’ of the HTML, and RSS is RDF, does it follow that all RDF descriptions of single URIs are to be considered as just other representations of said URIs? (This is clearly extreme and provocative)

Thoughts in favour of the value being the RSS URI:

  • Some RSS documents exist without any equivalent ‘describee’. For example, RSS feeds of aggregated items related by category are automatically constructed for further consumption. There are no equivalent HTML documents that these RSS documents are describing. So having to specify a URI of the HTML in the rdf:about attribute in these cases doesn’t make sense.
  • Often HTML pages have content above and beyond the news (or whatever) items being described in the RSS. As the RSS is not describing this other content, stating that the description is about the HTML URI is not entirely accurate, or at least not entirely representative.

One interesting thing to note concerning the ‘representation’ question, and in relation to the HTML <link rel=”…” … /> construction to

  1. point to the RSS feed from a weblog (in collaborative weblog activities earlier this year)
  2. point to RDF from HTML in general (in the RDF M&S)

is that different possible values for the <link>’s type attribute show both alternatives: for example the value “alt” suggests an alternative representation, whereas the value “home” suggests a separate resource altogether

.

Regarding the content negotiation comments, and the HTTP headers that are employed, I am reminded of other thoughts about RDF and resources in general. A question I had (well, still have) is “Where are the statements about the statements?” Yes, I guess I’m talking about reification, but not in the specific technical sense. I’m more interested in this: given a resource, how do I know where the (RDF) statements are that describe it? Appendix B “Transporting RDF in the RDF Model and Syntax (M&S) Specification describes four ways of associating descriptions with the resource they describe – “embedded”, “along-with”, “service bureau”, and “wrapped”. I’ve been thinking of the pros and cons of supplying an HTTP header when the resource is retrieved, like this:

X-RDF: (URI-of-RDF-file)

Now this isn’t a statement about a statement in the RDF sense, but it sure tells you where the statements about a resource are to be found. Hmmm…

Anyway, back to what the value of rdf:about should be. I think the difficulties and questions arise because of the special relationship between RDF and RSS that I mentioned last time. Perhaps because other (non-RDF) RSS formats exist, the RDF and RSS are seen as separate things, so the RSS is a valid candidate for description (by RDF). This becomes meta meta data – a description of the description. Hmmm. One perverse extrapolation of this (taking the fact that RSS is RDF, and considering rdf:about=”URI-of-RSS”) is that all RDF files would be written to describe themselves, and not the actual original resource. Perhaps what I’m trying to say is that this crazy scenario is an argument against having the RSS URI in the rdf:about.

So, what’s the answer?

I don’t think there is a definitive one, apart from “whatever makes more sense for your particular instance”. I don’t know if this is right or not, but it sure is stimulating.

One last question to bring this rambling to an end. What’s the semantic difference between specifying the RSS URI for rdf:about in an RSS file, and specifying blank (i.e. rdf:about=””)? About the same difference as there is between “self reference” and “self reflection”? ;-)

A moment of clarity: The significance of RSS 1.0 as an RDF application

Walking on the beach this morning (yes we’re on holidays) I suddenly had a moment of clarity. It may be obvious to you, but I’ve been struggling to see what the theme was that lay beyond the clutter of the investigations into RDF and RSS, the contemplation of namespaces and the various recent discussions on namespaces in comments to posts on Ben’s, Shelley’s, Sam’s, and doubtless others’ weblogs (all too much to keep track of).

I’d been contemplating what a namespace-aware XML parser for RSS 1.0 would look like, and how it would work in relation to the RSS modules. (Of course for Perl programmers, for example, there’s the XML::RSS parser on CPAN, which is namespace aware but relies on the RSS 1.0 namespace being the default namespace — in other words, if you specify a prefix for the RSS 1.0 namespace, say, “rss10″, rather than have it as the default namespace in the document, and prefix the RSS 1.0 elements with this prefix (“rss10:channel”, “rss10:item”, etc) then XML::RSS isn’t going to like it. But I digress…)

While namespace-aware XML parsing is indeed important, and namespaces are fundamental to RDF, the importance of handling namespaces correctly when parsing had clouded a question that I knew existed but hadn’t all the right words in the sentence, until now.

“What is the significance of RDF in RSS?” Actually, that’s not quite right.

While I’ve been looking at the RDF in RSS, I’ve been concentrating on the bits that ‘look like’ RDF — the stuff that I highlighted in bold in the example RSS (rdf:about, rdf:Seq, and so on). But it’s not as if there are some ‘bits’ of RDF in a format that’s RSS … the format RSS 1.0 is an RDF application. In other words, all of RSS 1.0 is RDF. The fact that it’s very similar to non-RDF RSS formats like 0.91 is of course an intended advantage. And the fact that the ‘transportable form’ that RDF takes is XML (RDF can be expressed in node/arc diagrams, or other forms such as Notation 3, or ‘N3′) also makes it nicely ‘compatible’.

“So what?”, I hear you ask.

Well, I’ve been wondering how complicated an XML parser (yes, a namespace aware one, but that’s not significant here) would have to get to support the plethora of RSS 1.0 modules available now and in the future. To be more specific, let’s take an example. Consider the creator property (element) from the Dublin Core (dc) module. The property is normally used by specifying a literal (a string) as its value, thus:

<dc:creator>DJ Adams</dc:creator>

But what about rich content model usage of properties? Consider the use of this property in the discussions of how to splice FOAF with RSS. Dealing with a new element from a defined namespace, where the usage is of the open tag – literal value – close tag variety, is not that difficult when parsing based on XML events. But what about this, which is based on one of the suggestions from Dan Brickley in the discussion and further discussed on the rdfweb-dev list

:

<dc:creator>
  <foaf:Person>
    <foaf:Name>DJ Adams</foaf:Name>
    <foaf:seeAlso rdf:resource='...' />
    ...
  </foaf:Person>
</dc:creator>

Suddenly having to parsing this, as opposed to the simple ‘literal value’ example, is a whole new ballgame in state management (“where the hell am I now in this XML document and what do I do with these tags?”), and at least for this author, writing an XML parser to cope with all such data eventualities would be rather difficult in the context of XML-event based parsing.

But that’s just it. Considering an XML parser is missing the point. An RDF parser is more appropriate here. Looking at the structure of RSS 1.0 and the modules available for it from an RDF point of view suddenly made things clear for me. With RDF, the striped nature of the information encoded in XML is neatly parsed, regardless of difficult-to-predict hierarchical complexity, and translated into a flat set of triples (subject, predicate, object) that you can then interrogate. What you do with that information is then up to you.

There are many RDF tools, including parsers, listed on Dave Beckett’s RDF resource site. One of them is Redland, his own RDF toolset. Just before I bring this post to a conclusion, let’s have a look at what the RDF parser in Redland produces for the two creator examples earlier.

Simple literal value example gives:

{[http://www.pipetree.com/qmacro], [http://purl.org/dc/1.1/elements/creator], "DJ Adams"}

In other words:

    /--------  creator   +----------+
    | qmacro |----------->| DJ Adams |
    --------/            +----------+

Complex FOAF element structure example gives:

{[http://www.pipetree.com/qmacro], [http://dublincore.com/creator], (genid1)}
{(genid1), [http://www.w3.org/1999/02/22-rdf-syntax-ns#type], [http://foaf.com/Person]}
{(genid1), [http://foaf.com/name], "DJ Adams"}
{(genid1), [http://www.w3.org/2000/01/rdf-schema#seeAlso], [http://www.pipetree.com/~dj/foaf.rdf]}

In other words:

                                     type      /--------
                               +-------------->| Person |
                               |               --------/
                               |
    /--------  creator   /----------    name      +----------+
    | qmacro |----------->|  genid1  |------------->| DJ Adams |
    --------/            ----------/              +----------+
                               |
                               |               /----------
                               +-------------->| foaf.rdf |
                                    seeAlso    ----------/

(Whee! ASCII art RDF diagrams :-)

So what conclusion is there to draw from this bit of rambling? For me, it’s the emphasis on RDF, rather than XML, of RSS (and in fact the subtle relationships between those three things) that is significant in itself, especially when one considers the journey to data richness that seems to demand complex (and tricky-to-parse) XML structures. And what’s more, it’s not specifically RSS that wins here. It’s any RDF application.