Monthly Archives: September 2002

The case of the missing rdf:Description

RDF is a framework for describing resources. We know that. We also know (from various sources, such as the RDF Model and Syntax Specification, or Pierre-Antoine Champin’s RDF Tutorial (recommended!)) that an RDF document consists of a number of descriptions, and these descriptions look like this:

<rdf:RDF ... >

  <rdf:Description ... >
    ...
  </rdf:Description>

  <rdf:Description ... >
    ...
  </rdf:Description>

  ...

</rdf:RDF>

How come then, that instances of two of the more well-known RDF applications, RSS and FOAF, don’t seem to reflect this format? Following the root rdf:RDF node and the declarations of the namespaces, we have, respectively:

<channel rdf:about="http://www.pipetree.com/qmacro">
  <title>DJ's Weblog</title>
  ...
</channel>

and

<foaf:Person rdf:ID="qmacro">
  <foaf:mbox rdf:resource="mailto:dj.adams@pobox.com"/>
  ...
</foaf:Person>

What, no rdf:Description? Let’s have a look at what’s happening here. In the RSS example, we have channel – or in its fully qualified form http://purl.org/rss/1.0/channel – a class, of which http://www.pipetree.com/qmacro is declared as an instance with the rdf:about attribute.

The RDF subject-predicate-object triple looks like this:

http://www.pipetree.com/qmacro rdf:type http://purl.org/rss/1.0/channel

or in other words “the URI (which is about to be described) is a channel”.

Because RDF is about is declaring and describing resources, it becomes clear that this sort of statement (technically the rdf:type triple, above) is very common. And what we saw in the RSS snippet above was the special RDF/XML construction that may be used to express such statements. If we didn’t have this special construction, we’d have to write:

<rdf:Description rdf:about="http://www.pipetree.com/qmacro">
  <rdf:type rdf:resource="http://purl.org/rss/1.0/channel" />
  <title>DJ's Weblog</title>
  ...
</rdf:Description>

which is a tad long winded. Similarly, the long winded equivalent for the FOAF example would look like this:

<rdf:Description rdf:ID="qmacro">
  <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" />
  <foaf:mbox rdf:resource="mailto:dj.adams@pobox.com"/>
  ...
</rdf:Description>

So there you have it. The rdf:Description isn’t there because a special construction is being used in both examples. Many thanks to Jon Hanna for turning the light bulb on in the first place.

More thoughts to ponder

Here are two or three, er, thoughtbites, that I’ve come across over the past few days and that have stayed with me. I just thought I’d share them here as it’s the weekend and often a good time to think about things.

Open thinking about deep-linking
Tim Bray’s strawman defence of the principle that ‘deep linking’ on the web isn’t illegal. It’s a wonderfully calm and simple aspirin for the anger and frustration that builds up inside when one reads about silly legal action about ‘deep-linking’.

RDF, define thyself
In Sean B. Palmer’s document The Semantic Web: An Introduction (highly recommended!), RDF Schema is introduced, using (amongst other things) this snippet of RDF (read “rdf:type” as “is a”):


rdfs:Resource rdf:type rdfs:Class .
rdfs:Class rdf:type rdfs:Class .
rdf:Property rdf:type rdfs:Class .
rdf:type rdf:type rdf:Property .

I don’t know about you, but I had to go and have a sit down to consider the implications after reading that.

Using namespaces in code
Last week on #rss-dev, Ken MacLeod pointed to a post by Dan Connolly regarding namespaces. Ken said:

A very key point (I think) drawn out in this article is that namespaces are used only to derive a (URI+localname) pair — namespaces should never be considered seperate from the element name they specify. … A namespace and localname make a single item of data, distinct from any other combination of namespace and localname.

Libraries and applications (tools) should not try to store a namespace as one “object” and try to link all of the names as “children” of those objects. So, if you’re working in a language that’s string-happy, like Tcl or Perl, the first thing you should do is take the namespace and element name and put them together and use them like that from then on, “{URI}LocalName” works well in Perl, for example.

Sounds obvious when you grok it, but (for me at least) it was a refreshing way to look at the whole issue of namespaces and how they’re represented in XML and used in deserialised data structures.

XML Scripting, data manipulation, and RDF

I’ve just read Jon’s latest post on XML Scripting, which mentions Adam Bosworth’s thoughts about an XML scripting language that could natively support XML.

While the advent of XML scripting sounds fascinating, I’ve also been wondering about RDF enabling us to “gracefully integrate with the world of objects” and enhance the “self-describing nature of XML”. Yes, it’s my current area of interest (read: I’m vacuuming as much information as I can about it right now), and this by itself is likely to taint my vision somewhat. But reading what was quoted from Adam immediately made me think of some of RDF’s features (or should I say ‘nature’, I guess I’m not trying to sell it):

  • in its XML incarnation, RDF can describe the XML data it ornaments
  • it’s core nature (and through association the nature of the XML described by it) is object-orientated: things in RDF are either (instances of) classes, properties, or values of properties
  • the concept and use of namespaces is a key strength of RDF and one on which it relies. Shared semantics, and the classes and properties by which such semantics are conveyed, are surely important when attempting to “convert from one XML format to another” and “synthesize complex XML documents for [from?] multiple sources”

Now it’s clear that XML is not RDF. There’s the bootstrapping issue with RDF applications of which we’re all aware. There’s no magic wand, but there are ways (such as transformations to wring out RDF essence from ‘flat’ XML) to get going. And in the context where REST, web services, business data, and the focus on resources (URIs) intersect, RDF – as a technology for describing, sharing and linking business data – seems too significant to ignore.

Going back to Adam’s quote that sparked this post, I am curious about the ‘native support’ of XML as a data type; my limited imagination cannot see how that might happen without some sort of serialization/deserialisation (will a term like ‘serdes’ be this decade’s equivalent of ‘modem’?). I am ready and willing to be enlightened :-) The great thing about RDF is that there is already a bounty of software (storage mechanisms, model and query tools, serializers and deserialisers) that can work with RDF in many existing programming languages.

Anyway, plenty to ponder. Life is good.

Moving from description to content:encoded in my RSS 1.0 feed

After spotting a comment on #rdfig regarding the contents of my RSS 1.0 feed’s <description>, I decided to take the plunge and use the draft part of RSS 1.0’s mod_content module, namely the content:encoded property, to hold the entity-encoded weblog item content. (The description element itself in core 1.0 is optional, and although I’m omitting it for now, I’m still uneasy about it – ideally I’ll have a text-only abstract and be a good RSS citizen). This is something that Jon, Sam and others have done already. While Timothy Appnel asks a good question, I’ll address it here at a later stage as Blosxom entity-encodes my HTML for me (i.e. there’s not much point trying to XSL-Transform it back).

So I have modified the RSS 1.0 feed for this site to use content:encoded with a stylesheet slightly modified from last time.

From RSS 0.91 to 1.0

Now that I understand what the RDF in RSS is, I’m ready to move up to RSS 1.0. I’m using Blosxom which generates RSS 0.91 by default. Flushed with a previous success using XSLT, I thought I’d use that technology again to generate 1.0 from 0.91.

Luckily, Eric van der Vlist has some XSLT stylesheets over at 4XT to do exactly that. This is the perfect opportunity for me to (a) learn more about XSLT by studying his stylesheets, and (b) to reflect upon the loosely connected nature of the web by employing the W3C’s XSLT Service and pointing directly to Eric’s 0.91-to-1.0 stylesheet and my RSS 0.91 source, in a URI recipe similar to the earlier sidebar experiment.

This link is the URI that will automagically return an RSS 1.0 of my weblog. Hurrah! However, so as not to abuse the transformation service, I’m cacheing the result and making my RSS 1.0 feed ‘static’, like this (split up a bit for easier reading):

/usr/bin/wget -qO /tmp/qmacro.rss10
'http://www.w3.org/2000/06/webdata/xslt
?xslfile=http%3A%2F%2F4xt.org%2Fdownloads%2Frss%2Frss091-to-10.xsl
&xmlfile=http%3A%2F%2Fwww.pipetree.com%2Fqmacro%2Fxml&transform=Submit'
&& mv /tmp/qmacro.rss10 ~dj/public_html/

This is another example of the flexible nature of the shell (my favourite IDE) and programs designed and written for it. The wonderful wget program returns true if the retrieval of a resource was successful, otherwise false. I can then use the && to only overwrite the current static rendering if we’ve successfully got a fresh transform result.

I arrange for this incantation to be made once an hour, and can announce that my RSS 1.0 feed is available here: http://www.pipetree.com/~dj/qmacro.rss10

The RDF in RSS

Ironically, it’s only been the recent and ongoing hubbub about the direction of RSS that’s got me wondering what the real truth is about the RDF in RSS. In a still handwavy sort of way, I understand that RDF is important for the (semantic|data) web that is to form as a layer above the current writhing disconnected mass of URIs. But I realised I hadn’t really thought much about what the RDF bits of RSS (1.0) were, and much less what they were for. I get the feeling that for most mortals, including me, including RDF in their RSS feeds seemed like building a racing car to do the shopping, and never even taking it out on a track after the shopping was finished. Actually, perhaps some people didn’t even see the racing car as a whole. So I did a little reading and thinking.

The point of RDF is to be able to describe resources. Resource Description Framework. So far so good. But what are resources? They’re things that we can point to on the web – things with URIs (REST axioms, anyone?). With RDF, we can make assertions, state facts, about things. These assertions are always in the form of

this thing has this property with this value.

These assertions are often expressed as having the form ‘subject-predicate-object’ and are referred to as ‘triples’. RDF exists independently of XML, but what I (and lots of other people) recognise RDF as is its XML incarnation. Here’s a simple example:

<rdf:Description rdf:about='http://www.pipetree.com/qmacro'>
  <dc:title>DJ's Weblog</dc:title>
</rdf:Description>

This makes the assertion that

the resource at http://www.pipetree.com/qmacro has a title (as defined in the Dublin Core) with the value “DJ’s Weblog“.

What’s obvious is that subjects are URIs. It’s also easy to realise that objects can be URIs too – instead of having a Literal (“DJ’s Weblog”) as in the example above, you can have another resource (a URI), for example:

<foaf:Person rdf:ID="qmacro">
  <foaf:depiction rdf:resource="http://www.pipetree.com/~dj/dj.png"/>
</foaf:Person>

Here, the object, the value of the foaf:depiction property, is a URI (http://www.pipetree.com/~dj/dj.png) pointed to directly with the rdf:resource attribute.

But what’s really mindblowingly meta is that the predicate parts of assertion triples, the properties, are resources, addressable by URIs, too. Yikes! this means that RDF can be used to describe … RDF. In case you’re wondering, the properties (dc:title, foaf:depiction) don’t look like URIs, but they are URIs in disguise – the URI for each property is made up from the namespace qualifying the XML element name, and the element name fragment on the end of that. So for example, the dc namespace http://purl.org/dc/elements/1.1/, plus the element name title, gives:

http://purl.org/dc/elements/1.1/title

Anyway, the point of RDF here is to be able to make connections between things on the web. To define, or describe, relations between things; to add richness to the data out there – to declare data about the data. If we, or our machines, can understand things about the data we’re throwing around, the world will be a better place for it. And to all those meta-data agnostics out there, ask yourself this – where would the database world be without data dictionaries?

So, what about these triples that exist in RSS 1.0? They’re just to add a layer of richness, a seam to be mined by RDF-aware tools. Let’s have a look at a simple RSS 1.0 file. I’ve highlighted the RDF bits (slightly cut to fit):

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/">

  <channel rdf:about="http://www.pipetree.com/qmacro/xml">
    <title>DJ's Weblog</title>
    <link>http://www.pipetree.com/qmacro</link>
    <description>
      Why make things simple when you can make them
      complicated?
    </description>
    
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://www...#tech/moz-tab-bookmark"/>
        <rdf:li rdf:resource="http://www...#tech/google-idempotent" />
      </rdf:Seq>
    </items>
    
  </channel>

  <item rdf:about="http://www...#tech/moz-tab-bookmark">
    <title>Mozilla "Bookmark This Group of Tabs"</title>
    <link>http://www...#tech/moz-tab-bookmark</link>
    <description>
     I was just reading some background stuff ...
    </description>
  </item>

  ...

</rdf:RDF>

Here’s what we have, RDF-wise:

  • We have the rdf namespace (http://www.w3.org/1999/02/22-rdf-syntax-ns#), which qualifies the rdf elements in the file. The root element is RDF, which is standard for an RDF document.
  • An RDF document is made up of a list of descriptions of resources. The resources here are identified, via their URIs of course, via the rdf:about attributes on the <channel/> and <item/> elements. (Notice that the channel and item elements are on the same level, XML-wise.)
  • Then we have the <items/> element inside the <channel/> element, containing an RDF structure called a sequence (rdf:Seq), which is basically an ordered container for things.

And what do these RDF things do? First, each resource – the RSS channel, or the Weblog it represents, and the actual items – are identified as subjects of assertions, using the rdf:about attributes. You could say that they’re the “subjects of Descriptions of them”. Each has a unique URI. Then, an assertion of the following nature is made about the channel:

The channel http://www.pipetree.com/qmacro/xml contains an ordered sequence of things, namely http://www…#tech/moz-tab-bookmark and http://www…#tech/google-idempotent.

If the RSS file were to have an image, it would occur as in other RSS versions (i.e. as an element peer of the <channel/> element), and the <image/> element itself would have an rdf:about attribute pointing to that image resource’s URI. Then, inside the channel element, there’d be a simple:

<image rdf:resource="..." />

element pointing to the same URI as the <image/> element’s rdf:about attribute pointed to. This would say:

The channel http://www.pipetree.com/qmacro/xml has an image , namely (the image’s URI).

And so on.

In other words, the RDF in RSS is there to identify resources (the nodes) and to describe properties of or relationships between them (the arcs). The RDF content of RSS is not large. I think some people might intermingle RDF and namespace content and think “ooh, there’s a lot of RDF in RSS”. Sure, namespaces are fundamental to RDF, but exist (both here in RSS and elsewhere) independent of it (although if you use namespaces such as the Dublin Core in RDF-enhanced RSS, then you’re effectively, and at no extra cost, adding to the data web with the triples that come into being because of how RDF, namespaces, and XML wonderfully work together).

So, there you have it. Just a bit of a brain dump of what I’ve been learning over the past couple of days. Now that I understand what’s going on, I for one would be very disappointed to see RDF go away from RSS. Although there are signs that this may not be the case after all. But that’s another story.

Mozilla “Bookmark This Group of Tabs”

I was just reading some background stuff before I posted my comment just now over on the interesting discussion on REST and idempotency over on Sam’s site (the comment is also partially in response to Sam’s post yesterday).

I had about 5 or 6 tabbed pages open with content relating to the discussion, and lo and behold, my new browser of choice, Mozilla (not least because I can now have a consistent experience on Linux and MS-Windows) allowed me to “Bookmark This Group of Tabs” all at once, and give the collection a nice little title.

Neat. It’s little things like this that make for a pleasant experience. Now if only I could move from tab to tab with the keyboard instead of the mouse …

REST, Google, and idempotency

Sam has asked Mark Baker a question, or rather, presented an apparent conundrum. It was a pleasant subject to ponder as I was rolling out the pastry for some apple and blackberry pies, and on returning to the keyboard, I’ve decided to put some of my own thoughts down on Sam’s points, even though I’m still learning about all this stuff.

Point #1: ‘GETs must not have side effects’ is perhaps REST’s most cherished axiom
If I had to pick one as being the most cherished, I’d go for the one that says that anything that’s important is a first class URI citizen (i.e. addressable by a URI). The ‘no side effects’ axiom appears to be ‘just’ a natural follow on from the presentation of how the HTTP verbs are supposed to be understood and used.

Point #2: The 1001st call to Google is different, and [so] the [GET] query is not idempotent
In the SOAP context, a SOAP Fault will be returned by Google if you exceed your limit of 1000 calls in a day. Returning a SOAP Fault within the context of an HTTP 200 OK status is one thing. But percolating this response up to a REST (i.e. HTTP) context would imply returning, say, an HTTP 403 FORBIDDEN, with a body explaining why. This is a valid response to a GET.

Having different results, different status codes, returned on a GET query doesn’t necessarily imply any side effects. Indeed, in our beloved canonical stock-quote example, we don’t even need to regard the HTTP status codes to see that results can be different on the same GET query (the stock market would be a very dull place if they weren’t). And what about Google itself? The same search query one day will not necessarily return the same results the next day. Different query results, no implied side-effects.

Point #3: So, what do you do?
Nothing different. Through REST-tinted spectacles, the 1001th GET receives a 403, and you act accordingly. No lives have been lost, no state has been changed. Potentatus idem manet. As the saying goes. Well. It does now.

Of course, these are just my thoughts. Apologies to Sam if I’ve misunderstood his points, and to Mark if I’ve potentially muddied the waters.

P.S. maybe I should have used ‘potestas’…

P.P.S. I’m a grey, not a black-and-white, person