Archive for June 2006

RDFa: The Gentle Road to RDF

One confusion that surrounds RDFa is whether you must wait for XHTML 2 to come along, before you can use it. In this post we’ll try to clarify how RDFa relates to HTML now.

The key thing to understand about RDFa is that it doesn’t just add new attributes and techniques to XHTML to allow for the embedding of rich metadata, it also brings to the fore the metadata features that already exist in XHTML. This is important, because a key design goal has been that to convince people to put metadata into their documents–important if we are ever to bring about the so-called Semantic Web–it has to be easy…and to be easy it needs to build on what people already know how to do.

RDFa does a lot, but here we will describe some of its features at a high level, in order to show how RDFa extends XHTML practice, rather than replacing it.

1. Use @rel and @rev to give information about your links.

This first technique is just standard HTML. Best-practice would say that a value is also added to the profile attribute, and currently RDFa says nothing about what that should be–but it should.

2. Use namespace-prefixed @rel and @rev values to make this information globally understandable.

The problem with simply putting text values into @rel and @rev is that it is not clear whether your ‘event’ is the same as my ‘event’. If we want to share our metadata or make it available to metadata search engines then we need to say that we both mean the same thing. This is usually achieved with namespaces, and RDFa recommends using CURIEs (a prefixed name, for example dc:creator) when you want to be precise.

Using namespaces in @rel and @rev would still be valid XHTML though, even though browsers wouldn’t do anything with the values (of course the whole point is that an RDFa parser could).

3. Use @property to provide predicates for inline mark-up.

The addition of the @property attribute is new, but it’s not that difficult to understand–it’s just the equivalent of @rel and @rev for inline text, or the @name attribute on the meta element.

4. Use @about to indicate that something other than the document is the subject.

This is also new, but we feel it is extremely intuitive. @about sets a context for some block of metadata, allowing you to collect together statements that concern some external resource, such as an image, a sound file, a license, a friend, etc.

5. Use <meta> and <link> anywhere in the document.

Also new, but also intuitive; everyone is familiar with using meta and link to add metadata to their documents, and all this new usage does is to allow these elements to be used in a meaningful way elsewhere. It means that now you can indicate the source of a quote, the license for an image, the name of the photographer who took the photo, and so on, all inline within your document.

In summary, RDFa gently builds on HTML best practice, but has the power to ramp up to full-blown RDF. In fact, it even supports reification and bnodes, although if you don’t know what they are, don’t lose any sleep over it!


Python RDFa Parser Added to RDFLib

Elias has just announced the incorporation of his RDFa parser into RDFLib, a powerful Python library for retrieving, storing and manipulating RDF.

He mentions in passing that he was inspired to do this whilst discussing his parser with Chimezie. That makes this doubly exciting…if Chimezie has his hands on RDFa, who knows what will happen next!


A Python RDFa Parser

If you’re starting to look at RDFa you may already have come across one of the key design concepts–that it is ‘generic’. What we mean by that is that once you know the handful of basic rules that make up RDFa, you can add any type of metadata you like to your XHTML documents. And because the rules are fixed, it’s clear what happens when more than one vocabulary is used in a document.

The advantage of this approach is that you only need one parser; to make available to some processor the metadata contained in your document simply requires applying the RDFa rules, without needing to know anything about what the markup is meant to mean. To give an example, in the hCard microformat you can say this:

<a class="email fn" href="mailto:jfriday@host.com">Joe Friday</a>

But to understand which of the values in the class attribute apply to the @href and which to the content of the anchor (or both or neither) you need to look at the hCard specification–in other words you need a specialised hCard processor. This applies to each microformat produced, since they are all–much like GRDDL processors–specialised processors.

RDFa has from the beginning taken a different approach and the aim has always been to create a set of rules that are independent of any particular metadata language. The RDFa equivalent of the previous example would be:

<a rel="email" property="fn" href="mailto:jfriday@host.com">Joe Friday</a>

In XHTML the rel attribute already does the job we need it to do, namely providing some metadata about the relationship between the current document and some other resource–so it’s clear that email qualifies @href.

Unfortunately, there is nothing straightforward in XHTML that can be used to flag up the text value. The microformats approach is to use the class attribute, but whilst this logically tells us something about the object that has the class (a span or a, for example), it doesn’t feel right to say that this is also a property of the document that contains the mark-up. (In RDF terms, if we say that “Mark is of type author”, that doesn’t necessarily mean “This document has an author of Mark”.)

We therefore decided in RDFa to make further use of the attribute that is already used on the meta element–@property–since in RDFa we always try to build on what authors already know how to do, and are comfortable with.

Although I said above that “you only need one parser”, I didn’t of course mean that you only need one parser! I just meant that since the process is clearly defined, and is independent of any vocabulary, then once you have written your parser you don’t need to write a new one when someone creates a new set of metadata terms. Whilst we were working on the language, the only parsers for RDFa were XSLT-based. However, recently, Ben Adida implemented a JavaScript parser (or to be more precise, a flexible parsing framework), and yesterday Elias Torres announced a Python parser.

Elias’ parser is pretty impressive in its own right, but it is particularly important because it is the first parser to be written by someone not involved in writing the specification…and has therefore shown up some glaring inconsistencies in the spec! Not content with that, Elias has also gone on to create a web service that will retrieve any URL you give it, parse the RDFa, and give you back the RDF/XML.

This is all excellent work, and a substantial taste of what RDFa will be able to deliver.