RDFa Vocabularies

From RDFaWiki

Jump to: navigation, search

Contents

Microformat Vocabularies in RDFa

Introduction

RDFa is a method for embedding RDF "triples" in other languages such as XHTML. There are many ways to define vocabularies for RDF (e.g., OWL) and we certainly recommend that you avail yourself of one of these methods if you are creating a new vocabulary. If, however, you have an existing vocabulary that was developed for use in microformats, you might want to make that vocabulary easily usable via RDFa. This document describes a method for doing so.

Note that much of this document is about the mechanism for allowing RDFa processors to learn about the microformat vocabularies. While this is not really of interest to most microformat users, it is critical to the RDF community. We have tried to isolate those more esoteric aspects of this proposal.


Requirements

There are some pretty simple requirements we want to address:

  • Ensure microformat users have an easy path to RDFa.
  • Do not require the use of prefixes to reference well-known vocabulary terms (but permit prefixed use).
  • Permit the mixing of microformat and RDF vocabularies.

There are also a handful of stretch goals:

  • Provide a tool to help transform microformat using (X)HTML into (X)HTML+RDFa.
  • Ensure that documents that are in (X)HTML+RDFa can still adhere to microformat conventions to the extent that is possible.

What is a vocabulary (to RDF)?

The general purpose of RDF is to associate two URIs via some other term - a subject is associated with an object via a predicate. Usually at least one of these components is a vocabulary term that is defined somewhere on the internet. For example, in XHTML we define the term "next" to be the next resource in a sequence. This term has a formal URI (http://www.w3.org/1999/xhtml/vocab#next) but it can also be referenced in a compact form - because typing that full URI every time you wanted to talk about the next document in a sequence would be silly.


Terms via Compact URIs (CURIEs)

This compact form, called a CURIE, allows for the association of a "prefix" with a mapping value:

  vocab="xhv=http://www.w3.org/1999/xhtml/vocab#"

In this case, the prefix "xhv" is associated with the value "http://www.w3.org/1999/xhtml/vocab#". The purpose of making this association is so that in a document we can use terms like xhv:next and the RDFa processor will know that what we really mean is http://www.w3.org/1999/xhtml/vocab#next.

Terms via unprefixed CURIEs

However, the RDFa designers recognized that the term next is a very common term in HTML documents, and that it should be possible to use that term with no prefix at all! Consequently, they declared that this (and a number of other terms) are part of a collection of "reserved words". This means that, in an (X)HTML+RDFa document, you can use include a link like:

  <link rel="next" href="chapter2.html" />

and an RDFa processor will know that what you really mean is that the URI chapter2.html is related to the current document using the relationship "next", which is the same as "xhv:next" above, which is the same as http://www.w3.org/1999/xhtml/vocab#next.

Extending the set of unprefixed CURIEs

So, the trick to incorporating microformats within RDFa documents is to ensure that the microformat "terms" are part of the collection of terms that can be referenced in an unprefixed fashion. Then it is simply a matter of taking advantage of the attributes that RDFa defines to structure the microformat terms and data. An RDFa processor will then be able to automatically extract the data and map it into real RDF "triples" that are meaningful to the "semantic web".

Creating a document that uses microformat terms via RDFa

Typically, a Microformat document re-uses the @class attribute to mark up semantic data. The following is an example of an audio recording being marked up using the hAudio Microformat:

<div class="haudio">
   <span class="title">Start Wearing Purple</span> by 
   <span class="contributor">Gogol Bordello</span>
</div>

The following Microformat object is generated from the HTML shown above (JSON representation):

{
   microformat-type : "haudio"
   title            : "Start Wearing Purple",
   contributor      : "Gogol Bordello"
}

Here is the same semantic data being marked up in RDFa, with the reserved word extension mechanism used to pull in Microformat vocabulary terms. The terms are used to resolve unprefixed CURIEs:

<div vocab="http://microformats.org/vocab#" typeof="haudio">
   <span property="title">Start Wearing Purple</span> by 
   <span property="contributor">Gogol Bordello</span>
</div>

The following RDF triples are generated from the markup above (TURTLE representation):

_:bnode0
   <http://www.w3. org/1999/02/22-rdf-syntax-ns#type>
      <http://microformats.org/vocab#haudio> .
_:bnode0
   <http://microformats.org/vocab#title>
      "Start Wearing Purple" .
_:bnode0
   <http://microformats.org/vocab#contributor>
      "Gogol Bordello" .
}

Extending RDFa Processors

The mechanism described in this document is not included in the RDFa Syntax specification. It is, however, completely consistent with that specification's view of how RDFa documents should be processed. Basically what we are doing is defining some "pre-processing" rules.

Discovering new unprefixed CURIEs

An RDFa processor that supports this extension can dynamically expand its collection of unprefixed CURIEs. It can do this by retrieving the default RDF graph associated with prefix references from the vocab attribute that have no actual "prefix" defined:

  <html vocab="http://www.example.org/someVocab#">

Tells an RDFa processor that it should dereference the URI and find the RDF associated with it. That RDF may contain many many triples. For RDFa purposes, the ones we are interested define new "unprefixed" CURIEs and, optionally, also define zero or more prefix mappings that can be used in the document.

Alternative Mechanism

An alternative mechanism for this is that the RDFa processor does not need to dereference the URI and find the RDF associated with it - it simply assumes that any unprefixed terms used where CURIEs are expected should have that prefix added by default TobyInk 11:00, 2 September 2008 (UTC)

Defining unprefixed CURIEs

Unfortunately, the first version of the RDFa syntax specification does not define a mechanism for dynamically extending the collection of reserved words. However, the architecture is such that we CAN define a mechanism for permitting unprefixed CURIEs and the behavior when such an unprefixed CURIE is encountered.


To define an unprefixed CURIE, we map a term to a URI using the special XHTML vocabulary term "ReservedWord". In an XMDP document, for example, we might annotate it using RDFa as follows:

 <dl class="profile">
  <dt id='author' about="#author" property=":ReservedWord">author</dt>
   <dd>A person who wrote (at least part of) the document.</dd>
  <dt id='keywords' about="#keywords" property=":ReservedWord">keywords</dt>
   <dd>A comma and/or space separated list of the 
    keywords or keyphrases of the document.</dd>
  <dt id='copyright' about="#copyright" property=":ReservedWord">copyright</dt>
   <dd>The name (or names) of the copyright holder(s) 
    for this document, and/or a complete statement of copyright.</dd>
  <dt id='date' about="#date" property=":ReservedWord">date</dt>
   <dd>The last updated date of the document, in ISO8601 date format.</dd>
  <dt id='identifier' about="#identifier" property=":ReservedWord">identifier</dt>
   <dd>The normative URI for the document.</dd>
  <dt id='rel' about="#rel" property=":ReservedWord">rel</dt>
   <dd>
    <dl>
     <dt id='script' about="#script" property=":ReservedWord">script</dt>
     <dd>A reference to a client-side script. When used with the 
      LINK element, the script is evaluated as the document loads and 
      may modify the contents of the document dynamically.</dd> 
    </dl>
   </dd>
  </dl>

In this example, which is lifted from the XMDP definition document, we merely add the RDFa attributes for about and property to ensure that the appropriate triples are generated. When this document is run through an RDFa processor, it will generate (at least) the following triples:

<http://www.example.org/somevocab#author> <http://www.w3.org/1999/xhtml/vocab#ReservedWord> "author" .
<http://www.example.org/somevocab#keywords> <http://www.w3.org/1999/xhtml/vocab#ReservedWord> "keywords" .
<http://www.example.org/somevocab#copyright> <http://www.w3.org/1999/xhtml/vocab#ReservedWord> "copyright" .
<http://www.example.org/somevocab#date> <http://www.w3.org/1999/xhtml/vocab#ReservedWord> "date" .
<http://www.example.org/somevocab#identifier> <http://www.w3.org/1999/xhtml/vocab#ReservedWord> "identifier" .
<http://www.example.org/somevocab#rel> <http://www.w3.org/1999/xhtml/vocab#ReservedWord> "rel" .
<http://www.example.org/somevocab#script> <http://www.w3.org/1999/xhtml/vocab#ReservedWord> "script" .

An RDFa processor looking at this graph would know to add those reserved words to its collection, mapping the values on the right to the URIs on the left.

Defining a prefix mapping

@@@@ This is a risky feature, and may not be introduced. -ShaneMcCarron @@@@

In general, prefix mappings in RDFa are defined using the xmlns or vocab attributes. Regardless of how this mapping is defined syntactically, the effect is the same. You are defining a prefix as being equivalent to some other string. These mappings are used when turning RDFa "CURIEs" into URIs.

When an RDFa processor retrieves an RDF graph and looks for extensions, one such extension is a prefix mapping definition. This is really just a convenience function so that a referenced document can define a bunch of interesting prefix mappings to simplify document authoring. For example, if an RDFa document contained the following:

  <link about="http://www.example.org/myvocab#" property=":PREFIX" content="ev" />

It would mean that the prefix 'ev' is defined as mapping to 'http://www.example.org/myvocab#'. A referenced resource can define as many of these as it likes - they will be incorporated into referring document's prefix collection.

Updating Microformat Vocabulary Definitions

The machine readable definitions for microformats are defined via XMDP. This is a convention for how to annotate HTML documents with specific values in the class attribute. In order to transform these definitions also support RDFa, we merely need to add a couple of attributes to each declaration and (possibly) define a default prefix mapping just in case people want to use the values in a qualified fashion.

Specifying a default prefix mapping

To define a default prefix mapping, use the XHTML vocabulary term "PREFIX" in a link element within the vocabulary definition. For example, if we wanted to annotate the hCard microformat profile so that it had a prefix of "hc" we could add to the head section of the document at http://www.w3.org/2006/03/hcard the following:

  <link about="#" property=":PREFIX" content="hc" /> 

That would cause the following RDF triple to be available in its RDF graph:

  <http://www.w3.org/2006/03/hcard#> <http://www.w3.org/1999/xhtml/vocab#PREFIX> "hc" .

Annotating XMDP with RDFa

In order to ensure that all the terms are available as unprefixed CURIEs, for each term defined in the vocabulary we need to add the following attributes:

  about="#theTerm" property=":ReservedWord"

Each term has an id so the about references that ID, and the property indicates that it is a ReservedWord as defined in the XHTML vocabulary.

Dealing with collisions

@@@last definition of an unprefixed term wins@@@

Supporting microformats and RDFa in the same document

References

http://gmpg.org/xmdp/description

http://www.w3.org/1999/xhtml/vocab

http://www.w3.org/TR/owl-features/