RDFainHTML4
From RDFaWiki
- Experimental (validating) demonstration of RDFa in HTML: Example
Contents |
Embedding RDFa in HTML 4 Documents
The RDFa attribute collection and rules are designed from the ground up to work with a variety of markup languages. While the focus of RDFa has been upon XML languages, there is not anything in the design that prevents it being used in HTML 4. In fact, given the state of the art in HTML user agent implementations, the inclusion of RDFa attributes works very well (in that they are ignored by popular user agents). There are only a couple of snags:
- RDFa uses CURIEs, and in XML grammars CURIEs rely upon the use of XML Namespace attributes for the definition of prefix mappings.
- Validation of HTML4 + RDFa requires an updated DTD and (possibly) updated validation software.
This page reflects the thinking of some of the RDFa creators on how to create an HTML 4 "profile" for RDFa.
HTML4 + RDFa
We have developed an extension to the HTML 4.01 transitional document type that permits the use of RDFa. This document type permits the use of RDFa in exactly the same way as it is used in XHTML+RDFa - RDFa attributes are permitted everywhere. Documents written to use this extension need to use the DOCTYPE declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML4+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/html4-rdfa-1.dtd"
Prefix Mappings
RDF is all about associating URIs to other URIs. RDFa relies upon CURIEs as a mechanism for abbreviating URIs within the RDF attribute values. This is a simple extension mechanism, and allows anyone to easily define new relationships. CURIEs are a combination of a "prefix" and a reference. The "prefix" is a shorthand; it is mapped to a string. The concatenation of this string and the reference is a URI - bringing us back to how RDF describes relationships... In XHTML+RDFa, these prefix mappings are defined using the XML Namespace mechanism (xmlns:foo="someURI"). This is a well known mechanism, and people in the XML and RDF communities are used to using it. This mechanism will work fine in HTML4+RDFa documents - RDFa parsers will recognize the "xmlns" attributes as prefix maps and act accordingly.
Future Handling of Prefix Mappings
There is some feeling in the RDFa community that the use of "xmlns" attributes to define prefix mappings is overloading the XML Namespace mechanism - in particular since the facility that relies upon the prefix mappings - CURIEs - does not really use XML Namespaces. In addition, it turns out that it is pretty tricky to accommodate XML Namespaces in HTML 4 Validation; the way that the "xmlns" attributes work makes them difficult to deal with in SGML grammars. In order to make this work better, and also to decouple RDFa "prefix" from XML "namespaces", we may introduce an additional "prefix" attribute:
prefix="dcterms=someURI geo=someOtherURI"
Note that use of this attribute is not REQUIRED. Authors are welcome to continue to use xmlns:prefix="someURI" syntax in their documents and RDFa processors will work just fine now, and will continue to work fine even if this new attribute is introduced. The syntax for the value of this proposed attribute is:
mapping := [ [ prefixName ] '=' ] URI [ ' ' mapping ]*
Each possible syntax variation has slightly different semantics:
- When a value 'prefixName=URI' is specified, CURIEs in the current scope that are prefixed with the string prefixName will be expanded to URIs starting with 'URI'.
- When a value '=URI' is specified, CURIEs that are unprefixed in the current scope will be expanded to URIs starting with 'URI'. Note that this feature is at risk - it is unlikely that there is a good use case for this that is not better addressed by the following point.
- When a value 'URI' is specified, the document at URI will be evaluated for CURIE "reserved word" definitions, and those definitions will be available for use in CURIEs in the current scope. See RDFa Vocabularies for more on how such definitions might be created. The external resource ultimately needs to be able to map down to basic RDF atoms that define terms.
Dealing with Literals
RDFa defines a mechanism for easily extracting 'literal' values. You do this via the 'property' attribute - you associate a literal value with a subject via a property. For example:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:dc="http://purl.org/dc/elements/1.1/"
>
<head>
<title>Jo's Friends and Family Blog</title>
<meta property="dc:creator" content="Jo" />
</head>
<body>
<h1 property="dc:title">Jo's Friends and Family Blog</h1>
...
</body>
</html>
In this example, there are two literal values. The first, in the 'meta' element, tells an RDFa processor that the value "Jo" is associated with the current document (the default subject) via the property "creator" as defined by Dublin Core. This type of value is called a "plain literal". The second, in the 'h1' element, tells an RDFa processor that the value "Jo's Friends and Family Blog" is associated with the current document via the property "title" (also) as defined by Dublin Core.
There are lots of other datatypes that you can easily associate with a subject via this same mechanism. One such type is, confusingly, called an "XML Literal". Such a literal has little to do with XML at all. Instead, at least in the RDFa context, what it means is that the literal will include all of the "markup". So, if we were to change our example:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:dc="http://purl.org/dc/elements/1.1/"
>
<head>
<title>Jo's Friends and Family Blog</title>
<meta property="dc:creator" content="Jo" />
</head>
<body>
<h1 property="dc:title">Jo's <em>Friends</em> and Family Blog</h1>
...
</body>
</html>
Now the 'h1' element has children other than simple text. By default, RDFa processors know that the literal associated with 'h1' should now be interpreted as an XMLLiteral and convey all sorts of additional data to a consuming application. Its value would be something like:
"Jo's <em xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\">Friends</em> and Family Blog"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
The upshot of all this is that, by default, an RDFa processor will do the right thing to ensure that values associated with properties are of the right "datatype" so that an RDF-consuming application will know what to do. In most cases these details should not matter to an HTML or XHTML content author. All an author needs to do is use the appropriate property, and the RDFa processor will do the work of capturing the data in a way that will make sense to RDF-consuming applications and linking it to the right subject.
Validation of HTML 4 + RDFa
HTML document validation is done using free and commercial validation tools (e.g., [1] ). Unfortunately, today, the W3C validator will complain about the use of xmlns attributes in the document. We are working on a fix for this in the W3C validator. In the interim, it is safe and fine to ignore these warnings when validating your HTML4+RDFa documents.

