Rdfa-design-principles

From RDFaWiki

Jump to: navigation, search

Contents

Introduction

RDFa has a number of design principles that are not always evident to somebody that is new to web semantics. The list below is an attempt to exhaustively summarize some of the key design decisions made when creating RDFa.

The Data Visibility Principle

[Ben: This is pretty much the same thing as DRY, in fact that's who we've pitched it all along: there's a human-readable web page, we should reuse the visible content and make it more meaningful. Since this is design principles, I think we should keep to as few as possible?]

The biggest problem with hidden structured data is that it causes the human-readable data and the machine readable data to become out of sync. It is currently difficult to debug hidden meta-data, only search engines really take advantage of it.

The Data Visibility principle states that it is best to re-use existing human-readable data as the machine-readable data. While this principle overlaps slightly with the DRY Principle, the Data Visibility Principle goes further and states that hidden meta-data is generally not a good idea because of the reason listed above.

The DRY Principle

The concept of Do Not Repeat Yourself, also known as DRY or the Single Point of Truth principle did not start in the RDFa community. Dave Thomas (no, not the founder of the Wendy's restaurant chain) made it a popular concept in his book called "The Pragmatic Programmer" (Addison-Wesley, 1999). It is a philosophy targeted at reducing duplication in systems.

RDFa follows this philosophy by striving to ensure that the HTML author does not have to express meta-data for a machine as well as for a human using a parallel stream of markup in their HTML code. In other words, if they specify that the name of an author in their HTML code once, they shouldn't have to state the same author name again in another file or in the same HTML document. All that should be required is for the HTML author to add a number of RDFa attributes to the HTML to express semantics in the document to a machine.

To demonstrate a violation of the DRY principle, examine the following code:


The author's name is <span property="dcterms:creator" data="Albert Einstein">Albert Einstein</span>.

The code above lists "Albert Einstein" twice, instead of once. This is a clear violation of the DRY principle. It is not always possible to not violate the DRY principle, but RDFa strives to help web authors to not violate the principle.

To demonstrate the proper application of the DRY principle, examine the following code:


The author's name is <span property="dcterms:creator">Albert Einstein</span>.

The markup above is valid RDFa markup and demonstrates that it is possible to express a machine readable and human readable expression without repeating the pertinent data (the author's name).

The HTML Attribute Reuse Principle

[Ben: this should be part of the same principle as the DRTB principle below]

Semantic HTML isn't a new concept, in fact, HTML has had standardized semantics ever since it included the @rel attribute in HTML 3.2. In fact, the first IETF spec for rel was in December of 1995 and was included as a way to "specify the relationship of the target @href to the anchor element".

One of the main design criteria for RDFa was to re-use existing semantic attributes and markup mechanisms as much as possible, inventing new attributes only when absolutely necessary to accomplish the requirements of a use case.

This design principle was the reason that RDFa re-uses @href, @rel, @rev, and @src.

The DRTB Principle

The Don't Rock The Boat (DRTB) principle means that HTML authors shouldn't have to change their existing markup in any major way to express RDFa. In other words, RDFa shouldn't impact the layout of existing HTML pages because of markup requirements.

There are a number of ways that RDFa could have imposed a set of markup rules that would have forced HTML authors to re-write the layout of web pages, this design principle was a common argument against making HTML authors change their markup for the sake of semantics. The designers of RDFa felt that in order for the semantic web to become successful that current authoring and deployment practices should be minimally impacted.

The Follow-Your-Nose Principle

The Follow Your Nose Principle states that there should be a common mechanism to discover more about semantic objects and that should be provided by regular URL navigation. This means that any concept that is linked to semantically can be de-referenced and more triples extracted from the target URL document, if they are available. This principle allows software agents to automatically discover more about a given subject by following links, just like a human would follow links to external documents to learn more about a specific topic.

More

[Ben: The more general design principles we had outlined early in the task force were along the lines of what ccREL describes: http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/#SECTION00041000000000000000]

  • Independence and Extensibility: We cannot know in advance what new kinds of data we will want to integrate with Creative Commons licensing data. Currently, we already need to combine Creative Commons properties with simple media files (sound, images, videos) and there's a growing interest in providing markup for complex scientific data (biomedical records, experimental results). Therefore, the means of expressing the licensing information in HTML should be extensible: it should enable the reuse of existing data models and the addition of new properties, both by Creative Commons and by others. Adding new properties should not require extensive coordination across communities or approval from a central authority. Tools should not suddenly become obsolete when new properties are added, or when existing properties are applied to new kinds of data sets.
  • DRY (Don't Repeat Yourself): An HTML document often already displays the name of the author and a clickable link to a Creative Commons license. Providing machine-readable structure should not require duplicating this data in a separate format. Notably, if the human-clickable link to the license is changed, e.g. from v2.5 to v3.0, a machine processing the page should automatically note this change without the publisher having to update another part of the HTML file to keep it "in sync" with the human-readable portion.
  • Visual Locality: An HTML page may contain multiple items, for example a dozen photos, each with its own structured data, for example a different license. It should be easy for tools to associate the appropriate structured data with their corresponding visual display.
  • Remix Friendliness: It should be easy to copy an item from one document and paste it into a new document with all appropriate structured data included. In a world where we constantly remix old content to create new content, copy-and-paste, widgets, and sidebars are crucial elements of the remixable Web. As much as possible, ccREL should allow for easy copy-and-paste of data to carry along the appropriate licensing information.