Uri-vs-resource-ambiguity-problem
From RDFaWiki
Contents |
Introduction
The Uniform Resource Identifier vs. Real Resource Problem is an ambiguity issue in RDF where it is unknown if retrieving the URI will produce the actual resource (such as an MP3 file described by the Audio RDF Vocabulary), or a description of a non-web resource (such as a person described by the FOAF Vocabulary).
Previous Discussions
This issue has been discussed in the following places before:
- URI aliases
- Re: Subclass of Thing/Resource
- Is the term 'anonymous resource' a misnomer?
- RDF equivalency issue postponed
The Problem
In RDF, subjects are always URIs or blank nodes. In general, if a subject has a URI, the URI describes a specific resource on the Web. If a subject is a blank node, it cannot be dereferenced and thus is not addressable on the Web.
Real World Examples
The real-world implication of the URI vs Real Resource Ambiguity Problem is that inference engines could come to the wrong conclusion when attempting to infer certain properties about resources both on and off the web. The following real-world examples outline specific problems as a result of the "theoretical" issue.
Books
Consider the following markup:
<div xmlns:dcterms="http://purl.org/dc/terms/" xmlns:book="http://purl.org/book#"
about="urn:isbn:978-0545029377" typeof="book:Novel">
<p>
<span property="dcterms:title">Harry Potter and the Deathly Hallows</span> by
<span property="dcterms:creator">J. K. Rowling</span>
</p>
<p>
Published: <span property="dcterms:issued">July 21, 2007</span>
</p>
</div>
and resulting triples:
@prefix dcterms "http://purl.org/dc/terms/" .
@prefix book "http://purl.org/book#" .
<urn:isbn:978-0545029377> rdf:type book:Novel ;
dcterms:title "Harry Potter and the Deathly Hallows" ;
dcterms:creator "J. K. Rowling" ;
dcterms:issued "July 21, 2007" .
The issue here is that the ISBN number has an RDF type of book:Novel. While a human will know that the ISBN number is referring to a book with the given ISBN number, a computer must be trained further to understand that the URI given is not the real resource, but a resource indicator (a pointer to the resource). This is an issue because one could also mark up the book like so:
<div xmlns:dcterms="http://purl.org/dc/terms/" xmlns:book="http://example.org/book#"
about="http://example.org/books/DeathlyHallows.pdf" typeof="book:Novel">
<p>
<span property="dcterms:title">Harry Potter and the Deathly Hallows</span> by
<span property="dcterms:creator">J. K. Rowling</span>
</p>
<p>
<span property="dcterms:issued">July 21, 2007</span>
</p>
</div>
generating the resulting triples:
@prefix dcterms "http://purl.org/dc/terms/" .
@prefix book "http://example.org/book#" .
<http://example.org/books/DeathlyHallows.pdf> rdf:type book:Novel ;
dcterms:title "Harry Potter and the Deathly Hallows" ;
dcterms:creator "J. K. Rowling" ;
dcterms:issued "July 21, 2007" .
The markup used and triples generated in the example above mean the same thing to both a human and a machine without any need for modification, which cannot be said for the first example.
People
Consider the following markup:
<div xmlns:foaf="http://xmlns.com/foaf/0.1/"
about="http://digitalbazaar.com/people/manu" typeof="foaf:Person">
<span property="foaf:name">Manu Sporny</span>
works for
<a rel="foaf:workplaceHomepage" href="http://digitalbazaar.com/">Digital Bazaar</a>, Inc.
</div>
and the resulting triples:
@prefix foaf "http://xmlns.com/foaf/0.1/" .
<http://digitalbazaar.com/people/manu> rdf:type foaf:Person ;
foaf:name "Manu Sporny" ;
foaf:workplaceHomepage <http://digitalbazaar.com/> .
While the subject URL above points to a web page, the RDF type of that resource is a foaf:Person. The vocabulary definition for foaf:Person is this:
The foaf:Person class represents people. Something is a foaf:Person if it is a person. We don't nitpic about whether they're alive, dead, real, or imaginary.
If one wanted to be semantically pedantic, they could claim that the only type of Person that could potentially be referenced via a URL is an imaginary foaf:Person who resides on a particular URL. There is never a case where one could retrieve a URL and download a person.
Music
When describing a song on the web, one may or may not refer to a digital recording of the particular song. For example, if one were to mention the following:
<div xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:audio="http://purl.org/media/audio#"
about="http://www.youtube.com/watch?v=p_81l4DXlwM" typeof="audio:Recording">
<span property="dcterms:title">Start Wearing Purple</span> by
<span property="dcterms:creator">Gogol Bordello</span>
</div>
the following triples would be generated:
@prefix dcterms "http://purl.org/dc/terms/" .
@prefix audio "http://purl.org/media/audio#" .
<http://www.youtube.com/watch?v=p_81l4DXlwM> rdf:type audio:Recording ;
dcterms:title "Start Wearing Purple" ;
dcterms:creator "Gogol Bordello" .
However, the URL that is used as the subject is not a direct retrieval mechanism for the audio recording and only contains the audio recording as part of the website. An autonomous agent would incorrectly deduce that the subject URL was an audio recording, which is incorrect.
Potential Solutions
At the moment, the problem has resulted in markup that is semantically ambiguous in some cases that cannot be detected by a computer. This leads to computing algorithms incorrectly deducing fallacies about a particular set of RDF statements. While the issue may seem theoretical, it has direct usability issues such as providing the wrong link to download a book or piece of music.
Use New URI Scheme to Address Non-Web Resources
Steven Pemberton wrote about the creation of a new URI scheme to address this problem:
One question that has already been posed to me for which I do not have a good answer: In the examples I point to the dbpedia entry for the Washington Monument as the definitive *subject* I am describing. There is also a Washington Monument web page from the US Government. Why is that not a more appropriate subject? I use a link to it in the examples so there is something to "click" on. And if that were the subject for everything, is there a way to update the markup so that we wouldn't have to duplicate the URI?
As Mark pointed out, this is a question I recently raised. In short, there is a major difference between a thing and a web page about that thing.
The dc:creator of me is my mother*, but the dc:creator of my web page is me. If you mixed me up with my web page, you would have to conclude that I was my own mother. It's as simple as that.
I went to a talk at XTech by someone creating a search engine for RDF, and he said it was a major headache for them, since their engine was constantly concluding things like "Tim Berners-Lee" and "W3C" were the same thing.
So the approach I am using in the tutorial I am writing is to say:
<link about="_:WashingtonMonument" rel="foaf:primaryTopicOf"
href="http://www.dbpedia.org/resource/Washington_Monument" />
...
<p about="_:WashingtonMonument"
property="geo:lat_long" content="38.8895563,-77.0352546">During our trip ...
and then it doesn't matter if you use the Wikipedia page as referent, or the .gov page, or even both.
My recent proposal to shortcut this is to define a new URI scheme:
<p about="pto:http://www.dbpedia.org/resource/Washington_Monument"
property="geo:lat_long" content="38.8895563,-77.0352546">During our trip ...
(where 'pto' means 'primary topic of').
Use Bnodes or Fragments to Denote Non-Web Resources
Steven Pemberton and Manu Sporny also noted a slight modification on his primaryTopicOf (pto) URI proposal above without the new URI scheme. The proposal is to use blank nodes (proposed by Steven Pemberton) or fragment identifiers (proposed by Manu Sporny) when describing resources that are not retrievable via the Web. Therefore, non-Web People, Places, Events, and real-world objects in general would use blank nodes when being described on the web via RDF or RDFa. Here are a couple of examples:
People
<div xmlns:foaf="http://xmlns.com/foaf/0.1/"
about="_:Manu" typeof="foaf:Person">
<span property="foaf:name">Manu Sporny</span>
works for
<a rel="foaf:workplaceHomepage" href="http://digitalbazaar.com/">Digital Bazaar</a>, Inc.
</div>
and the resulting triples:
@prefix foaf "http://xmlns.com/foaf/0.1/" .
_:Manu rdf:type foaf:Person ;
foaf:name "Manu Sporny" ;
foaf:workplaceHomepage <http://digitalbazaar.com/> .
Places
<p xmlns:foaf="http://xmlns.com/foaf/1.0/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
about="_:WashingtonMonument" property="geo:lat_long" content="38.8895563,-77.0352546">
During our trip we visited the
<a rel="foaf:homepage" href="http://www.nps.gov/wamo/">Washington Monument</a>:
</p>
and the resulting triples:
@prefix foaf "http://xmlns.com/foaf/1.0/" .
@prefix geo "http://www.w3.org/2003/01/geo/wgs84_pos#" .
_:WashingtonMonument geo:lat_long "38.8895563,-77.0352546" ;
foaf:homepage <http://www.nps.gov/wamo/> .
Audio Recordings
<html version="XHTML+RDFa 1.0"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:audio="http://purl.org/media/audio#">
<div about="http://example.org/speeches#i-have-a-dream" typeof="audio:Recording">
<span property="dcterms:title">I Have a Dream</span>, a
<span property="dcterms:type">speech</span> by
<span property="dcterms:creator">Martin Luther King, Jr.</span>
</div>
and the resulting triples:
@prefix dcterms "http://purl.org/dc/terms/" .
@prefix audio "http://purl.org/media/audio#" .
<http://example.org/speeches#i-have-a-dream> rdf:type audio:Recording ;
dcterms:title "I Have a Dream" ;
dcterms:type "speech" ;
dcterms:creator "Martin Luther King, Jr." .
Always Assume Subject is a Non-web Resource
One potential solution offered by Manu Sporny is to always assume that a URI is a resource indicator to a non-Web resource unless a specific vocabulary term is used to specify that the resource can be retrieved. While the specific vocabulary term is yet-to-be determined, this approach provides a great deal of backwards compatibility by assuming that all URI RDF references on the web currently describe non-web resources.
For example, assume that the vocabulary term used to identify whether the resource being described is a Web-based resource or a non-Web resource is rdf:realRepresentation. An author that is describing a book could do the following if they were talking about a non-Web book:
<div xmlns:dcterms="http://purl.org/dc/terms/" xmlns:book="http://purl.org/book#"
about="urn:isbn:978-0545029377" typeof="book:Novel">
<p>
<span property="dcterms:title">Harry Potter and the Deathly Hallows</span> by
<span property="dcterms:creator">J. K. Rowling</span>
</p>
<p>
Published: <span property="dcterms:issued">July 21, 2007</span>
</p>
</div>
or do the following if they were talking about a Web-based book at a particular URL:
<div xmlns:dcterms="http://purl.org/dc/terms/" xmlns:book="http://purl.org/book#"
about="urn:isbn:978-0545029377" typeof="book:Novel">
<p>
<span property="dcterms:title">Harry Potter and the Deathly Hallows</span> by
<span property="dcterms:creator">J. K. Rowling</span>
</p>
<p>
Published: <span property="dcterms:issued">July 21, 2007</span>
( <a rel="rdf:realRepresentation" href="http://example.org/books/DeathlyHallows.pdf">Download</a> )
</p>
</div>
The first markup describes a book that may or may not be a web-based resource. The second describes a book that has a real representation on the web specified using the given URL.
A nuance of this approach that may be lost on some is that this changes a subject URI into just another long identifier with no particular meaning. It can't be assumed that the identifier has anything to do with a particular resource (although, in most cases it will). This approach splits RDF subjects into two categories: named resources (URIs) and unnamed resources (bnodes). Both should be considered non-web resources unless explicitly specified.

