Developer-faq
From RDFaWiki
ManuSporny (Talk | contribs) m (added hidden metadata question) |
ManuSporny (Talk | contribs) m (→Can't you hide meta-data using RDFa, or make pages express something different to computers than they do to humans?) |
||
| (4 intermediate revisions not shown) | |||
| Line 14: | Line 14: | ||
What is "best" is usually a decision that only the application developer can make because it is heavily dependent on a numerous set of conflicting requirements. RDF will work in some cases, in others, perhaps SQL or OODBMS is best. | What is "best" is usually a decision that only the application developer can make because it is heavily dependent on a numerous set of conflicting requirements. RDF will work in some cases, in others, perhaps SQL or OODBMS is best. | ||
| + | |||
| + | = Attributes = | ||
| + | |||
| + | == Why does RDFa have both rel and property for defining predicates? == | ||
| + | |||
| + | Semantic HTML isn't a new concept, in fact, HTML has had standardized semantics ever since it included the @rel attribute in HTML 3.2. In fact, the first IETF [http://ftp.ics.uci.edu/pub/ietf/html/draft-ietf-html-relrev-00.txt spec for rel] was in December of 1995 and was included as a way to "specify the relationship of the target @href to the anchor element". | ||
| + | |||
| + | One of the main design criteria for RDFa was to re-use existing semantic attributes as much as possible, inventing new attributes only when absolutely necessary to accomplish the requirements of a use case. Since rel has always been a semantic attribute associated with the anchor/link element. Leaving rel out of the list of known RDFa attributes violated the design principle of re-using existing semantic HTML attributes. | ||
| + | |||
| + | However, @rel can only be used on specific elements, so a new property was needed to specify predicates on any HTML element. This additional mechanism is @property. | ||
| + | |||
| + | There is also a functional difference between @rel and @property. @rel is used to relate two URL resources together with a predicate. In general, @property is used to relate a URL resource, using a predicate, to a non-URL object literal. | ||
| + | |||
| + | == Why does RDFa use href, src, about and resource to specify URL-based resources? == | ||
| + | |||
| + | There are several ways that one can specify URL-based resources in RDFa. These include @href and @src, which have been re-used from HTML based on the RDFa design principle of [http://rdfa.info/wiki/rdfa-design-principles#Re-use_of_Existing_Semantic_HTML_attributes attribute reuse]. | ||
| + | |||
| + | The newest attribute additions, @about and @resource, were included in the RDFa specification as alternative mechanisms for specifying URL-based resources to the ones that already exist in HTML. | ||
| + | |||
| + | @about and @src are used to switch the currently active subject in RDFa. @resource is used to override @href because it is sometimes desirable to specify a different target URL for the machine-based semantics. | ||
| + | |||
| + | == Why do we have rev, when the triple could just be given backwards? == | ||
| + | |||
| + | @rev was included in RDFa because of three RDFa design principles - the [http://rdfa.info/wiki/Rdfa-design-principles#The_DRTB_Principle DRTB principle], the [http://rdfa.info/wiki/rdfa-design-principles#The_DRY_Principle DRY principle] and the [http://rdfa.info/wiki/Rdfa-design-principles#The_HTML_Attribute_Reuse_Principle attribute reuse] principle. | ||
| + | |||
| + | Since @rev was introduce to provide a reverse semantic relationship for @rel in HTML3.2, the HTML Attribute Reuse Principal applied. | ||
| + | |||
| + | In order to save authors from having to change their markup, per the DRTB Principle, it is sometimes more efficient to use @rev instead of @rel to express certain types of semantic relationships. | ||
| + | |||
| + | If RDFa were to force a uni-directional triple markup mechanism, data would be repeated thus violating the DRY Principle, or it would cause the author to change their HTML layout, thus violating the DRTB Principle. | ||
| + | |||
| + | == Are typeof and datatype really that useful? == | ||
| + | |||
| + | It depends on the use case, but in general, @typeof is used freqently. @datatype is used whenever datatype specification should be used to hint at a more specific data type than the default "string" datatype. | ||
| + | |||
| + | The argument against @typeof is that it is just an alias for rdf:type, so this: | ||
| + | |||
| + | <pre> | ||
| + | <nowiki> | ||
| + | <span xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" | ||
| + | xmlns:foaf="http://xmlns.com/foaf/0.1/" | ||
| + | about="#me" rel="rdf:type" resource="foaf:Person">I am a human being!</span> | ||
| + | </nowiki> | ||
| + | </pre> | ||
| + | |||
| + | is the same as this: | ||
| + | |||
| + | <pre> | ||
| + | <nowiki> | ||
| + | <span xmlns:foaf="http://xmlns.com/foaf/0.1/" | ||
| + | about="#me" typeof="foaf:Person">I am a human being!</span> | ||
| + | </nowiki> | ||
| + | </pre> | ||
| + | |||
| + | @typeof is also used for establishing blank nodes. Blank nodes are used to define semantic objects that do not have a permanent URL. | ||
| + | |||
| + | For example, type information is relied upon heavily by [http://rdfa.digitalbazaar.com/fuzzbot Fuzzbot] to generate the proper UIs for a group of triples about a typed semantic object. | ||
| + | |||
| + | @datatype is useful when specifying items that have a ISO standardized format, such as dates, times, durations, base64 encoded data and numbers. | ||
= Search Questions = | = Search Questions = | ||
| Line 109: | Line 168: | ||
= Authoring = | = Authoring = | ||
| + | |||
| + | == Why does RDFa use URIs for identifying subjects and objects? == | ||
| + | |||
| + | One of the design principles behind RDFa is called the [http://rdfa.info/wiki/Rdfa-design-principles#The_Discovery_Principle Discovery Principle], also known as the Follow Your Nose Principal. The principal basically states that there should be a common mechanism to discover more about semantic objects and that should be provided by regular URL navigation. This means that any concept that is linked to semantically can be de-referenced and more relevant triples extracted from the target URL document, if they are available. This principle allows software agents to automatically discover more about a given subject by following links, just like a human would follow links to external documents to learn more about a specific topic. | ||
| + | |||
| + | In order to provide this functionality, some sort of navigable, global-identifier is needed. Since another design principle for RDFa was that of re-use, RDFa adopted the ubiquitous URL as the global-identifier mechanism because it is well known, and more importantly, it works. | ||
== Why does RDFa use CURIEs? == | == Why does RDFa use CURIEs? == | ||
| Line 192: | Line 257: | ||
For example, RDFa allows you to override an object literal for a triple by using @content: | For example, RDFa allows you to override an object literal for a triple by using @content: | ||
| - | + | <pre> | |
| + | <nowiki> | ||
My name is | My name is | ||
<span about="#me" property="foaf:name" content="Bob">Jane</span>. | <span about="#me" property="foaf:name" content="Bob">Jane</span>. | ||
| - | + | </nowiki> | |
| + | </pre> | ||
To computers, this RDFa statement would express that your name is "Bob", but humans would read that your name is "Jane". Obviously, this is not encouraged behavior. Hiding any sort of meta-data from a human is generally considered harmful because hidden meta-data easily gets out of sync with the displayable page data. If you can't see your meta-data when you look at the web page with a browser, there is a good chance that you won't catch errors and neither will your site visitors. | To computers, this RDFa statement would express that your name is "Bob", but humans would read that your name is "Jane". Obviously, this is not encouraged behavior. Hiding any sort of meta-data from a human is generally considered harmful because hidden meta-data easily gets out of sync with the displayable page data. If you can't see your meta-data when you look at the web page with a browser, there is a good chance that you won't catch errors and neither will your site visitors. | ||
| + | |||
| + | == Aren't mechanisms like profile difficult for non-programmers to understand? == | ||
| + | |||
| + | Some have said that the proposed @profile extension to RDFa would be difficult for web developers and designers to author: | ||
| + | |||
| + | <pre> | ||
| + | <nowiki> | ||
| + | <div profile="http://purl.org/media/audio" typeof="Recording"> | ||
| + | <span property="title">We're Going to Be Friends</span> by | ||
| + | <span property="creator">The White Stripes</span> | ||
| + | </article> | ||
| + | </nowiki> | ||
| + | </pre> | ||
| + | |||
| + | However, the argument falls flat because this mechanism is almost directly analogous to how web designers currently use Cascading Style Sheets: | ||
| + | |||
| + | <pre> | ||
| + | <nowiki> | ||
| + | <link rel="stylesheet" href="http://purl.org/media/audio"> | ||
| + | ... | ||
| + | <div class="Recording"> | ||
| + | <span class="title">We're Going to Be Friends</span> by | ||
| + | <span class="creator">The White Stripes</span> | ||
| + | </div> | ||
| + | </article> | ||
| + | </nowiki> | ||
| + | </pre> | ||
= Security = | = Security = | ||
| Line 247: | Line 341: | ||
No, currently XHTML+RDFa 1.0 does not define a storage or persistence model. RDFa was created to express RDF in HTML family languages, but can be used in any structured document with attribute support. The storage model is dependent on the implementation language as well as the consuming application and will vary from use-case to use-case. | No, currently XHTML+RDFa 1.0 does not define a storage or persistence model. RDFa was created to express RDF in HTML family languages, but can be used in any structured document with attribute support. The storage model is dependent on the implementation language as well as the consuming application and will vary from use-case to use-case. | ||
| + | |||
| + | = Ongoing Debate = | ||
| + | |||
| + | == What is the primary topic of a URL? == | ||
| + | |||
| + | There is an ongoing discussion about best practices when using URLs to describe non-document semantic objects. For example, if we were to use this for Bob's URL: | ||
| + | |||
| + | <pre> | ||
| + | <nowiki> | ||
| + | http://example.org/people/bob | ||
| + | </nowiki> | ||
| + | </pre> | ||
| + | |||
| + | Is it a good idea to do this? | ||
| + | |||
| + | <pre> | ||
| + | <nowiki> | ||
| + | <span about="http://example.org/people/bob" typeof="foaf:Person">Bob is a person</span> | ||
| + | </nowiki> | ||
| + | </pre> | ||
| + | |||
| + | Some read the above issue as implying that the document located at Bob's URL is a person, when we are typically conditioned to refer to the URL as a regular document on the web. So, what is it? Is it a document or a person? Would the following be a better way of expressing the same information? (note the #person at the end of Bob's URL): | ||
| + | |||
| + | <pre> | ||
| + | <nowiki> | ||
| + | <span about="http://example.org/people/bob#person" typeof="foaf:Person">Bob is a person</span> | ||
| + | </nowiki> | ||
| + | </pre> | ||
| + | |||
| + | This offloads the definition of foaf:Person to the #person anchor on Bob's URL (the document). There are two good arguments for either method. | ||
| + | |||
| + | # Documents should only express semantics about documents, their contents which should be identified by #anchors should contain descriptions about other semantic objects. This keeps a logical separation between documents and other "semantic things". | ||
| + | # Documents can express semantics about documents and people because type information can be used to differentiate between the two types of semantic objects in a page. While this muddles the logical separation between documents and other semantic objects on the web, the statements made about each are separable. | ||
Latest revision as of 23:13, 23 February 2009
The Data Model
Is RDF the best way to store data?
No, it isn't. Nobody is arguing that RDF is the best data model for everything. The only argument that the RDF community is making is that it is ONE minimal form of a universal data encoding mechanism. SQL Stored Procedures can, and do, make the same claim, as could Lisp or any other Turing-complete language.
RDF has the benefit of being very simple and very expressive, which are characteristics that will always be at odds with one another. Any simpler and you lose semantic meaning any more complex and it becomes more difficult to express.
It could be argued that DNA isn't the best data model for the human genome because it starts to fail after telomers start shortening. Telomer shortening is one reason behind why we, as humans, die. The problem with DNA is that we don't have anything better to replace it yet.
So, while it is true that DNA isn't necessarily the best way to encode the human genome, nobody has been able to come up with anything better.
The "best way to store data" argument against RDF is no different. Nobody is claiming that it is the best way to encode everything, it's just that nobody has been able to come up with anything better.
What is "best" is usually a decision that only the application developer can make because it is heavily dependent on a numerous set of conflicting requirements. RDF will work in some cases, in others, perhaps SQL or OODBMS is best.
Attributes
Why does RDFa have both rel and property for defining predicates?
Semantic HTML isn't a new concept, in fact, HTML has had standardized semantics ever since it included the @rel attribute in HTML 3.2. In fact, the first IETF spec for rel was in December of 1995 and was included as a way to "specify the relationship of the target @href to the anchor element".
One of the main design criteria for RDFa was to re-use existing semantic attributes as much as possible, inventing new attributes only when absolutely necessary to accomplish the requirements of a use case. Since rel has always been a semantic attribute associated with the anchor/link element. Leaving rel out of the list of known RDFa attributes violated the design principle of re-using existing semantic HTML attributes.
However, @rel can only be used on specific elements, so a new property was needed to specify predicates on any HTML element. This additional mechanism is @property.
There is also a functional difference between @rel and @property. @rel is used to relate two URL resources together with a predicate. In general, @property is used to relate a URL resource, using a predicate, to a non-URL object literal.
Why does RDFa use href, src, about and resource to specify URL-based resources?
There are several ways that one can specify URL-based resources in RDFa. These include @href and @src, which have been re-used from HTML based on the RDFa design principle of attribute reuse.
The newest attribute additions, @about and @resource, were included in the RDFa specification as alternative mechanisms for specifying URL-based resources to the ones that already exist in HTML.
@about and @src are used to switch the currently active subject in RDFa. @resource is used to override @href because it is sometimes desirable to specify a different target URL for the machine-based semantics.
Why do we have rev, when the triple could just be given backwards?
@rev was included in RDFa because of three RDFa design principles - the DRTB principle, the DRY principle and the attribute reuse principle.
Since @rev was introduce to provide a reverse semantic relationship for @rel in HTML3.2, the HTML Attribute Reuse Principal applied.
In order to save authors from having to change their markup, per the DRTB Principle, it is sometimes more efficient to use @rev instead of @rel to express certain types of semantic relationships.
If RDFa were to force a uni-directional triple markup mechanism, data would be repeated thus violating the DRY Principle, or it would cause the author to change their HTML layout, thus violating the DRTB Principle.
Are typeof and datatype really that useful?
It depends on the use case, but in general, @typeof is used freqently. @datatype is used whenever datatype specification should be used to hint at a more specific data type than the default "string" datatype.
The argument against @typeof is that it is just an alias for rdf:type, so this:
<span xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
about="#me" rel="rdf:type" resource="foaf:Person">I am a human being!</span>
is the same as this:
<span xmlns:foaf="http://xmlns.com/foaf/0.1/"
about="#me" typeof="foaf:Person">I am a human being!</span>
@typeof is also used for establishing blank nodes. Blank nodes are used to define semantic objects that do not have a permanent URL.
For example, type information is relied upon heavily by Fuzzbot to generate the proper UIs for a group of triples about a typed semantic object.
@datatype is useful when specifying items that have a ISO standardized format, such as dates, times, durations, base64 encoded data and numbers.
Search Questions
Would a search system based on RDFa give better results?
Would a search system based on RDF or RDFa give a better answer to searches done in Google (in 2009)? How? Does it require all data to be marked up as RDFa?
Which is better, RDFa or Natural Language Processing?
Can an RDF/RDFa system do better from a natural language query?
Will authors widely and reliably use RDFa?
Do we have reason to believe that it is more likely that we will get authors to widely and reliably include such relations than it is that we will get high quality natural language processing? Why?
How does RDFa deal with unstructured natural language search queries?
How would an RDF/RDFa system deal with the problem of the _questions_ being unstructured natural language?
Isn't an HREF good enough for expressing links between concepts on the web?
<a href> is a great way of providing a link, which is why RDFa uses it! For example, a person represented in RDFa might be:
<a xmlns:foaf="http://xmlns.com/foaf/0.1/" typeof="foaf:Person" property="foaf:name" rel="foaf:homepage" href="http://joe.example.com/">Joe Bloggs</a>
As well as href, RDFa also reuses another existing HTML attribute for linking to stuff: src. RDFa does introduce one new attribute for linking to things: resource. This is intended as a way of expressing a link without creating a clickable area on the page. It is often useful for when the thing you want to link to is not directly consumable in browsers and you wish to use src or href to link to a "friendly" resource, but resource to link to the machine-consumable resource.
Data Sharing
How does RDFa work with companies that publish their data in non-RDFa formats?
How would an RDF/RDFa system deal with data provided by companies that have no interest in providing the data in RDF or RDFa? (e.g. companies providing data dumps in XML or JSON.)
How does RDFa work with companies that don't want to provide their data for free?
How would an RDF/RDFa system deal with companies that do not want to provide the data free of charge?
How does RDFa deal with authors screwing up and encoding bad data?
Like most technologies, RDFa in and of itself is incapable of preventing mis-use. The same pitfalls hold true for authoring HTML, JSON, or even e-mail and articles created using any written language. People make spelling and grammatical mistakes quite often and there are systems and tools to detect these mistakes and sometimes correct them.
RDFa is not much different than the English language. There are basic preventative measures that are included in the language, such as not generating triples when a prefix is unknown, or if a triple is malformed. However, it is fairly difficult to prevent authors from eventually making mistakes. There are tools out already, such as Fuzzbot, that allow web page authors to see the triples and data that they markup.
If data is malformed, the UIs that use that data, such as Fuzzbot, will clearly show information that is not what the author intended. This method of seeing an error gives them an opportunity to correct the error.
Authoring tools are also planned that will reduce the number of hand-authoring errors in content management systems, such as Drupal and Wordpress.
How does RDFa deal with apathy from sites that you want to scrape data from?
How does RDFa deal with apathy from sites that you want to scrape data from?
How does RDFa deal with deal with spammers or other malicious authors encoding misleading data?
How does RDFa deal with deal with spammers or other malicious authors encoding misleading data?
How does RDFa enable monetization for producers who are intentionally obfuscating the data today?
How does RDFa enable monetization for producers who are intentionally obfuscating the data today?
How does RDFa track per-developer usage of their data?
How would an RDF/RDFa system deal with companies that want to track per-developer usage of their data?
How is RDFa going to help sites like Wikipedia?
How is RDFa going to make the thousands or millions of Wikipedia contributors faster?
Doesn't RDFa create invisible meta-data and isn't that a bad idea?
Doesn't RDFa create invisible meta-data and isn't that a bad idea?
Process
The RDFa Task Force was only chartered to solve the metadata problem in XHTML, so why bother with HTML4 and HTML5?
The RDFa Task Force was only chartered to solve the metadata problem in XHTML, so why bother with HTML4 and HTML5?
HTML and XHTML Differences
HTML is parsed differently than XHTML, is it possible to write one RDFa parser to parse both XHTML and HTML?
HTML is parsed differently than XHTML, is it possible to write one RDFa parser to parse both XHTML and HTML?
QNames have been identified as a known anti-pattern, does RDFa revive QName use?
QNames have been identified as a known anti-pattern, does RDFa revive QName use?
It is a common misconception that RDFa uses QNames.
RDFa does not use QNames. The specification has defined the CURIE datatype with explicit parsing rules, and it has been specifically defined as not mapping to (namespace,local), but instead to a full URL. RDFa does not use a browser's handling of QNames, and whatever brokenness that might exist with QNames doesn't apply to CURIEs or RDFa.
We only use the xmlns declarations of mapping prefixes, never QName expansion.
RDFa does not use QNames.
What about the conflict between HTML5 and RDFa with the use of CURIEs in the @rel attribute?
What about the conflict between HTML5 and RDFa with the use of CURIEs in the @rel attribute?
Authoring
Why does RDFa use URIs for identifying subjects and objects?
One of the design principles behind RDFa is called the Discovery Principle, also known as the Follow Your Nose Principal. The principal basically states that there should be a common mechanism to discover more about semantic objects and that should be provided by regular URL navigation. This means that any concept that is linked to semantically can be de-referenced and more relevant triples extracted from the target URL document, if they are available. This principle allows software agents to automatically discover more about a given subject by following links, just like a human would follow links to external documents to learn more about a specific topic.
In order to provide this functionality, some sort of navigable, global-identifier is needed. Since another design principle for RDFa was that of re-use, RDFa adopted the ubiquitous URL as the global-identifier mechanism because it is well known, and more importantly, it works.
Why does RDFa use CURIEs?
RDFa uses CURIEs for the following reasons:
- It eases the cognitive load for the web developer.
- It reduces clutter and eases readability of the HTML code.
- It reduces URL errors introduced by typing out complete URLs.
- It reduces the size of HTML files that contain a large number of RDFa statements.
Easing the Cognitive Load for the Web Developer
Instead of writing out full URLs for predicates (eg: http://purl.org/media/audio#Recording), the use of CURIEs allow the author to write something easier to remember (eg: "audio:Recording"). This reduces the cognitive load on the author if they are writing RDFa by hand. If they are not writing RDFa by hand, the authoring argument is a non-issue.
Reduces Clutter and Eases Readability of HTML Code
Using CURIEs reduces clutter and eases readability of HTML code by reducing the number of URLs that are placed in the HTML document. While this may seem like a small improvement, it certainly does help those that are debugging HTML code to not have to worry about checking every character in every URL that is used as a predicate in RDFa.
Reducing URL errors introduced by typing out complete URLs
The probability that a typing error will occur when repeatedly typing predicate URLs rises significantly as the document size grows. While the possibility still exists when typing a string like 'dcterms:title' or 'audio:Recording', it is less than when typing a URL like http://purl.org/dc/terms/title or http://purl.org/media/audio#Recording.
Reducing the size of HTML files that contain a large number of RDFa statements
While usually a minor issue, page size does matter to larger sites. This was a concern when developing RDFa and a nice side-effect of CURIEs is that they do reduce page size in almost all of the usage scenarios. For example, if we are marking up 20 audio recordings on a single web page, each with 3 predicates each (type, title and singer), we will need to specify
20 * 3 == 60
sixty predicates. With CURIEs, this results in
len("http://purl.org/media/audio#") + len("audio:")*60
388 characters used to express the predicates. Without CURIEs, this results in
len("http://purl.org/media/audio#")*60)
1680 characters used to express the predicates. Using CURIEs results in a 4x reduction in characters used.
What are the draw-backs of using CURIEs?
The most prevalent arguments against the use of CURIEs are:
- If the ratio of the number of vocabularies used to triples generated approach 1, the HTML file is larger than if no CURIEs were used.
- They cause HTML markup to be fragile under copy-paste scenarios.
- Prefixes are difficult to teach and understand.
CURIEs bloat HTML files
While it has been demonstrated that CURIEs can offer 4x reduction in character usage, it is true that if you only use a CURIE once that you will waste a number of characters. This is because the CURIE prefix must first be defined and then used. For example, if you were to specify just the title of a page using the dcterms vocabulary, you would use:
len("xmlns:dcterms='http://purl.org/dc/terms/title'") + len("dcterms:") - len("http://purl.org/dc/terms/title")
24 extra characters. However, if you were to use the dcterms prefix at least twice in your markup, you would save
len("xmlns:dcterms='http://purl.org/dc/terms/title'") + len("dcterms:") - len("http://purl.org/dc/terms/title")*2
6 characters. Most RDFa markup size benefits from the small up-front cost of defining prefixes.
CURIEs make RDFa markup fragile
The most prevalent argument against CURIEs is that they cause page markup to be fragile. If one copies HTML from one website to another, and forgets to copy the prefix definitions for the CURIEs (either by mistake or because they didn't know), then any triples that use those unknown prefixes will stop working. While this is true for cut-and-paste scenarios, it does not hold at all for authoring tools and content management systems which take care of defining the prefixes for the author. The alternative to not use CURIEs was explored and they provided too much benefit to ignore.
Defining Namespaces and Prefixes are Difficult to Teach and Understand
It has been asserted that prefixes, namespaces and non-URI structures are difficult to teach and understand. This is, however, hard to prove as many people use namespaces and prefixes in their everyday lives. http: is a namespace, as is a person's last name. Often it is the method of teaching that is lacking and not the strength of the concept.
Why doesn't RDFa use keywords instead of URIs for prefixes?
One of the biggest innovations that RDF and RDFa bring to the semantic web is the ability for a web designer or web developer to create their own vocabularies for describing structured data. If we were to use keywords without providing an external reference, the browser would have some sort of hard-coded understanding of those keywords. If we hard-code keywords into the browser, that means that the web designers/developers of the world couldn't use the technology to solve their specific problems. Only large problems deemed as important by the "powers that be" would be addressed.
URIs are used for prefixes as a mechanism to help anybody create and publish their own vocabulary.
Why doesn't RDFa use keywords with a central registry instead of URIs for prefixes?
Another angle on the URI authoring issue that has been proposed several times is the notion of a centralized keyword registry for prefixes. The issue with centralized registries is that there are certain fairly steep costs associated with operating the registry. The administration costs of running such a registry as well as possible legal costs arising out of conflict resolution can stretch into the hundreds of thousands, if not millions of dollars. ICANN, the Internet Corporation for Assigned Names and Numbers, has an annual budget of $15.83 million dollars. Even 1% of that represents a budget of $158,300 per year - and that's if everything is going according to plan.
Parallels have been drawn to the global Domain Name Service. Some have called the semantic web equivalent SNS - the Semantic Name Service. This service, however tempting, is unnecessary and constraining. The beauty of creating your own vocabulary and publishing it on your website is that you don't have to ask anybody for permission, it starts working the second you publish the vocabulary, and you are guaranteed to not clash with anybody else's vocabulary. This comes at the cost of having to specify a URL - something that everybody that uses the web is familiar writing.
Can't you hide meta-data using RDFa, or make pages express something different to computers than they do to humans?
Yes, you can hide meta-data using RDFa although it is heavily frowned-upon both by the RDFa community as well as the Microformats community.
For example, RDFa allows you to override an object literal for a triple by using @content:
My name is <span about="#me" property="foaf:name" content="Bob">Jane</span>.
To computers, this RDFa statement would express that your name is "Bob", but humans would read that your name is "Jane". Obviously, this is not encouraged behavior. Hiding any sort of meta-data from a human is generally considered harmful because hidden meta-data easily gets out of sync with the displayable page data. If you can't see your meta-data when you look at the web page with a browser, there is a good chance that you won't catch errors and neither will your site visitors.
Aren't mechanisms like profile difficult for non-programmers to understand?
Some have said that the proposed @profile extension to RDFa would be difficult for web developers and designers to author:
<div profile="http://purl.org/media/audio" typeof="Recording"> <span property="title">We're Going to Be Friends</span> by <span property="creator">The White Stripes</span> </article>
However, the argument falls flat because this mechanism is almost directly analogous to how web designers currently use Cascading Style Sheets:
<link rel="stylesheet" href="http://purl.org/media/audio"> ... <div class="Recording"> <span class="title">We're Going to Be Friends</span> by <span class="creator">The White Stripes</span> </div> </article>
Security
Are iframes a security risk to RDFa?
If data in iframes are processed and digital signatures are not used for the data on a page then iframes are indeed a "security risk". The issue is that another site could hijack the data on a page by re-writing or overwriting triple URLs that are defined on a page. For example, an advertisement loaded through an iframe could inject triples for their product/service into the page content that you are viewing. This could manifest itself as a link to the latest Beyonce CD, which is actually a link to the latest Viagra ad.
There are several proposed solutions to this issue:
- Do not process any data contained in an iframe.
- Only process iframe data that is digitally signed by a trusted party.
Do not process any data contained in an iframe
A security option setting could be to ignore all triples contained in iframe data. The option could be enabled when calling the RDFa parser. This would remove the threat entirely, but could result in blocking some interesting uses of RDFa.
Only process digitally signed data
Digital signatures are to become the primary method of verifying the truthiness of statements made on a page. It is important that trusted statements are given greater consideration by a browser viewing a web page. This means that one alternative is to digitally sign every triple or sign a bundle of triples so that a browser can differentiate between data that contains a high level of trust and data that does not contain any assurances. Standardized digital signature technology can be used for these purposes.
How does one prevent bad triples from corrupting a local triple store?
Like other things on the web, there will be certain data sources that you trust and certain ones that you do not trust. If long-term triple storage is a goal for an individual, the browser can shield them from bad data sources by only including data from the following sources:
- Digitally signed data sources
- White-listed data sources
Using this mechanism would allow browsers to clean and protect the triple store without intervention, or by using an externally trusted white-listed source similar to spam white-listing services. More can be found in the Security and Trust section of the wiki.
Also note that the phrase "triple store" is somewhat dated. Practically all RDF storage systems (since RDFCore in 2004 and a few years before) have effectively been "quad stores". While the core RDF specs are described in terms of triples, database systems for managing triples have almost always kept track of the source (or "provenance") of each piece of data. For this reason when RDF's data access / query language, SPARQL, was created, it included within the language a mechanism for querying this extra information. This ability to explicitly represent (and query) the source of each RDF data graph gives some extra machinery for dealing with trust. We might, for example, pose a SPARQL query that was targetted only at graphs tagged as trusted. RDF stores are no longer a simplistic melting pot in which data from multiple sources gets indecipherably tangled.
Hackers can digitally sign triples too, what's to stop hostile sites from interacting with the person browsing?
Digital Signatures don't prevent hostile sites that the user intended to go to from interacting with the user. How would digital signatures help here? Attackers can sign stuff just like anyone else can, no?
Yes, that is true attackers can digitally sign anything. However, digital signatures are meant to give the person browsing control over which triples to persist or trust more than other untrusted triples. It also gives the person browsing the power to delegate that control to a trusted white-listing service. For example, one may be more prone to trust triples that are digitally signed by Google, Yahoo, Amazon or Slashdot than ones signed by somebody that they don't know.
Which digitally signed triples does one trust? Well, the ones from close acquaintances on Facebook, Twitter and LinkedIn for starters. It does require those services to generate public/private key-pairs, or at least provide public-key display capabilities. The bare minimum that would be needed is the capability of describing your public key using RDFa on your Facebook home page in order for your friends to verify your triples somewhere else on the web.
See the Security and Trust section of the wiki for more information on how digital signatures might eventually work in RDFa.
How does RDFa deal with cross-origin data load?
The current RDFa spec does not address cross-origin data load. There are fairly sane ways to deal with this issue, like not loading cross-origin data at all, or only trusting signed cross-origin data, but nothing has been standardized yet.
Persistence
Does RDFa define a storage model or persistence layer/API?
No, currently XHTML+RDFa 1.0 does not define a storage or persistence model. RDFa was created to express RDF in HTML family languages, but can be used in any structured document with attribute support. The storage model is dependent on the implementation language as well as the consuming application and will vary from use-case to use-case.
Ongoing Debate
What is the primary topic of a URL?
There is an ongoing discussion about best practices when using URLs to describe non-document semantic objects. For example, if we were to use this for Bob's URL:
http://example.org/people/bob
Is it a good idea to do this?
<span about="http://example.org/people/bob" typeof="foaf:Person">Bob is a person</span>
Some read the above issue as implying that the document located at Bob's URL is a person, when we are typically conditioned to refer to the URL as a regular document on the web. So, what is it? Is it a document or a person? Would the following be a better way of expressing the same information? (note the #person at the end of Bob's URL):
<span about="http://example.org/people/bob#person" typeof="foaf:Person">Bob is a person</span>
This offloads the definition of foaf:Person to the #person anchor on Bob's URL (the document). There are two good arguments for either method.
- Documents should only express semantics about documents, their contents which should be identified by #anchors should contain descriptions about other semantic objects. This keeps a logical separation between documents and other "semantic things".
- Documents can express semantics about documents and people because type information can be used to differentiate between the two types of semantic objects in a page. While this muddles the logical separation between documents and other semantic objects on the web, the statements made about each are separable.