Rdfa-in-html-issues
From RDFaWiki
Introduction
This page outlines a number of issues that have been raised surrounding the RDFa in HTML discussion. Most of these questions are raised as a result of Shane's RDFa in HTML4 proposed spec, Philip Taylor's RDFa in HTML5 document and Philip Taylor's RDFa Parser Interoperability tests.
Issues Raised but Not Answered
The target of RDFa processing rules
Manu Sporny asked:
What is the target model for the RDFa processing rules? It is currently a well-formed XHTML document, but can be applied to almost any DOM representation. Some have asked that it should be an HTML5 DOM (for HTML+RDFa), others have asked that it focus on a mechanism that is more abstract than the HTML5 DOM. I would propose that we look closely at naming exactly what the RDFa processing rules operate on by studying the differences between parse trees, abstract syntax trees, and abstract semantic graphs. We want more flexibility than just the HTML5 DOM and Mark Birbeck has proposed that we could work on an "RDFa Core" document that applies the processing rules in a generic fashion to a generic model of some sort.
xml:lang attribute processing in HTML
Philip Taylor asked:
What is "the @xml:lang attribute"? Is it the attribute with local name "xml:lang" in no namespace (as would be produced by an HTML 5 parser (and by current HTML browser parser implementations))? or the attribute with local name "lang" in the namespace "http://www.w3.org/XML/1998/namespace" (as would be produced by an XML parser, and could be inserted in an HTML document via DOM APIs)? or both (in which case both could be specified on one element, in addition to "lang" in no namespace)?
And Shane McCarron replied:
Well - remember that the document you are looking at is written in the context of HTML 4. In HTML 4 none of what you say above makes any sense. Attributes are tokens - and the token "xml:lang" is what I was talking about. In HTML 4 those attribute names are case-insensitive - I need to add something about that to the draft. Thanks for the reminder!
Meaning of XMLLiteral in non-XML languages
Philip Taylor asked:
"If the object of a triple would be an XMLLiteral, and the input to the processor is not well-formed [XML]" - I don't understand what that means in an HTML context. Is it meant to mean something like "the bytes in the HTML file that correspond to the contents of the relevant element could be parsed as well-formed XML (modulo various namespace declaration issues)"? If so, that seems impossible to implement. The input to the RDFa processor will most likely be a DOM, possibly manipulated by the DOM APIs rather than coming straight from an HTML parser, so it may never have had a byte representation at all.
Even without scripting, there isn't always a contiguous sequence of bytes corresponding to the content of an element. E.g. if the HTML input is:<table> <tr some-attributes-to-say-this-element-outputs-an-XMLLiteral> <td> This text goes inside the table </td> This text gets parsed to *outside* the table <td> This text goes inside the table </td> </tr> </table>then (according to the HTML 5 parsing algorithm, and implemented in (at least) Firefox) the content of the <tr> element includes the first and third lines of text, but not the second. How would you decide whether the content is well-formed XML?
Mixing id and about on the same element
There has been an issue raised about ambiguity created by placing the same values in @id and @about on the same element. For example, the following markup demonstrates the issue:
<p id="foo" about=#foo">Some text</p>
Assuming the base URL for the fragment above is http://example.org/index.html, the full URL for the @id value (http://example.org/index.html#foo) would be the same as the full URL for the @about value (http://example.org/index.html#foo). A best practices suggestion is needed for this particular markup. Here are the conflicting arguments surrounding the issue:
- Vocabulary authors usually place @id and @about to the same values to aid follow-your-nose via a web browser.
- Counter argument: This aid would not be needed if web browsers would first look for fragments in @id, and then look for fragments in @about.
- Those that are concerned about the semantics of the document point out that there are two semantically different URLs in the document that point to the same location.
- Counter argument: It is up to an application to determine the meaning of a document. Web browsers would give preference to @id, semantic applications don't care about @id and would give preference to @about.
Processing of xmlns:* in non-XML languages
Philip Taylor asked:
How are xmlns:* attributes meant to be processed? E.g. what is the expected output in the following cases:
<div xmlns:T="test:"> <span typeof="t:x" property="t:y">Test</span> </div> <div XMLNS:t="test:"> <span typeof="t:x" property="t:y">Test</span> </div> <div xmlns:T="test:"> <span typeof="T:x" property="T:y">Test</span> </div> <div xmlns:t="test:"> <div xmlns:t=""> <span typeof="t:x" property="t:y">Test</span> </div> </div> <div xmlns:t="test1:" id="d"> <span typeof="t:x" property="t:y">Test</span> </div> <script> document.getElementById('d').setAttributeNS( 'http://www.w3.org/2000/xmlns/', 'xmlns:t', 'test2:'); /* (now the element has two distinct attributes, each in different namespaces) */ </script>
Should the same processing rules be used for documents from both HTML and XHTML parsers, or would DOM-based implementations need to detect where the input came from and switch processing rules accordingly? If there is a difference, what happens if I adoptNode from an XHTML document into an HTML document, or vice versa?
Use of regular CURIEs in @rel
Julian Reschke wrote:
Manu Sporny wrote:
There should be one set of processing rules for all HTML family languages, if possible.Which reminds me of the fact that RDFa-in-XHTML already breaks this by making @rel a CURIE instead of a safe-CURIE.
Whitespace characters in "whitespace separated" lists
From Philip Taylor's RDFa in HTML tests:
<link rel="next prev	first
lastsection
subsection" href="http://example.org/test.css" />
<link rel="nextprev" href="http://example.org/test.css" />
The preceding markup results in incompatible output from different RDFa processors.
Case sensitivity
From Philip Taylor's RDFa in HTML tests
<link rel="NEXT" href="http://example.org/test.css" />
<p XMLNS:EX="http://example.org/" property="EX:test">Test</p>
The preceding markup results in incompatible output from different RDFa processors.
Empty xmlns prefix
From Philip Taylor's RDFa in HTML tests:
<p xmlns:="http://example.org/" property=":test">Test</p>
Underscore xmlns prefix
From Philip Taylor's RDFa in HTML tests:
<p xmlns:_="http://example.org/" property="_:test">Test</p>
Colon in xmlns prefix
From Philip Taylor's RDFa in HTML tests:
<p xmlns:ex="http://example.org/1/" xmlns:ex:two="http://example.org/2/" property="ex:two:three:test">Test</p>
Empty xmlns value
From Philip Taylor's RDFa in HTML tests:
<p xmlns:ex="http://example.org/"><span xmlns:ex="" property="ex:#test">Test</span></p>
In XML Namespaces 1.0, xmlns:* attribute values must not be the empty string. In XML Namespaces 1.1, they may be (and they remove the prefix binding).
Shane McCarron says: RDFa is defined in terms of XML Namespaces 1.0. An empty xmlns value is an error, and should therefore be ignored by any conforming processor.
Script-based modification of DOM
From Philip Taylor's RDFa in HTML tests:
<!DOCTYPE html>
<title>Test</title>
<p xmlns:t="test1:" id="d">
<span property="t:test">Test</span>
</p>
<script>
document.getElementById('d').setAttributeNS(
'http://www.w3.org/2000/xmlns/', 'xmlns:t', 'test2:');
/* (now the element has two distinct attributes,
each in different namespaces) */
</script>
@lang and @xml:lang Issues
From Philip Taylor's RDFa in HTML tests:
<!DOCTYPE html> <title>Test</title> <p xmlns:ex="http://example.org/" property="ex:test" lang="aa" xml:lang="bb">Test</p>
<!DOCTYPE html> <title>Test</title> <p lang="aa"><span xmlns:ex="http://example.org/" property="ex:test" xml:lang="bb">Test</span></p>
<!DOCTYPE html> <title>Test</title> <p xml:lang="aa"><span xmlns:ex="http://example.org/" property="ex:test" lang="bb">Test</span></p>
<!DOCTYPE html>
<title>Test</title>
<p xmlns:ex="http://example.org/" property="ex:test" lang="aa" xml:lang="bb" id="d">Test</p>
<script>
document.getElementById('d').setAttributeNS(
'http://www.w3.org/XML/1998/namespace', 'xml:lang', 'cc');
</script>
Issues Answered but not Reviewed
Bogus reserved words
From Philip Taylor's RDFa in HTML tests:
<link rel="next bogus prev" href="http://example.org/test.css" />
Answer: The expected is wrong, the following triple should not be generated:
<> <http://www.w3.org/1999/xhtml/vocab#bogus> <http://example.org/test.css> .
There is a conformance requirement to not generate triples for non-reserved words. Test Case #86 in the RDFa Test Harness checks this conformance requirement.
Julian Reschke comments:
The issue here is that there doesn't seem to be agreement outside the XHTML2 WG that that WG indeed is responsible for maintaining the list of reserved keywords. Pretending that what you decided is good for RDFa in XHTML doesn't necessarily mean it's going to work everywhere else.
Right now there are the following registries and registry documents for link relations in general (unless I missed one):
- HTML4 (<http://www.w3.org/TR/html4/types.html#type-links>)
- HTML5 Working Draft (<http://dev.w3.org/html5/spec/Overview.html#linkTypes>)
- XHTML2 Vocabulary (formal status???) (<http://www.w3.org/1999/xhtml/vocab/>)
- WhatWG Wiki (<http://wiki.whatwg.org/wiki/RelExtensions>)
- Atom Link Relations (IETF Proposed Standard) (<http://greenbytes.de/tech/webdav/rfc4287.html#rfc.section.7.1> and <http://www.iana.org/assignments/link-relations/link-relations.xhtml>
- Web Linking (IETF Internet Draft) (<http://greenbytes.de/tech/webdav/draft-nottingham-http-link-header-05.html>), proposing a new IANA registry, merging Atom and HTML relation names
Of these, only HTML5, Atom, and Web Linking seem to propose a way how to register relations in the future (note that I strongly disagree with the HTML5 proposal to use a Wiki for that, though).
So, what's the registration procedure for <http://www.w3.org/1999/xhtml/vocab/> -- and who's going to maintain it once the WG ceases to exist?
What we need is a format-neutral registry (I want to be able to express link relations from non *HTML documents as well), and a registry that works in practice.
URLs in @rel
From Philip Taylor's RDFa in HTML tests:
<link rel="next http://example.org/test prev" href="http://example.org/test.css" />
Answer: "http:" is detected as a prefix with no mapping and thus no triple value should be generated for this token in @rel. This conformance requirement is outlined in the section on CURIEs.
Invalid IRIs generated via CURIEs
From Philip Taylor's RDFa in HTML tests:
<p xmlns:ex="" property="ex:#test">Test</p>
Answer: The previous two examples should produce no triples because the resulting IRI is invalid. The RDFa Syntax document specifies that all resolved CURIEs must be a syntactically valid IRI.
RDFa Task Force Discussion Order
The technical issues listed above have some inter-dependencies. Proposed below is the order in which we should discuss each issue. The more serious/fundamental issues are dealt with first, followed by less pressing issues and finally edge cases.
- The target of RDFa processing rules (a universal HTML5 DOM, a parse tree, or a syntax tree)
RDFa TF Proposal/Findings: Use existing HTML specs' signaling mechanism and work on the resulting DOM tree: HTML4+RDFa Syntax Document. For HTML4/5, this means use the html5lib parser to generate a tree-like structure that will be passed to the RDFa parser at the application layer. For XHTML1.1, it means use an XHTML parser to generate a tree-like structure that will be passed to the RDFa parser at the application layer.
- Requirement: RDFa signalling mechanism for XHTML+RDFa, HTML+RDFa, and how does mime-type affect that mechanism?
RDFa TF Proposal/Findings: Use existing HTML specs' signaling mechanism and work on the resulting DOM tree: HTML4+RDFa Syntax Document. For HTML4/5, this means use the html5lib parser to generate a tree-like structure that will be passed to the RDFa parser at the application layer. For XHTML1.1, it means use an XHTML parser to generate a tree-like structure that will be passed to the RDFa parser at the application layer.
- Requirement: Do we need to cut features from RDFa to support HTML+RDFa?
RDFa TF Proposal/Findings: There is no principled stand against cutting features, but there also does not seem to be a need at this point (except maybe for XMLLiteral, but see next point.)
- Meaning of XMLLiteral in non-XML languages
RDFa TF Proposal/Findings: With the DOM/tree-like approach to parsing (deferring to the language rules for generating a DOM in HTML4/5 or XHTML1.1), this issue may be solved: simply serialize the DOM subtree to XML according to host-language-specific rules (HTML4??). This issue may be further reduced by making plain literal (instead of XMLLiteral) generation the default behavior for elements that contain non-text nodes.
- Processing of xmlns:* in non-XML languages
RDFa TF Proposal/Findings: xmlns:* attributes should be preserved in the tree/DOM that is created in any non-XML family language. Nothing else is required by the parser/DOM model other than to preserve the xmlns:* attributes.
- Case sensitivity for xmlns: attributes and prefixes in attribute values
RDFa TF Proposal/Findings: To ensure compatability between prefixes and attribute values, all prefixes and attribute values should be specified in lower-case.
- Use of regular CURIEs in @rel
RDFa TF Proposal/Findings: Regular CURIEs can be allowed in @rel based on a proposal to enable URLs in @rel values. The details of this proposal are still being worked out.
- Script-based modification of DOM
RDFa TF Proposal/Findings: Script-based modification of the DOM is perfectly compatible with RDFa. Any mechanism to provide iterative detection of triples and modification of the triple store is allowed as long as the triples generated via the iterative mechanism are exactly the same as if the RDFa Processor were to complete a top-to-bottom run on the modified document.
- @lang and @xml:lang Issues
RDFa TF Proposal/Findings: Processing of @lang and @xml:lang is performed in the same manner as specified in the HTML5 specification.
- xml:lang attribute processing in HTML
RDFa TF Proposal/Findings: Processing of @xml:lang is performed in the same manner as specified in the HTML5 specification.
- Whitespace characters in "whitespace separated" lists
- Empty xmlns prefix
- Underscore xmlns prefix
- Colon in xmlns prefix
- Empty xmlns value
- Mixing @id and @about on the same element

