Rdfa-in-html-requirements
From RDFaWiki
Contents |
Introduction
There is currently a desire to create a set of unified rules/documents for processing RDFa in HTML and XHTML documents. The list below is a set of design statements and requirements surrounding the creation of these rules and documents. The list is expected to change as more information is discovered and discussion progresses on the set of rules and documents.
Design Statements and Requirements
- The set of solutions must clearly address documents served as application/xhtml+xml and documents served as text/html. ~Sam Ruby
- The set of solutions must clearly address documents served as application/xhtml+xml but consist completely of HTML and documents served as text/html that consist completely of XHTML. ~Sam Ruby
- The set of solutions must not be specific to a certain version of HTML or a certain version of XHTML. The rules for RDFa in HTML should be the same for documents with the HTML5 doctype as they are for the HTML4 doctype. ~Sam Ruby
- The set of solutions should not greatly destabilize XHTML+RDFa that is already deployed. ~RDFa Task Force
- The text of the solutions cannot depend on XHTML/HTML processing rules or standards that do not currently exist. ~Manu Sporny
- The syntaxes for RDFa in application/xhtml vs RDFa in text/html not be considered separately, but be developed together and with an eye towards maximizing the set of common triples generated. ~Sam Ruby
- In the fullness of time, it would be helpful if there were a validator which identified documents which cause different triples to be produced. If such a tool also identifies other conformance issues with the document, it would be helpful to have an option to turn the reporting of such issues off as with many documents this will obscure the set of errors that affect the production of triples. ~Sam Ruby
- Test cases should be produced with the goal of ensuring that parsers looking to produce RDF triples (whether it be from microformats, microdata, or RDFa) respect the MIME type of the document. ~Sam Ruby
- Strong disagreement with using only MIME-type as a signaling mechanism for the RDFa language. ManuSporny 04:04, 28 May 2009 (UTC)
- Likely implication of the strong disagreement: jquery.rdfa.js can never be fully compliant Rubys 04:33, 28 May 2009 (UTC)
- Strong disagreement with using only MIME-type as a signaling mechanism for the RDFa language. ManuSporny 04:04, 28 May 2009 (UTC)
- It's not as important to generate the "same" set of triples, as to ensure that people know what to expect when they serve their pages up as HTML compared to XHTML. ~Shelley Powers
- The set of solutions must address the limitations that HTML/DOM implementation casts at a high-level with several examples and test cases.
Questions to Ponder
- Should the RDFa processing model assume a DOM without access to the source document? ~Manu Sporny
- Would supporting a subset of RDFa in HTML be better than supporting a different version of RDFa? ~Shelley Powers
- Will we end up severely limiting RDFa because of issues with in-page access with the DOM? ~Shelley Powers
- Do we need a new specification specifically to address limitations of the DOM? ~Shelley Powers
- Do we need something like a best practices document work, to warn folks of differences between HTML+RDFa and XHTML+RDFa? ~Shelley Powers
Possible Solutions for XMLLiterals
- We may not want to do any sort of coercion of XMLLiteral content during the RDFa processing stage. ManuSporny 04:04, 28 May 2009 (UTC)
- Perhaps we should suggest that all HTML documents be serialized to XHTML via HTML5 rules before processing. XHTML documents would stay as-is. ManuSporny 04:04, 28 May 2009 (UTC)
- Relax the conformance requirements for XMLLiterals between HTML and XHTML documents. Parsers that do not output the exact same XMLLiterals (or even skip some XMLLiterals in some isntances) could still be considered "conformant". ManuSporny 04:04, 28 May 2009 (UTC)
- The parser MUST ignore this triple altogether. A simple solution, and it means that the HTML graph would be a subset of the XHTML graph. RDF vocabularies are generally defined so that if a graph G is true, then any graph H such that H is a subset of G is also true. ~Toby Inkster
- The parser MUST add the triple to the graph as normal, but MUST NOT set the literal's datatype to XMLLiteral. They could either leave the literal as an untyped literal (that happened to have a lot of angled brackets in it) or perhaps set it to some HTMLLiteral datatype of our own concoction. ~Toby Inkster
- The parser MUST coerce the HTML fragment into a well-formed (but not necessarily valid) XHTML fragment. The HTML5 draft gives us decent algorithms for doing this. ~Toby Inkster
- Defer to http://www.whatwg.org/specs/web-apps/current-work/#coercing-an-html-dom-into-an-infoset (i.e. drop such attributes) ~Henri Sivonen
- Change the HTML5 parsing algorithm so xmlns:foo gets local name "foo" in the XML Namespaces namespace. (That seems very unlikely to happen, because of backward-compatibility issues with existing content.) ~Philip Taylor
- Add some ugly hacks in the serialisation process, e.g. find all attributes named "xmlns:foo" and pretend they were called "foo" in the XML Namespaces namespace while serialising. ~Philip Taylor
- Don't support XMLLiterals. ~Philip Taylor, ~Shelley Powers
- Discourage the use of xmlns:foo attributes, and replace them with @prefix or something. ~Philip Taylor, ManuSporny 04:04, 28 May 2009 (UTC)
Possible bugs/test cases needed for XMLLiterals in RDF in XHTML
- If the XMLLiteral contains some &entity; that's defined in the XHTML page's DTD then it will have to be expanded out so that it's correct once it's separated from the DTD. ~Philip Taylor

