Dynamic-content-parsing

From RDFaWiki

Jump to: navigation, search

Contents

Introduction

Dynamic Content Parsing refers to state-less or state-ful parsing of a DOM that is updated via Javascript. Approaches to these methods of parsing are often implementation specific and break down into client-side and server-side processing. This page exists to capture the various methods that parser writers use to perform state-less or stateful parsing of the DOM, either on the client-side or the server-side.

Client-side Processing

Brute Force

The Brute Force approach has us re-parsing the entire DOM whenever it changes. While this approach is resource intensive, it is one valid approach to the problem.

librdfa and Fuzzbot

librdfa can parse a 50KB document, such as the Digg main page, in 15 milliseconds (+- 3ms). Currently, the plan for the Fuzzbot Firefox plugin is to have the option of re-parsing dynamic content a total of 4 times per second. Issues arise when updating the Firefox UI faster than that, and in fact, it is not very useful to update UI menus even at that rate.

DOM subtree

DOM subtree re-parsing keeps a mapping between DOM elements and the triples that are generated and only re-parses sections of the DOM that changes. This method ensures that triples that no longer exist are destroyed and that new triples are created whenever a DOM subtree contains RDFa that creates a triple.

Server-side Processing

(Note: Init by Michael due to his action from 2008-04-24 telecon; see also ISSUE-114 )

RDFa's processing model is described for a static DOM that does not account for dynamic changes after the document has been loaded (see Sec. 5.5 of the RDFa Syntax document). Web 2.0 applications (more specifically Ajax-based applications) dynamically add or remove information while interacting with the user. As long as one only uses client-side processing, this is not a problem. However, when the result of an interaction with a Web resource needs to be piped into another service, this is a problem.

Take for example i r s, a tool that allows you to add semantic links between resources on the Web of Data. Got to the i r s and just click 'ask'. When you have a local RDFa processor, such as the Operator plugin or Fuzzbot installed you'll see something like the following:

Example of local RDFa processing with i r s

When you now try to run this trough a server-side extractor (e.g. http://www.w3.org/2007/08/pyRdfa/extract?uri=http://143.224.254.32/irs/) you will see: no triples (at least not the one you'd expect)

Workaround

First, it has to be mentioned that this is a general problem with dynamic content - nothing RDFa-specific around. Whenever you do something locally (i.e. in your browser) another (external) service can not access this content. While the is no generally accepted solution, there are at least two possibilities for a workaround:

  • separate rendering: for the client-only part, the functionality of the Ajax framework is used, for server-side processing, the RDFa-content is inserted at generation time. See for example riese where this is used to deploy statistical data in RDFa. This is a possible, but not very scalable solution, heavily depending on the number of data items you have
  • only have a stub in RDFa and more detailed data via rdfs:seeAlso (I guess not where standard, but scales better)

Further Reading

Personal tools