Wiki-based-vocabulary-website

From RDFaWiki

Jump to: navigation, search

Contents

Introduction

There exist a significant number of lessons learned by creating semantic vocabularies over the past two decades. Unfortunately, most of these lessons have not been codified yet and we are still learning as a community. If RDFa is to become successful, the guided, rapid dissemination of these lessons should be of interest to the RDF/RDFa community. We need a website that helps people create vocabularies, validate those vocabularies against lessons learned, and publish those vocabularies to the web in a decentralized manner. This is not just an RDFa issue, it is an issue with any technology that depends on RDF.

The Problem

There isn't an online tool that can help people create and publish vocabularies, mash-up vocabularies, or correct bad vocabulary design and development. Furthermore, even if such a site existed, it could become a victim of its own success. Popular vocabulary documents may be requested at a rate of thousands of documents per minute, crippling the site owner with bandwidth fees and other operational expenses.

Vocabulary and profile design should not be an academic or niche behavior. Given the correct set of tools it should be possible for most website designers to codify their vocabularies just like they codify source code objects using modern object oriented programming languages. The problem consists of three distinct issues:

  • There are currently no online tools to help individuals easily create, mash-up and publish RDF(a) vocabularies.
  • There are currently no online tools to correct common vocabulary authoring mistakes.
  • There is currently no automatic online vocabulary mirroring/archival architecture for RDF(a).

The Requirements

  • The solution must have a very low barrier to entry. It should closely emulate the ease of wiki authoring.
  • The solution should be as collaborative as a wiki, preferably with editing roll-back/diff support.
  • The solution must be able to check against common authoring rules/suggestions and report the results back to the vocabulary author.
  • The solution must continue to operate when the primary site fails temporarily or permanently.
  • The solution should be aware of other RDF vocabularies. This will enable the authoring tools to suggest pre-existing vocabulary terms.
  • The solution should be able to publish vocabularies in RDF and RDFa format.
  • The solution should be able to mirror its contents to a number of back-up sites.

The Proposed Solutions

  • To provide a low barrier to entry - a simple OpenID or username/password login is required to edit any page on the vocabulary wiki. Editing a vocabulary is limited to the person that created the vocabulary, and any other person they allowed to edit the vocabulary.
  • To track the changes to each vocabulary, a git-based mechanism could be used. One file per vocabulary, expressed in JSON format. Mirroring the vocabulary would use the http transport for git synchronization.
  • I don't know if we can get away with a mechanism that places the entire database in the GIT repository (for example, what do we do with username/password?), but if we can, it'll make it easier to mirror the entire website.
  • A 'vocabulary term checker' architecture could be created in a modular way such that each test is its own PHP file. This would allow each lesson learned, or authoring rule to be encapsulated in a single file. When a vocabulary term is created or modified, it could be passed to each rule. We could have rules to check individual vocabulary terms as well as ones to check full vocabularies. Arguments to each rule could be the vocabulary term name and the whole vocabulary (for starters).
  • To provide a mirroring mechanism, all sites supporting the mirroring system should provide a 'vocab-mirrors.txt' file. The file could be RDFa. The file would describe other sites that are mirrors to the current site.
  • All of the vocabularies should be kept in a GIT repository in order to make the mirroring and history operations easy to implement.
  • The PHP files should depend on serving static XHTML/RDF files, which can be auto-generated from time-to-time when a change occurs to a particular vocabulary. It must be assumed that the site will incur a heavy traffic load, so serving static files would help reduce processing load on the website.

State of the Art

The short list below consists of the current state of the art. A short description of the website/program is outlined along with what is needed to support the proposed solution above.

  • VoCamp - VoCamp is a series of informal events where people can spend some dedicated time creating lightweight vocabularies/ontologies for the Semantic Web.
    • While the VoCamp gatherings are a great way to create vocabularies, they require quite a bit of dedication and tend to be very technical in nature.
    • Issue: Barrier to entry is moderate to high (must attend a vocamp).
    • Issue: Vocabularies are not tracked on the website.
    • Issue: No mechanism for validating vocabulary terms.
    • Issue: No method of automatically mirroring downstream to vocabulary mirrors.
    • Issue: No mechanism to mirror upstream vocabularies.
    • Issue: No version control for vocabularies.
    • Issue: Not scalable/high-performance.
  • Knoodl - Knoodl facilitates community-oriented development of OWL based ontologies and RDF knowledgebases.
    • Issue: Restrictive Terms of Service - participation in the community can be cut off at the company owner's discretion for any reason.
    • Issue: The system is closed-source and thus may be difficult for the community to develop solutions for vocabulary validation/checking.
    • Issue: The content on the website (the vocabularies) cannot be mirrored per the website TOS.
    • Issue: No method of automatically mirroring downstream to vocabulary mirrors.
    • Issue: No mechanism to mirror upstream vocabularies.
    • Issue: No version control for vocabularies.
  • OpenVocab - OpenVocab is a project created by Ian Davis that enables anyone to participate in the creation of a open and shared RDF vocabulary.
    • Issue: No method of automatically mirroring downstream to vocabulary mirrors.
    • Issue: No mechanism to mirror upstream vocabularies.
    • Issue: No mechanism for validating vocabulary terms.
    • Issue: Single name space (only terms are edit-able).
    • Issue: No version control for vocabularies?
    • Issue: Not scalable/high-performance?
  • Neologism - Neologism is a simple web-based RDF Schema vocabulary editor and publishing system.
    • Issue: Barrier to entry is moderate.
    • Issue: No method of automatically mirroring downstream to vocabulary mirrors.
    • Issue: No clear mechanism for validating vocabulary terms?
    • Issue: No clear version control for vocabularies?
    • Issue: Not scalable/high-performance?
  • Vocabify - Instead of starting by defining classes and properties for a vocabulary, Vocabify lets you write some example instance data and it then generates the proper schema.
    • Issue: Barrier to entry is very high (you need to know how to write RDF/XML or TURTLE)
    • Issue: Vocabularies are not tracked on the website.
    • Issue: No mechanism to publish generated vocabularies as RDF/XML and RDFa..
    • Issue: No mechanism to mirror upstream vocabularies.
    • Issue: No clear mechanism for validating vocabulary terms?
    • Issue: No version control for vocabularies.
  • Argot Hub - Argot-hub is the home of a collection of vocabularies or word-bundles (an argot), that can be used to describe data. The range of argots is wide, and they can either be used on their own, or with others.
    • Issue: Barrier to entry is moderate to high.
    • Issue: No mechanism to publish generated vocabularies as RDF/XML and RDFa.
    • Issue: No method of automatically mirroring downstream to vocabulary mirrors.
    • Issue: No mechanism to mirror upstream vocabularies.
    • Issue: No mechanism for validating vocabulary terms.
    • Issue: No clear version control for vocabularies.
    • Issue: Not scalable/high-performance.
  • Semantic MediaWiki - Semantic MediaWiki is a free extension of MediaWiki – the wiki-system powering Wikipedia – that helps to search, organise, tag, browse, evaluate, and share the wiki's content.
    • Issue: Barrier to entry is moderate.
    • Issue: No mechanism to publish generated vocabularies as RDF/XML and RDFa.
    • Issue: No method of automatically mirroring downstream to vocabulary mirrors.
    • Issue: No mechanism for validating vocabulary terms.
    • Issue: Not scalable/high-performance.
    • Issue: No mechanism to mirror upstream vocabularies.