The HP report mentions scalability as a deficiency of RDF. The latest on the subject that I can find on the issue is the report of the 2003 SWAD meeting http://www.w3.org/2001/sw/Europe/reports/dev_workshop_report_4/ which puts the then state of the art at 40M triples, with typical stores supporting around 10M. This leads me to some questions:
1. There must be substantial improvement in the intervening 2.5 years. Can someone point me at what is currently the situation about large triple stores?
2. Are there known indexing techniques that would not require a store that holds all the triples germaine to a project?
3. What are the estimates of the number of triples that would be needed to deal with the domains of interest to TDWG and how should one try to make such? For example, if the current GBIF specimen record service were implemented on a triple store, what would be its size and how does one estimate this? For descriptive data, one might reasonably expect an average of 100 property values per taxon (i.e. the state of 100 characters), so does this mean 180M triples would be an adequate (required?) store for descriptions of 1.8M taxa?
If scalability is rational, then for SDD there is an irony that the advantages cited in the HP paper are a good match to the problems of descriptive data, while at the same time the disadvantages are debilitating. Those in Section 7.2 (OWL Expressitivity limitations) are mostly quite important to some of the current things SDD expresses, but the workarounds suggested might not be onerous. The inability to describe continuous properties may be more problematic, but discretizing continuous properties---which is frequently done anyway---might be accepted by descriptive data users if the benefits were high. The inability to compare the size of butterfly wings to the size of raptor wings might bring limitations that would not interfere with 95% of the uses of descriptive data. What's unclear to me at the moment is whether working around the cross-slot constraint on OWL entails a lot of reification in order to talk about collections of properties, which is quite fundamental to descriptions of taxa, and if so, does this push an OWL DL ontology into OWL Full or worse, possibly removing the principal benefit---reasoning---from the matter.
Bob
Roger Hyam wrote:
Bob (inspired by Damian) asked me to post this for him as he is away from his email account just now .
http://www.hpl.hp.com/techreports/2005/HPL-2005-189.pdf is touted on some blogs as extremely balanced.
Looks good to me to - though not gone through it in detail yet.
Roger