[Tdwg-tag] Scalability
Bob Morris
ram at cs.umb.edu
Wed Apr 12 14:41:58 CEST 2006
The HP report mentions scalability as a deficiency of RDF. The latest on
the subject that I can find on the issue is the report of the 2003
SWAD meeting
http://www.w3.org/2001/sw/Europe/reports/dev_workshop_report_4/ which
puts the then state of the art at 40M triples, with typical stores
supporting around 10M. This leads me to some questions:
1. There must be substantial improvement in the intervening 2.5 years.
Can someone point me at what is currently the situation about large
triple stores?
2. Are there known indexing techniques that would not require a store
that holds all the triples germaine to a project?
3. What are the estimates of the number of triples that would be needed
to deal with the domains of interest to TDWG and how should one try to
make such? For example, if the current GBIF specimen record service were
implemented on a triple store, what would be its size and how does one
estimate this? For descriptive data, one might reasonably expect an
average of 100 property values per taxon (i.e. the state of 100
characters), so does this mean 180M triples would be an adequate
(required?) store for descriptions of 1.8M taxa?
If scalability is rational, then for SDD there is an irony that the
advantages cited in the HP paper are a good match to the problems of
descriptive data, while at the same time the disadvantages are
debilitating. Those in Section 7.2 (OWL Expressitivity limitations) are
mostly quite important to some of the current things SDD expresses, but
the workarounds suggested might not be onerous. The inability to
describe continuous properties may be more problematic, but discretizing
continuous properties---which is frequently done anyway---might be
accepted by descriptive data users if the benefits were high. The
inability to compare the size of butterfly wings to the size of raptor
wings might bring limitations that would not interfere with 95% of the
uses of descriptive data. What's unclear to me at the moment is whether
working around the cross-slot constraint on OWL entails a lot of
reification in order to talk about collections of properties, which is
quite fundamental to descriptions of taxa, and if so, does this push an
OWL DL ontology into OWL Full or worse, possibly removing the principal
benefit---reasoning---from the matter.
Bob
Roger Hyam wrote:
> Bob (inspired by Damian) asked me to post this for him as he is away
> from his email account just now .
>
> http://www.hpl.hp.com/techreports/2005/HPL-2005-189.pdf
> is touted on some blogs as extremely balanced.
>
> Looks good to me to - though not gone through it in detail yet.
>
> Roger
>
>
--
Robert A. Morris
Professor of Computer Science
UMASS-Boston
http://www.cs.umb.edu/~ram
phone (+1)617 287 6466
More information about the tdwg-tag
mailing list