[Tdwg-tag] Triple store debate ...
Gregor Hagedorn
G.Hagedorn at BBA.DE
Fri Mar 31 15:43:44 CEST 2006
Donald writes:
> 2. Please don't judge GBIF's goals on the limited achievements of the
> current data portal (which will be discarded as soon as I can replace it -
> some time later this year). The aim is certainly to provide clear "trust
> networks" (all the way back to original sources) and to allow all data to be
> filtered by such criteria.
I did not mean to belittle anything GBIF has achieved - I probably should have
made that clearer. I am not arguing against GBIFs or your achievements in using
big data providers, I just try to express my thoughts that part (and I believe
a big part) of the future may lie in something less organized, document-centric
rather than institutional-database-provider-centric.
This has some implications for the RDF debate - if we argue that big data
providers will set up the conversion tools to publish excerpts of their
proprietary data structure in RDF, and if we argue that RDF import use cases
are relevant mostly for aggregators/indexing services.
Your point about trust-networks is excellent (and I did participate in your
survey asking for suggestions about the data portal...)
> retrieve information. Again I'd really like to receive your comments on how
> e.g. the international pool of SDD data might best be handled.
I have no easy answer to this. Not only is SDD implemented only in beta stage,
but also the Delta/Lucid/DeltaAccess documents which could currently be
expressed in SDD are rarely made available - partly because their seems to be
not enough value in making them available (which using them through GBIF could
change in the future) partly because people have reservations making their work
available.
> As a side issue, I'm not sure how easy it really would be for us to use an
> RDBMS-based approach to support the integration of all of the disparate and
> relevant information which is (as you say) scattered through so many
> sources. A world in which anyone can annotate any data element would seem
> much more suitable.
Perhaps indeed, I just cannot think it through, it seems to blow my brain. I
feel a major point is what Steven said about CBD being the real unit of
information, not the triples. This rings a bell in me, but I cannot hear it
load enough yet. I am sorry if this is causing confusing posts from me.
An example: In SDD we think modifiers are very important.
"Flower red"
and
"Flowers almost never red"
could in RDF be:
a) Species - FlowerColor - red
b) Species - FlowerColor - red
ReificationOfTheAbove - Modifier - "almost never"
Getting this as independent, extensible tuples (getting the first, but not the
second is not real information. The whole, the "CBD" is the unit of information
which I can critize, reject, approve.
Similar in Taxon concepts expressed through character circumscription, the
concept that can be analyzed or critisized is only the total of all descriptive
statements, nothing less.
Now, in the xml-schema world, this boundary is assumed (though nowhere
guaranteed) to be a document. However, in SDD we ran exactly into the opposite
problem, that we did mean to extent across servers and documents (although in a
less atomized way than RDF). RDF would be a solution for these problems -
perhaps we will understand the problems better if we better understand how to
define and refer to CBDs when using RDF? I do not understand this yet.
Donald writes:
> The database you construct to support
> efficient queries need not be the same as the one that I construct, or the
> object model inside someone else's application. The critical issue is how
> easy it is for two parties to exchange the set of objects and properties
> that they wish to share.
If you want interoperability of documents, you have to be able to match
imported data losslessly into your inner information model. I feel that it will
be very difficult to import data into your (permanent, editable) data store
unless you at least use a very similar basic object ontology and a similar
concept of cardinality constraints. Currently a major problem in consuming
DarwinCore flat structures is that your are left to guess about relationships
between multiple element instances. Better "boxing" in object types of DwC
clearly overcomes this, but if you have two different boxing models (object
ontology, internal information model) the problem probably appears worse than
before.
Gregor----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn at bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19 Tel: +49-30-8304-2220
14195 Berlin, Germany Fax: +49-30-8304-2203
More information about the tdwg-tag
mailing list