Donald writes:
- Please don't judge GBIF's goals on the limited achievements of the
current data portal (which will be discarded as soon as I can replace it - some time later this year). The aim is certainly to provide clear "trust networks" (all the way back to original sources) and to allow all data to be filtered by such criteria.
I did not mean to belittle anything GBIF has achieved - I probably should have made that clearer. I am not arguing against GBIFs or your achievements in using big data providers, I just try to express my thoughts that part (and I believe a big part) of the future may lie in something less organized, document-centric rather than institutional-database-provider-centric.
This has some implications for the RDF debate - if we argue that big data providers will set up the conversion tools to publish excerpts of their proprietary data structure in RDF, and if we argue that RDF import use cases are relevant mostly for aggregators/indexing services.
Your point about trust-networks is excellent (and I did participate in your survey asking for suggestions about the data portal...)
retrieve information. Again I'd really like to receive your comments on how e.g. the international pool of SDD data might best be handled.
I have no easy answer to this. Not only is SDD implemented only in beta stage, but also the Delta/Lucid/DeltaAccess documents which could currently be expressed in SDD are rarely made available - partly because their seems to be not enough value in making them available (which using them through GBIF could change in the future) partly because people have reservations making their work available.
As a side issue, I'm not sure how easy it really would be for us to use an RDBMS-based approach to support the integration of all of the disparate and relevant information which is (as you say) scattered through so many sources. A world in which anyone can annotate any data element would seem much more suitable.
Perhaps indeed, I just cannot think it through, it seems to blow my brain. I feel a major point is what Steven said about CBD being the real unit of information, not the triples. This rings a bell in me, but I cannot hear it load enough yet. I am sorry if this is causing confusing posts from me.
An example: In SDD we think modifiers are very important.
"Flower red" and "Flowers almost never red"
could in RDF be:
a) Species - FlowerColor - red
b) Species - FlowerColor - red ReificationOfTheAbove - Modifier - "almost never"
Getting this as independent, extensible tuples (getting the first, but not the second is not real information. The whole, the "CBD" is the unit of information which I can critize, reject, approve.
Similar in Taxon concepts expressed through character circumscription, the concept that can be analyzed or critisized is only the total of all descriptive statements, nothing less.
Now, in the xml-schema world, this boundary is assumed (though nowhere guaranteed) to be a document. However, in SDD we ran exactly into the opposite problem, that we did mean to extent across servers and documents (although in a less atomized way than RDF). RDF would be a solution for these problems - perhaps we will understand the problems better if we better understand how to define and refer to CBDs when using RDF? I do not understand this yet.
Donald writes:
The database you construct to support efficient queries need not be the same as the one that I construct, or the object model inside someone else's application. The critical issue is how easy it is for two parties to exchange the set of objects and properties that they wish to share.
If you want interoperability of documents, you have to be able to match imported data losslessly into your inner information model. I feel that it will be very difficult to import data into your (permanent, editable) data store unless you at least use a very similar basic object ontology and a similar concept of cardinality constraints. Currently a major problem in consuming DarwinCore flat structures is that your are left to guess about relationships between multiple element instances. Better "boxing" in object types of DwC clearly overcomes this, but if you have two different boxing models (object ontology, internal information model) the problem probably appears worse than before.
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203