Hi Hilmar,
No argument from me, just my prejudice against "solution via ontology", and my enthusiasm for "schema-last" - the idea that the schema reveals itself after you've populated the knowledge base. This was never really possible with relational databases, where a table must be defined before it can be populated. But graph databases (expecially the "anyone can say anything" semantic web) practically invite a degree of schema-last. Examples include Freebase (schema-last by design), and FOAF, whose specification is so widely ignored and mis-used (often to good effect), that the de-facto spec is the one that can be abstracted from FOAF files in the wild.
The semantic web is littered with ontologies lacking instance data; my hope is that generating instance data is a significant part of the ontology building process for each of the ontologies proposed by the report. By "generating instance data" I mean not simply marking up a few example records, but generating millions of triples to query over as part of the development cycle. This will indicate both the suitability of the ontology to the use cases, and also its ease of use.
I like the order in which the GBIF report lists its infrastructure recommendations. Persistent URIs (the underpinning of everything); followed by competency questions and use cases (very helpful in the prevention of mental masturbation); followed by OWL ontologies (to facilitate reasoning). Perhaps the only place where we differ is that you're comfortable with "incorporate instance data into the ontology design process" being implicit, while I never tire of seeing that point hammered home.
Regards - Joel.
On Mon, 14 Feb 2011, Hilmar Lapp wrote:
On Feb 14, 2011, at 12:05 PM, joel sachs wrote:
I think the recommendations are heavy on building ontologies, and light on suggesting paths to linked data representations of instance data.
Good observation. I can't speak for all of the authors, but in my experience building Linked Data representations is mostly a technical problem, and thus much easier compared to building soundly engineered, commonly agreed upon ontologies with deep domain knowledge capture. The latter is hard, because it requires overcoming a lot of social challenges.
As for the GBIF report, personally I think linked biodiversity data representations will come at about the same pace whether or not GBIF pushes on that front (though GBIF can help make those representations better by provisioning stable resolvable identifier services, URIs etc). There is a unique opportunity though for "neutral" organizations such as GBIF (or, in fact, TDWG), to significantly accelerate the development of sound ontologies by catalyzing the community engagement, coherence, and discourse that is necessary for them.
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================