Hi Guys,
I think I should chip a bit of perspective/history in as I am at least partly responsible for the mess. Matt hits a few nails on the head as usual.
In the beginning there were a bunch of more or less unrelated XML Schemas. Kings amongst these were DwC in it's different flavours and ABCD (in all its complexity and a couple of versions).
Back in 2005 I was given the job of taking TCS (the XML Schema version) through the old standards process. TaxonConcepts and TaxonNames are really joining objects. They have very few literal properties but just point out to lots of other objects like people and specimens. I soon became frustrated with not being able to simply include chunks of other schemas (either ABCD or DwC but which?) without importing them wholesale and so hard linking to specific versions. i.e. effectively building a single XML Schema for our little world. And of course once we had one for our world the process would repeat with the rest of the world. I refuse to define fields for some ones contact details again ....
Having got TCS through the old TDWG standards process I was given the job of "TDWG Technical Architect" for two years. By this time I had had my conversion on the road to Damascus and decide that RDF and semantic technologies were really the best tools for this kind of semantic integration. There are use cases that are not supported though and are more appropriately handled with XML validated in some way but there is nothing to stop this XML mapping directly to RDF or even being an actual XML RDF serialization. (we are just talking frame based modeling at this level not OWL)
TDWG had just 'got' XML Schema and had big investment in it. DiGIR, BioCASe and TAPIR protocols all relied on it. I could not say "Don't do XML Schema for this stuff" or "Change all your schemas so they map to a RDF or a frame based model". So I spent two years trying to lead people down a weaving path that involved integrating XML based 'application schemas' with semantics in RDF. The message was far too complex for much of the audience and the momentum of what was going on was too great to get all the way to the goal but we certainly ended up with people talking in terms of modeling in a different way and concepts being identified by URIs. This is still part of that conversation.
A group of us went through a modeling process with Jessie Kennedy to start building an ontology for the whole of TDWG but this was just like building one big XML Schema and became bogged down - though I don't think Jessie would agree with that as she felt we could have completed something with a couple more meetings. It was a good modeling exercise but we didn't finish it. If had finished we would then have had to impose it on the rest of the organization. An illustration of the complexity of managing ontologies is that we were talking of observations being separate from specimens and specimens being more linked to collections management. I made the recommendation that there should be separate observations interest group and a specimen/collection curation interest group but the executive committee didn't support this approach. So we have the same group potentially discussing preservation techniques and GPS locations of organisms because historically observations have been extracted from specimens - unless you are a birder or ecologist (or anyone outside TDWG) in which case it is nuts.
At the same time we were introducing the notion on GUIDs which is integrally linked with the RDF way of looking at things. We need RDF based return types for LSIDs which were being issued by the nomenclators. I took TCS (XML Schema version) and created two RDF/OWL vocabularies out of it so we at least had the name spaces set up that could be used in the RDF about names and taxa.
http://rs.tdwg.org/ontology/voc/TaxonName
http://rs.tdwg.org/ontology/voc/TaxonConcept
I have been out of the loop on another project for a year and it is beyond me to develop and understand the issues across the whole of the TDWG space but I am willing to stand buy these, develop them further and try and take them through a ratification process. They are not perfect (certainly not good OWL) but I think they could be useful. Although they are still experimental they are being used to some extent.
At the same time I put together the TaxonOccurrence vocabulary
http://rs.tdwg.org/ontology/voc/TaxonOccurrence
This is shamefully stolen from the schema version of DwC and written (nee cut and paste) in a day or so. It was meant as a straw man that I hoped would be knocked down by the Observations and Specimens Group. We do need a ontology/vocabulary for specimens and possibly another for occurrence data but there are people with greater domain knowledge than me who should be doing it. I'd be happy if we took this down.
I believe the way forward is with small, modular ontologies that have as little entailment in them as possible - ideal for importing into larger ontologies without blowing them up. There seems to be a modeling approach which considers upper classes the most important. I believe the class hierarchy is actually only part of the ontology. It is perfectly possible to define functional classes first, standardize them and then import them into other ontologies that assert the class hierarchies. In fact this is the only way you can have shared semantics where we agree on an object but it has a different place in your world to the place it has in my world. The alternative is to get everyone to see the world from a single view of reality and the only way to do that is to pay them lots of money to see it that way!
We need versioning and standardization of constructs at a very fine grained level - even individual property level. If we don't take this approach nothing will be standardized till we agree on everything i.e. never.
Sorry for the long post and a tendency to over justify!
All the best,
Roger
On 23 Feb 2009, at 22:29, Blum, Stan wrote:
The TDWG 'house' is organic and messy, if not in disarray. I'll explain this state with a descriptive history instead of a logical exposition.
- We changed the way TDWG approves standards (in 2006). ABCD, TCS,
and SDD were the last standards approved by the old method (a vote by members). New standards simply have to pass expert and public review (as judged by the executive committee). ABCD, TCS, and SDD aren't consistent because...(blah, blah; life is hard).
- The TDWG ontology or vocabulary is NOT a standard. The terms are
supposed to be taken from standards, and perhaps even non-standard schemas that are widely used. The ontology managers are supposed to apply the filter of use; terms in the ontology should be widely used. (We expect that some terms in approved standards will not be widely used.)
- With the ontology/vocabulary comes a shift or at least a
broadening towards RDF (away from XML schemas), though there were/are some skeptics.
The existing pages on the TDWG vocabulary are pretty old, and I don't think there has been any kind of review "event" for these. Editing and reviewing these is on our to-do list for this year.
I think IT WOULD BE VERY HELPFUL if we put up a list of REQUIRED READING before getting into this review, and perhaps even the current review of DarwinCore. John Wieczorek and his core collaborators have drawn heavily from the DCMI initiative in shaping the most recent draft of the DarwinCore. I would really appreciate references to any other good statements about RDF and ontologies. These should go on the TAG homepage.
Matt, thanks for pointing out the emperor's clothes!
-Stan
-----Original Message----- From: mbjones.89@gmail.com [mailto:mbjones.89@gmail.com] On Behalf Of Matt Jones Sent: Monday, February 23, 2009 12:51 PM To: Blum, Stan Cc: Kevin Richards; Hilmar Lapp; Technical Architecture Group mailing list; rogerhyam Hyam; Mark Schildhauer Subject: Re: [tdwg-tag] Embedding specimen (and other) annotations in NeXML
This thread has prompted me to ask some naive questions about the process under which the vocabularies are formed. Maybe I'm the only one who is confused about the vocabularies, their status, and the process of forming new terms, but it seems maybe I'm not alone. And clarification on some of these points will help me with our direction on the development of the Observation Ontology under the OSR group, which I think will fit right in with Stan's point about fitting some of the concepts into a broader Observation framework.
For me there is a lot of confusion over the TDWG vocabularies, partly because they capture concepts that are present in existing TDWG standards, but are generally incomplete. For example, the TCS standard provides the field 'Specimens/Specimen', which I think is relevant to Hilmar's question. However, the listed TDWG vocabulary for TCS is the TaxonConcept vocabulary (http://rs.tdwg.org/ontology/voc/TaxonConcept), which does not provide a class for the TCS 'Specimen' concept. In addition, the ABCD TDWG standard also seem to have a way for specimens to be represented, but they are generalized as 'Unit's with a 'RecordBasis' of 'PreservedSpecimen'. So there are at least two official TDWG 'standards' for representing Specimen information, in addition to whatever DwC does. It seems to me that the best thing to do would be to finish the LSID vocabularies for TCS and ABCD so that they completely represent the concepts in TCS and ABCD, then get that approved as a valid way to represent these TDWG standards. In the process, one could try to resolve the differences in modeling approaches employed by the different standards, such as mapping the Specimen concept in TCS to its corresponding concept in DwC and ABCD. This would help avoid multiple TDWG standards defining overlapping versions of these concepts, and let people use the vocabularies in place of the XML schema versions of these standards.
What is the process for approval of the LSID vocabularies? They seem to be bypassing the normal TDWG standards track. Some of the vocabularies have a status of 'Available' (like TaxonConcept, even though it is incomplete), while others are marked as 'Developmental'.
The page on OntologyGovernance (http://wiki.tdwg.org/twiki/bin/view/TAG/TDWGOntologyGovernance) states: "Relationship Between TDWG Standards and the Ontology -- Concepts are standardized by being included in TDWG Standards. Once they have been mentioned in a standard the Ontology Manager has the responsibility of maintaining their URIs and descriptions as per the standard. Concepts must be promoted to the live branch before the standard enters the standards process. "
So it seems that the OntologyManager replaces the standards process for the purpose of the vocabularies. Is this correct? And does the OntologyManager make sure that concepts like 'Specimen' that are defined in TCS make it into the corresponding LSID vocabulary before it is classified as 'Available'? And how does the OntologyManager decide which concept and representation for 'Specimen' to use -- the one from TCS or the one from ABCD? Does 'Available' have the same weight as a published TDWG standard, and if so, shouldn't these vocabularies be listed on the Standards page as well? Finally, does the existence of a concept such as 'Specimen' in TCS have any bearing on the development of new standards such as DwC that may want to define the concept differently, or more completely?
Matt
------------------------------------------------------------- Roger Hyam Roger@BiodiversityCollectionsIndex.org http://www.BiodiversityCollectionsIndex.org ------------------------------------------------------------- Royal Botanic Garden Edinburgh 20A Inverleith Row, Edinburgh, EH3 5LR, UK Tel: +44 131 552 7171 ext 3015 Fax: +44 131 248 2901 http://www.rbge.org.uk/ -------------------------------------------------------------