[tdwg-tag] Embedding specimen (and other) annotations in NeXML

Roger Hyam rogerhyam at mac.com
Tue Feb 24 11:57:17 CET 2009

Hi Guys,

I think I should chip a bit of perspective/history in as I am at least  
partly responsible for the mess. Matt hits a few nails on the head as  

In the beginning there were a bunch of more or less unrelated XML  
Schemas. Kings amongst these were DwC in it's different flavours and  
ABCD (in all its complexity and a couple of versions).

Back in 2005 I was given the job of taking TCS (the XML Schema  
version) through the old standards process. TaxonConcepts and  
TaxonNames are really joining objects. They have very few literal  
properties but just point out to lots of other objects like people and  
specimens. I soon became frustrated with not being able to simply  
include chunks of other schemas (either ABCD or DwC but which?)  
without importing them wholesale and so hard linking to specific  
versions. i.e. effectively building a single XML Schema for our little  
world. And of course once we had one for our world the process would  
repeat with the rest of the world. I refuse to define fields for some  
ones contact details again ....

Having got TCS through the old TDWG standards process I was given the  
job of "TDWG Technical Architect" for two years. By this time I had  
had my conversion on the road to Damascus and decide that RDF and  
semantic technologies were really the best tools for this kind of  
semantic integration. There are use cases that are not supported  
though and are more appropriately handled with XML validated in some  
way but there is nothing to stop this XML mapping directly to RDF or  
even being an actual XML RDF serialization. (we are just talking frame  
based modeling at this level not OWL)

TDWG had just 'got' XML Schema and had big investment in it. DiGIR,  
BioCASe and TAPIR protocols all relied on it. I could not say "Don't  
do XML Schema for this stuff" or "Change all your schemas so they map  
to a RDF or a frame based model".  So I spent two years trying to lead  
people down a weaving path that involved integrating XML based  
'application schemas' with semantics in RDF. The message was far too  
complex for much of the audience and the momentum of what was going on  
was too great to get all the way to the goal but we certainly ended up  
with people talking in terms of modeling in a different way and  
concepts being identified by URIs. This is still part of that  

A group of us went through a modeling process with Jessie Kennedy to  
start building an ontology for the whole of TDWG but this was just  
like building one big XML Schema and became bogged down - though I  
don't think Jessie would agree with that as she felt we could have  
completed something with a couple more meetings. It was a good  
modeling exercise but we didn't finish it. If had finished we would  
then have had to impose it on the rest of the organization. An  
illustration of the complexity of managing ontologies is that we were  
talking of observations being separate from specimens and specimens  
being more linked to collections management.  I made the  
recommendation that there should be separate observations interest  
group and a specimen/collection curation interest group but the  
executive committee didn't support this approach. So we have the same  
group potentially discussing preservation techniques and GPS locations  
of organisms because historically observations have been extracted  
from specimens - unless you are a birder or ecologist (or anyone  
outside TDWG) in which case it is nuts.

At the same time we were introducing the notion on GUIDs which is  
integrally linked with the RDF way of looking at things. We need RDF  
based return types for LSIDs which were being issued by the  
nomenclators. I took TCS (XML Schema version) and created two RDF/OWL  
vocabularies out of it so we at least had the name spaces  set up that  
could be used in the RDF about names and taxa.



I have been out of the loop on another project for a year and it is  
beyond me to develop and understand the issues across the whole of the  
TDWG space but  I am willing to stand buy these, develop them further  
and try and take them through a ratification process. They are not  
perfect (certainly not good OWL) but I think they could be useful.  
Although they are still experimental they are being used to some extent.

At the same time I put together the TaxonOccurrence vocabulary


This is shamefully stolen from the schema version of DwC and written  
(nee cut and paste) in a day or so. It was meant as a straw man that I  
hoped would be knocked down by the Observations and Specimens Group.  
We do need a ontology/vocabulary for specimens and possibly another  
for occurrence data but there are people with greater domain knowledge  
than me who should be doing it. I'd be happy if we took this down.

I believe the way forward is with small, modular ontologies that have  
as little entailment in them as possible - ideal for importing into  
larger ontologies without blowing them up. There seems to be a  
modeling approach which considers  upper classes the most important. I  
believe the class hierarchy is actually only part of the ontology. It  
is perfectly possible to define functional classes first, standardize  
them and then import them into other ontologies that assert the class  
hierarchies. In fact this is the only way you can have shared  
semantics where we agree on an object but it has a different place in  
your world to the place it has in my world. The alternative is to get  
everyone to see the world from a single view of reality and the only  
way to do that is to pay them lots of money to see it that way!

We need versioning and standardization of constructs at a very fine  
grained level - even individual property level. If we don't take this  
approach nothing will be standardized till we agree on everything i.e.  

Sorry for the long post and a tendency to over justify!

All the best,


On 23 Feb 2009, at 22:29, Blum, Stan wrote:

> The TDWG 'house' is organic and messy, if not in disarray.  I'll  
> explain this
> state with a descriptive history instead of a logical exposition.
> * We changed the way TDWG approves standards (in 2006).  ABCD, TCS,  
> and SDD
> were the last standards approved by the old method (a vote by  
> members).  New
> standards simply have to pass expert and public review (as judged by  
> the
> executive committee).  ABCD, TCS, and SDD aren't consistent  
> because...(blah,
> blah; life is hard).
> * The TDWG ontology or vocabulary is NOT a standard.  The terms are  
> supposed
> to be taken from standards, and perhaps even non-standard schemas  
> that are
> widely used.  The ontology managers are supposed to apply the filter  
> of use;
> terms in the ontology should be widely used.  (We expect that some  
> terms in
> approved standards will not be widely used.)
> * With the ontology/vocabulary comes a shift or at least a  
> broadening towards
> RDF (away from XML schemas), though there were/are some skeptics.
> The existing pages on the TDWG vocabulary are pretty old, and I  
> don't think
> there has been any kind of review "event" for these.  Editing and  
> reviewing
> these is on our to-do list for this year.
> I think IT WOULD BE VERY HELPFUL if we put up a list of REQUIRED  
> before getting into this review, and perhaps even the current review  
> of
> DarwinCore.  John Wieczorek and his core collaborators have drawn  
> heavily
> from the DCMI initiative in shaping the most recent draft of the  
> DarwinCore.
> I would really appreciate references to any other good statements  
> about RDF
> and ontologies.  These should go on the TAG homepage.
> Matt, thanks for pointing out the emperor's clothes!
> -Stan
> -----Original Message-----
> From: mbjones.89 at gmail.com [mailto:mbjones.89 at gmail.com] On Behalf  
> Of Matt
> Jones
> Sent: Monday, February 23, 2009 12:51 PM
> To: Blum, Stan
> Cc: Kevin Richards; Hilmar Lapp; Technical Architecture Group  
> mailing list;
> rogerhyam Hyam; Mark Schildhauer
> Subject: Re: [tdwg-tag] Embedding specimen (and other) annotations  
> in NeXML
> This thread has prompted me to ask some naive questions about the
> process under which the vocabularies are formed.  Maybe I'm the only
> one who is confused about the vocabularies, their status, and the
> process of forming new terms, but it seems maybe I'm not alone.  And
> clarification on some of these points will help me with our direction
> on the development of the Observation Ontology under the OSR group,
> which I think will fit right in with Stan's point about fitting some
> of the concepts into a broader Observation framework.
> For me there is a lot of confusion over the TDWG vocabularies, partly
> because they capture concepts that are present in existing TDWG
> standards, but are generally incomplete.  For example, the TCS
> standard provides the field 'Specimens/Specimen', which I think is
> relevant to Hilmar's question.  However, the listed TDWG vocabulary
> for TCS is the TaxonConcept vocabulary
> (http://rs.tdwg.org/ontology/voc/TaxonConcept), which does not provide
> a class for the TCS 'Specimen' concept.  In addition, the ABCD TDWG
> standard  also seem to have  a way for specimens to be represented,
> but they are generalized as 'Unit's with a 'RecordBasis' of
> 'PreservedSpecimen'. So there are at least two official TDWG
> 'standards' for representing Specimen information, in addition to
> whatever DwC does.  It seems to me that the best thing to do would be
> to finish the LSID vocabularies for TCS and ABCD so that they
> completely represent the concepts in TCS and ABCD, then get that
> approved as a valid way to represent these TDWG standards. In the
> process, one could try to resolve the differences in modeling
> approaches employed by the different standards, such as mapping the
> Specimen concept in TCS to its corresponding concept in DwC and ABCD.
> This would help avoid multiple TDWG standards defining overlapping
> versions of these concepts, and let people use the vocabularies in
> place of the XML schema versions of these standards.
> What is the process for approval of the LSID vocabularies?  They seem
> to be bypassing the normal TDWG standards track.  Some of the
> vocabularies have a status of 'Available' (like TaxonConcept,  even
> though it is incomplete), while others are marked as 'Developmental'.
> The page on OntologyGovernance
> (http://wiki.tdwg.org/twiki/bin/view/TAG/TDWGOntologyGovernance)
> states:
> "Relationship Between TDWG Standards and the Ontology --
> Concepts are standardized by being included in TDWG Standards. Once
> they have been mentioned in a standard the Ontology Manager has the
> responsibility of maintaining their URIs and descriptions as per the
> standard. Concepts must be promoted to the live branch before the
> standard enters the standards process. "
> So it seems that the OntologyManager replaces the standards process
> for the purpose of the vocabularies. Is this correct?  And does the
> OntologyManager make sure that concepts like 'Specimen' that are
> defined in TCS make it into the corresponding LSID vocabulary before
> it is classified as 'Available'?  And how does the OntologyManager
> decide which concept and representation for 'Specimen' to use -- the
> one from TCS or the one from ABCD? Does 'Available' have the same
> weight as a published TDWG standard, and if so, shouldn't these
> vocabularies be listed on the Standards page as well?  Finally, does
> the existence of a concept such as 'Specimen' in TCS have any bearing
> on the development of new standards such as DwC that may want to
> define the concept differently, or more completely?
> Matt

Roger Hyam
Roger at BiodiversityCollectionsIndex.org
Royal Botanic Garden Edinburgh
20A Inverleith Row, Edinburgh, EH3 5LR, UK
Tel: +44 131 552 7171 ext 3015
Fax: +44 131 248 2901

More information about the tdwg-tag mailing list