[tdwg-content] Taxon and Name [SEC=UNCLASSIFIED]

Mon Nov 1 18:33:38 CET 2010

Paul, Rich, et al.
I have decided that it was time to face up to trying to understand the 
"right side" of that diagram that Rich made a while back, which I tried 
to put in readable form at:
http://bioimages.vanderbilt.edu/pages/token-explicit.gif
Up to this point I ignored the right side of the diagram because I 
basically am unfamiliar with taxon/name issues.  But I feel that I 
should be!  I read through Rich's various emails in the thread, Paul's 
email (below), and looked at the Darwin Core Taxon and Identification 
class terms and came up with:
http://bioimages.vanderbilt.edu/pages/taxon-diagram1.gif
Please note that this diagram does NOT represent an opinion on my part 
but  rather an attempt to summarize what people have said in a graphical 
way. 

In general, I have come to understand the following:
1. There are taxon concepts, which I guess represents a particular 
circumscription of individuals.  The taxon concept is the result of some 
kind of rule that allows one to decide whether  particular individuals 
should be included in that taxon or not.  The set of all biological 
individuals that are included are the actual concept (or maybe not?). 
2. There are taxon names, which have been published for the purpose of 
identifying taxa.
3. There are taxon name usages, which are a sort of node that connects a 
name with a concept.  If I'm getting Rich right, this is the resource to 
which dwc:Identifications should be tied.  Rich also suggested that 
taxon name usages might be instances of the dwc:Taxon class.
Although these three types of resources aren't all defined as "classes" 
in Darwin Core, it seems to me that they are classes in the "RDF sense" 
(i.e. that their instances can be typed to them). 

In the diagram, I used triangles to indicate 1:many relationships 
similar to the way Rich did in his original diagram.  In the DwC term 
index, (http://rs.tdwg.org/dwc/terms/index.htm), there seem to be other 
entities represented that are like the "acceptedName" (i.e. 
originalName, parentName) but I've left them off the diagram for 
simplicity and because I don't fully understand them.  Because of the 
dual use of the xxxxxxxxID terms, I should clarify their use in the 
diagram.  The arrows are used in the way you would use an arrow in an 
RDF graph.  The subject is at the tail, the object is at the head and 
the predicate is the term beside the arrow.  Where I put an xxxxxxxID 
term, I'm using it in a way that the Linked Data world would use 
"hasXxxxx".  So the arrow from taxonNameUsage to taxonName asserts the 
statement
[taxonNameUsage] hasTaxonName [taxonName]
which I'm guessing is equivalent to
[taxonNameUsage] scientificNameID [taxonName]
Again, I'm not asserting that the xxxxxxID terms mean what I've put on 
the diagram.  I'm guessing and asking if that's what they mean.

I've indicated my guess about some of the key properties of each of the 
four "classes" (including also Identifications) by arrows pointing away 
from the boxes.  In a lot of cases there are both xyz and xyzID terms, 
where xyz would be for a string literal and xyzID would be for a GUID 
(e.g. URI).  But they would behave the same way, so I only showed the 
xyzID version in the diagram. 

Here is a use case for me.  I refer to the Gleason and Cronquist key and 
the Golden Guide to the Trees to identify a tree that I've documented in 
an Occurrence.  identifiedBy would be me.  nameAccordingTo would be a 
reference to the Gleason and Cronquist treatment.  namePublishedIn would 
be the original publication for the species name.  
identificationReferences would be the Golden Guide and the Gleason and 
Cronquist key.  If somebody like Pete had created a URI to represent the 
concept, I could refer to that using taxonConceptID.  If there weren't 
such a URI, I'd just skip making a reference to the concept.  The 
identifier for the taxonName could be something like a TSNID and the 
taxonName instance could have properties like string values for genus, 
species, scientificName, etc.

So is this anything close to reality? 
Steve

Paul Murray wrote:
> Speaking of which - (Looking back on what I have written below, it's very disorganised. Just a brain dump, really. )  :
>
> Currently, I have used the TDWG rdf vocabulary as far as I am able to work it out. For instance:
> 	http://biodiversity.org.au/apni.name/33407.rdf  (aka: http://biodiversity.org.au/apni.name/33407 , urn:lsid:biodiversity.org.au:apni.name:33407 )
>
> Of course, not having a owl:domain predicate does make things difficult to untangle: when I read the DwC vocabulary in with protege, I just have a list of predicate names. Luckily, the quick reference guide (http://rs.tdwg.org/dwc/terms/index.htm) does organise the properties into the classes they apply to. The only DwC classes that our data involves at this stage would be Taxon and ResourceRelationship.
>
> ----------------------
>
> As per the TDWG vocabulary, we make a fairly strong distinction between taxonomic and nomenclatural components. A TaxonName is not a TaxonConcept. I'm finding that the Taxon predicates in the DwC vocabulary seem to be a mix of things that variously belong to names and taxa. My impression is that the distinction is there, in fact - it is modelled by a DwC taxon having or not having a nameAccordingTo rather than by an explicit class. If there is no AccordingTo, then we are discussing the "nominal taxon" - what the name means in the absence of any specific information about what it means.
>
> But as we are so careful to distinguish between name and taxon, I think I will take the (safer) position that a Name is not the same thing as its nominal taxon. That is, I will not declare that biodiversity.org.au names are DwC taxa, even though they have properties from DwC. 
>
> (Perhaps our data should genrate an id for these nominal taxa - it's easy enough, just use the name objectid as the taxon objectid and "[afd|apni].taxon.nominal" as the LSID namespace. In principle, everyone who uses a name is also asserting that their taxon "is congruent to" the nominal taxon. Every synonym relationship is also an assertion of synonymy to the nominal taxon. But that's an awful lot of unnecessary detail to make explicit - over-engineering things is one of my failings. Forget I said it.)
>
> ----------------------
>
> DwC properties variously use "taxonID" and also "nameUsageID'. Now, I believe I understand the distinction: not all usages of a name are of taxonomic interest (my favourite example is a bottle of weedkiller that happens to mention a scientific name.) Our databases only contain name usages that are taxa, so the distinction does not arise - a name usage is simply a taxon. 
>
> However, not all of our names are scientific names. We have cultivar names, and we have vernacular names. Al  usages of these are TDWG TaxonConcepts - they have synonomy relationships and so on. However, the DwC property for declaring that a taxon record has a name seems to be "scientificNameID". This would seem to be inappropriate for taxa that don't have scientific names. I think that the correct way for me to go is to not declare these taxa as DwC taxa at all. That is, the absence of a "nameID" property seems to indicate that DwC is only "interested" in scientific names - scientific taxa if you will.
>
> To continue:
> These properties apply to our taxa (TaxonConcepts) without difficulty:
> scientificNameID
> parentNameUsageID
> nameAccordingToID
>
> These apply to our taxon names:
> acceptedNameUsageID
> originalNameUsageID
> namePublishedInID
> scientificNameAuthorship
>
> One of the wiki pages seemed to indicated that Taxa would have both a nameAccordingToID and also the namePublishedInID (the two being equal indicting that the taxon is the original one),  but I think we will continue to not do this on the grounds that it's best to assert things only once to avoid data inconsistencies.
>
> ----------------------
>
> scientificName
> higherClassification
> kingdom | phylum | class | order | family | genus | subgenus| specificEpithet | infraspecificEpithet
>
> The various properties for name parts are ... problematic from the point of view of our data. These properties sort of di double duty: they are places for putting parts of names (ie, strictly nomenclatural), and they also are places to put taxonomy.
>
> With respect to holding name parts, there seems to be no property in which to put - for instance - a subfamily name. The closest thing is "infraspecificEpithet", which contains the terminal epithet, but obviously that's not right for supergeneric names. TCS and the TDWG vocabulary have "uninomial". It might be nice to have this property, and to declare the other bits as being subproperties.
>
> With respect to taxonomy, if you want to use these for holding taxonomic relationships, then you don't need "order", you need "orderNameUsageID" or "orderTaxonID". 
>
> Of course, what's really going on here is that these fields are simply a denormalisation of the data. Let's face it: in my data, I do indeed have the scientific name string in the taxon record even though *technically* it's duplicating the data. So I think the conclusion is that these properties *on taxon records* are denormalisation, whereas these properties *on name records* are primary data. This is fine for me, but only because I have a separate TaxonName class.
>
> ----------------------
>
> taxonRank | verbatimTaxonRank
>
> Simple enough - "taxonRank" is controlled, "verbatim" is not. It's yet another mapping exercise for me, but them's the breaks. The whole "rank" issue is so fraught that one of our datasets here uses numeric codes. Which is fine, until you fill up all of the slots. What the world really needs is a dotted decimal notation, where negative numbers are allowed. Family, subfamily, and superfamily would be "5", "5.1", "5.-1". If you ever need a sub-superfamily, then it's "5.-1.1" . But maybe that's over-engineering things again.
>
> In any case. According to the wiki page, the controlled vocabulary seems to be just a list of strings. I would have expected them to be typed named individuals, permitting you to have an abbreviation, and the english and latin name. A difficulty is that in order to render a botanical name correctly, you need the rank abbreviation string: "Evolvulus alsinoides var. sericeus". At present, there is no DWC property for that.
>
> ----------------------
>
> In summary - shouldn't be too difficult. At least, to get the basics up.
>
>
>
>
>
> ------
> If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. 
>
>
>
> Please consider the environment before printing this email.
>
> ------
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> .
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu