Richard Pyle deepreef at bishopmuseum.org
Sun Oct 31 19:45:21 CET 2010

Hi Paul,

I'll try to address your questions relating to DwC terms in the class
"Taxon".  I'm partly responsible for some of them being there, and even I
was a bit confused about what a couple of them meant.  Fortunately, I had
the opportunity to sit down with Markus in Berlin one evening last week, and
that conversation helped clear up a number of things in min mind (of course,
Markus may well contradict what I'm about to write).

> As per the TDWG vocabulary, we make a fairly strong 
> distinction between taxonomic and nomenclatural components. A 
> TaxonName is not a TaxonConcept. I'm finding that the Taxon 
> predicates in the DwC vocabulary seem to be a mix of things 
> that variously belong to names and taxa. My impression is 
> that the distinction is there, in fact - it is modelled by a 
> DwC taxon having or not having a nameAccordingTo rather than 
> by an explicit class. If there is no AccordingTo, then we are 
> discussing the "nominal taxon" - what the name means in the 
> absence of any specific information about what it means.

I think that's generally a safe assumption; but I think it's a bit more
involved than that.

> But as we are so careful to distinguish between name and 
> taxon, I think I will take the (safer) position that a Name 
> is not the same thing as its nominal taxon. That is, I will 
> not declare that biodiversity.org.au names are DwC taxa, even 
> though they have properties from DwC. 

The problem with treating "name" as a distinct entity (independent of a
particular usage of the name) is problematic, because there are several
different interpretations of what a "name" is. Is it a simple text string?
Or, is it a nomenclatural "object" with properties beyond the text string?
Are all orthographic variants and misspellings different representations of
the same "name" (object perspective); or is each variant a different "name"
(text-string perspective).  Is a name formatted as "Genus (Subgenus)
species" the same as a name formatted as "Genus species"?  Is authorship and
associated details part of the name?  What about infraspecific prefixes such
as "var." and "subsp."?  This is just a sampling of questions for which you
will find a variety of answers when talking to different people in our
community about what a "name" is.

For this reason, I'm rather unclear on what sorts of identifiers that one
might populate dwc:scientificNameID with.  I would have guessed that this is
where you would put an identifier fora Taxon Name Usage (TNU) record that
represents either a Protonym (~=basionym), a New Combination, a Replacement
name (nom. nov.), or the like; but we already have dwc:originalNameUsageID
for that function.  Perhaps dwc:scientificNameID should link to a nominal
concept record?  Or maybe something like an ITIS TSN record?  I've never
been very clear on this.  The example given
(urn:lsid:ipni.org:names:37829-1:1.3) doesn't seem to resolve right now, but
corresponding IPNI record shows details for what looks to me like a TNU (and
hence, probably best represented via dwc:originalNameUsageID).

> (Perhaps our data should genrate an id for these nominal taxa 
> - it's easy enough, just use the name objectid as the taxon 
> objectid and "[afd|apni].taxon.nominal" as the LSID 
> namespace. In principle, everyone who uses a name is also 
> asserting that their taxon "is congruent to" the nominal 
> taxon. Every synonym relationship is also an assertion of 
> synonymy to the nominal taxon. But that's an awful lot of 
> unnecessary detail to make explicit - over-engineering things 
> is one of my failings. Forget I said it.)

The topic of Nominal Concepts is definitely one that needs to be hammered
out at some point -- but I agree, now may not be the best time.

> DwC properties variously use "taxonID" and also 
> "nameUsageID'. 

That's what I used to think too -- but then I realised that that the
unqualified "nameUsageID" isn't in the dwc spec (as far as I can tell) --
onlt the qualified versions (dwc:acceptedNameUsageID, dwc:parentNameUsageID,
and dwc:originalNameUsageID).

Thus, I interpret TaxonID to effectively be nameUsageID (and Markus agreed
when we discussed this -- right Markus???).

> However, not all of our names are scientific names. We have 
> cultivar names, and we have vernacular names. Al  usages of 
> these are TDWG TaxonConcepts - they have synonomy 
> relationships and so on. However, the DwC property for 
> declaring that a taxon record has a name seems to be 
> "scientificNameID". This would seem to be inappropriate for 
> taxa that don't have scientific names. I think that the 
> correct way for me to go is to not declare these taxa as DwC 
> taxa at all. That is, the absence of a "nameID" property 
> seems to indicate that DwC is only "interested" in scientific 
> names - scientific taxa if you will.

I tend to agree.  I think cultivars will fit within the scientificName
framework reasonably well; but not so much for vernaculars.  I think that
they could be represented by a taxonID instance -- but I don't see where you
would put the actual vernacular name-string.

> To continue:
> These properties apply to our taxa (TaxonConcepts) without difficulty:
> scientificNameID
> parentNameUsageID
> nameAccordingToID
> These apply to our taxon names:
> acceptedNameUsageID
> originalNameUsageID
> namePublishedInID
> scientificNameAuthorship
> One of the wiki pages seemed to indicated that Taxa would 
> have both a nameAccordingToID and also the namePublishedInID 
> (the two being equal indicting that the taxon is the original 
> one),  but I think we will continue to not do this on the 
> grounds that it's best to assert things only once to avoid 
> data inconsistencies.

Actually, these are quite different things.  They only are identical if you
are passing the original taxon concept circumscription that was used when
the name was first established under the Code.  In the (vast?) majority of
cases, they will be different; with dwc:nameAccordingToID pointing to the
publication representing the particular taxon concept circumscription, and
namePublishedInID pointing to the publication in which the name was formally
established under the relevant Code.

> ----------------------
> scientificName
> higherClassification
> kingdom | phylum | class | order | family | genus | subgenus| 
> specificEpithet | infraspecificEpithet
> The various properties for name parts are ... problematic 
> from the point of view of our data. These properties sort of 
> di double duty: they are places for putting parts of names 
> (ie, strictly nomenclatural), and they also are places to put 
> taxonomy.
> With respect to holding name parts, there seems to be no 
> property in which to put - for instance - a subfamily name. 
> The closest thing is "infraspecificEpithet", which contains 
> the terminal epithet, but obviously that's not right for 
> supergeneric names. TCS and the TDWG vocabulary have 
> "uninomial". It might be nice to have this property, and to 
> declare the other bits as being subproperties.

The idea is that for a record representing the subfamily name itself, the
text-string subfamily name goes in dwc:scientificName.  But for names below
the rank of subfamily, the Subfamily name is genrally not included among the
parsed classification elements (neither are any other higher infra-rank
names).  The real information, I think, goes in dwc:scientificName.  The
terms dwc:genus, dwc:subgenus, dwc:specificEpithet, and infraspecificEpithet
(as well as scientificNameAuthorship) are there to allow you to provide
pre-parsed name elements of a compond name represented in scientificName.

> With respect to taxonomy, if you want to use these for 
> holding taxonomic relationships, then you don't need "order", 
> you need "orderNameUsageID" or "orderTaxonID". 

No, because presumably there would be a record for the Order name itself
(linked to child names via a series of parentNameUsageID), which would have
its own value of taxonID (="nameUsageID")

> Of course, what's really going on here is that these fields 
> are simply a denormalisation of the data. 

Yes, exactly.  You can represent them in a normalized way using the
available terms, but not all people have the information broken into a
normalised form, so DWC accomodates a denormalized representation as well.

> Let's face it: in 
> my data, I do indeed have the scientific name string in the 
> taxon record even though *technically* it's duplicating the 
> data. So I think the conclusion is that these properties *on 
> taxon records* are denormalisation, whereas these properties 
> *on name records* are primary data. This is fine for me, but 
> only because I have a separate TaxonName class.

I don't think I understand the difference in normalisation between records
representing names, and records representing concepts.  They both seem
equally denormalised to me.  Either the name *is* the object being
described, or the name elements are labels for the element being described,
but in both cases, the same amount of denormalisation seems to be happening.

> ----------------------
> taxonRank | verbatimTaxonRank
> Simple enough - "taxonRank" is controlled, "verbatim" is not. 
> It's yet another mapping exercise for me, but them's the 
> breaks. The whole "rank" issue is so fraught that one of our 
> datasets here uses numeric codes. Which is fine, until you 
> fill up all of the slots. What the world really needs is a 
> dotted decimal notation, where negative numbers are allowed. 
> Family, subfamily, and superfamily would be "5", "5.1", 
> "5.-1". If you ever need a sub-superfamily, then it's 
> "5.-1.1" . But maybe that's over-engineering things again.

I'm not sure I understand the value of a numeric surrogate for rank in DwC
in place of (or in addition to) a controlled vocabulary for taxonRank.
Sure, you can do clever semantic things, but it seems to me that those
clever that kind of information should be embedded within code logic tied to
the controlled vocabulary; but not part of the DwC itself.

> In any case. According to the wiki page, the controlled 
> vocabulary seems to be just a list of strings. I would have 
> expected them to be typed named individuals, permitting you 
> to have an abbreviation, and the english and latin name. A 
> difficulty is that in order to render a botanical name 
> correctly, you need the rank abbreviation string: "Evolvulus 
> alsinoides var. sericeus". At present, there is no DWC 
> property for that.

Agreed -- there needs to be more robust attributes for the taxonRank
controlled vocabulary.  They probably shouldn't be part of DwC, but we
should have a community-shared representation of what those attributes are
(e.g., standard abbreviations for each rank that can be used for
concatenating a "standard" compound name-string).  Markus and I discussed
that a bit in Berlin.  Once I get my own head around it, I'll try to draft
something for further discussion.


