Hi Paul,
I'll try to address your questions relating to DwC terms in the class "Taxon". I'm partly responsible for some of them being there, and even I was a bit confused about what a couple of them meant. Fortunately, I had the opportunity to sit down with Markus in Berlin one evening last week, and that conversation helped clear up a number of things in min mind (of course, Markus may well contradict what I'm about to write).
As per the TDWG vocabulary, we make a fairly strong distinction between taxonomic and nomenclatural components. A TaxonName is not a TaxonConcept. I'm finding that the Taxon predicates in the DwC vocabulary seem to be a mix of things that variously belong to names and taxa. My impression is that the distinction is there, in fact - it is modelled by a DwC taxon having or not having a nameAccordingTo rather than by an explicit class. If there is no AccordingTo, then we are discussing the "nominal taxon" - what the name means in the absence of any specific information about what it means.
I think that's generally a safe assumption; but I think it's a bit more involved than that.
But as we are so careful to distinguish between name and taxon, I think I will take the (safer) position that a Name is not the same thing as its nominal taxon. That is, I will not declare that biodiversity.org.au names are DwC taxa, even though they have properties from DwC.
The problem with treating "name" as a distinct entity (independent of a particular usage of the name) is problematic, because there are several different interpretations of what a "name" is. Is it a simple text string? Or, is it a nomenclatural "object" with properties beyond the text string? Are all orthographic variants and misspellings different representations of the same "name" (object perspective); or is each variant a different "name" (text-string perspective). Is a name formatted as "Genus (Subgenus) species" the same as a name formatted as "Genus species"? Is authorship and associated details part of the name? What about infraspecific prefixes such as "var." and "subsp."? This is just a sampling of questions for which you will find a variety of answers when talking to different people in our community about what a "name" is.
For this reason, I'm rather unclear on what sorts of identifiers that one might populate dwc:scientificNameID with. I would have guessed that this is where you would put an identifier fora Taxon Name Usage (TNU) record that represents either a Protonym (~=basionym), a New Combination, a Replacement name (nom. nov.), or the like; but we already have dwc:originalNameUsageID for that function. Perhaps dwc:scientificNameID should link to a nominal concept record? Or maybe something like an ITIS TSN record? I've never been very clear on this. The example given (urn:lsid:ipni.org:names:37829-1:1.3) doesn't seem to resolve right now, but corresponding IPNI record shows details for what looks to me like a TNU (and hence, probably best represented via dwc:originalNameUsageID).
(Perhaps our data should genrate an id for these nominal taxa
- it's easy enough, just use the name objectid as the taxon
objectid and "[afd|apni].taxon.nominal" as the LSID namespace. In principle, everyone who uses a name is also asserting that their taxon "is congruent to" the nominal taxon. Every synonym relationship is also an assertion of synonymy to the nominal taxon. But that's an awful lot of unnecessary detail to make explicit - over-engineering things is one of my failings. Forget I said it.)
The topic of Nominal Concepts is definitely one that needs to be hammered out at some point -- but I agree, now may not be the best time.
DwC properties variously use "taxonID" and also "nameUsageID'.
That's what I used to think too -- but then I realised that that the unqualified "nameUsageID" isn't in the dwc spec (as far as I can tell) -- onlt the qualified versions (dwc:acceptedNameUsageID, dwc:parentNameUsageID, and dwc:originalNameUsageID).
Thus, I interpret TaxonID to effectively be nameUsageID (and Markus agreed when we discussed this -- right Markus???).
However, not all of our names are scientific names. We have cultivar names, and we have vernacular names. Al usages of these are TDWG TaxonConcepts - they have synonomy relationships and so on. However, the DwC property for declaring that a taxon record has a name seems to be "scientificNameID". This would seem to be inappropriate for taxa that don't have scientific names. I think that the correct way for me to go is to not declare these taxa as DwC taxa at all. That is, the absence of a "nameID" property seems to indicate that DwC is only "interested" in scientific names - scientific taxa if you will.
I tend to agree. I think cultivars will fit within the scientificName framework reasonably well; but not so much for vernaculars. I think that they could be represented by a taxonID instance -- but I don't see where you would put the actual vernacular name-string.
To continue: These properties apply to our taxa (TaxonConcepts) without difficulty: scientificNameID parentNameUsageID nameAccordingToID
These apply to our taxon names: acceptedNameUsageID originalNameUsageID namePublishedInID scientificNameAuthorship
One of the wiki pages seemed to indicated that Taxa would have both a nameAccordingToID and also the namePublishedInID (the two being equal indicting that the taxon is the original one), but I think we will continue to not do this on the grounds that it's best to assert things only once to avoid data inconsistencies.
Actually, these are quite different things. They only are identical if you are passing the original taxon concept circumscription that was used when the name was first established under the Code. In the (vast?) majority of cases, they will be different; with dwc:nameAccordingToID pointing to the publication representing the particular taxon concept circumscription, and namePublishedInID pointing to the publication in which the name was formally established under the relevant Code.
scientificName higherClassification kingdom | phylum | class | order | family | genus | subgenus| specificEpithet | infraspecificEpithet
The various properties for name parts are ... problematic from the point of view of our data. These properties sort of di double duty: they are places for putting parts of names (ie, strictly nomenclatural), and they also are places to put taxonomy.
With respect to holding name parts, there seems to be no property in which to put - for instance - a subfamily name. The closest thing is "infraspecificEpithet", which contains the terminal epithet, but obviously that's not right for supergeneric names. TCS and the TDWG vocabulary have "uninomial". It might be nice to have this property, and to declare the other bits as being subproperties.
The idea is that for a record representing the subfamily name itself, the text-string subfamily name goes in dwc:scientificName. But for names below the rank of subfamily, the Subfamily name is genrally not included among the parsed classification elements (neither are any other higher infra-rank names). The real information, I think, goes in dwc:scientificName. The terms dwc:genus, dwc:subgenus, dwc:specificEpithet, and infraspecificEpithet (as well as scientificNameAuthorship) are there to allow you to provide pre-parsed name elements of a compond name represented in scientificName.
With respect to taxonomy, if you want to use these for holding taxonomic relationships, then you don't need "order", you need "orderNameUsageID" or "orderTaxonID".
No, because presumably there would be a record for the Order name itself (linked to child names via a series of parentNameUsageID), which would have its own value of taxonID (="nameUsageID")
Of course, what's really going on here is that these fields are simply a denormalisation of the data.
Yes, exactly. You can represent them in a normalized way using the available terms, but not all people have the information broken into a normalised form, so DWC accomodates a denormalized representation as well.
Let's face it: in my data, I do indeed have the scientific name string in the taxon record even though *technically* it's duplicating the data. So I think the conclusion is that these properties *on taxon records* are denormalisation, whereas these properties *on name records* are primary data. This is fine for me, but only because I have a separate TaxonName class.
I don't think I understand the difference in normalisation between records representing names, and records representing concepts. They both seem equally denormalised to me. Either the name *is* the object being described, or the name elements are labels for the element being described, but in both cases, the same amount of denormalisation seems to be happening.
taxonRank | verbatimTaxonRank
Simple enough - "taxonRank" is controlled, "verbatim" is not. It's yet another mapping exercise for me, but them's the breaks. The whole "rank" issue is so fraught that one of our datasets here uses numeric codes. Which is fine, until you fill up all of the slots. What the world really needs is a dotted decimal notation, where negative numbers are allowed. Family, subfamily, and superfamily would be "5", "5.1", "5.-1". If you ever need a sub-superfamily, then it's "5.-1.1" . But maybe that's over-engineering things again.
I'm not sure I understand the value of a numeric surrogate for rank in DwC in place of (or in addition to) a controlled vocabulary for taxonRank. Sure, you can do clever semantic things, but it seems to me that those clever that kind of information should be embedded within code logic tied to the controlled vocabulary; but not part of the DwC itself.
In any case. According to the wiki page, the controlled vocabulary seems to be just a list of strings. I would have expected them to be typed named individuals, permitting you to have an abbreviation, and the english and latin name. A difficulty is that in order to render a botanical name correctly, you need the rank abbreviation string: "Evolvulus alsinoides var. sericeus". At present, there is no DWC property for that.
Agreed -- there needs to be more robust attributes for the taxonRank controlled vocabulary. They probably shouldn't be part of DwC, but we should have a community-shared representation of what those attributes are (e.g., standard abbreviations for each rank that can be used for concatenating a "standard" compound name-string). Markus and I discussed that a bit in Berlin. Once I get my own head around it, I'll try to draft something for further discussion.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html