Hi Steve,
I read through Rich's various emails in the thread, Paul's email (below), and looked at the Darwin Core Taxon and Identification class terms and came up with: http://bioimages.vanderbilt.edu/pages/taxon-diagram1.gif Please note that this diagram does NOT represent an opinion on my part but rather an attempt to summarize what people have said in a graphical way.
That's pretty close. The way we deal with the "taxonName" box, however, is as a subset of taxonNameUsage which we've been calling a "Protonym". Roughly equivalent to a botanical basionym, but more general. Basically, a protonym is "the naxonNameUsage instance that established the name". The word "established" is the one that needs to be carefully defined; for scientific names, it refers to the particular taxonNameUsage instance that established the name in accordance (or attempted accordance with) with the relevant Code(s) ["Code(s)" because some names fall under more than one Code -- see Ambiregnal).
Anyway, the point is, I would represent the diagram with the taxonName box represented as a subset of tanxonNameUsage, or as a recursive relationship (not sure how best to do that using your symbols).
A more detailed explanation starts on p. 18 (after "Taxa" heading) of this article: http://systbio.org/files/phyloinformatics/1.pdf ...where the term "Assertion" is equivalent to "taxonNameUsage".
- There are taxon concepts, which I guess represents a
particular circumscription of individuals. The taxon concept is the result of some kind of rule that allows one to decide whether particular individuals should be included in that taxon or not. The set of all biological individuals that are included are the actual concept (or maybe not?).
I think that's probably about right, but I generally prefer Nico's wording in his reply. Only thing I'm a little uncomfortable with in his #1 (second paragraph) is the notion that a Taxon Concept relies on the existence of a scientific name. I think that a taxon concept exists independantly of the name(s) that have been used to label it, to include circumscribed sets of individuals that have not yet been assigned a scientific name. I also think they can exist independently of a publication. I also tend to think of the concept as the "implied" set of organisms (living, dead, yet-to-be-born), which is more or less what I think Nico means, but perhaps worded slightly differently.
I agree that what I think of as "taxonNameUsage" is somewhat close to what Nico defines as "Taxon Concept", but a TNU doesn't always come with an implied concept -- sometimes it's just a raw name-usage without an implied concept (e.g., in a catalog of type specimens at a Museum). However, I would say that *all* taxon concepts are anchored to at least one TNU instance, keeping in mind that the "N" part doesn't have to be a scientific name, and the "U" doesn't have to occur within the scope of a publication.
- There are taxon name usages, which are a sort of node that
connects a name with a concept. If I'm getting Rich right, this is the resource to which dwc:Identifications should be tied. Rich also suggested that taxon name usages might be instances of the dwc:Taxon class. Although these three types of resources aren't all defined as "classes" in Darwin Core, it seems to me that they are classes in the "RDF sense" (i.e. that their instances can be typed to them).
As mentioned, I think of taxon concepts as additional contextual information that exist in the form of TNUs. Just like Nomecnaltural Acts (which are not concepts). Just like a lot of other information that's not really part of a defined taxon concept (but often constitutes information that definds the boundaries of the concept circumscription).
TNUs are really just a core index of documented usages of names for organisms, where "documented" is usually a publication, but can include other forms of static documentation. But the thing is, basically all nomenclatural acts, and all concept definitions, and a very, very large proposrtion of information about biodiversity (at least the ones that provide context via scientific names) exist as part of a TNU.
In the diagram, I used triangles to indicate 1:many relationships similar to the way Rich did in his original diagram. In the DwC term index, (http://rs.tdwg.org/dwc/terms/index.htm), there seem to be other entities represented that are like the "acceptedName" (i.e. originalName, parentName) but I've left them off the diagram for simplicity and because I don't fully understand them.
acceptedName[Id] refers to the particular usage instance of a name that the data provider believes "got it right" (i.e., right spelling, right parent, right rank, right set of heterotypic synonyms). It's a way of saying, "When we use the name Aus bus, we mean it in the sense of Jones, 1980. It's the "sec" reference that Nico was referring to.
originalName[ID] is a pointer to the Protonym TNU for the terminal name. For example, if Aus bus was originally described by Linnaeus in 1758, there would be a TNU for the epithet "bus" linked to the reference of Linnaeus 1758, and that particular usage instance for "bus" would be the Protonym. If the data provider wanted to represent the name as "Xus bus (Linnaeus) Smith" (botanical standard for saying the species epithet "bus" established by Linnaeus, first combined with the genus "Xus" by Smith), then the originalName[ID] would refer to the (protonym) usage instance of "Aus bus" by Linnaeus.
parentName[ID] is a link to the parent taxon of the particular usage. For example, if the given record (taxonID) referred to the protonym instance of Aus bus in Linnaeus 1758, then the parentName[ID] would point to another TNU representing the treatment of the genus "Aus" in Linnaeus 1758. If the taxonID record is for the usage of Xus bus (L.) by Smith, then the parentUsage[ID] would refer to the usage instance of the genus name "Xus" by Smith.
All three are recursive links back to other taxonNameUsage instances.
I've indicated my guess about some of the key properties of each of the four "classes" (including also Identifications) by arrows pointing away from the boxes. In a lot of cases there are both xyz and xyzID terms, where xyz would be for a string literal and xyzID would be for a GUID (e.g. URI). But they would behave the same way, so I only showed the xyzID version in the diagram.
Most of these seem about right to me, but I haven't examined them in detail yet.
Here is a use case for me. I refer to the Gleason and Cronquist key and the Golden Guide to the Trees to identify a tree that I've documented in an Occurrence. identifiedBy would be me. nameAccordingTo would be a reference to the Gleason and Cronquist treatment. namePublishedIn would be the original publication for the species name.
Agreed.
identificationReferences would be the Golden Guide and the Gleason and Cronquist key.
Not sure -- I guess this presumes that the taxon concept represented in G&C is congruent with the taxon concept represented in the Golden Guide.
If somebody like Pete had created a URI to represent the concept, I could refer to that using taxonConceptID.
Right...and presumably this taxonConceptID would include the fact that the G&C TNU and the Golden Guide TNU represent congruent Concepts.
So is this anything close to reality?
I would say very close...but I didn't study your email or diagram in detail.
Aloha, Rich