Excerpts from what Richard Pyle wrote and responses:
I'm not saying this ever will happen, or even should happen. But we've developed the data model such that it *could* happen, if it turns out to be a useful mechanism for mapping specimens to published taxon concepts.
As so often is the case, I think the problem here boils down to identifiers and the metadata that we associate with them.
Absolutely!!! This is often not intuitive stuff, so the trickey part is getting people to apply identifiers and cross-link them in an appropriate and consistent way.
As a person who isn't really sure if he believes in RDF (and the co-convener of the RDF Task Group) and as a person who finds this whole conversation more annoying than about anything else he can think of (but who is actually trying to walk this walk with his metadata), I'm saying that we are there. HTTP URIs are not THE way to create identifiers but they are A way to create identifiers that demonstrably work. RDF is not THE way to cross link identifiers but they are A way to cross link identifiers. History is full of examples where a way of doing things that wasn't the best won out because it was there when needed. (I'm thinking typewriter keyboard.)
Let's say in the real-life example above, somebody (we can say GNUB) assigns a persistent identifier (perhaps a URI constructed from an opaque UUID) to "Juncus diffusissimus Buckl. sec
L. Urbatsch 2009".
We could say with an rdf:type statement that the resource identified by the URI is a TNU. We can give that resource a tc:hasName property linking it to the name which is represented by the string "Juncus diffusissimus Buckl.". (I'm not sure what property we use to say that L. Urbatch made the
assertion).
I'm not sure how you would do it in RDF, but if it's any help the relevant DwC term is taxon:nameAccordingToID.
This is a major part of why I'm interested in this conversation. All of the dwc: "ID" terms are flawed for use in RDF for technical reasons that I've described elsewhere (see http://code.google.com/p/tdwg-rdf/wiki/Beginners4Vocabularies#4.6._The_Darwi... http://code.google.com/p/tdwg-rdf/wiki/DublinCore#1.3.2.4._dcterms:identifie...) and http://code.google.com/p/tdwg-rdf/wiki/DublinCore#2.4._Possible_courses_of_a... if you care about this). I believe that tc:accordingTo would be a fairly exact equivalent to dwc:nameAccordingToID which does not have the subproperty problems the DwC ID terms have in RDF. So the question in my mind (and I think a question posed explicitly earlier in this thread) is whether enough of the technical stuff you want to do with TNUs can be done with the TDWG Taxon Concept ontology which is based on a ratified TDWG standard (TCS).
It's certainly part of the metadata for the TNU itself:
There is a TNU for diffusissimus within the Reference of Buckl. This is the original description of the epithet, so it is also the Protonym for diffusissimus. There is also a TNU for the genus name Juncus as used within the Reference of Buckl. If, in the same publication, Buckl. Also established that genus, then the genus would also be the Protonym TNU. However, the genus Juncus was established by L., so there is another TNU for Juncus that is the protonym (Juncus L.), and the TNU for the usage of Juncus within Buckl. Links to the TNU of Juncus L. (the Protonym).
So, that's three TNUs:
- Juncus L. sec. L. (Protonym for the genus Juncus)
- Juncus L. sec. Buckl. (links to 1 as ProtonymID)
- diffusissimus Buckl. sec. Buckl. (Protonym for the species diffusissimus,
links to 2 as parent)
There are also two more TNUs linked to the Urbatsch 2009 publication: 4. Juncus L. sec. Urbatsch (links to 1 as ProtonymID) 5. diffusissimus Buckl. sec. Urbatsch (links to 3 as ProtonymID, and to 4 as parent)
Now let's say that L. Urbatsch publishes a paper describing in detail her concept of Juncus diffusissimus Buckl.
Do you mean the 2009 paper of "Juncus diffusissimus Buckl. sec L. Urbatsch 2009"? (In which case we have the TNUs as 4 & 5).... or do you mean a later paper (2010)? In which case we'd need two more TNUs. Is the "2009" thing a specimen determination, or a publication? It doesn't matter -- I just want to make sure I'm following your example correctly.
The 2009 "thing" is something that L. Urbatsh had in his head - the idea of how he thought the name "Juncus diffusissimus Buckl." should be applied to real organisms and the specimens that come from them. We don't know what that idea is but presumably he had one and we could assign a persistent identifier to it and describe it using the string "Juncus diffusissimus Buckl. sec L. Urbatsch 2009". I was thinking that was what you meant by TNU. Whether or not it is better to proliferate a bunch of additional TNU instances, assign them their own identifiers, and relate them to each other is a technical detail that as an end user I'm happy to let you take care of within your GNUB system. End users want a persistent identifier of some sort to link identification instances to. They will hopefully find it through a user-friendly interface that hides the ugly details you are describing here.
[some quotes omitted for brevity]
The point I'm trying to make is that as long as this "thing" that we are variously calling "taxon name usage", "taxon concept", "shallow taxonomic concept", or "deep taxonomic concept" can be assigned an identifier, what really matters is the metadata we associate with it, not really what we call it.
Absolutely! With emphasis on "really matters". It's certainly important to mint persistent identifiers, but it's equally (more?) important to make sure we understand the "thing" the identifier represents. We have a pretty stable & solid definition of a TNU, that has emerged form many NOMINA meetings over a number of years. What I don't think has been done yet, is any real analysis of whether the appropriate subset of TNUs accompanied by a robust concept definition *are* the concepts (i.e., the TNU GUID is the concept GUID); or whether there should be a separate GUID minted explicitly for the concept, which then links back to the TNU GUID as it's "definition" or "source" (or whatever). I can see it working both ways.
I looked at the TDWG Taxon Concept ontology (see http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/Taxo... ) and came up with the following scenario. Let's say that I assign a persistent identifier [URI1] to the TNU we are talking about here, "Juncus diffusissimus Buckl. sec L. Urbatsch 2009" and that I'm right that by TNU we mean the idea that Lowell Urbatch had in his head about how he meant to (sorry to drag him into this randomly - I actually don't know him). We can assign a tc:accordingTo property to [URI1] with the object of that property being some kind of resource whose metadata says that Lowell Urbatcsh used the name in some sense known to him in 2009. (We could work out the details of how one would write that metadata but there probably is enough stuff in the Dublin Core and FOAF vocabularies to do it. I think later in your email you said GNUB calls it a "Reference".) Now let's say that in 2012 I call him on the phone and say "hey Lowell, what taxonomic treatment were you thinking about in 2009 when you annotated that specimen in the NLU herbarium with barcode LSU00000428?" and he says "Gleason and Cronquist, 1991". I have two choices: 1. add a tc:describedBy property to the metadata for [URI1] with the value urn:isbn:0893273651 (a persistent URI representing Gleason and Cronquist 1991) and "promote" the resource identified by [URI1] from generic TNU to "deeper" taxonomic concept. 2. create a different [URI2] which has a property/value pair of tc:describedBy urn:isbn:0893273651 , then somehow relate the royal URI2 taxonomic concept to the lowly URI1 generic TNU.
If we use owl:sameAs to relate the two instances (i.e. assert [URI2] owl:sameAs [URI1] ) then there is no advantage to option two. All statements made about [URI1] would apply to [URI2] and vice-versa. All we would be accomplishing is doubling the number of triples that we have to keep track of. The question is whether this would result in "bad" or "silly" statements (the essence of Bob Morris' objection to owl:sameAs, I think). If so, then we need some kind of new term to relate royal Taxon Concepts ("deep" taxonomic concepts) to lowly generic TNUs ("shallow" taxonomic concepts). I don't think such a term exists at the moment.
The advantage of choice 1 is that we have off-the-shelf technology (the TDWG Taxon Concept ontology based on a ratified TDWG standard, TCS). If we go with choice 2 (and don't use owl:sameAs to relate the two URIs), then we doom ourselves to another five+ years of thrashing out some new vocabulary or ontology for relating TNUs to Taxon Concepts. So here where we would be if we went with choice 1:
1. The rdf:type of the "thing" is tc:Taxon or tc:TaxonConcept . The ontology declares them to be equivalent classes. 2. The minimal requirement for a tc:Taxon is to have a persistent identifier and a name associated with it, either through tc:nameString literal or better yet a tc:hasName property whose value is a persistent URI that provides more extensive metadata than a string. uBio comes to mind here as a giant source of name strings with assigned persistent identifiers. A tc:Taxon instance having only name information would be a nominal taxon - an undesirable but probably common circumstance. 3. If the tc:Taxon instance has a tc:accordingTo property, it is elevated to TNU status because we can know who used the name and when if we investigate of the object of the tc:accordingTo property further. Maybe we can learn more about this kind of tc:Taxon instance later, but in most cases probably not. 4. If the tc:Taxon instance has a tc:describedBy property, then it is elevated to full taxonomic concept status because it would be related to the persistent identifier of a published taxonomic treatment. One could then theoretically discover all kinds of cool relationships to other full-blown concepts using the tools that Nico and Rich are going to develop.
One could create a Venn diagram showing the subset relationships of these three levels of tc:Taxon. If it made Rich feel better, he could mint a class URI for the deep taxonomic concepts which could be used in rdf:type statements in addition to the basic rdf:type of tc:Taxon .
I see this as a way to make rapid progress on this front by leveraging work which has already been completed and accepted by TDWG but not really implemented. The TDWG Taxon Concept ontology and TCS may not do everything that people want as far as allowing one to define all of the set relationships required to do the fancy stuff. But I don't see why that can't be added in a TCS 2.0 that is backwardly compatible with TCS 1.2 .
I will defend my repeated references to HTTP URIs and RDF by saying that this email is being posted to the TDWG RDF group list but also because the ratified standard for GUIDs (see http://bioimages.vanderbilt.edu/pages/guid-applicability-final-2011-01.pdf) says that persistent identifiers must be HTTP proxied (recommendation 2) and that they should resolve to provide RDF/XML (recommendation 10). Having not yet achieved the degree of cynicism attained by Rod Page about people actually following TDWG standards, I still believe that we should try to follow them. Or else just stop wasting time on this...
Steve