Excerpts from what Richard Pyle wrote and responses:
I'm not saying this ever will happen, or even should happen. But we've
developed the data model such that it *could* happen, if it turns out to be
a useful mechanism for mapping specimens to published taxon concepts.
As so often is the case, I think the problem here boils down to
identifiers and the metadata that we associate with them.
Absolutely!!! This is often not intuitive stuff, so the trickey part is
getting people to apply identifiers and cross-link them in an appropriate
and consistent way.
As a person who isn't really sure if he believes in RDF (and the
co-convener of the RDF Task Group) and as a person who finds this whole
conversation more annoying than about anything else he can think of
(but who is actually trying to walk this walk with his metadata), I'm
saying that we are there. HTTP URIs are not THE way to create
identifiers but they are A way to create identifiers that demonstrably
work. RDF is not THE way to cross link identifiers but they are A way
to cross link identifiers. History is full of examples where a way of
doing things that wasn't the best won out because
it was there when needed. (I'm thinking typewriter keyboard.)
Let's say in the real-life example above, somebody (we
can say GNUB) assigns a persistent identifier (perhaps a
URI constructed from an opaque UUID) to "Juncus diffusissimus Buckl. sec
L. Urbatsch 2009".
We could say with an rdf:type statement that the resource
identified by the URI is a TNU. We can give that resource a
tc:hasName property linking it to the name which is
represented by the string "Juncus diffusissimus Buckl.".
(I'm not sure what property we use to say that L. Urbatch made the
assertion).
I'm not sure how you would do it in RDF, but if it's any help the relevant
DwC term is taxon:nameAccordingToID.
This is a major part of why I'm interested in this conversation. All
of the dwc: "ID" terms are flawed for use in RDF for technical reasons
that I've described elsewhere (see
http://code.google.com/p/tdwg-rdf/wiki/Beginners4Vocabularies#4.6._The_Darwin_Core_vocabulary_normative_definition_as_RDF
http://code.google.com/p/tdwg-rdf/wiki/DublinCore#1.3.2.4._dcterms:identifier_(AC)
and
http://code.google.com/p/tdwg-rdf/wiki/DublinCore#2.4._Possible_courses_of_action_where_users_may_be_inclined_to_u
if you care about this). I believe that tc:accordingTo would be a
fairly exact equivalent to dwc:nameAccordingToID which does not have
the subproperty problems the DwC ID terms have in RDF. So the question
in my mind (and I think a question posed explicitly earlier in this
thread) is whether enough of the technical stuff you want to do with
TNUs can be done with the TDWG Taxon Concept ontology which is based on
a ratified TDWG standard (TCS).
It's certainly part of the metadata for the TNU itself:
There is a TNU for diffusissimus within the Reference of Buckl. This is the
original description of the epithet, so it is also the Protonym for
diffusissimus. There is also a TNU for the genus name Juncus as used within
the Reference of Buckl. If, in the same publication, Buckl. Also
established that genus, then the genus would also be the Protonym TNU.
However, the genus Juncus was established by L., so there is another TNU for
Juncus that is the protonym (Juncus L.), and the TNU for the usage of Juncus
within Buckl. Links to the TNU of Juncus L. (the Protonym).
So, that's three TNUs:
1. Juncus L. sec. L. (Protonym for the genus Juncus)
2. Juncus L. sec. Buckl. (links to 1 as ProtonymID)
3. diffusissimus Buckl. sec. Buckl. (Protonym for the species diffusissimus,
links to 2 as parent)
There are also two more TNUs linked to the Urbatsch 2009 publication:
4. Juncus L. sec. Urbatsch (links to 1 as ProtonymID)
5. diffusissimus Buckl. sec. Urbatsch (links to 3 as ProtonymID, and to 4 as
parent)
Now let's say that L. Urbatsch publishes a paper describing in detail her
concept of Juncus diffusissimus Buckl.
Do you mean the 2009 paper of "Juncus diffusissimus Buckl. sec L. Urbatsch
2009"? (In which case we have the TNUs as 4 & 5).... or do you mean a later
paper (2010)? In which case we'd need two more TNUs. Is the "2009" thing a
specimen determination, or a publication? It doesn't matter -- I just want
to make sure I'm following your example correctly.
The 2009 "thing" is something that L. Urbatsh had in his head - the
idea of how he thought the name "Juncus diffusissimus Buckl." should be
applied to real organisms and the specimens that come from them. We
don't know what that idea is but presumably he had one and we could
assign a persistent identifier to it and describe it using the string
"Juncus diffusissimus Buckl. sec L. Urbatsch 2009". I was thinking
that was what you meant by TNU. Whether or not it is better to
proliferate a bunch of additional TNU instances, assign them their own
identifiers, and relate them to each other is a technical detail that
as an end user I'm happy to let you take care of within your GNUB
system. End users want a persistent identifier of some sort to link
identification instances to. They will hopefully find it through a
user-friendly interface that hides the ugly details you are describing
here.
[some quotes omitted for brevity]
The point I'm trying to make is that as long as this "thing"
that we are variously calling "taxon name usage", "taxon concept",
"shallow taxonomic concept", or "deep taxonomic concept"
can be assigned an identifier, what really matters is the metadata
we associate with it, not really what we call it.
Absolutely! With emphasis on "really matters". It's certainly important to
mint persistent identifiers, but it's equally (more?) important to make sure
we understand the "thing" the identifier represents. We have a pretty
stable & solid definition of a TNU, that has emerged form many NOMINA
meetings over a number of years. What I don't think has been done yet, is
any real analysis of whether the appropriate subset of TNUs accompanied by a
robust concept definition *are* the concepts (i.e., the TNU GUID is the
concept GUID); or whether there should be a separate GUID minted explicitly
for the concept, which then links back to the TNU GUID as it's "definition"
or "source" (or whatever). I can see it working both ways.
I looked at the TDWG Taxon Concept ontology (see
http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/TaxonConcept.owl
) and came up with the following scenario. Let's say that I assign a
persistent identifier [URI1] to the TNU we are talking about here,
"Juncus diffusissimus Buckl. sec L. Urbatsch 2009" and that I'm right
that by TNU we mean the idea that Lowell Urbatch had in his head about
how he meant to (sorry to drag him into this randomly - I actually
don't know him).
We can assign a tc:accordingTo property to [URI1] with the object of
that property being some kind of resource whose metadata says that
Lowell Urbatcsh used the name in some sense known to him in 2009. (We
could work out the details of how one would write that metadata but
there probably is enough stuff in the Dublin Core and FOAF vocabularies
to do it. I think later in your email you said GNUB calls it a
"Reference".) Now let's say that in 2012 I call him on the phone and
say "hey Lowell, what taxonomic treatment were you thinking about in
2009 when you annotated that specimen in the NLU herbarium with barcode
LSU00000428?" and he says "Gleason and Cronquist, 1991". I have two
choices:
1. add a tc:describedBy property to the metadata for [URI1] with the
value urn:isbn:0893273651 (a persistent URI representing Gleason and
Cronquist 1991) and "promote" the resource identified by [URI1] from
generic TNU to "deeper" taxonomic concept.
2. create a different [URI2] which has a property/value pair of
tc:describedBy urn:isbn:0893273651 , then somehow relate the royal URI2
taxonomic concept to the lowly URI1 generic TNU.
If we use owl:sameAs to relate the two instances (i.e. assert [URI2]
owl:sameAs [URI1] ) then there is no advantage to option two. All
statements made about [URI1] would apply to [URI2] and vice-versa. All
we would be accomplishing is doubling the number of triples that we
have to keep track of. The question is whether this would result in
"bad" or "silly" statements (the essence of Bob Morris' objection to
owl:sameAs, I think). If so, then we need some kind of new term to
relate royal Taxon Concepts ("deep" taxonomic concepts) to lowly
generic TNUs ("shallow" taxonomic concepts). I don't think such a term
exists at the moment.
The advantage of choice 1 is that we have off-the-shelf technology (the
TDWG Taxon Concept ontology based on a ratified TDWG standard, TCS).
If we go with choice 2 (and don't use owl:sameAs to relate the two
URIs), then we doom ourselves to another five+ years of thrashing out
some new vocabulary or ontology for relating TNUs to Taxon Concepts.
So here where we would be if we went with choice 1:
1. The rdf:type of the "thing" is tc:Taxon or tc:TaxonConcept . The
ontology declares them to be equivalent classes.
2. The minimal requirement for a tc:Taxon is to have a persistent
identifier and a name associated with it, either through tc:nameString
literal or better yet a tc:hasName property whose value is a persistent
URI that provides more extensive metadata than a string. uBio comes to
mind here as a giant source of name strings with assigned persistent
identifiers. A tc:Taxon instance having only name information would be
a nominal taxon - an undesirable but probably common circumstance.
3. If the tc:Taxon instance has a tc:accordingTo property, it is
elevated to TNU status because we can know who used the name and when
if we investigate of the object of the tc:accordingTo property
further. Maybe we can learn more about this kind of tc:Taxon instance
later, but in most cases probably not.
4. If the tc:Taxon instance has a tc:describedBy property, then it is
elevated to full taxonomic concept status because it would be related
to the persistent identifier of a published taxonomic treatment. One
could then theoretically discover all kinds of cool relationships to
other full-blown concepts using the tools that Nico and Rich are going
to develop.
One could create a Venn diagram showing the subset relationships of
these three levels of tc:Taxon. If it made Rich feel better, he could
mint a class URI for the deep taxonomic concepts which could be used in
rdf:type statements in addition to the basic rdf:type of tc:Taxon .
I see this as a way to make rapid progress on this front by leveraging
work which has already been completed and accepted by TDWG but not
really implemented. The TDWG Taxon Concept ontology and TCS may not do
everything that people want as far as allowing one to define all of the
set relationships required to do the fancy stuff. But I don't see why
that can't be added in a TCS 2.0 that is backwardly compatible with TCS
1.2 .
I will defend my repeated references to HTTP URIs and RDF by saying
that this email is being posted to the TDWG RDF group list but also
because the ratified standard for GUIDs (see
http://bioimages.vanderbilt.edu/pages/guid-applicability-final-2011-01.pdf)
says that persistent identifiers must be HTTP proxied (recommendation
2) and that they should resolve to provide RDF/XML (recommendation
10). Having not yet achieved the degree of cynicism attained by Rod
Page about people actually following TDWG standards, I still believe
that we should try to follow them. Or else just stop wasting time on
this...
Steve
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu