Re: [tdwg-tag] [tdwg-rdf: 105] Re: Any TCS users with experiences to report?

28 Nov 2012

      Excerpts from what Richard Pyle wrote and responses:
...
I'm not saying this ever will happen, or even should happen.  But we've
developed the data model such that it *could* happen, if it turns out to be
a useful mechanism for mapping specimens to published taxon concepts.
...
As so often is the case, I think the problem here boils down to
identifiers and the metadata that we associate with them.
Absolutely!!! This is often not intuitive stuff, so the trickey part is
getting people to apply identifiers and cross-link them in an appropriate
and consistent way.
As a person who isn't really sure if he believes in RDF (and the 
co-convener of the RDF Task Group) and as a person who finds this whole 
conversation more annoying than about anything else he can think of (but 
who is actually trying to walk this walk with his metadata), I'm saying 
that we are there.  HTTP URIs are not THE way to create identifiers but 
they are A way to create identifiers that demonstrably work.  RDF is not 
THE way to cross link identifiers but they are A way to cross link 
identifiers.  History is full of examples where a way of doing things 
that wasn't the best won out because it was there when needed.  (I'm 
thinking typewriter keyboard.)
...
...
Let's say in the real-life example above, somebody (we
can say GNUB) assigns a persistent identifier (perhaps a
URI constructed from an opaque UUID) to "Juncus diffusissimus Buckl. sec
L. Urbatsch 2009".
...
We could say with an rdf:type statement that the resource
identified by the URI is a TNU.  We can give that resource a
tc:hasName property linking it to the name which is
represented by the string "Juncus diffusissimus Buckl.".
(I'm not sure what property we use to say that L. Urbatch made the
assertion).
I'm not sure how you would do it in RDF, but if it's any help the relevant
DwC term is taxon:nameAccordingToID.
This is a major part of why I'm interested in this conversation.  All of 
the dwc: "ID" terms are flawed for use in RDF for technical reasons that 
I've described elsewhere (see 
http://code.google.com/p/tdwg-rdf/wiki/Beginners4Vocabularies#4.6._The_Darwi...  
http://code.google.com/p/tdwg-rdf/wiki/DublinCore#1.3.2.4._dcterms:identifie...) 
and 
http://code.google.com/p/tdwg-rdf/wiki/DublinCore#2.4._Possible_courses_of_a... 
if you care about this).  I believe that tc:accordingTo would be a 
fairly exact equivalent to dwc:nameAccordingToID which does not have the 
subproperty problems the DwC ID terms have in RDF.  So the question in 
my mind (and I think a question posed explicitly earlier in this thread) 
is whether enough of the technical stuff you want to do with TNUs can be 
done with the TDWG Taxon Concept ontology which is based on a ratified 
TDWG standard (TCS).
...
It's certainly part of the metadata for the TNU itself:
There is a TNU for diffusissimus within the Reference of Buckl.  This is the
original description of the epithet, so it is also the Protonym for
diffusissimus.  There is also a TNU for the genus name Juncus as used within
the Reference of Buckl.  If, in the same publication, Buckl. Also
established that genus, then the genus would also be the Protonym TNU.
However, the genus Juncus was established by L., so there is another TNU for
Juncus that is the protonym (Juncus L.), and the TNU for the usage of Juncus
within Buckl. Links to the TNU of Juncus L. (the Protonym).
So, that's three TNUs:
1. Juncus L. sec. L. (Protonym for the genus Juncus)
2. Juncus L. sec. Buckl. (links to 1 as ProtonymID)
3. diffusissimus Buckl. sec. Buckl. (Protonym for the species diffusissimus,
links to 2 as parent)
There are also two more TNUs linked to the Urbatsch 2009 publication:
4. Juncus L. sec. Urbatsch (links to 1 as ProtonymID)
5. diffusissimus Buckl. sec. Urbatsch (links to 3 as ProtonymID, and to 4 as
parent)
...
Now let's say that L. Urbatsch publishes a paper describing in detail her
concept of Juncus diffusissimus Buckl.
Do you mean the 2009 paper of "Juncus diffusissimus Buckl. sec L. Urbatsch
2009"? (In which case we have the TNUs as 4 & 5).... or do you mean a later
paper (2010)?  In which case we'd need two more TNUs. Is the "2009" thing a
specimen determination, or a publication?  It doesn't matter -- I just want
to make sure I'm following your example correctly.
The 2009 "thing" is something that L. Urbatsh had in his head - the idea 
of how he thought the name "Juncus diffusissimus Buckl." should be 
applied to real organisms and the specimens that come from them.  We 
don't know what that idea is but presumably he had one and we could 
assign a persistent identifier to it and describe it using the string 
"Juncus diffusissimus Buckl. sec L. Urbatsch 2009".   I was thinking 
that was what you meant by TNU.  Whether or not it is better to 
proliferate a bunch of additional TNU instances, assign them their own 
identifiers, and relate them to each other is a technical detail that as 
an end user I'm happy to let you take care of within your GNUB system.  
End users want a persistent identifier of some sort to link 
identification instances to.  They will hopefully find it through a 
user-friendly interface that hides the ugly details you are describing here.

[some quotes omitted for brevity]
...
...
The point I'm trying to make is that as long as this "thing"
that we are variously calling "taxon name usage", "taxon concept",
"shallow taxonomic concept", or "deep taxonomic concept"
can be assigned an identifier, what really matters is the metadata
we associate with it, not really what we call it.
Absolutely!  With emphasis on "really matters".  It's certainly important to
mint persistent identifiers, but it's equally (more?) important to make sure
we understand the "thing" the identifier represents.  We have a pretty
stable & solid definition of a TNU, that has emerged form many NOMINA
meetings over a number of years.  What I don't think has been done yet, is
any real analysis of whether the appropriate subset of TNUs accompanied by a
robust concept definition *are* the concepts (i.e., the TNU GUID is the
concept GUID); or whether there should be a separate GUID minted explicitly
for the concept, which then links back to the TNU GUID as it's "definition"
or "source" (or whatever).  I can see it working both ways.
I looked at the TDWG Taxon Concept ontology (see 
http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/Taxo... 
) and came up with the following scenario.  Let's say that I assign a 
persistent identifier [URI1] to the TNU we are talking about here, 
"Juncus diffusissimus Buckl. sec L. Urbatsch 2009" and that I'm right 
that by TNU we mean the idea that Lowell Urbatch had in his head about 
how he meant to  (sorry to drag him into this randomly - I actually 
don't know him). 
We can assign a tc:accordingTo property to [URI1] with the object of 
that property being some kind of resource whose metadata says that 
Lowell Urbatcsh used the name in some sense known to him in 2009.  (We 
could work out the details of how one would write that metadata but 
there probably is enough stuff in the Dublin Core and FOAF vocabularies 
to do it.  I think later in your email you said GNUB calls it a 
"Reference".)   Now let's say that in 2012 I call him on the phone and 
say "hey Lowell, what taxonomic treatment were you thinking about in 
2009 when you annotated that specimen in the NLU herbarium with barcode 
LSU00000428?" and he says "Gleason and Cronquist, 1991".  I have two 
choices:
1.  add  a tc:describedBy property to the metadata for [URI1] with the 
value urn:isbn:0893273651 (a persistent URI representing Gleason and 
Cronquist 1991) and "promote" the resource identified by [URI1] from 
generic TNU to "deeper" taxonomic concept.
2. create a different [URI2] which has a property/value pair of 
tc:describedBy urn:isbn:0893273651 , then somehow relate the royal URI2 
taxonomic concept to the lowly URI1 generic TNU. 

If we use owl:sameAs to relate the two  instances (i.e. assert [URI2] 
owl:sameAs [URI1] ) then there is no advantage to option two.  All 
statements made about [URI1] would apply to [URI2] and vice-versa.  All 
we would be accomplishing is doubling the number of triples that we have 
to keep track of.  The question is whether this would result in "bad" or 
"silly" statements (the essence  of Bob Morris' objection to owl:sameAs, 
I think).  If so, then we need some kind of new term to relate royal 
Taxon Concepts ("deep" taxonomic concepts) to lowly  generic TNUs 
("shallow" taxonomic concepts).  I don't think such a term exists at the 
moment.

The advantage of choice 1 is that we have off-the-shelf technology (the 
TDWG Taxon Concept ontology based on a ratified TDWG standard, TCS).  If 
we go with choice 2 (and don't use owl:sameAs to relate the two URIs), 
then we doom ourselves to another five+ years of thrashing out some new 
vocabulary or ontology for relating TNUs to Taxon Concepts.  So here 
where we would be if we went with choice 1:

1. The rdf:type of the "thing" is tc:Taxon or tc:TaxonConcept .  The 
ontology declares them to be equivalent classes. 
2. The minimal requirement for a tc:Taxon is to have a persistent 
identifier and a name associated with it, either through tc:nameString 
literal or better yet a tc:hasName property whose value is a persistent 
URI that provides more extensive metadata than a string.  uBio comes to 
mind here as a giant source of name strings with assigned persistent 
identifiers.  A tc:Taxon instance having only name information would be 
a nominal taxon - an undesirable but probably common circumstance.
3. If the tc:Taxon instance has a tc:accordingTo property, it is 
elevated to TNU status because we can know who used the name and when if 
we investigate of the object of the tc:accordingTo property further.  
Maybe we can learn more about this kind of tc:Taxon instance later, but 
in most cases probably not.
4. If the tc:Taxon instance has a tc:describedBy property, then it is 
elevated to full taxonomic concept status because it would be related to 
the persistent identifier of a published taxonomic treatment.  One could 
then theoretically discover all kinds of cool relationships to other 
full-blown concepts using the tools that Nico and Rich are going to 
develop. 

One could create a Venn diagram showing the subset relationships of 
these three levels of tc:Taxon.  If it made Rich feel better, he could 
mint a class URI for the deep taxonomic concepts which could be used in 
rdf:type statements in addition to the basic rdf:type of tc:Taxon . 

I see this as a way to make rapid progress on this front by leveraging 
work which has already been completed and accepted by TDWG but not 
really implemented.  The TDWG Taxon Concept ontology and TCS may not do 
everything that people want as far as allowing one to define all of the 
set relationships required to do the fancy stuff.  But I don't see why 
that can't be added in a TCS 2.0 that is backwardly compatible with TCS 
1.2 . 

I will defend my repeated references to HTTP URIs and RDF by saying that 
this email is being posted to the TDWG RDF group list but also because 
the ratified standard for GUIDs (see 
http://bioimages.vanderbilt.edu/pages/guid-applicability-final-2011-01.pdf) 
says that persistent identifiers must be HTTP proxied (recommendation 2) 
and that they should resolve to provide RDF/XML (recommendation 10).   
Having not yet achieved the degree of cynicism attained by Rod Page 
about people actually following TDWG standards, I still believe that we 
should try to follow them.  Or else just stop wasting time on this...

Steve

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

Re: [tdwg-tag] [tdwg-rdf: 105] Re: Any TCS users with experiences to report?

Steve Baskauf