[tdwg-tag] [tdwg-rdf: 105] Re: Any TCS users with experiences to report?

Wed Nov 28 10:11:15 CET 2012

> In that section, Rich notes that 
> 
> "Eventually, a third party may be able to deduce (perhaps through a suite
of
> other, external information) a RelationshipAssertion that maps the TNU
> "[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" to some other,
perhaps
> published and well-defined taxon concept (of the same or different name).
> Also, if there are 100 specimens in the collection that L. Urbatsch
> identified as "Juncus diffusissimus Buckl." in 2009, then anchoring all
100
> Identification instances to the one TNU, allows all of those specimens to
> inherit the mapping of the one "[Juncus] diffusissimus Buckl. sec L.
> Urbatsch 2009" TNU instance to some other better-defined taxon concept."
> 
> From that post, I understood that a TNU (a.k.a. "assertion" in Pyle 2004
http://systbio.org/files/phyloinformatics/1.pdf) 
> can be as vague as an idea that some determiner had in his/her head about 
> how organism/specimen instances should be mapped to a name.  

Yes, a TNU can be that simple.  Basically, a TNU exists whenever someone
documents a scientific name.  That doesn't mean that all of these will be
entered into a database.  But the intention was to scope TNUs to be wide
open to any form of documentation (including a specimen label), in case
someone has a need to record the fact that somebody used a particular name
in a particular way (useful, for example, if you want to track so-called
"manuscript names" that only exist on specimen labels).  

The problem is, if you have a broad scope for what a TNU can be, then there
is no guarantee that the TNU is accompanied by a taxon concept definition.
Certainly a specimen label is not (it applies to only one specimen).
Neither is, for example, a published type catalog (which often records
names, without implied concepts).  And then there are things like newspaper
articles, where perhaps there is a concept somewhere in there, but it's too
ambiguous to map to pin down to any particular concept.  However, some TNUs
(e.g., treatments in revisionary monographs) certainly are accompanied by
well-defined concept definitions.  And those are the ones that we'd like to
see become "anchor-points" to taxon concepts, against which the other
(non-concept-bearing) TNUs can be mapped (as asserted by a third party; or
in some cases by the first party).

> I think from what Rich said there that there is the potential that we as 
> metadata aggregators may at some later point be able to map how 
> that idea in the determiner's head fits in with a more well-defined 
> (e.g. published) taxon description which one may choose to call a 
> taxon concept rather than a TNU.  

That was certainly the intention, yes.  Whether or not that will ever happen
with most (many?) specimens remains to be seen.  Certainly it can happen in
some cases.  For example, in our collection we often have visitors come
through and study our holdings of a particular genus or a particular family.
Often they will make determinations about the specimens they examine.
Sometime later, they'll published a revision of the group.  If they include
a comprehensive "material examined" section, then there's an explicit
relationship between the specimen and the publication that allows direct
mapping of the specimen to the well-defined concept.  However, if there is
not a comprehensive material examined section, but there is a determination
label by the same person in the jar (so you know the person examined it and
identified it as part of the work that went behind the published revision),
then the collection manager could create a TNU for the determination label,
then map that determination-TNU to the publication's TNU (a
RelationshipAssertion by the third-party collection manager).

I'm not saying this ever will happen, or even should happen.  But we've
developed the data model such that it *could* happen, if it turns out to be
a useful mechanism for mapping specimens to published taxon concepts.

> As so often is the case, I think the problem here boils down to 
> identifiers and the metadata that we associate with them.  

Absolutely!!! This is often not intuitive stuff, so the trickey part is
getting people to apply identifiers and cross-link them in an appropriate
and consistent way.

> Let's say in the real-life example above, somebody (we 
> can say GNUB) assigns a persistent identifier (perhaps a 
> URI constructed from an opaque UUID) to "Juncus diffusissimus Buckl. sec
L. Urbatsch 2009".  
> We could say with an rdf:type statement that the resource 
> identified by the URI is a TNU.  We can give that resource a 
> tc:hasName property linking it to the name which is 
> represented by the string "Juncus diffusissimus Buckl.".  
> (I'm not sure what property we use to say that L. Urbatch made the
assertion).  

I'm not sure how you would do it in RDF, but if it's any help the relevant
DwC term is taxon:nameAccordingToID.

It's certainly part of the metadata for the TNU itself:

There is a TNU for diffusissimus within the Reference of Buckl.  This is the
original description of the epithet, so it is also the Protonym for
diffusissimus.  There is also a TNU for the genus name Juncus as used within
the Reference of Buckl.  If, in the same publication, Buckl. Also
established that genus, then the genus would also be the Protonym TNU.
However, the genus Juncus was established by L., so there is another TNU for
Juncus that is the protonym (Juncus L.), and the TNU for the usage of Juncus
within Buckl. Links to the TNU of Juncus L. (the Protonym).

So, that's three TNUs:
1. Juncus L. sec. L. (Protonym for the genus Juncus)
2. Juncus L. sec. Buckl. (links to 1 as ProtonymID)
3. diffusissimus Buckl. sec. Buckl. (Protonym for the species diffusissimus,
links to 2 as parent)

There are also two more TNUs linked to the Urbatsch 2009 publication:
4. Juncus L. sec. Urbatsch (links to 1 as ProtonymID)
5. diffusissimus Buckl. sec. Urbatsch (links to 3 as ProtonymID, and to 4 as
parent)

> Now let's say that L. Urbatsch publishes a paper describing in detail her 
> concept of Juncus diffusissimus Buckl.  

Do you mean the 2009 paper of "Juncus diffusissimus Buckl. sec L. Urbatsch
2009"? (In which case we have the TNUs as 4 & 5).... or do you mean a later
paper (2010)?  In which case we'd need two more TNUs. Is the "2009" thing a
specimen determination, or a publication?  It doesn't matter -- I just want
to make sure I'm following your example correctly.

> We can now assign the resource identified by the URI a 
> tc:accordingTo property whose value is the DOI of the paper she wrote.  

That would be a property of the TNU record (#4 & #5) -- assuming the paper
she wrote is the 2009 paper.

> If we want, we can replace the previous rdf:type statement 
> with different one stating that the resource is a taxon concept 
> rather than a TNU, or if we believe that all taxon concepts are 
> also TNUs we can leave the rdf:type statement that we had 
> before and just add a second one saying that the resource is 
> also of type taxon concept.  

Yes -- this space has not yet been clearly defined.  We could either say the
well-defined concept TNU *is* the concept (i.e., use the TNU GUID to
represent the concept itself).  Or, we could create another GUID for the
concept, and anchor that concept to the TNU as the "original definition" (or
whatever you want to call it).  I've been mostly focused on using GNUB TNUs
for nomenclatural things, so I haven't thought that far into it yet for
concepts.  The TNUs will certainly play a fundamental role; but it's not
clear to me whether they should be regarded as the concept, or as a property
of the concept.

> The point I'm trying to make is that as long as this "thing" 
> that we are variously calling "taxon name usage", "taxon concept", 
> "shallow taxonomic concept", or "deep taxonomic concept" 
> can be assigned an identifier, what really matters is the metadata 
> we associate with it, not really what we call it.  

Absolutely!  With emphasis on "really matters".  It's certainly important to
mint persistent identifiers, but it's equally (more?) important to make sure
we understand the "thing" the identifier represents.  We have a pretty
stable & solid definition of a TNU, that has emerged form many NOMINA
meetings over a number of years.  What I don't think has been done yet, is
any real analysis of whether the appropriate subset of TNUs accompanied by a
robust concept definition *are* the concepts (i.e., the TNU GUID is the
concept GUID); or whether there should be a separate GUID minted explicitly
for the concept, which then links back to the TNU GUID as it's "definition"
or "source" (or whatever).  I can see it working both ways.  

> The more metadata that we can connect with it, either through datatype 
> properties like name strings or object properties that describe how the
"thing" 
> is related to other resources, the "deeper" the concept.  On the other
extreme, 
> we may know nothing more than the name string.  In that case we could call

> it a "nominal concept", but we could still assign it an identifier and
maybe 
> with luck we could associate more metadata with it (make it "deeper") 
> at some point in the future.  

Yes!

> I am going to be bold and say that we already have the minimum 
> tools required to get started implementing TNUs/TaxonConcepts: 
> - URI GUIDs (which if one preferred could be UUIDs or  LSIDs -- 
> HTTP proxied to make Linked Data people happy; see the TDWG 
> GUID Applicability Statement standard if you don't know how to do this) 
> to identify the TNU/concepts, 
> - the two terms tc:hasName and tc:accordingTo (from the TDWG Taxon 
> Concept ontology) to relate the TNUs/TaxonConcepts to names and sec.
references, and 
> - some sources for name and publication URI GUIDs.  
> There are deficiencies all over the place for that last item, but 
> they can be addressed over time by improving the scope of the 
> relevant databases and the quality of the metadata provided.  
> uBio has URIs for almost every name I've ever looked for.  
> BHL has a growing collection of old literature which has been assigned 
> identifiers by  Rod Page's BioStor, new literature usually has an
assigned, 
> dereferenceable proxied DOI, and one can even make valid URIs from 
> ISBNs of books (although they aren't resolvable).  I'm not sure how one 
> should address the situation where the "sec." reference of a TNU is a 
> person and date since there isn't a standard database of people (as far as
I know).  
> But that could be remedied.  Ultimately, one could create the kinds of
mapping 
> tools that Nico and Rich are talking about which relate different taxon
concepts/TNUs 
> which have set theory relationships.  Whether that would be done with RDF,

> OWL, or something completely different I don't know, but the basic
anchoring 
> of persistent identifiers for the TNU/concepts to the names and sec.
references 
> wouldn't have to wait on that.  We could also get hung up about what terms

> to use to express the metadata describing the basic TNU/name/sec.
resources, 
> but there is nothing that says that metadata can't change or be improved
over time.  
> It's the identifier that shouldn't change.  
> 
> Am I wrong about this???

Not wrong at all, in my mind.  This is the framework we're following right
now as we start to build these things.

One comment on this bit:

" I'm not sure how one should address the situation where the "sec."
reference of a TNU is a  person and date since there isn't a standard
database of people (as far as I know)."

In the GNUB Model, Person(s) + Date = Reference.  A Reference is the link to
the "documentation" (i.e., the place where the TNU occurred).  A Reference
can be (and usually is) a publication, but it can also be a notebook,
specimen label, etc., etc.  All References have "Authors" and a date, so a
Reference yields Person(s) + Date.  Think of it like a determination label
for a specimen being a micro-publication; and then everything else is easy
to model.

So, regardless of whether "Urbatsch 2009" refers to a determination label or
a published monograph, it's still the basis for the TNUs 4 & 5 in the
examples above.

Aloha,
Rich