Re: [tdwg-tag] [tdwg-rdf: 105] Re: Any TCS users with experiences to report?
I read Rich's email as quoted in Nico's reply - I think maybe Rich's post didn't actually go out on the tdwg-tag or RDF group lists. Rich mentions that he is swamped and will reply later. For the moment it may be helpful to cite an earlier email of Rich's which it took me some time to dig out of the tdwg-content email list:
http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001703.html
In that post, Rich was responding to a thread that started when I asked how one would handle a real-life situation (the specimen pictured in http://images.cyberfloralouisiana.com/images/specimensheets/lsu/0/0/4/28/LSU...). The relevant part begins about half way down the page with "In the web example given by Steve, we have... ". In that section, Rich notes that
"Eventually, a third party may be able to deduce (perhaps through a suite of other, external information) a RelationshipAssertion that maps the TNU "[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" to some other, perhaps published and well-defined taxon concept (of the same or different name). Also, if there are 100 specimens in the collection that L. Urbatsch identified as "Juncus diffusissimus Buckl." in 2009, then anchoring all 100 Identification instances to the one TNU, allows all of those specimens to inherit the mapping of the one "[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" TNU instance to some other better-defined taxon concept."
From that post, I understood that a TNU (a.k.a. "assertion" in Pyle 2004 http://systbio.org/files/phyloinformatics/1.pdf) can be as vague as an idea that some determiner had in his/her head about how organism/specimen instances should be mapped to a name. I think from what Rich said there that there is the potential that we as metadata aggregators may at some later point be able to map how that idea in the determiner's head fits in with a more well-defined (e.g. published) taxon description which one may choose to call a taxon concept rather than a TNU.
As so often is the case, I think the problem here boils down to identifiers and the metadata that we associate with them. Let's say in the real-life example above, somebody (we can say GNUB) assigns a persistent identifier (perhaps a URI constructed from an opaque UUID) to "Juncus diffusissimus Buckl. sec L. Urbatsch 2009". We could say with an rdf:type statement that the resource identified by the URI is a TNU. We can give that resource a tc:hasName property linking it to the name which is represented by the string "Juncus diffusissimus Buckl.". (I'm not sure what property we use to say that L. Urbatch made the assertion). Now let's say that L. Urbatsch publishes a paper describing in detail her concept of Juncus diffusissimus Buckl. We can now assign the resource identified by the URI a tc:accordingTo property whose value is the DOI of the paper she wrote. If we want, we can replace the previous rdf:type statement with different one stating that the resource is a taxon concept rather than a TNU, or if we believe that all taxon concepts are also TNUs we can leave the rdf:type statement that we had before and just add a second one saying that the resource is also of type taxon concept.
The point I'm trying to make is that as long as this "thing" that we are variously calling "taxon name usage", "taxon concept", "shallow taxonomic concept", or "deep taxonomic concept" can be assigned an identifier, what really matters is the metadata we associate with it, not really what we call it. The more metadata that we can connect with it, either through datatype properties like name strings or object properties that describe how the "thing" is related to other resources, the "deeper" the concept. On the other extreme, we may know nothing more than the name string. In that case we could call it a "nominal concept", but we could still assign it an identifier and maybe with luck we could associate more metadata with it (make it "deeper") at some point in the future.
Returning to the original question of the thread (which was about the utility of TCS), TCS tries to deal with this problem using a thing called "signatures" (section 17.2, see http://bioimages.vanderbilt.edu/pages/TCS-Schema-UserGuide-v1.3.pdf) which are a somewhat crude attempt to make identifying strings unique by standardizing their format. However, TCS was written in 2005-2006. Since then, the development of DOIs, the TDWG GUID Applicability Statement standard, and best practices in the Linked Data world have provided well-established and standardized ways to create persistent and dereferenciable identifiers. So there isn't any reason I can see why we can't use them.
I am going to be bold and say that we already have the minimum tools required to get started implementing TNUs/TaxonConcepts: - URI GUIDs (which if one preferred could be UUIDs or LSIDs -- HTTP proxied to make Linked Data people happy; see the TDWG GUID Applicability Statement standard if you don't know how to do this) to identify the TNU/concepts, - the two terms tc:hasName and tc:accordingTo (from the TDWG Taxon Concept ontology) to relate the TNUs/TaxonConcepts to names and sec. references, and - some sources for name and publication URI GUIDs. There are deficiencies all over the place for that last item, but they can be addressed over time by improving the scope of the relevant databases and the quality of the metadata provided. uBio has URIs for almost every name I've ever looked for. BHL has a growing collection of old literature which has been assigned identifiers by Rod Page's BioStor, new literature usually has an assigned, dereferenceable proxied DOI, and one can even make valid URIs from ISBNs of books (although they aren't resolvable). I'm not sure how one should address the situation where the "sec." reference of a TNU is a person and date since there isn't a standard database of people (as far as I know). But that could be remedied. Ultimately, one could create the kinds of mapping tools that Nico and Rich are talking about which relate different taxon concepts/TNUs which have set theory relationships. Whether that would be done with RDF, OWL, or something completely different I don't know, but the basic anchoring of persistent identifiers for the TNU/concepts to the names and sec. references wouldn't have to wait on that. We could also get hung up about what terms to use to express the metadata describing the basic TNU/name/sec. resources, but there is nothing that says that metadata can't change or be improved over time. It's the identifier that shouldn't change.
Am I wrong about this???
Steve
Nico Franz wrote:
Thank you, Rich.
So we seem to agree on something like this:
Rich Nico taxon name usage <===> "shallow" taxonomic concept taxon concept <===> "deep" taxonomic concept
Both: labeling is via name sec. author Both: authoring concepts/usages vs. identifying to those => slippery issue; ideally requires proper speaker awareness.
Why the latter? - well, because (again) the desirable effect of using concepts - the desirable situation where these would have a justification that goes beyond just really meticulous data management and advances to the level of facilitating better science qua more precise taxonomic semantics - only obtains if a great number of name occurrences in a wide range of shallow-ish sources is linked via identification to a presumably smaller number of occurrences where those names are well defined and where successive definitions of names are semantically linked. So there needs to be an emerging culture of minimizing concept inflation. Otherwise we obtain what we have now (mostly just names) and on top of that add new baggage (lots of really shallow concepts) that nobody can do good semantics with.
Here is where I think we disagree, perhaps just in terms of sales strategy:
You seem to suggest that making an a priori distinction between TNUs and concepts is (1) possible in a good number of cases, (2) is desirable perhaps in the form of registry, and (3) even necessary for building and populating databases, etc.
Here I disagree, for a number of reasons. First off I do believe that not defining certain things too soon or too narrowly is sometimes actually really good science and on the other hand, doing so can be a show stopper if other people don't share this narrowness and find it limiting. Second, while we can perhaps readily agree that a lengthy monograph published in American Museum Novitates rises to the level of authoring next concepts whereas a label saying "Family Carabidae" on a specimen submitted as part of an insect student collection does not, there are enough in-between cases where only time will tell.
Example: USDA Plants promotes a particular perspective of groundcherry taxonomy, genus-level concept Physalis - http://plants.usda.gov/java/profile?symbol=physa - with some 29 species-level concepts recognized. ASU's herbarium curator Les Landrum is a bit of a groundcherry nerd (I say this with admiration). If you go here: http://swbiodiversity.org/seinet/index.php, then Search Collections => Select All => Next => Scientific Name = Physalis => Search, you get some 3700 pertinent specimen records. If you then switch to the Species List tab, you see 115 concept listed overall. Switching to the USDA Plants Thesaurus will give you only 46 concepts that these 3700 specimens are mapped to. Using instead the ASU Taxonomic Thesaurus will yield 89 concepts linking variously to those specimens. This is based on Les' classification of groundcherries which is not further documented in the SEINet environment at this moment.
Now, saying a a priori whether Les' list represents a set of TNUs versus concepts would presumably require you to assert that there is nobody who is Les or very much like him that can provide a semantically very accurate mapping of the 89 name usages in the SEINet-ASU Physalis list to the much more thoroughly circumscribed USDA Plants concepts. That could seem like a daring prediction given how little Les might think of the USDA perspective. At the very moment that Les or someone very much like him DOES provide the mapping, what looked like a list of TNUs then all of a sudden acquires - via the mapping - a much deeper semantic status where others can readily go from one classification to the next, even though each come with very different amounts of information in their original appearances. Some people may prefer Les' concepts at least for Arizonan groundcherries, and in either case, the mapping put both on an even playing field.
So this exemplifies IMO why so far the concept approach has been too abstract, the TCN has been too depauperate on the relationships/mapping side (worrying instead almost needlessness about what constitutes a concept per se), and definitions between identifications, name usages, shallow, deep concepts have been too abstract as well. I believe we should focus less discussion on those issues and more emphasis on building mapping tools that can carry a wide range of input and logically infer additional implied mappings from the initial expert-given set. The actual semantic properties of that input will emerge a posteriori and will be hard to predict in some cases. Some descriptions are lengthy but nobody understands them. Some names lists are profoundly informative if the context of their origin is well known to an expert.
There will be some obvious overreaches in both directions (too many unconnected items, some items that are connected more precisely than their inherent information would seem to justify). I think these overreaches would be tolerable. What's less productive to me is a restrictive set of definitions that provide an early blockage in they way towards an environment where mapping is supposed to occur very frequently. We're not at the registry stage yet. More at the "can this work in principle" stage. As I mentioned before, the mappings ARE the concepts under a certain viewpoint. We don't want to pre-determine their fate by separating TNUs from concepts in a great number of cases.
I hope this was not a misrepresentation of your view and also a clarification of my view. In the end, we both advocate some sort of balance for the same concerns, but perhaps disagree only strategically about the moment where/when that balance will materialize - upfront via precise definitions and registration or later on via the presence/lack of actual mappings.
Best,
Nico
On Mon, Nov 26, 2012 at 5:18 PM, Richard Pyle <deepreef@bishopmuseum.org mailto:deepreef@bishopmuseum.org> wrote:
I want to get into this topic in more detail (going back to Steve’s original post), but this week is hell-week for me, so only a quick comment now. I generally agree with everything Nico says, but I think we need to be a little more clear of what we mean by “name sec. author” The core unit of the data model we’ve been building towards (GNUB, which underlies ZooBank) uses as its fundamental unit something we’ve been calling a “Taxon Name Usage Instance” (TNU). The scope of what can be a TNU is intentionally very broad – anything from an original taxon name description, to a mention in a newspaper article, and potentially even a scribbled hand-written label or letter. The only requirement is that it be static – that is, a snapshot in time. I mention this because database records can be represented as TNUs, but only as a static snapshot of the record. If the essence of the database record changes over time (e.g., due to changing taxonomic opinion), then a new TNU is generated for a different snapshot in time. A very small subset of the universe of TNUs represent Code-governed Nomenclatural Acts (original descriptions of new names and other code-governed nomenclatural actions). In the case of such TNUs involving the ICZN Code (for example), the TNUs are registered in ZooBank. But the point is, one subset of all TNUs are those that involve actions governed by a Code of nomenclature. The reason I mention this is that, if I read Nico’s email correctly, I think he’s saying that not all TNUs de-facto represent taxon concepts. Rather, analogous to the nomenclatural subset of TNUs, there is a subset of TNUs that rise to the level of representing Taxon Concept definitions. In the case of nomenclatural acts, someone must make some sort of declaration (assertion) that a specific TNU constitutes a Code-governed nomenclatural act, along with relevant metadata relating to that assertion and the nature of the Act. In the case of zoological names, ZooBank is intended to facilitate this role (i.e., when a person registers a TNU in ZooBank, there is an implied assertion that the TNU represents a nomenclatural act under the ICZN Code). What would be nice to have (and what TDWG could play a helpful role in facilitating), is a registry of sorts (analogous to ZooBank) for those TNUs that represent taxon concepts. In other words, a mechanism for people to “register” the subset of all TNUs that represent taxon concepts. Secondarily, there would also be a mechanism to make assertions about how registered taxon concepts map to each other (via some sort of set theory relationship[s]). In summary, my points are 1) We should be clear when we say “name sec. author” whether we mean it sensu lato (i.e., all TNUs); or sensu stricto (i.e., only those TNUs that rise to the level of representing taxon concepts). 2) There ought to be a registry (perhaps administered by CoL?) for identifying the subset of TNUs that represent concept definitions, and it should include a mechanism for making set-theory relationship assertions among registered concept-TNUs. 3) The two things mentioned in #2 should be separate; that is, one can assert that a particular TNU represents a taxon concept separately from (potentially multiple) assertions about how that taxon concept relates to other taxon concepts. More later. Aloha, Rich P.S By my standards that WAS quick!
In that section, Rich notes that
"Eventually, a third party may be able to deduce (perhaps through a suite
of
other, external information) a RelationshipAssertion that maps the TNU "[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" to some other,
perhaps
published and well-defined taxon concept (of the same or different name). Also, if there are 100 specimens in the collection that L. Urbatsch identified as "Juncus diffusissimus Buckl." in 2009, then anchoring all
100
Identification instances to the one TNU, allows all of those specimens to inherit the mapping of the one "[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" TNU instance to some other better-defined taxon concept."
From that post, I understood that a TNU (a.k.a. "assertion" in Pyle 2004
http://systbio.org/files/phyloinformatics/1.pdf)
can be as vague as an idea that some determiner had in his/her head about how organism/specimen instances should be mapped to a name.
Yes, a TNU can be that simple. Basically, a TNU exists whenever someone documents a scientific name. That doesn't mean that all of these will be entered into a database. But the intention was to scope TNUs to be wide open to any form of documentation (including a specimen label), in case someone has a need to record the fact that somebody used a particular name in a particular way (useful, for example, if you want to track so-called "manuscript names" that only exist on specimen labels).
The problem is, if you have a broad scope for what a TNU can be, then there is no guarantee that the TNU is accompanied by a taxon concept definition. Certainly a specimen label is not (it applies to only one specimen). Neither is, for example, a published type catalog (which often records names, without implied concepts). And then there are things like newspaper articles, where perhaps there is a concept somewhere in there, but it's too ambiguous to map to pin down to any particular concept. However, some TNUs (e.g., treatments in revisionary monographs) certainly are accompanied by well-defined concept definitions. And those are the ones that we'd like to see become "anchor-points" to taxon concepts, against which the other (non-concept-bearing) TNUs can be mapped (as asserted by a third party; or in some cases by the first party).
I think from what Rich said there that there is the potential that we as metadata aggregators may at some later point be able to map how that idea in the determiner's head fits in with a more well-defined (e.g. published) taxon description which one may choose to call a taxon concept rather than a TNU.
That was certainly the intention, yes. Whether or not that will ever happen with most (many?) specimens remains to be seen. Certainly it can happen in some cases. For example, in our collection we often have visitors come through and study our holdings of a particular genus or a particular family. Often they will make determinations about the specimens they examine. Sometime later, they'll published a revision of the group. If they include a comprehensive "material examined" section, then there's an explicit relationship between the specimen and the publication that allows direct mapping of the specimen to the well-defined concept. However, if there is not a comprehensive material examined section, but there is a determination label by the same person in the jar (so you know the person examined it and identified it as part of the work that went behind the published revision), then the collection manager could create a TNU for the determination label, then map that determination-TNU to the publication's TNU (a RelationshipAssertion by the third-party collection manager).
I'm not saying this ever will happen, or even should happen. But we've developed the data model such that it *could* happen, if it turns out to be a useful mechanism for mapping specimens to published taxon concepts.
As so often is the case, I think the problem here boils down to identifiers and the metadata that we associate with them.
Absolutely!!! This is often not intuitive stuff, so the trickey part is getting people to apply identifiers and cross-link them in an appropriate and consistent way.
Let's say in the real-life example above, somebody (we can say GNUB) assigns a persistent identifier (perhaps a URI constructed from an opaque UUID) to "Juncus diffusissimus Buckl. sec
L. Urbatsch 2009".
We could say with an rdf:type statement that the resource identified by the URI is a TNU. We can give that resource a tc:hasName property linking it to the name which is represented by the string "Juncus diffusissimus Buckl.". (I'm not sure what property we use to say that L. Urbatch made the
assertion).
I'm not sure how you would do it in RDF, but if it's any help the relevant DwC term is taxon:nameAccordingToID.
It's certainly part of the metadata for the TNU itself:
There is a TNU for diffusissimus within the Reference of Buckl. This is the original description of the epithet, so it is also the Protonym for diffusissimus. There is also a TNU for the genus name Juncus as used within the Reference of Buckl. If, in the same publication, Buckl. Also established that genus, then the genus would also be the Protonym TNU. However, the genus Juncus was established by L., so there is another TNU for Juncus that is the protonym (Juncus L.), and the TNU for the usage of Juncus within Buckl. Links to the TNU of Juncus L. (the Protonym).
So, that's three TNUs: 1. Juncus L. sec. L. (Protonym for the genus Juncus) 2. Juncus L. sec. Buckl. (links to 1 as ProtonymID) 3. diffusissimus Buckl. sec. Buckl. (Protonym for the species diffusissimus, links to 2 as parent)
There are also two more TNUs linked to the Urbatsch 2009 publication: 4. Juncus L. sec. Urbatsch (links to 1 as ProtonymID) 5. diffusissimus Buckl. sec. Urbatsch (links to 3 as ProtonymID, and to 4 as parent)
Now let's say that L. Urbatsch publishes a paper describing in detail her concept of Juncus diffusissimus Buckl.
Do you mean the 2009 paper of "Juncus diffusissimus Buckl. sec L. Urbatsch 2009"? (In which case we have the TNUs as 4 & 5).... or do you mean a later paper (2010)? In which case we'd need two more TNUs. Is the "2009" thing a specimen determination, or a publication? It doesn't matter -- I just want to make sure I'm following your example correctly.
We can now assign the resource identified by the URI a tc:accordingTo property whose value is the DOI of the paper she wrote.
That would be a property of the TNU record (#4 & #5) -- assuming the paper she wrote is the 2009 paper.
If we want, we can replace the previous rdf:type statement with different one stating that the resource is a taxon concept rather than a TNU, or if we believe that all taxon concepts are also TNUs we can leave the rdf:type statement that we had before and just add a second one saying that the resource is also of type taxon concept.
Yes -- this space has not yet been clearly defined. We could either say the well-defined concept TNU *is* the concept (i.e., use the TNU GUID to represent the concept itself). Or, we could create another GUID for the concept, and anchor that concept to the TNU as the "original definition" (or whatever you want to call it). I've been mostly focused on using GNUB TNUs for nomenclatural things, so I haven't thought that far into it yet for concepts. The TNUs will certainly play a fundamental role; but it's not clear to me whether they should be regarded as the concept, or as a property of the concept.
The point I'm trying to make is that as long as this "thing" that we are variously calling "taxon name usage", "taxon concept", "shallow taxonomic concept", or "deep taxonomic concept" can be assigned an identifier, what really matters is the metadata we associate with it, not really what we call it.
Absolutely! With emphasis on "really matters". It's certainly important to mint persistent identifiers, but it's equally (more?) important to make sure we understand the "thing" the identifier represents. We have a pretty stable & solid definition of a TNU, that has emerged form many NOMINA meetings over a number of years. What I don't think has been done yet, is any real analysis of whether the appropriate subset of TNUs accompanied by a robust concept definition *are* the concepts (i.e., the TNU GUID is the concept GUID); or whether there should be a separate GUID minted explicitly for the concept, which then links back to the TNU GUID as it's "definition" or "source" (or whatever). I can see it working both ways.
The more metadata that we can connect with it, either through datatype properties like name strings or object properties that describe how the
"thing"
is related to other resources, the "deeper" the concept. On the other
extreme,
we may know nothing more than the name string. In that case we could call
it a "nominal concept", but we could still assign it an identifier and
maybe
with luck we could associate more metadata with it (make it "deeper") at some point in the future.
Yes!
I am going to be bold and say that we already have the minimum tools required to get started implementing TNUs/TaxonConcepts:
- URI GUIDs (which if one preferred could be UUIDs or LSIDs --
HTTP proxied to make Linked Data people happy; see the TDWG GUID Applicability Statement standard if you don't know how to do this) to identify the TNU/concepts,
- the two terms tc:hasName and tc:accordingTo (from the TDWG Taxon
Concept ontology) to relate the TNUs/TaxonConcepts to names and sec.
references, and
- some sources for name and publication URI GUIDs.
There are deficiencies all over the place for that last item, but they can be addressed over time by improving the scope of the relevant databases and the quality of the metadata provided. uBio has URIs for almost every name I've ever looked for. BHL has a growing collection of old literature which has been assigned identifiers by Rod Page's BioStor, new literature usually has an
assigned,
dereferenceable proxied DOI, and one can even make valid URIs from ISBNs of books (although they aren't resolvable). I'm not sure how one should address the situation where the "sec." reference of a TNU is a person and date since there isn't a standard database of people (as far as
I know).
But that could be remedied. Ultimately, one could create the kinds of
mapping
tools that Nico and Rich are talking about which relate different taxon
concepts/TNUs
which have set theory relationships. Whether that would be done with RDF,
OWL, or something completely different I don't know, but the basic
anchoring
of persistent identifiers for the TNU/concepts to the names and sec.
references
wouldn't have to wait on that. We could also get hung up about what terms
to use to express the metadata describing the basic TNU/name/sec.
resources,
but there is nothing that says that metadata can't change or be improved
over time.
It's the identifier that shouldn't change.
Am I wrong about this???
Not wrong at all, in my mind. This is the framework we're following right now as we start to build these things.
One comment on this bit:
" I'm not sure how one should address the situation where the "sec." reference of a TNU is a person and date since there isn't a standard database of people (as far as I know)."
In the GNUB Model, Person(s) + Date = Reference. A Reference is the link to the "documentation" (i.e., the place where the TNU occurred). A Reference can be (and usually is) a publication, but it can also be a notebook, specimen label, etc., etc. All References have "Authors" and a date, so a Reference yields Person(s) + Date. Think of it like a determination label for a specimen being a micro-publication; and then everything else is easy to model.
So, regardless of whether "Urbatsch 2009" refers to a determination label or a published monograph, it's still the basis for the TNUs 4 & 5 in the examples above.
Aloha, Rich
To try and get my head around the various models of taxon names, usages, and concepts being discussed I've created a graph of my understanding of the data model underlying ZooBank http://iphylo.blogspot.co.uk/2012/11/zoobank-data-model.html . This may or may not reflect the actual situation, I'll leave that to Rich to comment on.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
Oi Vey!
OK, Im not sure whether its best to respond to the email list, or comment on the post, so Ill do both.
First . There technically isnt a ZooBank data model. ZooBank isnt a database, its a service build on the Global Names Usage Bank (GNUB) database. The GNUB database is MUCH broader in scope than ZooBank. ZooBank is only concerned with the specific subset of TNUs that involve nomenclatural acts as governed by the ICZN Code. There are many, many, many more TNUs that are not nomenclatural acts, and/or involve names outside the scope of Zoology.
Second, like many other database projects, weve focused our available time much more on doing, rather than documenting. However, its becoming increasingly clear that we need much more documentation about the GNUB data model, and Ill try to bump that task up the priority list for the coming weeks. For now, Ive uploaded four images to the ZooBank server showing the table relationships:
http://zoobank.org/images/TaxonNameUsageCluster.jpg http://zoobank.org/images/ReferenceCluster.jpg http://zoobank.org/images/AgentCluster.jpg http://zoobank.org/images/CoreTables.jpg
The first of these will be the most helpful to this discussion; but the others are of potential interest. Also, this is the data model as it now stands. Following a very productive meeting last April, there is a draft new data model that is mostly the same, but adds more capability to capture specific details about name usages. But the general data model remains the same.
Now, on to Rods post:
1) Nice choice of the example species!
2) The graph looks mostly right, but its hard in some cases to figure out what labels go with what arrows. For example, the protonym in the upper-left corner of the image seems to apply to the vertical solid line connecting the top-left oval (which should be labeled "Belonoperca Fowler & Bean, 1930 sensu Eschmeyer 2004 "), and the other "protonym" applies to the recursive link for the protonym itself. Similarly, the parentusageuuid off to the left applies to the vertical arrow from Belonoperca F&B to Serranidae sensu F&B. It took me a while to figure out what was going on with the labeling of arrows.
3) Another issue with the graph is that the dotted lines from the three "sensu" TNUs to the respective original description publications are not links that actually exist in the data model, so it seems inappropriate to represent them in the graph.
4) While it's fair to say that the graph may look "to look a tad complicated" -- is it any more complicated than it needs to be? After you get rid of the superfluous dashed lines from the "sensu" usages to the original publications, what specific pieces of information represented do you feel are not necessary to reflect taxonomic information? Taxonomy is, after all, a tad complicated in how it has worked over the past 250 years. I think part of the reason why many of the myriad existing databases don't quite fulfill the overall needs is that they aren't quite complicated enough. In other words, too many databases take too many shortcuts in representing information, thereby reducing the overall utility.
I have a major report due on Friday, so I can't respond in more detail now; but I will be glad to address any points of confusion or elaborate more completely on how the GNUB data model is structured (and why it is structured the way it is) next week.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control.
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Wednesday, November 28, 2012 1:04 AM To: Richard Pyle Cc: 'Steve Baskauf'; tdwg-rdf@googlegroups.com; Tony.Rees@csiro.au; pmurray@anbg.gov.au; Simon.Pigot@csiro.au; J.Kennedy@napier.ac.uk; eotuama@gbif.org; tdwg-tag@lists.tdwg.org; 'David Patterson' Subject: Re: [tdwg-rdf: 105] Re: [tdwg-tag] Any TCS users with experiences to report?
To try and get my head around the various models of taxon names, usages, and concepts being discussed I've created a graph of my understanding of the data model underlying ZooBank http://iphylo.blogspot.co.uk/2012/11/zoobank-data-model.html . This may or may not reflect the actual situation, I'll leave that to Rich to comment on.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
I need to unload some taxonomic bugbears even if I might be fundamentally wrong about some things.
1. I hate, and have always hated, the term "taxon concept." It implies that the distinction between a taxon name and a concept is somehow special. Is it inaccurate to think of names as the "word"s used in taxonomy and concepts are the "definitions". Geographical names have similar issues. Don't many regular words themselves suffer this 'concept' problem? Even the word 'concept' suffers the concept problem. [1] I personally prefer to use the term 'taxon definition' when describing the issue to anyone who doesn't read this list. It keeps their yes from glazing over. It's easier for me to digest too.
2. I get very anxious when I get involved in discussions on "sensu" this and "name-usage" that which subsequently link to taxon references that don't actually resolve to anything resembling the actual definition of a taxon! It seems madness to have nearly any use of a taxon be a distinct concept because someone might have a different thought in their head about what the 'definition' is.
3. My opinion is that only distinct concepts should be distinctly identified. What distinguishes a distinct taxonomic concept? One that provides evidence of its circumscription[2] or, more succinctly, a "definition".
4. ideally the elements used to define a taxon circumscription would be something machine-comparable so you could automagically compare concepts and infer relationships. It seems most concept-to-concept mappings I've come across, on the other hand, are based on the non-evidence of someone saying A is broader than B and simply recording it that way.
5. What elements can be used to describe the circumscription of a taxon? I've come across: a. specimens - I believe this is what Jessie and Martin Pullen, et al. preferred in the Prometheus work [3]. b. Types/Original descriptions - I like this way of doing things. In my opinion it provides the most flexibility for using taxonomy to improve discovery and access to species information. c. Morphological characters - Fundamentally I think these must actually be resolved to (a) or (b) above, don't they? Seems useful though. d. Geospatial distribution - iNaturalist does a nice job of distinguishing concepts this way but I don't believe it's actually the definition. What a pain to make machine-comparable too. e. Synonymy - This seems to be how some define their taxa but without a clear distinction between taxonomy and nomenclature I don't see how it can work. Adding a nomenclatural synonym changes nothing regarding the circumscription. So fundamentally this approach has to tie names back to (b) above. At least if I know that my definition of Gorilla gorilla includes the types for berengei and graueri, and you think they are all distinct than my Gorilla is broader than yours. f. DNA - I really don't know anything about that.
6. Lastly, I worry a bit about using the hierarchical ancestry of a taxon as part of its definition. For species I suppose the parent taxon could be considered an attribute of the concept/definition. But for higher taxa it makes no sense at all.
Anyway, it's off my chest. I bring it all up for two reasons. I've rarely come across a term used more ambiguously than the term "taxon concept" and subsequently I've been very nervous about assessing different approaches to modeling them.
best from Woods Hole Dave Remsen
[1] http://www.thefreedictionary.com/concept [2] The "word circumscription," itself has at least four senses. http://www.thefreedictionary.com/circumscription - I prefer circumscription sensu 1. [3] Raguenaud, C., Pullan, M.R., Watson, M., Kennedy, J., Newman, M., Barclay, P. (2002). Implementation of the Prometheus Taxonomic Model: a comparison of database systems. Taxon, 51(1), 131-142.
On Nov 28, 2012, at 2:28 PM, Richard Pyle wrote:
Oi Vey!
OK, I’m not sure whether it’s best to respond to the email list, or comment on the post, so I’ll do both.
First…. There technically isn’t a “ZooBank data model”. ZooBank isn’t a database, it’s a service build on the Global Names Usage Bank (GNUB) database. The GNUB database is MUCH broader in scope than ZooBank. ZooBank is only concerned with the specific subset of TNUs that involve nomenclatural acts as governed by the ICZN Code. There are many, many, many more TNUs that are not nomenclatural acts, and/or involve names outside the scope of Zoology.
Second, like many other database projects, we’ve focused our available time much more on “doing”, rather than documenting. However, it’s becoming increasingly clear that we need much more documentation about the GNUB data model, and I’ll try to bump that task up the priority list for the coming weeks. For now, I’ve uploaded four images to the ZooBank server showing the table relationships:
http://zoobank.org/images/TaxonNameUsageCluster.jpg http://zoobank.org/images/ReferenceCluster.jpg http://zoobank.org/images/AgentCluster.jpg http://zoobank.org/images/CoreTables.jpg
The first of these will be the most helpful to this discussion; but the others are of potential interest. Also, this is the data model as it now stands. Following a very productive meeting last April, there is a draft new data model that is mostly the same, but adds more capability to capture specific details about name usages. But the general data model remains the same.
Now, on to Rod’s post:
Nice choice of the example species!
The graph looks mostly right, but it’s hard in some cases to figure out
what labels go with what arrows. For example, the “protonym” in the upper-left corner of the image seems to apply to the vertical solid line connecting the top-left oval (which should be labeled "Belonoperca Fowler & Bean, 1930 sensu Eschmeyer 2004 "), and the other "protonym" applies to the recursive link for the protonym itself. Similarly, the parentusageuuid off to the left applies to the vertical arrow from Belonoperca F&B to Serranidae sensu F&B. It took me a while to figure out what was going on with the labeling of arrows.
- Another issue with the graph is that the dotted lines from the three
"sensu" TNUs to the respective original description publications are not links that actually exist in the data model, so it seems inappropriate to represent them in the graph.
- While it's fair to say that the graph may look "to look a tad
complicated" -- is it any more complicated than it needs to be? After you get rid of the superfluous dashed lines from the "sensu" usages to the original publications, what specific pieces of information represented do you feel are not necessary to reflect taxonomic information? Taxonomy is, after all, a tad complicated in how it has worked over the past 250 years. I think part of the reason why many of the myriad existing databases don't quite fulfill the overall needs is that they aren't quite complicated enough. In other words, too many databases take too many shortcuts in representing information, thereby reducing the overall utility.
I have a major report due on Friday, so I can't respond in more detail now; but I will be glad to address any points of confusion or elaborate more completely on how the GNUB data model is structured (and why it is structured the way it is) next week.
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control.
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Wednesday, November 28, 2012 1:04 AM To: Richard Pyle Cc: 'Steve Baskauf'; tdwg-rdf@googlegroups.com; Tony.Rees@csiro.au; pmurray@anbg.gov.au; Simon.Pigot@csiro.au; J.Kennedy@napier.ac.uk; eotuama@gbif.org; tdwg-tag@lists.tdwg.org; 'David Patterson' Subject: Re: [tdwg-rdf: 105] Re: [tdwg-tag] Any TCS users with experiences to report?
To try and get my head around the various models of taxon names, usages, and concepts being discussed I've created a graph of my understanding of the data model underlying ZooBank http://iphylo.blogspot.co.uk/2012/11/zoobank-data-model.html . This may or may not reflect the actual situation, I'll leave that to Rich to comment on.
Regards
Rod
Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
To try and get my head around the various models of taxon names, usages, and concepts being discussed I've created a graph of my understanding of the data model underlying ZooBank http://iphylo.blogspot.co.uk/2012/11/zoobank-data-model.html . This may or may not reflect the actual situation, I'll leave that to Rich to comment on.
Regards
Rod
--------------------------------------------------------- Roderic Page Professor of Taxonomy Institute of Biodiversity, Animal Health and Comparative Medicine College of Medical, Veterinary and Life Sciences Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 Skype: rdmpage Facebook: http://www.facebook.com/rdmpage Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
Excerpts from what Richard Pyle wrote and responses:
I'm not saying this ever will happen, or even should happen. But we've developed the data model such that it *could* happen, if it turns out to be a useful mechanism for mapping specimens to published taxon concepts.
As so often is the case, I think the problem here boils down to identifiers and the metadata that we associate with them.
Absolutely!!! This is often not intuitive stuff, so the trickey part is getting people to apply identifiers and cross-link them in an appropriate and consistent way.
As a person who isn't really sure if he believes in RDF (and the co-convener of the RDF Task Group) and as a person who finds this whole conversation more annoying than about anything else he can think of (but who is actually trying to walk this walk with his metadata), I'm saying that we are there. HTTP URIs are not THE way to create identifiers but they are A way to create identifiers that demonstrably work. RDF is not THE way to cross link identifiers but they are A way to cross link identifiers. History is full of examples where a way of doing things that wasn't the best won out because it was there when needed. (I'm thinking typewriter keyboard.)
Let's say in the real-life example above, somebody (we can say GNUB) assigns a persistent identifier (perhaps a URI constructed from an opaque UUID) to "Juncus diffusissimus Buckl. sec
L. Urbatsch 2009".
We could say with an rdf:type statement that the resource identified by the URI is a TNU. We can give that resource a tc:hasName property linking it to the name which is represented by the string "Juncus diffusissimus Buckl.". (I'm not sure what property we use to say that L. Urbatch made the
assertion).
I'm not sure how you would do it in RDF, but if it's any help the relevant DwC term is taxon:nameAccordingToID.
This is a major part of why I'm interested in this conversation. All of the dwc: "ID" terms are flawed for use in RDF for technical reasons that I've described elsewhere (see http://code.google.com/p/tdwg-rdf/wiki/Beginners4Vocabularies#4.6._The_Darwi... http://code.google.com/p/tdwg-rdf/wiki/DublinCore#1.3.2.4._dcterms:identifie...) and http://code.google.com/p/tdwg-rdf/wiki/DublinCore#2.4._Possible_courses_of_a... if you care about this). I believe that tc:accordingTo would be a fairly exact equivalent to dwc:nameAccordingToID which does not have the subproperty problems the DwC ID terms have in RDF. So the question in my mind (and I think a question posed explicitly earlier in this thread) is whether enough of the technical stuff you want to do with TNUs can be done with the TDWG Taxon Concept ontology which is based on a ratified TDWG standard (TCS).
It's certainly part of the metadata for the TNU itself:
There is a TNU for diffusissimus within the Reference of Buckl. This is the original description of the epithet, so it is also the Protonym for diffusissimus. There is also a TNU for the genus name Juncus as used within the Reference of Buckl. If, in the same publication, Buckl. Also established that genus, then the genus would also be the Protonym TNU. However, the genus Juncus was established by L., so there is another TNU for Juncus that is the protonym (Juncus L.), and the TNU for the usage of Juncus within Buckl. Links to the TNU of Juncus L. (the Protonym).
So, that's three TNUs:
- Juncus L. sec. L. (Protonym for the genus Juncus)
- Juncus L. sec. Buckl. (links to 1 as ProtonymID)
- diffusissimus Buckl. sec. Buckl. (Protonym for the species diffusissimus,
links to 2 as parent)
There are also two more TNUs linked to the Urbatsch 2009 publication: 4. Juncus L. sec. Urbatsch (links to 1 as ProtonymID) 5. diffusissimus Buckl. sec. Urbatsch (links to 3 as ProtonymID, and to 4 as parent)
Now let's say that L. Urbatsch publishes a paper describing in detail her concept of Juncus diffusissimus Buckl.
Do you mean the 2009 paper of "Juncus diffusissimus Buckl. sec L. Urbatsch 2009"? (In which case we have the TNUs as 4 & 5).... or do you mean a later paper (2010)? In which case we'd need two more TNUs. Is the "2009" thing a specimen determination, or a publication? It doesn't matter -- I just want to make sure I'm following your example correctly.
The 2009 "thing" is something that L. Urbatsh had in his head - the idea of how he thought the name "Juncus diffusissimus Buckl." should be applied to real organisms and the specimens that come from them. We don't know what that idea is but presumably he had one and we could assign a persistent identifier to it and describe it using the string "Juncus diffusissimus Buckl. sec L. Urbatsch 2009". I was thinking that was what you meant by TNU. Whether or not it is better to proliferate a bunch of additional TNU instances, assign them their own identifiers, and relate them to each other is a technical detail that as an end user I'm happy to let you take care of within your GNUB system. End users want a persistent identifier of some sort to link identification instances to. They will hopefully find it through a user-friendly interface that hides the ugly details you are describing here.
[some quotes omitted for brevity]
The point I'm trying to make is that as long as this "thing" that we are variously calling "taxon name usage", "taxon concept", "shallow taxonomic concept", or "deep taxonomic concept" can be assigned an identifier, what really matters is the metadata we associate with it, not really what we call it.
Absolutely! With emphasis on "really matters". It's certainly important to mint persistent identifiers, but it's equally (more?) important to make sure we understand the "thing" the identifier represents. We have a pretty stable & solid definition of a TNU, that has emerged form many NOMINA meetings over a number of years. What I don't think has been done yet, is any real analysis of whether the appropriate subset of TNUs accompanied by a robust concept definition *are* the concepts (i.e., the TNU GUID is the concept GUID); or whether there should be a separate GUID minted explicitly for the concept, which then links back to the TNU GUID as it's "definition" or "source" (or whatever). I can see it working both ways.
I looked at the TDWG Taxon Concept ontology (see http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/Taxo... ) and came up with the following scenario. Let's say that I assign a persistent identifier [URI1] to the TNU we are talking about here, "Juncus diffusissimus Buckl. sec L. Urbatsch 2009" and that I'm right that by TNU we mean the idea that Lowell Urbatch had in his head about how he meant to (sorry to drag him into this randomly - I actually don't know him). We can assign a tc:accordingTo property to [URI1] with the object of that property being some kind of resource whose metadata says that Lowell Urbatcsh used the name in some sense known to him in 2009. (We could work out the details of how one would write that metadata but there probably is enough stuff in the Dublin Core and FOAF vocabularies to do it. I think later in your email you said GNUB calls it a "Reference".) Now let's say that in 2012 I call him on the phone and say "hey Lowell, what taxonomic treatment were you thinking about in 2009 when you annotated that specimen in the NLU herbarium with barcode LSU00000428?" and he says "Gleason and Cronquist, 1991". I have two choices: 1. add a tc:describedBy property to the metadata for [URI1] with the value urn:isbn:0893273651 (a persistent URI representing Gleason and Cronquist 1991) and "promote" the resource identified by [URI1] from generic TNU to "deeper" taxonomic concept. 2. create a different [URI2] which has a property/value pair of tc:describedBy urn:isbn:0893273651 , then somehow relate the royal URI2 taxonomic concept to the lowly URI1 generic TNU.
If we use owl:sameAs to relate the two instances (i.e. assert [URI2] owl:sameAs [URI1] ) then there is no advantage to option two. All statements made about [URI1] would apply to [URI2] and vice-versa. All we would be accomplishing is doubling the number of triples that we have to keep track of. The question is whether this would result in "bad" or "silly" statements (the essence of Bob Morris' objection to owl:sameAs, I think). If so, then we need some kind of new term to relate royal Taxon Concepts ("deep" taxonomic concepts) to lowly generic TNUs ("shallow" taxonomic concepts). I don't think such a term exists at the moment.
The advantage of choice 1 is that we have off-the-shelf technology (the TDWG Taxon Concept ontology based on a ratified TDWG standard, TCS). If we go with choice 2 (and don't use owl:sameAs to relate the two URIs), then we doom ourselves to another five+ years of thrashing out some new vocabulary or ontology for relating TNUs to Taxon Concepts. So here where we would be if we went with choice 1:
1. The rdf:type of the "thing" is tc:Taxon or tc:TaxonConcept . The ontology declares them to be equivalent classes. 2. The minimal requirement for a tc:Taxon is to have a persistent identifier and a name associated with it, either through tc:nameString literal or better yet a tc:hasName property whose value is a persistent URI that provides more extensive metadata than a string. uBio comes to mind here as a giant source of name strings with assigned persistent identifiers. A tc:Taxon instance having only name information would be a nominal taxon - an undesirable but probably common circumstance. 3. If the tc:Taxon instance has a tc:accordingTo property, it is elevated to TNU status because we can know who used the name and when if we investigate of the object of the tc:accordingTo property further. Maybe we can learn more about this kind of tc:Taxon instance later, but in most cases probably not. 4. If the tc:Taxon instance has a tc:describedBy property, then it is elevated to full taxonomic concept status because it would be related to the persistent identifier of a published taxonomic treatment. One could then theoretically discover all kinds of cool relationships to other full-blown concepts using the tools that Nico and Rich are going to develop.
One could create a Venn diagram showing the subset relationships of these three levels of tc:Taxon. If it made Rich feel better, he could mint a class URI for the deep taxonomic concepts which could be used in rdf:type statements in addition to the basic rdf:type of tc:Taxon .
I see this as a way to make rapid progress on this front by leveraging work which has already been completed and accepted by TDWG but not really implemented. The TDWG Taxon Concept ontology and TCS may not do everything that people want as far as allowing one to define all of the set relationships required to do the fancy stuff. But I don't see why that can't be added in a TCS 2.0 that is backwardly compatible with TCS 1.2 .
I will defend my repeated references to HTTP URIs and RDF by saying that this email is being posted to the TDWG RDF group list but also because the ratified standard for GUIDs (see http://bioimages.vanderbilt.edu/pages/guid-applicability-final-2011-01.pdf) says that persistent identifiers must be HTTP proxied (recommendation 2) and that they should resolve to provide RDF/XML (recommendation 10). Having not yet achieved the degree of cynicism attained by Rod Page about people actually following TDWG standards, I still believe that we should try to follow them. Or else just stop wasting time on this...
Steve
So the question in my mind (and I think a question posed explicitly
earlier in this thread)
is whether enough of the technical stuff you want to do with TNUs can be
done with
the TDWG Taxon Concept ontology which is based on a ratified TDWG standard
(TCS).
Yes, I agree that this is the key question right now (speaking as TDWG Convener for the TDWG-TNC Group); unfortunately it's not one I have time to address in detail this week. But I think it's the conversation we ought to be having.
The 2009 "thing" is something that L. Urbatsh had in his head - the idea of how he thought the name "Juncus diffusissimus Buckl." should be applied to real organisms and the specimens that come from them. We don't know what that idea is but presumably he had one and we could assign a persistent identifier to it and describe it using the string "Juncus diffusissimus Buckl. sec L. Urbatsch 2009".
Ah! In that case, it only becomes a TNU when it is documented (i.e., when it is "used"). The data model is very liberal in terms of what constitutes "documentation" (="Reference", in the GNUB data model), but chemical signals inside a person's brain would be out of scope for "documentation". However, once he's written it down on a determination label, then it becomes a TNU. The trick then becomes how to map what was in his head, to what is defined in some way. When people write determination labels on specimens, they usually don't include lots of details about the scope of the taxon concept they had in mind when assigning that name to that taxon. This is why an assertion is needed (as per Nico's emails on this thread): Someone (either Urbatsh himself, or the collection manager, or someone else) needs to make an assertion that the TNU recorded on the specimen determination label maps to some other TNU with a well-documented concept definition (e.g., a published revision), through some set-theory relationship.
I was thinking that was what you meant by TNU. Whether or not it is better to proliferate a bunch of additional TNU instances, assign them their own identifiers, and relate them to each other is a technical detail that as an end user I'm happy to let you take care of within your GNUB system. End users want a persistent identifier of some sort to link identification instances to. They will hopefully find it through a user-friendly interface that hides the ugly details you are describing
here.
I have to admit that I'm a little squeamish about minting TNUs for determination labels, and I wouldn't encourage it. However, for the collection managers out there, sometimes this is the only way you'll ever translate scientific names on specimen labels into well-defined taxon concepts. The only part that gets fuzzy is the notion of dwc:Identification. The way we model it for our specimen data is that each Identification instance gets its own GUID, and basically says "this person on this date determined this specimen to fall within the implied concept of this TNU". In other words, an Identification bridges a specimen instance to a TNU instance according to some person (with other metadata, like date, etc.) In the ideal world, whenever anyone every made a specimen determination, they would add a "sensu XXXX" to the name they applied. The unfortunately reality is that the vast vast vast majority of actual specimen data do not meet this ideal. Thus, we're left to guess what concept they had in mind when making the determination. In some cases, we can confidently infer that the person had a particular concept in mind (the example I gave on the expert visiting a collection, making specimen determinations, and later publishing a revision). You can also confidently build these links whenever a publication includes the "Material Examined" section. But in most cases, there really is no other choice than minting a TNU for the determination label, then hoping that TNU will eventually get cross-linked to other TNUs with more well-defined concept definitions.
Let's say that I assign a persistent identifier [URI1] to the TNU we are talking about here, "Juncus diffusissimus Buckl. sec L. Urbatsch 2009" and that I'm right that by TNU we mean the idea that Lowell Urbatch had in his head about how he meant to
OK, I'll assume there is a determination label in a jar with that species name on it, and credit to L. Urbatsch, and dated to 2009. This determination label serves as the basis for the TNU.
We can assign a tc:accordingTo property to [URI1] with the object of that property being some kind of resource whose metadata says that Lowell Urbatcsh used the name in some sense known to him in 2009.
I'm not sure this is right, because it sounds like you're saying that tc:accordingTo should point to a TNU. However, tc:accordingTo should point to a "Reference" (sensu GNUB data model). As such, there would be a Reference instance in GNUB with a GUID that refers to the determination label. The "author" of that determination label would be Lowell Urbatcsh, and the date would be 2009. This serves the same function as a journal article, or book, or any other more familiar form of documentation ("Reference"); but instead of being ink on paper in thousands of copies distributed in libraries around the world, it's ink on paper on a label attached to a specimen. So, tc:accordingTo would point to the GUID for the *Reference* instance; not a TNU instance.
(We could work out the details of how one would write that metadata but there probably is enough stuff in the Dublin Core and FOAF vocabularies to do it. I think later in your email you said GNUB calls it
a "Reference".)
Right -- exactly. A Reference is a piece of documentation -- a journal article, a book, a field notebook, a piece of correspondence, a determination label, etc. A Reference may be associated with zero, one, or many TNUs. In the case of determination labels, this is often a case where one reference is associated with only one TNU. But that doesn't mean they're the same thing. A Reference is a Reference, and a TNU is a TNU, and tc:accordingTo should point to a Reference, not a TNU.
Now let's say that in 2012 I call him on the phone and say "hey Lowell,
what
taxonomic treatment were you thinking about in 2009 when you annotated that specimen in the NLU herbarium with barcode LSU00000428?" and he says "Gleason and Cronquist, 1991". I have two choices: 1. add a tc:describedBy property to the metadata for [URI1] with the
value
urn:isbn:0893273651 (a persistent URI representing Gleason and Cronquist
1991)
and "promote" the resource identified by [URI1] from generic TNU to
"deeper" taxonomic concept.
- create a different [URI2] which has a property/value pair of
tc:describedBy
urn:isbn:0893273651 , then somehow relate the royal URI2 taxonomic concept
to the lowly URI1 generic TNU.
I need to get my head around tc:describedBy before I can respond in detail; but in GNUB space, there would be a mapping of the TNU for "Juncus diffusissimus Buckl. sec L. Urbatsch 2009" (the determination label), and the TNU for "Juncus diffusissimus Buckl. Gleason and Cronquist, 1991" (the publication -- assuming that G&C 1991 used the same name/combination/spelling). In this case, the mapping would be of type "congruent".
I can't comment on the OWL implications. My role here is to make sure that our "things" are consistent. (a Reference being on "thing", a TNU being another "thing", and also how the "Identification" thing and possibly the taxon concept "thing" relate to these other things).
Form what you describe later in your email, I guess I'm in favor of option 1.
- If the tc:Taxon instance has a tc:accordingTo property, it is elevated
to
TNU status because we can know who used the name and when if we investigate of the object of the tc:accordingTo property further.
Again, tc:accordingTo should not refer to a TNU. It should refer to a Reference.
If it made Rich feel better, he could mint a class URI for the deep taxonomic concepts which could be used in rdf:type statements in addition to the basic rdf:type of tc:Taxon .
It actually wouldn't make me happier to do this. My preference is to use the TNU GUID as the proxy for the concept, rather than mint new GUIDs for concepts. This is how we do it on the nomenclatural act side. We don't create a new GUID for the nomencatural act -- the GUID for the TNU carrying the nomenclatural act is a sufficiently unambiguous proxy. (However, the new draft GNUB model does parse out each nomencaltural act into one-to-many nomenclatural "events" -- but that's a separate conversation).
I see this as a way to make rapid progress on this front by leveraging work which has already been completed and accepted by TDWG but not really implemented. The TDWG Taxon Concept ontology and TCS may not do everything that people want as far as allowing one to define all of the set relationships required to do the fancy stuff. But I don't see why that can't be added in a TCS 2.0 that is backwardly compatible
with TCS 1.2 .
I wholeheartedly endorse this approach! In December I'll take a closer look at the Taxon Concept ontology and I'll discuss with Rob Whitton (CC'd -- sorry, Rob!) how we might actually start implementing services that output GNUB content in conformance with tc. Rapid progress is definitely a Good Thing!
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences Associate Zoologist in Ichthyology Dive Safety Officer Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html
Note: This disclaimer formally apologizes for the disclaimer below, over which I have no control.
participants (4)
-
David Remsen
-
Richard Pyle
-
Roderic Page
-
Steve Baskauf