[tdwg-content] Taxonomic name usage files

Richard Pyle deepreef at bishopmuseum.org
Tue Apr 19 01:47:36 CEST 2016

Hi Nico,


I think it would be helpful to define a couple of terms first. I know you already know much of this, but for the benefit of other I think it would help set a baseline for convsersation.


Name-String: A literal string of characters representing a Taxon Name. These may include authorships, rank indicators and other qualifiers, and concept qualifiers (e.g., the “sec.” bits.)


Reference: A “reference” is any form of static documentation, at any granularity (usually a publication in the traditional sense, but unpublished documents are also included in the scope of “reference”, as are subsections of traditionally cited publication units, such as a particular section of a single article).  


Taxon Name Usage (TNU):  The Global Names Usage Bank (GNUB) defines this as essentially the treatment of a taxon name within a Reference. The primary components of each TNU are:

1.       A link to the TNU that represents the original establishment of the name (called the “Protonym”, analogous to “basionym” in some ways).  When a TNU is itself a Protonym, then this link is self-referential. This is effectively the “Name” part of the TNU.

2.       A link to the Reference (as defined above).  Again, this usually corresponds to a traditionally-citable publication unit (Article, Book, Chapter, etc.), but may represent an unpublished document (manuscript, correspondence, etc.) or a finer granularity of a more traditionally-cited work (e.g., a few specific pages within an article).

3.       The verbatim spelling of the name as it appears in the Reference (as best as can be represented by the UTF-8 Character set).

4.       An indication of the taxonomic rank at which a name was used (e.g., species vs. subspecies vs. variety, etc.)

5.       An indication of the immediate hierarchical parent of the name, as it was used (always another TNU with a higher taxon rank, within the same Reference)

6.       An indication of the subjective validity of the name (recursive to the same record when the name is used as a valid taxon, or to another TNU within the same Reference representing the senior heterotypic synonym).

There are some other properties of TNUs, but this is the core.


Appearance: This term was established by James Ytow for his Nomencurator data model, and represents the actual appearance of a text-string name within a Reference.  Each TNU (the conceptual treatment of a name within a Reference) might involve only a single Appearance (i.e., the name-string appeared only once within the Reference); or it might involve many appearances of the name within a single TNU (e.g., a study on the biology of Aus bus might include dozens of instances of the text string “Aus bus” or its implied surrogates such as “A. bus” or in some cases, in context, even just “bus”).


Usage Citation: These are less common, but fall into your realm: these are the conceptual citations within a Reference to other specific usages in other References. These usually take the form of “Aus bus L. sec. Smith” (or “sensu” instead of “sec.”), but it’s not the literal name-string itself; rather it’s the implied citation linking a TNU in one Reference to a TNU in a different Reference (best represented in TDWG-land via “relationshipAssertion” of TCS).


If we can keep these four very different concepts separate, it may be helpful to address the example you gave.


For simplicity, let’s call your “study” a single reference.  That is, we will pick a snapshot static version of the manuscript or publication.  If you wanted to track multiple versions of a MS, then each version would be a separate Reference.


For further simplicity, let’s stick with two qualified Name-String for you example: “Aus bus L. sec. Smith” and “Xus bus (L.) sec. Jones”  Obviously, they may show up sometimes as “A. bus”, ”X. bus (L.)”, etc. – all of which are represented in their respective Appearances; but again, for simplicity, let’s assume that two name-strings are used consistently in different parts of the Reference.


Hokay… with THAT out of the way…..


Let’s say that within in your Reference: there are 12 Appearances of the Name-String “Aus bus L. sec. Smith” on pages 6-15 of your Reference, and 8 Appearances of the name-string “Xus bus (L.) sec. Jones” on pages 16-20 (again, trying to keep it simple).


You indicated in your example that the latter is being used as "my new sense"; in which case you (author of MS) are Jones.  You don’t have to be Jones – you could be Franz and your study compares two separate senses of “bus” by earlier authors “Smith” and “Jones”.  In any case, we know the following exist:



-          “Aus bus L. sec. Smith”

-          “Xus bus (L.) sec. Jones”



-          Reference represented by “L.”

-          Reference represented by “Smith”

-          Reference represented by “Jones”



-          Usage of “Aus” by L.

-          Usage of “bus” by L. (Protonym of “bus”)

-          Usage of “Aus” by Smith

-          Usage of “bus L.” by Smith

-          Usage of “Xus” by Jones

-          Usage of “bus L.” by Jones

[You can work out the other properties of each of these 6 TNUs.]



-          12 instances of Name-String “Aus bus L. sec. Smith” among pp. 6-15 of Jones

-          8 instances of Name-String “Xus bus (L.) sec. Jones” among pp. 16-20 of Jones


In summary, so far we have two Name-Strings, three References, six TNUS, and 20 Appearances.


Conspicuously, I have not indicated how many Usage Citations there are in this example.  There’s a very simple reason for this: whereas I have spent shamefully large numbers of hours thinking about the other classes of items, I have yet to get my head around how best to model “Usage Citation”.  It’s not that I don’t know how to do it; it’s that there are many different ways to do it, and I haven’t yet worked out which may makes the most sense.  However, not all is lost, because in all probability they are not needed to solve your issue.


Now (finally) getting back to the answer to your question….  I see two different ways to approach it.  The first is probably the most robust and appropriate, which is to create twenty Rows; one for each Appearance, with all the respective properties of each.  However, I suspect that would be seen by most people (possibly even myself) as overkill.


What it seems like you’re really looking for is two rows:  one representing the TNU of “bus L.” by Smith, and one representing the TNU of “bus L.” by Jones.  Columns would break out all the respective bits of these two TNUs, and you’d add another column to indicate which subsections of the MS manage each.


A compromise would be to iterate all six TNUs, but that might add more stuff without addressing your specific stated desire to distinguish the two usages of “bus”, and how to refer to them.


I’d be happy to discuss further if none of these quite addresses your main question.


DwC/TCS covers most of what you need, but is missing the “Appearance” class and all of its associated properties.  I would refer you to James Ytow’s works on Nomencurator to see how he defines those terms.  I’m not aware of anyone who has (yet) tried to tackle the “Usage Citation” in the real world, but as I said earlier, it’s best to start with the “relationshipAssertion” parts of TCS.


If I’m missing some aspect of what you’re asking about, please let me know.


Apologies to everyone else on the list for this massive missive.






Richard L. Pyle, PhD
Database Coordinator for Natural Sciences | Associate Zoologist in Ichthyology | Dive Safety Officer
Department of Natural Sciences, Bishop Museum, 1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef at bishopmuseum.org





From: tdwg-content [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of Nico Franz
Sent: Monday, April 18, 2016 12:19 PM
To: tdwg-content at lists.tdwg.org; vocab at noreply.github.com
Subject: [tdwg-content] Taxonomic name usage files


Hi TDWG Content group:


   Perhaps someone can answer these question? Suppose I submitted a new biodiversity study to a journal for peer review and ultimately publication. My study mentions taxonomic names, but some names are used in more than one specific sense throughout the manuscript. As part of my study's data body, I want to say things like: at this point or these sections in my manuscript, I am using the name in the sense of authors X. And: later on in the Discussion, I am using the name in "my new sense" (as an example). I want to submit a table with structured metadata on the various usages of names in my manuscript, as part of the supplementary data provided to the Journal. I believe part of what the table would have to reflect, for each usage, is whether this is my usage, or that of someone else that I am ok with (=> define speaker role).


   Is there a best TDWG standard to glean terms and definitions from to draft up that table? I assume it is Darwin Core and/or the TCS, but then has someone actually tried this (= extract the subset of terms needed to identify names, usages, speaker roles) in conjunction with (e.g.) a biodiversity inventory or taxonomic revision to be published? The key purpose here would be to facilitate better name usage data practices, tied to the process of publishing new data via journals. To make data about name usages part of the supplementary data, in a structured and rather explicit format.


Thanks and best, 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tdwg.org/pipermail/tdwg-content/attachments/20160418/9a10db64/attachment.html>

More information about the tdwg-content mailing list