GUIDs for Taxon Names and Taxon Concepts

Richard Pyle deepreef at BISHOPMUSEUM.ORG
Mon Nov 14 09:28:04 CET 2005

Hi Roger,

Sorry, I wasn't clear enough.  I think the simplest way to clarify is to
answer your questions directly:

> So we have a 'static' document and it is chuck full of NameStrings. I have
circled some of them.
> We have Ditrichum cornubicum (a red data book moss). It is mentioned, with
a picture on the
> previous page and at the top of this page it is mentioned a few more
times. Further down this
> page we have Buxbaumia aphylla which is also mentioned twice. There is a
picture of it on
> the next page.
> So how many name usages do we have here?

One.  A "NameString" is defined as a string of textual characters meant to
represent a name of biological organisms.  That there may be many replicate
copies of the *same* NameString within one documentation instance does not
mean there are many NameStrings (keeping in mind the "or is explicitly
implied" part of the definition).

> Does each mention of the name on the page count as a usage?

No, because they are replicate copies of the same NameString; not different

> - would seem to be a silly thing to do.

Yes, it would.

> Does mentioning the name on different pages mean different usages?

Not if those different pages are within the same scope of the documentation
instance.  Sometimes, however, what seems like a single documentation
instance may be subdivided into several documentation instances (e.g.,
chapters in a book), so it depends on what the scope of the documentation
instance is.  Please note that this realm of ambiguity exists completely
independently of the problems we're discussing about GUID "units" in
taxonomy, and thus applies equally to all solutions to the question of what
object(s) should receive a taxon GUID.

> - would also be silly but we don't have anyway to judge
> (different pages within a journal or combined work for example?)

We judge based on the defined scope of the documentation instance (hence my
parenthetical comment, "assuming sufficient metadata for identifying a
documentation instance").

> How about same page but different context? The picture may be of a
> different moss to the one that they mention in the text.

A picture is not a NameString. Multiple replicates of the same literal or
explicitly implied NameString within the same defined documentation instance
do not constitute multiple NameStrings, and hence do not represent multiple
usage instances.

> If a subspecies is mentioned does that count as a usage of
> the specific name (it has been used)

Yes, of course. True even for autonyms, such as "Pseudanthias ventralis
ventralis", being a separate NameString from "Pseudanthias ventralis" (the
former consisting of 32 textual characters (including the spaces), and the
latter consisting of 22 textual characters (including the space).

> and likewise a binomial implies a usage of the genus name.

Yes, that is correct. Just as trinomial implies usage of a species name as
well as a genus name. Part of "explicitly implied".

> There are around 1100 species mentioned in this publication.
> They are probably mentioned on average 3 times each (a guess)
> so that is 3300 new name usages.

No, it is 1100 NameString usages that include spaces as part of the
NameString (i.e., species-group names), plus an additional 300 NameString
usages at the genus level (a guess), plus whatever NameStrings at higher

Again, a NameString is identified as a string of textual characters meant to
represent a name of biological organisms.  Multiple copies of the same
NameString meant to represent the same name of the same biological organisms
within a defined documentation instance do not constitute multiple
NameStrings, and hence do not represent multiple NameUsage instances.  The
"meant to represent a name of biological organisms" part of the definition
is necessary to address the fact that homonyms are meant to refer to
different names, even though they share the same NameString.  Thus, the
appearance of two separate homonymous NameStrings intended to represent two
separate names, constitute two separate NameUsage instances (in need to two
separate GUIDs).

> I really can't see how one would apply your definition.

How about now, after I have clarified the definition?

> Perhaps if you restricted it to taxonomic works

That is a question of the scope of documentation instances; not the scope of
NameUsage instances.  I would suggest leaving the definition flexible and

> It certainly isn't clear to me.

Is it any more clear now?  If not, please let me know what parts don't make

> We can easily define what a TaxonConcept is because it implies intent.
> If I want to create an object that I want you to refer to as a definition
> of a taxon then I am creating a TaxonConcept and should issue a GUID to
> make it easy for you to refer to it. If not then I shouldn't bother.
> If I want to use the services of a nomenclator to define the publication
> and typification of the name I am using then I can use a TaxonName GUID
> within my definition - but I don't have to.  I can't see how that can
> be any simpler than that.

Evidently we have different ideas about the definition of the word
"simpler"... ;-)

> Porley & Hodgetts (2005) have no intension whatsoever of 'committing'
> nomenclatural acts or of defining any taxa that people will later refer
> to. They are simple referring to existing concepts.

Really?  Who makes that decision? You? Porley & Hodgetts (2005)? SEEK? GBIF?
There is an almost perfectly smooth transition gradient in published
literature (let alone unpublished documentation sources) between name-only
usages and full-blown highly-defined TaxonConcepts. If you advocate two
"kinds" of GUIDs (one for TaxonNames, and one for TaxonConcepts), then an
arbitrary line needs to be drawn between those that are TaxonName objects,
and those that should be treated as TaxonConcept objects.  This discussion
thread has made clear that this arbitrary line would be drawn in different
places by different people.  Why not start a GUID system from the simplest
and most flexible approach (which also happens to be the most objective in
terms of identifying objects to which GUIDs should be assigned), and
condense as much of the subjectivity as possible into the second-tier
informatics task of establishing links among GUID-represented objects?  That
second-tier task is already rife with subjectivity and ambiguity no matter
how many "kinds" of taxon GUIDs are established.

> Yet by your definition they have created over 6k name usages
> that a diligent publisher might issue GUIDs for.

Well....probably more like 1.5K Name usages.  And who is the "publisher" in
this case?  The publisher of the book?  Is that who we expect will assign
the GUIDs? Or do you mean the publisher more generically -- as a publisher
of GUID assignments (like a data provider)?

> Have I completely misinterpreted you definition?

Well...."completely" might be too strong of a word.  The word "mostly" seems
like a more appropriate adverb in this case.

> If so could you define it a little tighter?

I thought I had....let me know if this message does not clarify the

> If you imply that the author has to have meant to describe something then
you are just
> creating the TaxonConcept definition I am working with here. How else can
you subset
> all the times names appear in print?

Well....there minimally has to be sufficient evidence that the author
intended a given NameString to represent biological organisms -- so that we
can at least classify the NameString as a "name" (as distinguished from the
large quantities of other non-name words that might appear within a
documentation instance).  But whether or not the described "something" rises
to the level of a TaxonConcept, or is merely intended as a reference to a
pre-existing TaxonConcept (or something else entirely), is an ambiguity that
I do not think should clutter the discussion about GUIDs for taxonomic

> This is all great fun but we do need to nail it down and move on.

I agree 100%.  Unfortunately, the level of disagreement (not just from me)
as expressed in this discussion makes it clear that we have not yet nailed
it down.

Finally, let me know if you want me to clarify what I mean by "explicitly
implied" in my one-sentence definition.  This is where I expected some
argument, because it is the one area where there is ambiguity in my approach
(other than the documentation instance thing -- the ambiguity of which
applies equally to all of the proposed taxon GUID solutions).  I don't
believe that any solution is completely free of ambiguity -- but I believe
that NameString Usage instances are (by far) the least ambiguous of objects,
and can flexibly serve as "handles" to both TaxonName objects (by any
definition) and TaxonConcept objects (in the sense that we all seem to
accept that "Aus bus Smith 1995 SEC Jones 2000" can serve as a "handle" to a
TaxonConcept instance).


Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at

