Hi Roger,
Sorry, I wasn't clear enough. I think the simplest way to clarify is to answer your questions directly:
So we have a 'static' document and it is chuck full of NameStrings. I have
circled some of them.
We have Ditrichum cornubicum (a red data book moss). It is mentioned, with
a picture on the
previous page and at the top of this page it is mentioned a few more
times. Further down this
page we have Buxbaumia aphylla which is also mentioned twice. There is a
picture of it on
the next page.
So how many name usages do we have here?
One. A "NameString" is defined as a string of textual characters meant to represent a name of biological organisms. That there may be many replicate copies of the *same* NameString within one documentation instance does not mean there are many NameStrings (keeping in mind the "or is explicitly implied" part of the definition).
Does each mention of the name on the page count as a usage?
No, because they are replicate copies of the same NameString; not different NameStrings.
- would seem to be a silly thing to do.
Yes, it would.
Does mentioning the name on different pages mean different usages?
Not if those different pages are within the same scope of the documentation instance. Sometimes, however, what seems like a single documentation instance may be subdivided into several documentation instances (e.g., chapters in a book), so it depends on what the scope of the documentation instance is. Please note that this realm of ambiguity exists completely independently of the problems we're discussing about GUID "units" in taxonomy, and thus applies equally to all solutions to the question of what object(s) should receive a taxon GUID.
- would also be silly but we don't have anyway to judge
(different pages within a journal or combined work for example?)
We judge based on the defined scope of the documentation instance (hence my parenthetical comment, "assuming sufficient metadata for identifying a documentation instance").
How about same page but different context? The picture may be of a different moss to the one that they mention in the text.
A picture is not a NameString. Multiple replicates of the same literal or explicitly implied NameString within the same defined documentation instance do not constitute multiple NameStrings, and hence do not represent multiple usage instances.
If a subspecies is mentioned does that count as a usage of the specific name (it has been used)
Yes, of course. True even for autonyms, such as "Pseudanthias ventralis ventralis", being a separate NameString from "Pseudanthias ventralis" (the former consisting of 32 textual characters (including the spaces), and the latter consisting of 22 textual characters (including the space).
and likewise a binomial implies a usage of the genus name.
Yes, that is correct. Just as trinomial implies usage of a species name as well as a genus name. Part of "explicitly implied".
There are around 1100 species mentioned in this publication. They are probably mentioned on average 3 times each (a guess) so that is 3300 new name usages.
No, it is 1100 NameString usages that include spaces as part of the NameString (i.e., species-group names), plus an additional 300 NameString usages at the genus level (a guess), plus whatever NameStrings at higher ranks.
Again, a NameString is identified as a string of textual characters meant to represent a name of biological organisms. Multiple copies of the same NameString meant to represent the same name of the same biological organisms within a defined documentation instance do not constitute multiple NameStrings, and hence do not represent multiple NameUsage instances. The "meant to represent a name of biological organisms" part of the definition is necessary to address the fact that homonyms are meant to refer to different names, even though they share the same NameString. Thus, the appearance of two separate homonymous NameStrings intended to represent two separate names, constitute two separate NameUsage instances (in need to two separate GUIDs).
I really can't see how one would apply your definition.
How about now, after I have clarified the definition?
Perhaps if you restricted it to taxonomic works
That is a question of the scope of documentation instances; not the scope of NameUsage instances. I would suggest leaving the definition flexible and broad.
It certainly isn't clear to me.
Is it any more clear now? If not, please let me know what parts don't make sense.
We can easily define what a TaxonConcept is because it implies intent. If I want to create an object that I want you to refer to as a definition of a taxon then I am creating a TaxonConcept and should issue a GUID to make it easy for you to refer to it. If not then I shouldn't bother. If I want to use the services of a nomenclator to define the publication and typification of the name I am using then I can use a TaxonName GUID within my definition - but I don't have to. I can't see how that can be any simpler than that.
Evidently we have different ideas about the definition of the word "simpler"... ;-)
Porley & Hodgetts (2005) have no intension whatsoever of 'committing' nomenclatural acts or of defining any taxa that people will later refer to. They are simple referring to existing concepts.
Really? Who makes that decision? You? Porley & Hodgetts (2005)? SEEK? GBIF? There is an almost perfectly smooth transition gradient in published literature (let alone unpublished documentation sources) between name-only usages and full-blown highly-defined TaxonConcepts. If you advocate two "kinds" of GUIDs (one for TaxonNames, and one for TaxonConcepts), then an arbitrary line needs to be drawn between those that are TaxonName objects, and those that should be treated as TaxonConcept objects. This discussion thread has made clear that this arbitrary line would be drawn in different places by different people. Why not start a GUID system from the simplest and most flexible approach (which also happens to be the most objective in terms of identifying objects to which GUIDs should be assigned), and condense as much of the subjectivity as possible into the second-tier informatics task of establishing links among GUID-represented objects? That second-tier task is already rife with subjectivity and ambiguity no matter how many "kinds" of taxon GUIDs are established.
Yet by your definition they have created over 6k name usages that a diligent publisher might issue GUIDs for.
Well....probably more like 1.5K Name usages. And who is the "publisher" in this case? The publisher of the book? Is that who we expect will assign the GUIDs? Or do you mean the publisher more generically -- as a publisher of GUID assignments (like a data provider)?
Have I completely misinterpreted you definition?
Well...."completely" might be too strong of a word. The word "mostly" seems like a more appropriate adverb in this case.
If so could you define it a little tighter?
I thought I had....let me know if this message does not clarify the definition.
If you imply that the author has to have meant to describe something then
you are just
creating the TaxonConcept definition I am working with here. How else can
you subset
all the times names appear in print?
Well....there minimally has to be sufficient evidence that the author intended a given NameString to represent biological organisms -- so that we can at least classify the NameString as a "name" (as distinguished from the large quantities of other non-name words that might appear within a documentation instance). But whether or not the described "something" rises to the level of a TaxonConcept, or is merely intended as a reference to a pre-existing TaxonConcept (or something else entirely), is an ambiguity that I do not think should clutter the discussion about GUIDs for taxonomic objects.
This is all great fun but we do need to nail it down and move on.
I agree 100%. Unfortunately, the level of disagreement (not just from me) as expressed in this discussion makes it clear that we have not yet nailed it down.
Finally, let me know if you want me to clarify what I mean by "explicitly implied" in my one-sentence definition. This is where I expected some argument, because it is the one area where there is ambiguity in my approach (other than the documentation instance thing -- the ambiguity of which applies equally to all of the proposed taxon GUID solutions). I don't believe that any solution is completely free of ambiguity -- but I believe that NameString Usage instances are (by far) the least ambiguous of objects, and can flexibly serve as "handles" to both TaxonName objects (by any definition) and TaxonConcept objects (in the sense that we all seem to accept that "Aus bus Smith 1995 SEC Jones 2000" can serve as a "handle" to a TaxonConcept instance).
Aloha, Rich
Richard L. Pyle, PhD Database Coordinator for Natural Sciences and Associate Zoologist in Ichthyology Department of Natural Sciences, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html