GUIDs for Taxon Names and Taxon Concepts

Roger Hyam roger at TDWG.ORG
Mon Nov 14 13:07:25 CET 2005


Hi Rich,

So you define a NameUsage as:

"Any occurrence of a NameString as it appears or is explicitly implied
within some form of static documentation."

Let us explore this definition! Picking a volume almost at random  (I
like the cover) I chose Porley & Hodgett (2005) /Mosses & Liverworts./

http://www.tdwg.hyam.net/images/bryo_01.jpg

and picking a page at random - in this case 136

http://www.tdwg.hyam.net/images/bryo_02.jpg

So we have a 'static' document and it is chuck full of NameStrings. I
have circled some of them.

We have /Ditrichum cornubicum/ (a red data book moss). It is mentioned,
with a picture on the previous page and at the top of this page it is
mentioned a few more times. Further down this page we have /Buxbaumia
aphylla/ which is also mentioned twice. There is a picture of it on the
next page.

So how many name usages do we have here? There seem to be loads.

    * Does each mention of the name on the page count as a usage? -
      would seem to be a silly thing to do.
    * Does mentioning the name on different pages mean different usages?
      - would also be silly but we don't have anyway to judge (different
      pages within a journal or combined work for example?)
    * How about same page but different context? The picture may be of a
      different moss to the one that they mention in the text.
    * If a subspecies is mentioned does that count as a usage of the
      specific name (it has been used) and likewise a binomial implies a
      usage of the genus name.
    * There are around 1100 species mentioned in this publication. They
      are probably mentioned on average 3 times each (a guess) so that
      is 3300 new name usages. Plus they are all binomials or
      subspecific names so double that for the different usages or
      genera etc. So 6,600 name usages in this volume. I wonder how many
      publications like this come out a year globally?

I really can't see how one would apply your definition. Perhaps if you
restricted it to taxonomic works but then you have to define a taxonomic
work and you are still limited to how it has to be stated to act as a
'usage'. It certainly isn't clear to me.

We can easily define what a TaxonConcept is because it implies *intent*.
If I want to create an object that I want you to refer to as a
definition of a taxon then I am creating a TaxonConcept and should issue
a GUID to make it easy for you to refer to it. If not then I shouldn't
bother. If I want to use the services of a nomenclator to define the
publication and typification of the name I am using then I can use a
TaxonName GUID within my definition - but I don't have to.  I can't see
how that can be any simpler than that.

Porley & Hodgetts (2005) have no intension whatsoever of 'committing'
nomenclatural acts or of defining any taxa that people will later refer
to. They are simple *referring *to existing concepts. Yet by your
definition they have created over 6k name usages that a diligent
publisher might issue GUIDs for.

Have I completely misinterpreted you definition? If so could you define
it a little tighter? If you imply that the author has to have meant to
describe something then you are just creating the TaxonConcept
definition I am working with here. How else can you subset all the times
names appear in print?

This is all great fun but we do need to nail it down and move on.

All the best,

Roger



Richard Pyle wrote:
> Hi Roger,
>
>
>> Could you attempt a concise definition of a
>> UsageInstance we can all agree on then :)
>>
>
> Sure: Any occurrence of a NameString as it appears or is explicitly implied
> within some form of static documentation.
>
> "NameString" refers to a string of textual characters meant to represent a
> name of biological organisms.  This can be defined more restrictively to
> "ScientificNameString" (names that conform to one of a designated set of
> nomenclatural Codes), or more broadly to include vernacular NameStrings.
>
> "static documentation" can be defined broadly, to include publications,
> databases, and any other form of documented medium of human communication.
> The "static" part means that it must represent a snapshot in time. In the
> case of dynamic databases, this would require a "date stamp" for each
> UsageInstance -- either for an individual record, or for a snapshot of the
> entire dataset.
>
> The "explicitly implied" part addresses zoological-style nomenclatural
> listings along the lines of what Yde has sent to this list, where a genus is
> listed once as a header, and species epithets are enumerated below,
> explicitly implying a series of binomials, even if they are not actually
> printed on papaer as such.
>
> The point is, the definition is highly flexible, yet mostly unambiguous
> (assuming sufficient metadata for identifying a documentation instance).
>
> I think it only makes informatic sense to distinguish two "kinds" of GUID
> for taxonomic objects if the distinction between the objects is unambiguous.
> Given these NameStrings:
>
> 1. Aus Smith 1995
> 2. Xea Jones 2000
> 3. Aus bus Smith 1995
> 4. Xea bus (Smith 1995) Jones 2000
> 5. Xea ba (Smith 1995) Jones 2000
>
> It is ambiguous whether there are three, four, or five distinct NameObjects
> represented (i.e., it is ambiguous whether #s 4 & 5 should get TaxonName
> GUIDs, or TaxonConcept GUIDs via SEC instances).
>
> However, given this list:
>
> 1. "Aus" as it appears in Smith 1995
> 2. "Xea" as it appears in Jones 2000
> 3. "Aus bus" as it appears in Smith 1995
> 4. "Xea bus" as it appears in Jones 2000
> 5. "Xea" as it appears in Pyle 2005
> 6. "Xea ba" as it appears in Pyle 2005
> 7. "Xea" as it appears in ITIS Nov. 11, 2005 snapshot dataset
> 8. "Xea bus" as it appears in ITIS Nov. 11, 2005 snapshot dataset
> 9. "Xea ba" as it appears in ITIS Nov. 11, 2005 snapshot dataset
>
> There is very little ambiguity that each item in this list gets its own
> GUID.  Until a universal definition of a "NameObject" emerges, certain usage
> instances can serve as surrogates for basionyms (1, 2, 3), or botanical new
> combinations (4), or TaxonConcepts (1-6) -- in whatever way that a data
> manager needs or wishes to establish linkages among GUIDs (e.g., linking #s
> 4, 6, 8 & 9 to #3 via "is basionym of"; or linking #s 4, 6, 8 & 9 to #2 via
> "is combined with"; or linking #5 to #6, #2 to #4, and #1 to #3 via
> "contains").
>
> Several people have expressed a desire for "simple" and "flexible", and I
> think this approach maximizes both.
>
> Aloha,
> Rich
>
>

--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger at tdwg.org
 +44 1578 722782
-------------------------------------


--------------010003010201080303080809
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Rich,<br>
<br>
So you define a NameUsage as:<br>
<pre wrap="">"Any occurrence of a NameString as it appears or is explicitly implied
within some form of static documentation."</pre>
Let us explore this definition! Picking a volume almost at random&nbsp; (I
like the cover) I chose Porley &amp; Hodgett (2005) <i>Mosses &amp;
Liverworts.</i><br>
<br>
<a class="moz-txt-link-freetext" href="http://www.tdwg.hyam.net/images/bryo_01.jpg">http://www.tdwg.hyam.net/images/bryo_01.jpg</a><br>
<br>
and picking a page at random - in this case 136<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.tdwg.hyam.net/images/bryo_02.jpg">http://www.tdwg.hyam.net/images/bryo_02.jpg</a><br>
<br>
So we have a 'static' document and it is chuck full of NameStrings. I
have circled some of them. <br>
<br>
We have <i>Ditrichum cornubicum</i> (a red data book moss). It is
mentioned, with a picture on the previous page and at the top of this
page it is mentioned a few more times. Further down this page we have <i>Buxbaumia
aphylla</i> which is also mentioned twice. There is a picture of it on
the next page.<br>
<br>
So how many name usages do we have here? There seem to be loads.
<ul>
  <li>Does each mention of the name on the page count as a usage? -
would seem to be a silly thing to do.<br>
  </li>
  <li>Does mentioning the name on different pages mean different
usages? - would also be silly but we don't have anyway to judge
(different pages within a journal or combined work for example?)<br>
  </li>
  <li>How about same page but different context? The picture may be of
a different moss to the one that they mention in the text.</li>
  <li>If a subspecies is mentioned does that count as a usage of the
specific name (it has been used) and likewise a binomial implies a
usage of the genus name.</li>
  <li>There are around 1100 species mentioned in this publication. They
are probably mentioned on average 3 times each (a guess) so that is
3300 new name usages. Plus they are all binomials or subspecific names
so double that for the different usages or genera etc. So 6,600 name
usages in this volume. I wonder how many publications like this come
out a year globally?<br>
  </li>
</ul>
I really can't see how one would apply your definition. Perhaps if you
restricted it to taxonomic works but then you have to define a
taxonomic work and you are still limited to how it has to be stated to
act as a 'usage'. It certainly isn't clear to me.<br>
<br>
We can easily define what a TaxonConcept is because it implies <b>intent</b>.
If I want to create an object that I want you to refer to as a
definition of a taxon then I am creating a TaxonConcept and should
issue a GUID to make it easy for you to refer to it. If not then I
shouldn't bother. If I want to use the services of a nomenclator to
define the publication and typification of the name I am using then I
can use a TaxonName GUID within my definition - but I don't have to.&nbsp; I
can't see how that can be any simpler than that.<br>
<br>
Porley &amp; Hodgetts (2005) have no intension whatsoever of
'committing' nomenclatural acts or of defining any taxa that people
will later refer to. They are simple <b>referring </b>to existing
concepts. Yet by your definition they have created over 6k name usages
that a diligent publisher might issue GUIDs for.<br>
<br>
Have I completely misinterpreted you definition? If so could you define
it a little tighter? If you imply that the author has to have meant to
describe something then you are just creating the TaxonConcept
definition I am working with here. How else can you subset all the
times names appear in print?<br>
<br>
This is all great fun but we do need to nail it down and move on.<br>
<br>
All the best,<br>
<br>
Roger<br>
<br>
<br>
<br>
Richard Pyle wrote:
<blockquote
 cite="midIMEKKFHEGHHDDDHKIOJECEHDDHAA.deepreef at bishopmuseum.org"
 type="cite">
  <pre wrap="">Hi Roger,

  </pre>
  <blockquote type="cite">
    <pre wrap="">Could you attempt a concise definition of a
UsageInstance we can all agree on then :)
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Sure: Any occurrence of a NameString as it appears or is explicitly implied
within some form of static documentation.

"NameString" refers to a string of textual characters meant to represent a
name of biological organisms.  This can be defined more restrictively to
"ScientificNameString" (names that conform to one of a designated set of
nomenclatural Codes), or more broadly to include vernacular NameStrings.

"static documentation" can be defined broadly, to include publications,
databases, and any other form of documented medium of human communication.
The "static" part means that it must represent a snapshot in time. In the
case of dynamic databases, this would require a "date stamp" for each
UsageInstance -- either for an individual record, or for a snapshot of the
entire dataset.

The "explicitly implied" part addresses zoological-style nomenclatural
listings along the lines of what Yde has sent to this list, where a genus is
listed once as a header, and species epithets are enumerated below,
explicitly implying a series of binomials, even if they are not actually
printed on papaer as such.

The point is, the definition is highly flexible, yet mostly unambiguous
(assuming sufficient metadata for identifying a documentation instance).

I think it only makes informatic sense to distinguish two "kinds" of GUID
for taxonomic objects if the distinction between the objects is unambiguous.
Given these NameStrings:

1. Aus Smith 1995
2. Xea Jones 2000
3. Aus bus Smith 1995
4. Xea bus (Smith 1995) Jones 2000
5. Xea ba (Smith 1995) Jones 2000

It is ambiguous whether there are three, four, or five distinct NameObjects
represented (i.e., it is ambiguous whether #s 4 &amp; 5 should get TaxonName
GUIDs, or TaxonConcept GUIDs via SEC instances).

However, given this list:

1. "Aus" as it appears in Smith 1995
2. "Xea" as it appears in Jones 2000
3. "Aus bus" as it appears in Smith 1995
4. "Xea bus" as it appears in Jones 2000
5. "Xea" as it appears in Pyle 2005
6. "Xea ba" as it appears in Pyle 2005
7. "Xea" as it appears in ITIS Nov. 11, 2005 snapshot dataset
8. "Xea bus" as it appears in ITIS Nov. 11, 2005 snapshot dataset
9. "Xea ba" as it appears in ITIS Nov. 11, 2005 snapshot dataset

There is very little ambiguity that each item in this list gets its own
GUID.  Until a universal definition of a "NameObject" emerges, certain usage
instances can serve as surrogates for basionyms (1, 2, 3), or botanical new
combinations (4), or TaxonConcepts (1-6) -- in whatever way that a data
manager needs or wishes to establish linkages among GUIDs (e.g., linking #s
4, 6, 8 &amp; 9 to #3 via "is basionym of"; or linking #s 4, 6, 8 &amp; 9 to #2 via
"is combined with"; or linking #5 to #6, #2 to #4, and #1 to #3 via
"contains").

Several people have expressed a desire for "simple" and "flexible", and I
think this approach maximizes both.

Aloha,
Rich

  </pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>
 <a class="moz-txt-link-abbreviated" href="mailto:roger at tdwg.org">roger at tdwg.org</a>
 +44 1578 722782
-------------------------------------
</pre>
</body>
</html>


More information about the tdwg-tag mailing list