GUIDs for Taxon Names and Taxon Concepts

Roger Hyam roger at TDWG.ORG
Mon Nov 14 17:22:19 CET 2005


Hi Yde,


      TaxonNames

Yes it is ambiguous as to whether a nomenclator may wish to issue a
TaxonName GUID for 4 and/or 5. What I would do if I were a nomenclator
is issue a TaxonName GUID for both. The GUID of the wrongly spelled one
would return an object which included the GUID of the correctly spelled
one. This is just what I would do not what a nomenclator may choose to do.


      TaxonConcepts

TaxonConcepts depend on intention. From just a list of words it is
usually impossible to say whether they represent something that should
have a TaxonConcept GUID or not.

If this list was entitled "/Moths I have caught in my moth trap/" I
would argue strongly that you should not treat them as concepts. They
are an attempt by some one to say which taxa they have found not an
attempt to re-define taxa. *The authors want to reference existing taxa
*not give objects an identity. The items on the list may get GUIDs from
some recording scheme system though.

If the exact same list was entitled "/A treatment of  Zus  from Far Away
Land/" then it seems to me that they are all meant to be concepts
(possibly with bad nomenclature). *The authors want to tag/label these
taxa* so that other people can reference them when they go bird watching
in Far Away Land - or whatever animal a /Zus /is...


      An Analogy

I have in front of me a book called /Everyman's Dictionary of First
Names/.  Here are some names from the book.

    * Milli
    * Mills
    * Milly - see Milli
    * Milo
    * Milson
    * Milton

The question: "Should we issue National Insurance numbers for these
people?" is not a good one. They are not people they are just names! But
we only know that because I told you where I got them from. The people
who compiled the book probably had a database with IDs on. They could
try and set up a global system for names with GUIDs. This system would
clearly be completely separate from a global system for identifying
people by number for tax purposes although the tax system may refer to
the name system and a credit reference agency might provide a service
for getting an NI number from a person's name plus some other
disambiguation data.

So we can argue about the correct spelling of Milly and the register can
have its own pointers to 'correct' spelling but Milly Smith still gets a
tax bill because when she was born they gave her a number.

(Incidentally I believe Denmark actual has this system. It has a
national list of acceptable names for children and it has a system of
issuing ID numbers to everyone at birth. The UK just gets confused with
NI numbers and NHS numbers etc)

Hope this helps,

All the best,

Roger

[BTW I had the name book to hand because the children are choosing a
name for our new cat. Milo is the favourite but we are open to
suggestions. He is a ginger tom with a sister called Motlie].



Yde de Jong wrote:
> Dear Roger,
>
> One puzzling thing for me to be explained in more detail is the following:
>
> Extending the example of Richard:
>
> 1. Aus Smith 1995
> 2. Xea Jones 2000
> 3. Aus bus Smith 1995
> 4. Xea bus (Smith 1995) Jones 2000
> 5. Xea ba (Smith 1995) Jones 2000
> 6. Xea bus (Smith 1995) Jones 2000 as it appears in Pyle 2005
>    = Xea ba (Smith 1995) Jones 2000
>
> I agree with Richard that it is ambiguous whether nos 4 & 5 should get
> TaxonName
> GUIDs or TaxonConcept GUIDs, but I believe this is a matter of
> defintion we can solve.
>
> However, how to discriminate between the TaxonConcept of no. 4 and the
> TaxonConcept of no. 6 which includes subjective synonymy? I assume you
> need a GUID for each documentable usage instance?
>
> Kind regards,
>
> Yde
>
>
> ------------------------------------------------------------------------
>
>
>> Hi Rich,
>>
>> So you define a NameUsage as:
>> "Any occurrence of a NameString as it appears or is explicitly implied
>> within some form of static documentation."
>> Let us explore this definition! Picking a volume almost at random  (I
>> like the cover) I chose Porley & Hodgett (2005)/ Mosses & Liverworts./
>>
>> http://www.tdwg.hyam.net/images/bryo_01.jpg
>>
>> and picking a page at random - in this case 136
>>
>> http://www.tdwg.hyam.net/images/bryo_02.jpg
>>
>> So we have a 'static' document and it is chuck full of NameStrings. I
>> have circled some of them.
>>
>> We have/ Ditrichum cornubicum/ (a red data book moss). It is
>> mentioned, with a picture on the previous page and at the top of this
>> page it is mentioned a few more times. Further down this page we
>> have/ Buxbaumia aphylla/ which is also mentioned twice. There is a
>> picture of it on the next page.
>>
>> So how many name usages do we have here? There seem to be loads.
>>
>>     * Does each mention of the name on the page count as a usage? -
>>       would seem to be a silly thing to do.
>>     * Does mentioning the name on different pages mean different
>>       usages? - would also be silly but we don't have anyway to judge
>>       (different pages within a journal or combined work for example?)
>>     * How about same page but different context? The picture may be
>>       of a different moss to the one that they mention in the text.
>>     * If a subspecies is mentioned does that count as a usage of the
>>       specific name (it has been used) and likewise a binomial
>>       implies a usage of the genus name.
>>     * There are around 1100 species mentioned in this publication.
>>       They are probably mentioned on average 3 times each (a guess)
>>       so that is 3300 new name usages. Plus they are all binomials or
>>       subspecific names so double that for the different usages or
>>       genera etc. So 6,600 name usages in this volume. I wonder how
>>       many publications like this come out a year globally?
>>
>> I really can't see how one would apply your definition. Perhaps if
>> you restricted it to taxonomic works but then you have to define a
>> taxonomic work and you are still limited to how it has to be stated
>> to act as a 'usage'. It certainly isn't clear to me.
>>
>> We can easily define what a TaxonConcept is because it implies*
>> intent*. If I want to create an object that I want you to refer to as
>> a definition of a taxon then I am creating a TaxonConcept and should
>> issue a GUID to make it easy for you to refer to it. If not then I
>> shouldn't bother. If I want to use the services of a nomenclator to
>> define the publication and typification of the name I am using then I
>> can use a TaxonName GUID within my definition - but I don't have to.
>> I can't see how that can be any simpler than that.
>>
>> Porley & Hodgetts (2005) have no intension whatsoever of 'committing'
>> nomenclatural acts or of defining any taxa that people will later
>> refer to. They are simple* referring* to existing concepts. Yet by
>> your definition they have created over 6k name usages that a diligent
>> publisher might issue GUIDs for.
>>
>> Have I completely misinterpreted you definition? If so could you
>> define it a little tighter? If you imply that the author has to have
>> meant to describe something then you are just creating the
>> TaxonConcept definition I am working with here. How else can you
>> subset all the times names appear in print?
>>
>> This is all great fun but we do need to nail it down and move on.
>>
>> All the best,
>> Roger
>> ------------------------------------------------------------------------
>>
>>> Hi Roger,
>>>
>>>> Could you attempt a concise definition of a
>>>> UsageInstance we can all agree on then :)
>>>
>>> Sure: Any occurrence of a NameString as it appears or is explicitly
>>> implied
>>> within some form of static documentation.
>>>
>>> "NameString" refers to a string of textual characters meant to
>>> represent a
>>> name of biological organisms.  This can be defined more restrictively to
>>> "ScientificNameString" (names that conform to one of a designated set of
>>> nomenclatural Codes), or more broadly to include vernacular NameStrings.
>>>
>>> "static documentation" can be defined broadly, to include publications,
>>> databases, and any other form of documented medium of human
>>> communication.
>>> The "static" part means that it must represent a snapshot in time.
>>> In the
>>> case of dynamic databases, this would require a "date stamp" for each
>>> UsageInstance -- either for an individual record, or for a snapshot
>>> of the
>>> entire dataset.
>>>
>>> The "explicitly implied" part addresses zoological-style nomenclatural
>>> listings along the lines of what Yde has sent to this list, where a
>>> genus is
>>> listed once as a header, and species epithets are enumerated below,
>>> explicitly implying a series of binomials, even if they are not actually
>>> printed on papaer as such.
>>>
>>> The point is, the definition is highly flexible, yet mostly unambiguous
>>> (assuming sufficient metadata for identifying a documentation instance).
>>>
>>> I think it only makes informatic sense to distinguish two "kinds" of
>>> GUID
>>> for taxonomic objects if the distinction between the objects is
>>> unambiguous.
>>> Given these NameStrings:
>>>
>>> 1. Aus Smith 1995
>>> 2. Xea Jones 2000
>>> 3. Aus bus Smith 1995
>>> 4. Xea bus (Smith 1995) Jones 2000
>>> 5. Xea ba (Smith 1995) Jones 2000
>>>
>>> It is ambiguous whether there are three, four, or five distinct
>>> NameObjects
>>> represented (i.e., it is ambiguous whether #s 4 & 5 should get TaxonName
>>> GUIDs, or TaxonConcept GUIDs via SEC instances).
>>>
>>> However, given this list:
>>>
>>> 1. "Aus" as it appears in Smith 1995
>>> 2. "Xea" as it appears in Jones 2000
>>> 3. "Aus bus" as it appears in Smith 1995
>>> 4. "Xea bus" as it appears in Jones 2000
>>> 5. "Xea" as it appears in Pyle 2005
>>> 6. "Xea ba" as it appears in Pyle 2005
>>> 7. "Xea" as it appears in ITIS Nov. 11, 2005 snapshot dataset
>>> 8. "Xea bus" as it appears in ITIS Nov. 11, 2005 snapshot dataset
>>> 9. "Xea ba" as it appears in ITIS Nov. 11, 2005 snapshot dataset
>>>
>>> There is very little ambiguity that each item in this list gets its own
>>> GUID.  Until a universal definition of a "NameObject" emerges,
>>> certain usage
>>> instances can serve as surrogates for basionyms (1, 2, 3), or
>>> botanical new
>>> combinations (4), or TaxonConcepts (1-6) -- in whatever way that a data
>>> manager needs or wishes to establish linkages among GUIDs (e.g.,
>>> linking #s
>>> 4, 6, 8 & 9 to #3 via "is basionym of"; or linking #s 4, 6, 8 & 9 to
>>> #2 via
>>> "is combined with"; or linking #5 to #6, #2 to #4, and #1 to #3 via
>>> "contains").
>>>
>>> Several people have expressed a desire for "simple" and "flexible",
>>> and I
>>> think this approach maximizes both.
>>>
>>> Aloha,
>>> Rich

--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger at tdwg.org
 +44 1578 722782
-------------------------------------


--------------010606040708050101000904
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Yde,<br>
<h3>TaxonNames</h3>
Yes it is ambiguous as to whether a nomenclator may wish to issue a
TaxonName GUID for 4 and/or 5. What I would do if I were a nomenclator
is issue a TaxonName GUID for both. The GUID of the wrongly spelled one
would return an object which included the GUID of the correctly spelled
one. This is just what I would do not what a nomenclator may choose to
do.<br>
<h3>TaxonConcepts</h3>
TaxonConcepts depend on intention. From just a list of words it is
usually impossible to say whether they represent something that should
have a TaxonConcept GUID or not. <br>
<br>
If this list was entitled "<i>Moths I have caught in my moth trap</i>"
I would argue strongly that you should not treat them as concepts. They
are an attempt by some one to say which taxa they have found not an
attempt to re-define taxa. <b>The authors want to reference existing
taxa </b>not give objects an identity. The items on the list may get
GUIDs from some recording scheme system though.<br>
<br>
If the exact same list was entitled "<i>A treatment of&nbsp; Zus&nbsp; from Far
Away Land</i>" then it seems to me that they are all meant to be
concepts (possibly with bad nomenclature). <b>The authors want to
tag/label these taxa</b> so that other people can reference them when
they go bird watching in Far Away Land - or whatever animal a <i>Zus </i>is...<br>
<h3>An Analogy</h3>
I have in front of me a book called <i>Everyman's Dictionary of First
Names</i>.&nbsp; Here are some names from the book.<br>
<ul>
  <li>Milli</li>
  <li>Mills</li>
  <li>Milly - see Milli<br>
  </li>
  <li>Milo<br>
  </li>
  <li>Milson</li>
  <li>Milton<br>
  </li>
</ul>
The question: "Should we issue National Insurance numbers for these
people?" is not a good one. They are not people they are just names!
But we only know that because I told you where I got them from. The
people who compiled the book probably had a database with IDs on. They
could try and set up a global system for names with GUIDs. This system
would clearly be completely separate from a global system for
identifying people by number for tax purposes although the tax system
may refer to the name system and a credit reference agency might
provide a service for getting an NI number from a person's name plus
some other disambiguation data.<br>
<br>
So we can argue about the correct spelling of Milly and the register
can have its own pointers to 'correct' spelling but Milly Smith still
gets a tax bill because when she was born they gave her a number.<br>
<br>
(Incidentally I believe Denmark actual has this system. It has a
national list of acceptable names for children and it has a system of
issuing ID numbers to everyone at birth. The UK just gets confused with
NI numbers and NHS numbers etc)<br>
<br>
Hope this helps,<br>
<br>
All the best,<br>
<br>
Roger<br>
<br>
[BTW I had the name book to hand because the children are choosing a
name for our new cat. Milo is the favourite but we are open to
suggestions. He is a ginger tom with a sister called Motlie].<br>
<br>
<br>
<br>
Yde de Jong wrote:
<blockquote cite="midp06010200bf9e52413ead@%5B192.168.1.33%5D"
 type="cite">
  <style type="text/css"><!--
blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 }
 --></style>
  <title>Re: GUIDs for Taxon Names and Taxon
Concepts</title>
  <div>Dear Roger,</div>
  <div><br>
  </div>
  <div>One puzzling thing for me to be explained in more detail is the
following:</div>
  <div><br>
  </div>
  <div>Extending the example of Richard:</div>
  <div><br>
  </div>
  <div>1. Aus Smith 1995<br>
2. Xea Jones 2000<br>
3. Aus bus Smith 1995</div>
  <div>4. Xea bus (Smith 1995) Jones 2000</div>
  <div>5. Xea ba (Smith 1995) Jones 2000</div>
  <div>6. Xea bus (Smith 1995) Jones 2000 as it appears in Pyle
2005</div>
  <div>&nbsp;&nbsp; = Xea ba (Smith 1995) Jones 2000</div>
  <div><br>
  </div>
  <div>I agree with Richard that it is ambiguous whether nos 4 &amp; 5
should get TaxonName</div>
  <div>GUIDs or TaxonConcept GUIDs, but I believe this is a matter of
defintion we can solve.</div>
  <div><br>
  </div>
  <div>However, how to discriminate between the TaxonConcept of no. 4
and the TaxonConcept of no. 6 which includes subjective synonymy? I
assume you need a GUID for each documentable usage instance?</div>
  <div><br>
  </div>
  <div>Kind regards,</div>
  <div><br>
  </div>
  <div>Yde</div>
  <div><br>
  </div>
  <div><br>
  </div>
  <hr>
  <div><br>
  </div>
  <div><br>
  </div>
  <blockquote type="cite" cite="">Hi Rich,<br>
    <br>
So you define a NameUsage as:</blockquote>
  <blockquote type="cite" cite=""><tt>"Any occurrence of a NameString
as it appears or is explicitly implied<br>
within some form of static documentation."</tt></blockquote>
  <blockquote type="cite" cite="">Let us explore this definition!
Picking a
volume almost at random&nbsp; (I like the cover) I chose Porley &amp;
Hodgett (2005)<i> Mosses &amp; Liverworts.</i><br>
    <br>
    <a href="http://www.tdwg.hyam.net/images/bryo_01.jpg">http://www.tdwg.hyam.net/images/bryo_01.jpg</a><br>
    <br>
and picking a page at random - in this case 136<br>
    <br>
    <a href="http://www.tdwg.hyam.net/images/bryo_02.jpg">http://www.tdwg.hyam.net/images/bryo_02.jpg</a><br>
    <br>
So we have a 'static' document and it is chuck full of NameStrings. I
have circled some of them.<br>
    <br>
We have<i> Ditrichum cornubicum</i> (a red data book moss). It is
mentioned, with a picture on the previous page and at the top of this
page it is mentioned a few more times. Further down this page we
have<i> Buxbaumia aphylla</i> which is also mentioned twice. There is
a picture of it on the next page.<br>
    <br>
So how many name usages do we have here? There seem to be loads.<br>
  </blockquote>
  <blockquote type="cite" cite="">
    <ul>
      <li>Does each mention of the name on the page count as a usage? -
would seem to be a silly thing to do.
      </li>
      <li>Does mentioning the name on different pages mean different
usages?
- would also be silly but we don't have anyway to judge (different
pages within a journal or combined work for example?)
      </li>
      <li>How about same page but different context? The picture may be
of a
different moss to the one that they mention in the text.
      </li>
      <li>If a subspecies is mentioned does that count as a usage of
the
specific name (it has been used) and likewise a binomial implies a
usage of the genus name.
      </li>
      <li>There are around 1100 species mentioned in this publication.
They
are probably mentioned on average 3 times each (a guess) so that is
3300 new name usages. Plus they are all binomials or subspecific names
so double that for the different usages or genera etc. So 6,600 name
usages in this volume. I wonder how many publications like this come
out a year globally?</li>
    </ul>
  </blockquote>
  <blockquote type="cite" cite="">I really can't see how one would
apply
your definition. Perhaps if you restricted it to taxonomic works but
then you have to define a taxonomic work and you are still limited to
how it has to be stated to act as a 'usage'. It certainly isn't clear
to me.<br>
    <br>
We can easily define what a TaxonConcept is because it implies<b>
intent</b>. If I want to create an object that I want you to refer to
as a definition of a taxon then I am creating a TaxonConcept and
should issue a GUID to make it easy for you to refer to it. If not
then I shouldn't bother. If I want to use the services of a
nomenclator to define the publication and typification of the name I
am using then I can use a TaxonName GUID within my definition - but I
don't have to.&nbsp; I can't see how that can be any simpler than
that.<br>
    <br>
Porley &amp; Hodgetts (2005) have no intension whatsoever of
'committing' nomenclatural acts or of defining any taxa that people
will later refer to. They are simple<b> referring</b> to existing
concepts. Yet by your definition they have created over 6k name usages
that a diligent publisher might issue GUIDs for.<br>
    <br>
Have I completely misinterpreted you definition? If so could you
define it a little tighter? If you imply that the author has to have
meant to describe something then you are just creating the
TaxonConcept definition I am working with here. How else can you
subset all the times names appear in print?</blockquote>
  <blockquote type="cite" cite=""><br>
This is all great fun but we do need to nail it down and move on.<br>
    <br>
All the best,<br>
  </blockquote>
  <blockquote type="cite" cite="">Roger<br>
  </blockquote>
  <blockquote type="cite" cite="">
    <hr></blockquote>
  <blockquote type="cite" cite=""><br>
    <blockquote type="cite" cite=""><tt>Hi Roger,</tt></blockquote>
    <blockquote type="cite" cite=""><tt><br>
      </tt>
      <blockquote type="cite" cite=""><tt>Could you attempt a concise
definition of a</tt></blockquote>
      <blockquote type="cite" cite=""><tt>UsageInstance we can all
agree on
then :)</tt></blockquote>
    </blockquote>
    <blockquote type="cite" cite=""><tt><br>
Sure: Any occurrence of a NameString as it appears or is explicitly
implied<br>
within some form of static documentation.<br>
      <br>
"NameString" refers to a string of textual characters meant
to represent a<br>
name of biological organisms.&nbsp; This can be defined more
restrictively to<br>
"ScientificNameString" (names that conform to one of a
designated set of<br>
nomenclatural Codes), or more broadly to include vernacular
NameStrings.<br>
      <br>
"static documentation" can be defined broadly, to include
publications,<br>
databases, and any other form of documented medium of human
communication.<br>
The "static" part means that it must represent a snapshot in
time. In the<br>
case of dynamic databases, this would require a "date stamp"
for each<br>
UsageInstance -- either for an individual record, or for a snapshot of
the<br>
entire dataset.<br>
      <br>
The "explicitly implied" part addresses zoological-style
nomenclatural<br>
listings along the lines of what Yde has sent to this list, where a
genus is<br>
listed once as a header, and species epithets are enumerated
below,<br>
explicitly implying a series of binomials, even if they are not
actually<br>
printed on papaer as such.<br>
      <br>
The point is, the definition is highly flexible, yet mostly
unambiguous<br>
(assuming sufficient metadata for identifying a documentation
instance).<br>
      <br>
I think it only makes informatic sense to distinguish two "kinds"
of GUID<br>
for taxonomic objects if the distinction between the objects is
unambiguous.<br>
Given these NameStrings:<br>
      <br>
1. Aus Smith 1995<br>
2. Xea Jones 2000<br>
3. Aus bus Smith 1995<br>
4. Xea bus (Smith 1995) Jones 2000<br>
5. Xea ba (Smith 1995) Jones 2000<br>
      <br>
It is ambiguous whether there are three, four, or five distinct
NameObjects<br>
represented (i.e., it is ambiguous whether #s 4 &amp; 5 should get
TaxonName<br>
GUIDs, or TaxonConcept GUIDs via SEC instances).<br>
      <br>
However, given this list:<br>
      <br>
1. "Aus" as it appears in Smith 1995<br>
2. "Xea" as it appears in Jones 2000<br>
3. "Aus bus" as it appears in Smith 1995<br>
4. "Xea bus" as it appears in Jones 2000<br>
5. "Xea" as it appears in Pyle 2005<br>
6. "Xea ba" as it appears in Pyle 2005<br>
7. "Xea" as it appears in ITIS Nov. 11, 2005 snapshot
dataset<br>
8. "Xea bus" as it appears in ITIS Nov. 11, 2005 snapshot
dataset<br>
9. "Xea ba" as it appears in ITIS Nov. 11, 2005 snapshot
dataset<br>
      <br>
There is very little ambiguity that each item in this list gets its
own<br>
GUID.&nbsp; Until a universal definition of a "NameObject"
emerges, certain usage<br>
instances can serve as surrogates for basionyms (1, 2, 3), or
botanical new<br>
combinations (4), or TaxonConcepts (1-6) -- in whatever way that a
data<br>
manager needs or wishes to establish linkages among GUIDs (e.g.,
linking #s<br>
4, 6, 8 &amp; 9 to #3 via "is basionym of"; or linking #s 4,
6, 8 &amp; 9 to #2 via<br>
"is combined with"; or linking #5 to #6, #2 to #4, and #1 to
#3 via<br>
"contains").<br>
      <br>
Several people have expressed a desire for "simple" and
"flexible", and I<br>
think this approach maximizes both.<br>
      <br>
Aloha,</tt></blockquote>
    <blockquote type="cite" cite=""><tt>Rich</tt></blockquote>
  </blockquote>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>
 <a class="moz-txt-link-abbreviated" href="mailto:roger at tdwg.org">roger at tdwg.org</a>
 +44 1578 722782
-------------------------------------
</pre>
</body>
</html>


More information about the tdwg-tag mailing list