GUIDs for Taxon Names and Taxon Concepts

Vince Smith vsmith at INHS.UIUC.EDU
Fri Nov 11 09:57:31 CET 2005


Dear Roger,

Many thanks for developing the taxon concept wiki. You might want to
take a look at Nadia Anwa's work at the Univ. of Glasgow. She has
developed a taxon name warehouse that attempts to automate the
mapping of related taxonomic names to build potential taxon concepts
(http://spira.zoology.gla.ac.uk/ - be patient with the server, it is
a little slow). Each search generates a map showing these
relationships. In this scenario GUID's would be associated with the
component names and the map (or rather the potential taxon concept
derived from the map). Nadia's approach is perhaps less refined than
what some people envisage but is a first step toward automation.

Regards,

Vince

____________________________________________________

Dr. Vincent S. Smith
Illinois Natural History Survey
607 East Peabody Drive
Champaign, IL  61820-6970
USA
Tel: +1 (217) 265-0831
Fax: +1 (217) 333-4949

E-mail: vsmith at inhs.uiuc.edu
Web: http://darwin.zoology.gla.ac.uk/~vsmith/
iChat Video Conferencing: vsmithuk at mac.com (invitation only)
Skype: vsmithuk; or try SkypeIn (call whichever is nearer you)
SkypeIn London: 020-755-88950; or Chicago: (312)-239-0950
____________________________________________________

On 11 Nov 2005, at 07:26, Roger Hyam wrote:

> Hi Rich,
>
> I think you are wrong in your conclusions that we do not need GUIDs
> for TaxonNames and TaxonObjects but I need to do diagrams to
> illustrate the case so have created a wiki page here:
>
> http://wiki.gbif.org/guidwiki/wikka.php?wakka=SeparateNamesAndConcepts
>
> I also intersperse some comments below.
>
> Richard Pyle wrote:
>> Hi Roger,
>>
>>
>>> Rich - yes. I think you sum up the difference between new
>>> combinations
>>> in ICZN and ICBN well. But... just because the ICZN does not
>>> consider
>>> the usage of a name in a different genus as a nomenclatural act
>>> does not
>>> stop us creating a data object (TaxonName object) to represent
>>> what that
>>> name looks like in the new genus - perhaps with its new ending and
>>> possibly different author string (ICZN Recommendation 51G).
>>>
>> I agree that nothing is stopping us, but I forsee headaches down
>> the road as
>> a result of doing so.  In your next message, you wrote:
>>
>> "1. We have two kinds of GUID (one for TaxonNames and one for
>> TaxonConcepts)."
>>
>> >From this I interpret that TaxonName GUIDs are different with
>> TaxonConcept
>> GUIDs -- is that correct?  If so, the problem is that a botanist
>> would
>> assign a new TaxonName GUID to a new combination, and a zoologist
>> would
>> assign a TaxonConcept GUID to the same entity, because "genus
>> combination"
>> is a property of a name in botany, and a properyy of a usage
>> (~concept) in
>> zoology.
>>
>> Yes, you could certainly force-treat zoological names as though
>> they were
>> botanical names (treating new combinations as "new names"), just
>> as you
>> could easily force-treat botanical names as though they were
>> zoological
>> names (assigning TaxonName GUIDs only to basionyms, and
>> representing new
>> combincations via Usage/TaxonConcept GUIDs).  I just believe that
>> we will
>> come to regret it if we leave the distinction "fuzzy".
>>
>>
> I am not suggesting we leave it fuzzy I am suggesting we leave it
> up to the nomenclators and that it isn't a GUID issue. I don't
> leave the maintenance  of the brakes on my car fuzzy by not fixing
> them myself - I delegate it to a mechanic. Just because we are not
> solving the problem here does not mean that we are not going to get
> the problem solved.
>> My point has always been, and continues to be, that *IF* you have
>> separate
>> "kinds of GUID" for TaxonNames and TaxonConcepts, the line between
>> the two
>> should be unambigious (and ideally be consistent for both botany and
>> zoology).  After thinking about it some more, in the context of
>> what has
>> been written on this thread, I find myself coming back to that
>> first "IF".
>> Given that there seems to be a need and a desire to leave the
>> definition of
>> a "name unit" (to which a GUID is assigned) loose and flexible,
>> then perhaps
>> it would be premature to establish a GUID system for Name-Objects
>> at all.
>> Instead, I think we can both simplify *and* disambiguate taxonomic
>> objects
>> if there is only *ONE* GUID system -- which represents a Name-Usage
>> instance.
>>
> Then don't refer to the TaxonName GUIDs issued by nomenclators when
> you issue your TaxonConcepts. Just ignore the nomenclator bit. If
> you are correct then everyone will follow your lead. A two GUID
> system will just degrade to a one GUID system if the names thing is
> wrong.
>> In the case of datasets consisting of only namestrings, with no
>> specific
>> implied concept objects, the names can be interpreted as
>> "NameString SEC
>> Nobody" (=Nominal TaxonConcept in TCS). Nomenclators could use
>> whichever
>> subset of these Name-usage GUIDs that they wish to refer to their own
>> version of a name-object. For example, ZooBank can concern itself
>> only with
>> those GUIDs attached to original basionym usage instances, and
>> IPNI can
>> manage both basionyms usage instances and new-combination usage
>> instances.
>> ConceptBank could expand the scope of GUIDs to all those usage
>> instances
>> that represent defined concepts. Name indexers could use the
>> broadest set of
>> GUIDs (effectively all name-usage instances).
>>
> I think you are mixing issues here. The name string probably
> represents a TaxonConcept sec the dataset - depending on your
> intent. If you have had people scoring observations to a checklist
> for an area for a number of years then it would be convenient to
> treat the names as concepts so that you can reason about how they
> may be related to other better defined concepts. They are concepts
> sec your list. If they are just words you found in some books then
> they are just words  till you intend to do something with them.
>
> A nomenclator should not be publishing data other than
> nomenclatural data (debatable what this is I know but no
> circumscription data anyway).
>
> ConceptBank sounds like any indexer (like GBIF) that crawls the
> people who are defining concepts of taxa and tries to provide
> sensible query expansion on the basis of the GUIDs or however else
> it links the objects together.
>
>> So, in summary, my feeling is that if we are to think of "two
>> kinds of GUID"
>> for taxon objects, then we need an unambiguous (and cross-Code)
>> distinction
>> between the two.  If such a distinction is too cumbersome to draw
>> (as it
>> seems to be, based on the current thread), then we should only go
>> with one
>> kind of GUID -- and by default it should be the one kind that is most
>> flexible (=usage instance).
>>
>>
> See the wiki page for why I think this is an erroneous conclusion.
>
> Here is a definition of the two types of GUID:
> A TaxonName GUID resolves to a data object that only contains
> information about nomenclature. The provider does not intend anyone
> to be able to identify a specimen to this GUID.
> TaxonConcept GUIDs resolves to a data object that contains
> information about the delimitation/circumscription/relationships of
> a taxon. The provider intends people to be able to identify or
> otherwise related data to this GUID.
> I think the notion of the intension of the provider is very
> important as it answers many questions. "Should I attach a GUID to
> object X?" depends on what you want other people to do with it. I
> don't think this is very cumbersome definition? There is no
> 'natural' distinction between these two things. It is a pragmatic
> split so we produce a series of 'rooted' directed graphs of
>
> All the best,
>
> Roger
>
>
> --
>
> -------------------------------------
>  Roger Hyam
>  Technical Architect
>  Taxonomic Databases Working Group
> -------------------------------------
>  http://www.tdwg.org
>  roger at tdwg.org
>  +44 1578 722782
> -------------------------------------


--Apple-Mail-16-946112496
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
        charset=ISO-8859-1

<HTML><BODY style=3D"word-wrap: break-word; -khtml-nbsp-mode: space; =
-khtml-line-break: after-white-space; ">Dear Roger,<DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>Many thanks for developing =
the taxon concept wiki. You might want to take a look at Nadia Anwa's =
work at the Univ. of Glasgow. She has developed a taxon name warehouse =
that attempts to automate the mapping of=A0related taxonomic names to =
build potential taxon concepts=A0(<A =
href=3D"http://spira.zoology.gla.ac.uk">http://spira.zoology.gla.ac.uk</A>=
/ - be patient with the server, it is a little slow). Each =
search=A0generates a map showing these relationships. In this scenario =
GUID's would be associated with the component names and the map (or =
rather the potential taxon concept derived from the map). Nadia's =
approach is perhaps less refined than what some people envisage but is a =
first step toward automation.</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>Regards,</DIV><DIV><BR =
class=3D"khtml-block-placeholder"></DIV><DIV>Vince=A0</DIV><DIV>=A0 =A0 =
=A0<BR><DIV> <P style=3D"margin: 0.0px 0.0px 0.0px 0.0px"><FONT =
face=3D"Helvetica" size=3D"3" style=3D"font: 12.0px =
Helvetica">____________________________________________________</FONT></P>=
 <P style=3D"margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; =
min-height: 14.0px"><BR></P> <P style=3D"margin: 0.0px 0.0px 0.0px =
0.0px"><FONT face=3D"Helvetica" size=3D"3" style=3D"font: 12.0px =
Helvetica">Dr. Vincent S. Smith</FONT></P> <P style=3D"margin: 0.0px =
0.0px 0.0px 0.0px"><FONT face=3D"Helvetica" size=3D"3" style=3D"font: =
12.0px Helvetica">Illinois Natural History Survey</FONT></P> <P =
style=3D"margin: 0.0px 0.0px 0.0px 0.0px"><FONT face=3D"Helvetica" =
size=3D"3" style=3D"font: 12.0px Helvetica">607 East Peabody =
Drive</FONT></P> <P style=3D"margin: 0.0px 0.0px 0.0px 0.0px"><FONT =
face=3D"Helvetica" size=3D"3" style=3D"font: 12.0px =
Helvetica">Champaign, IL<SPAN class=3D"Apple-converted-space">=A0 =
</SPAN>61820-6970</FONT></P> <P style=3D"margin: 0.0px 0.0px 0.0px =
0.0px"><FONT face=3D"Helvetica" size=3D"3" style=3D"font: 12.0px =
Helvetica">USA</FONT></P> <P style=3D"margin: 0.0px 0.0px 0.0px =
0.0px"><FONT face=3D"Helvetica" size=3D"3" style=3D"font: 12.0px =
Helvetica">Tel: +1 (217) 265-0831</FONT></P> <P style=3D"margin: 0.0px =
0.0px 0.0px 0.0px"><FONT face=3D"Helvetica" size=3D"3" style=3D"font: =
12.0px Helvetica">Fax: +1 (217) 333-4949</FONT></P> <P style=3D"margin: =
0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: =
14.0px"><BR></P> <P style=3D"margin: 0.0px 0.0px 0.0px 0.0px"><FONT =
face=3D"Helvetica" size=3D"3" style=3D"font: 12.0px Helvetica">E-mail: =
<A =
href=3D"mailto:vsmith at inhs.uiuc.edu">vsmith at inhs.uiuc.edu</A></FONT></P> =
<P style=3D"margin: 0.0px 0.0px 0.0px 0.0px"><FONT face=3D"Helvetica" =
size=3D"3" style=3D"font: 12.0px Helvetica">Web: <A =
href=3D"http://darwin.zoology.gla.ac.uk/~vsmith/">http://darwin.zoology.gl=
a.ac.uk/~vsmith/</A></FONT></P> <P style=3D"margin: 0.0px 0.0px 0.0px =
0.0px"><FONT face=3D"Helvetica" size=3D"3" style=3D"font: 12.0px =
Helvetica">iChat Video Conferencing: <A =
href=3D"mailto:vsmithuk at mac.com">vsmithuk at mac.com</A> (invitation =
only)</FONT></P> <P style=3D"margin: 0.0px 0.0px 0.0px 0.0px"><FONT =
face=3D"Helvetica" size=3D"3" style=3D"font: 12.0px Helvetica">Skype: =
vsmithuk; or try SkypeIn (call whichever is nearer you)</FONT></P> <P =
style=3D"margin: 0.0px 0.0px 0.0px 0.0px"><FONT face=3D"Helvetica" =
size=3D"3" style=3D"font: 12.0px Helvetica">SkypeIn London: =
020-755-88950; or Chicago: (312)-239-0950</FONT></P> <P style=3D"margin: =
0.0px 0.0px 0.0px 0.0px"><FONT face=3D"Helvetica" size=3D"3" =
style=3D"font: 12.0px =
Helvetica">____________________________________________________ <SPAN =
class=3D"Apple-converted-space">=A0</SPAN></FONT></P>  =
</DIV><BR><DIV><DIV>On 11 Nov 2005, at 07:26, Roger Hyam wrote:</DIV><BR =
class=3D"Apple-interchange-newline"><BLOCKQUOTE type=3D"cite">  Hi =
Rich,<BR> <BR> I think you are wrong in your conclusions that we do not =
need GUIDs for TaxonNames and TaxonObjects but I need to do diagrams to =
illustrate the case so have created a wiki page here:<BR> <BR> <A =
class=3D"moz-txt-link-freetext" =
href=3D"http://wiki.gbif.org/guidwiki/wikka.php?wakka=3DSeparateNamesAndCo=
ncepts">http://wiki.gbif.org/guidwiki/wikka.php?wakka=3DSeparateNamesAndCo=
ncepts</A><BR> <BR> I also intersperse some comments below.<BR> <BR> =
Richard Pyle wrote: <BLOCKQUOTE =
cite=3D"midIMEKKFHEGHHDDDHKIOJEEEGGDHAA.deepreef at bishopmuseum.org" =
type=3D"cite">  <PRE wrap=3D"">Hi Roger,

  </PRE>  <BLOCKQUOTE type=3D"cite">    <PRE wrap=3D"">Rich - yes. I =
think you sum up the difference between new combinations
in ICZN and ICBN well. But... just because the ICZN does not consider
the usage of a name in a different genus as a nomenclatural act does not
stop us creating a data object (TaxonName object) to represent what that
name looks like in the new genus - perhaps with its new ending and
possibly different author string (ICZN Recommendation 51G).
    </PRE>  </BLOCKQUOTE>  <PRE wrap=3D"">I agree that nothing is =
stopping us, but I forsee headaches down the road as
a result of doing so.  In your next message, you wrote:

"1. We have two kinds of GUID (one for TaxonNames and one for
TaxonConcepts)."

>=46rom this I interpret that TaxonName GUIDs are different with =
TaxonConcept
GUIDs -- is that correct?  If so, the problem is that a botanist would
assign a new TaxonName GUID to a new combination, and a zoologist would
assign a TaxonConcept GUID to the same entity, because "genus =
combination"
is a property of a name in botany, and a properyy of a usage (~concept) =
in
zoology.

Yes, you could certainly force-treat zoological names as though they =
were
botanical names (treating new combinations as "new names"), just as you
could easily force-treat botanical names as though they were zoological
names (assigning TaxonName GUIDs only to basionyms, and representing new
combincations via Usage/TaxonConcept GUIDs).  I just believe that we =
will
come to regret it if we leave the distinction "fuzzy".

  </PRE> </BLOCKQUOTE> I am not suggesting we leave it fuzzy I am =
suggesting we leave it up to the nomenclators and that it isn't a GUID =
issue. I don't leave the maintenance=A0 of the brakes on my car fuzzy by =
not fixing them myself - I delegate it to a mechanic. Just because we =
are not solving the problem here does not mean that we are not going to =
get the problem solved. <BLOCKQUOTE =
cite=3D"midIMEKKFHEGHHDDDHKIOJEEEGGDHAA.deepreef at bishopmuseum.org" =
type=3D"cite">  <PRE wrap=3D"">My point has always been, and continues =
to be, that *IF* you have separate
"kinds of GUID" for TaxonNames and TaxonConcepts, the line between the =
two
should be unambigious (and ideally be consistent for both botany and
zoology).  After thinking about it some more, in the context of what has
been written on this thread, I find myself coming back to that first =
"IF".
Given that there seems to be a need and a desire to leave the definition =
of
a "name unit" (to which a GUID is assigned) loose and flexible, then =
perhaps
it would be premature to establish a GUID system for Name-Objects at =
all.
Instead, I think we can both simplify *and* disambiguate taxonomic =
objects
if there is only *ONE* GUID system -- which represents a Name-Usage
instance.
  </PRE> </BLOCKQUOTE> Then don't refer to the TaxonName GUIDs issued by =
nomenclators when you issue your TaxonConcepts. Just ignore the =
nomenclator bit. If you are correct then everyone will follow your lead. =
A two GUID system will just degrade to a one GUID system if the names =
thing is wrong. <BLOCKQUOTE =
cite=3D"midIMEKKFHEGHHDDDHKIOJEEEGGDHAA.deepreef at bishopmuseum.org" =
type=3D"cite">  <PRE wrap=3D"">In the case of datasets consisting of =
only namestrings, with no specific
implied concept objects, the names can be interpreted as "NameString SEC
Nobody" (=3DNominal TaxonConcept in TCS). Nomenclators could use =
whichever
subset of these Name-usage GUIDs that they wish to refer to their own
version of a name-object. For example, ZooBank can concern itself only =
with
those GUIDs attached to original basionym usage instances, and IPNI can
manage both basionyms usage instances and new-combination usage =
instances.
ConceptBank could expand the scope of GUIDs to all those usage instances
that represent defined concepts. Name indexers could use the broadest =
set of
GUIDs (effectively all name-usage instances).
  </PRE> </BLOCKQUOTE> I think you are mixing issues here. The name =
string probably represents a TaxonConcept sec the dataset - depending on =
your intent. If you have had people scoring observations to a checklist =
for an area for a number of years then it would be convenient to treat =
the names as concepts so that you can reason about how they may be =
related to other better defined concepts. They are concepts sec your =
list. If they are just words you found in some books then they are just =
words=A0 till you intend to do something with them.<BR> <BR> A =
nomenclator should not be publishing data other than nomenclatural data =
(debatable what this is I know but no circumscription data anyway).<BR> =
<BR> ConceptBank sounds like any indexer (like GBIF) that crawls the =
people who are defining concepts of taxa and tries to provide sensible =
query expansion on the basis of the GUIDs or however else it links the =
objects together.<BR> <BR> <BLOCKQUOTE =
cite=3D"midIMEKKFHEGHHDDDHKIOJEEEGGDHAA.deepreef at bishopmuseum.org" =
type=3D"cite">  <PRE wrap=3D"">So, in summary, my feeling is that if we =
are to think of "two kinds of GUID"
for taxon objects, then we need an unambiguous (and cross-Code) =
distinction
between the two.  If such a distinction is too cumbersome to draw (as it
seems to be, based on the current thread), then we should only go with =
one
kind of GUID -- and by default it should be the one kind that is most
flexible (=3Dusage instance).

  </PRE> </BLOCKQUOTE> See the wiki page for why I think this is an =
erroneous conclusion.<BR> <BR> <FONT color=3D"#006600"><FONT =
color=3D"#000000">Here is a definition of the two types of GUID:<BR> =
</FONT></FONT> <OL>  <LI>A TaxonName GUID resolves to a data object that =
only contains information about nomenclature. The provider does not =
intend anyone to be able to identify a specimen to this GUID.</LI>  =
<LI>TaxonConcept GUIDs resolves to a data object that contains =
information about the delimitation/circumscription/relationships of a =
taxon. The provider intends people to be able to identify or otherwise =
related data to this GUID.</LI> </OL> I think the notion of the =
intension of the provider is very important as it answers many =
questions. "Should I attach a GUID to object X?" depends on what you =
want other people to do with it. I don't think this is very cumbersome =
definition? There is no 'natural' distinction between these two things. =
It is a pragmatic split so we produce a series of 'rooted' directed =
graphs of <BR> <BR> All the best,<BR> <BR> Roger<BR> <BR> <BR> <PRE =
class=3D"moz-signature" cols=3D"72">--

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 <A class=3D"moz-txt-link-freetext" =
href=3D"http://www.tdwg.org">http://www.tdwg.org</A>
 <A class=3D"moz-txt-link-abbreviated" =
href=3D"mailto:roger at tdwg.org">roger at tdwg.org</A>
 +44 1578 722782
-------------------------------------
</PRE>  </BLOCKQUOTE></DIV><BR></DIV></BODY></HTML>=


More information about the tdwg-tag mailing list