[tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time

Peter DeVries pete.devries at gmail.com
Sun Jan 9 01:53:08 CET 2011

I think people are reading too much in to the gobalnames RDF experiment.

You need to look at what the GNI is - a store of the names that have been
used and where they are used.

Also, the ITIS numbers Steve mentions will clash with other numbers used by
the other providers. ITIS consists of a relatively small set of names.

See http://gni.globalnames.org/data_sources

<http://gni.globalnames.org/data_sources>* The TaxonConcept list number is a
bit out of date and was mainly to add names I had that were not in the list.

The ITIS record entails much more than a simple string of characters it has
a classification and much more.

The reason to use UUID's for somethings (maybe not this) is that they allow
you to generate a globally unique identifier, and there is support for them
built-in to every modern operating system even Windows. Just look at your

I can make one on my Mac via the command line command *uuidgen*

There is probably something similar built-in to Windows.

I have no formal role in the GNI I was just asked to look into seeing if a
triple/quadstore approach might help solve the name problem.

But it do think that having all the names used in one place is a good idea.


- Pete

On Sat, Jan 8, 2011 at 4:37 PM, Steve Baskauf
<steve.baskauf at vanderbilt.edu>wrote:

> I think that both Pete and Tony bring up good but different points.  Tony
> seems to be making a point about non-reuse of identifiers for taxon names,
> while Pete seems to be making points about the desirability of also having
> identifiers for concepts (which he might or might not intend to be the same
> thing as taxon name usages) and the need for identifiers that will work in
> the Linked Open Data world.
> As far as Tony's point is concerned, I would like to have an answer to the
> very direct question that Tony raised.  Why aren't ITIS TSNs, which are
> well-known and often-used unique identifiers within their system, being used
> as a part a Global Names Index GUID?  Since they are locally unique within
> ITIS, they could easily be made globally unique by concatenating them to
> http://gni.globalnames.org/ .  For example, if TSN=778049, the URI would
> be http://gni.globalnames.org/778049 .  That would be short, simple,
> globally unique, and a form that could easily be used with content
> negotiation and therefore be fine for use in the LOD world.  Databases that
> are already using ITIS TSNs could easily and reliably construct the URI.  I
> suppose one objection to this might be that ITIS TNS's imply TNUs and not
> simply names, but to my knowledge there is only a single name/author
> combination per TSN and so it could be used as part of a name identifier.
>  Am I correct about this?
> I do not believe that the suggestion of autogenerating URIs by creating
> UUIDs from the name strings is a good idea.  In the examples that Tony gave:
> Physeter catodon (Linnaeus, 1758)
> Physeter catodon L.
> Physeter Catodon Linnaeus 1758
> Physeter catodon Linnaeus, 1758
> Physeter catodon Linnaeus, 1766
> I'm assuming that they all (except maybe the last one) represent
> formatting/spelling differences of what is actually the same name.  In an
> earlier email, one of Pete's rationales for why a URI should be used instead
> of an actual name string was so that the various misspellings and
> differently formatted versions of the same name would be represented by a
> single URI.  Yet this system of generating the URI from a UUID through an
> algorithm based on the name string will allow someone who is writing
> software to generate the URI based on a misspelled name without requiring
> the software to check against a list of correct names/identifiers.  How is
> that better than just allowing people to enter misspelled names as string
> literals???  The only thing it accomplishes is replacing something
> intelligible like "Junonia coenia" with something unintelligible like
> 3a70f04d-fd29-5570-ba91-52dae0c3d07f
> I really don't understand this infatuation with UUIDs.  I would have
> thought that after the LSID debacle, this community would have learned
> something the negative effect on progress of promoting an unnecessarily
> complicated technical solution when one is not required to get the job done.
>  Imagine you are an herbarium curator who is somewhat behind the times
> technologically and a bit confused about GUIDs.   This curator has an Excel
> spreadsheet with TSN IDs in it.  Do we ask him to get one of his undergrad
> helpers to create a formula in an adjacent cell to the TSN that concatenates
> "http://gni.globalnames.org/tsn/" to the TSN to create the GUID that we
> tell him that he should be using?  Or do we ask him to download some special
> software that isn't quite ready to be used that will install a UUID
> generator on his computer and then spend an hour with him on the phone
> walking him through the installation which won't work because he'll have to
> install LUNIX, a Java virtual machine, a MySQL database, a proxy localhost
> http xyz server thingamajig that he has never heard of and doesn't
> understand but which seems so simple to the TDWG tech crew?  You can ask him
> to use
> http://gni.globalnames.org/tsn/778049
> or you can ask him to use
> http://gni.globalnames.org/name_strings/3a70f04d-fd29-5570-ba91-52dae0c3d07f
> Which one do you think is going to confuse him?  Which one can he type?
>  Which one can he cite in a paper and expect people to write down?  I
> complained about this in the draft of the Beginner's Guide to GUIDs which
> relied heavily on UUIDs in its examples.  Let's get real here.  We should be
> designing systems with the users in mind, not the programmers.
> Steve
> Tony.Rees at csiro.au wrote:
>> Dear Pete,
>> I don't think you have really addressed the question I was attempting to
>> ask - so I will try again...
>> Let's take as an example the sperm whale, Physeter macrocephalus, syn.
>> Physeter catodon (or vice versa in some sources e.g. Mammal Species of the
>> World 3rd edition).
>> P. catodon Linnaeus 1758 is given the taxonomic serial number (TSN) of
>> 180489 in ITIS, while P. macrocephalus has the TSN 180488. These usages are
>> then picked up by Cat. of Life, however rather than re-using the ITIS TSNs,
>> they are allocated the LSIDs ??? (for P. catodon) and
>> urn:lsid:catalogueoflife.org:taxon:415df5cc-52c2-102c-b3cd-957176fb88b9:col20101221
>> for P. macrocephalus. (Also my understanding is that these change every year
>> with a new release of CoL). Meanwhile over at uBio P. catodon has the LSID
>> urn:lsid:ubio.org:namebank:105910 while P. macrocephalus has the LSID
>> urn:lsid:ubio.org:namebank:111731 . Of course being both Linnaean taxa,
>> these also have ZooBank LSIDs i.e. P. catodon is urn:lsid:zoobank.org:act:046FA756-3A20-454E-8351-12EDE16574B4
>> while P. macrocephalus is urn:lsid:zoobank.org:act:A2F39087-C7A1-476F-88F6-B7C7B61D86AB
>> . Meanwhile over in AFD we find the LSID urn:lsid:biodiversity.org.au:afd.taxon:587e6872-512b-402e-9c5e-f098c6495275
>> for P. macroceph
>> a
>>  lus (not sure about catodon); in ION P. catodon has the LSID
>> urn:lsid:organismnames.com:name:553123 while P. macrocephalus has
>> urn:lsid:organismnames.com:name:553124 and so on we go (GenBank IDs,
>> WoRMS IDs, Fauna Europaea IDs, etc. etc.).
>> Now if you look in GNI (which indexes namestrings) you will find the
>> variants as follows:
>> Physeter catodon (Linnaeus, 1758)
>> Physeter catodon L.
>> Physeter Catodon Linnaeus 1758
>> Physeter catodon Linnaeus, 1758
>> Physeter catodon Linnaeus, 1766
>> (and similar for P. macrocephalus)
>> each of which no doubt also has its own unique ID somewhere behind the
>> scenes as well, all presumably awaiting reconciliation.
>> My question is simply how this apparently unregulated minting of LSIDs and
>> other unique identifiers is contributing to a solution rather than becoming
>> a new problem requiring additional resources to reconcile (bearing in mind
>> that we do not even have a reliable list of all named taxa at this time).
>> I am sure there is an answer somewhere, it's just that I cannot see it as
>> yet   :) - maybe someone will enlighten me however...
>> Regards - Tony
>> ________________________________
>> From: Peter DeVries [pete.devries at gmail.com]
>> Sent: Saturday, 8 January 2011 9:00 AM
>> To: Rees, Tony (CMAR, Hobart)
>> Cc: jsachs at csee.umbc.edu; tdwg-content at lists.tdwg.org;
>> pmurray at anbg.gov.au; pleary at mbl.edu; dpatterson at eol.org; dmozzherin;
>> Nathan Wilson
>> Subject: Re: [tdwg-content] most GUIDs/URIs for names/taxon stuff not
>> ready for prime time
>> Hi Tony,
>> That is why I think everyone should get behind the GlobalName Index which
>> is a EoL.org / GBIF.org project. It includes the names from ITIS et al.
>> The work I am doing with Dima is still experimental and in development,
>> but it demonstrates how two independent databases can autogenerate a shared
>> URI.
>> That idea, in itself, is interesting even if you don't like UUID's etc. or
>> the particular way the RDF is implemented now.
>> I find ITIS very valuable, but it has a different ID's for the different
>> name's for what would many would consider the same concept.
>> So if a given species name changes from Aedes triseriatus to Ochlerotatus
>> triseriatus a new ID is generated.
>> This is different than how NCBI does it, but ITIS has more names.
>> Also NCBI does not tell you anything about what is or is not an instance
>> of a given species.
>> Since I think ITIS and NCBI are useful resources I link to it when I find
>> an appropriate ID to match to. You can see this in my RDF.
>> I would encourage ITIS to continue and think about exposing at least some
>> of the data as RDF using CoolURI's.
>> http://www.w3.org/TR/cooluris/
>> <http://www.w3.org/TR/cooluris/>http://www.w3.org/Provider/Style/URI(i.e. do the best you can :-)
>> There are LOD compliant URI's for the NCBI ID's via bio2rdf and uniprot.
>> One of the major advantages of the Linked Open Data approach is that there
>> does not have to be one central place for everything.
>> Data sets can be distributed and each group can focus on it's core
>> competencies.
>> Even things like species concepts could be distributed, but I think it
>> would be best to first get a common understanding of how they will work.
>> Or at least a couple different "kinds" of standard species concepts.
>> I see several kinds of species-like resources out there now, some are
>> name-based (ITIS), others are more like concepts (NCBI). Some entail
>> a particular classification (NCBI, CoL, etc.). Others coin a species
>> concept to which various classifications are associated (TaxonConcept.org)
>> We are at the start of trying to untangle this mess and a good place to
>> start is one resource that contains all the name uses.
>> Besides is there any one else willing to take on the responsibility to
>> collect and curate the 400 name variants that can exist for one species?
>> >From this we can begin to connect those names to each other and as well
>> as related data sets like publications and occurrences etc.
>> I think it is good to have a diversity of projects even if there is some
>> overlap. Each group adds some interesting ideas and perspective.
>> Respectfully,
>> - Pete
>> P.S. Another thing we need is a shared set of URI's for attribution so
>> that they can be easily and efficiently incorporated and tracked.
>> e.g.  dataprovidedBy <http://some.shared.org/providers#ITIS>
>> A simple URI rather than a huge glob of text and images for each little
>> thing.
>> Perhaps using the void vocabulary http://vocab.deri.ie/void/guide
>> On Fri, Jan 7, 2011 at 2:11 PM, <Tony.Rees at csiro.au> wrote:
>> Dear all,
>> >From where I sit (very much on the sideline of this debate, waiting to
>> see what happens), the main trouble I see is that (1) anyone and his dog can
>> mint yet another unique identifier for the same taxon name, leading to
>> uncontrolled proliferation and never ending ID reconciliation issues, and
>> (2) there are always some names not on any particular external "identifier
>> assigning" list which therefore lack an identifier (however have a
>> scientific name) just when you want one. No problem, just mint your own,
>> however that feeds back into (1) again...
>> Just curious - ITIS TSNs would have to be one of the longest established
>> and promoted systems of "non-name" identifiers for taxon names - have they
>> been successful in anyone's view, or if not, why not...
>> Any comments appreciated.
>> Regards - Tony
>> --
>> ---------------------------------------------------------------
>> Pete DeVries
>> Department of Entomology
>> University of Wisconsin - Madison
>> 445 Russell Laboratories
>> 1630 Linden Drive
>> Madison, WI 53706
>> TaxonConcept Knowledge Base<http://www.taxonconcept.org/> / GeoSpecies
>> Knowledge Base<http://lod.geospecies.org/>
>> About the GeoSpecies Knowledge Base<http://about.geospecies.org/>
>> ------------------------------------------------------------
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> .
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu

Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110108/51a466c2/attachment.html 

More information about the tdwg-content mailing list