Hi Tony,
I see what you are getting at at.
I think this has happened has several reasons, below is a partial list:
1) Human nature
2) The way these projects are funded
3) Many of these projects specialize in, or have subsets of all taxa - neither NCBI or ITIS have all the North American Insects. Not even some common North American mosquitoes. So you need to make your own ID's for these.
4) Data use restrictions which forces people to create and curate their own lists. It is much easier to reconcile between the TaxonConcept.org and EUNIS lists because they are both available as linked data with RDF dump files.
5) Too many different API's and formats including download formats. You should be able to go to one of these sites and get an unencumbered list of what names they have for North American mosquitoes etc.
6) For several of these projects it is not clear to me how one would add a species to their list, unlike Wikipedia / Wikispecies. It would seem to me that a simple contribute form page which would allow a species to be added to a queue for possible incorporation is strangely missing on most of these. I would not have needed to do what I have done if I could easily add a name to ITIS, get back an ID that stayed the same if the species was moved to a different genus and sent me an email if a name I submitted changed or was somehow "flagged".
7) The fundamentally flawed co-mingling of the idea of a name being a unique stable species identifier and a phylogenetic hypothesis for that species. * Ochlerotatus triseriatus* = *Aedes triseriatus*, *Felis concolor* = *Puma concolor. *Or your whale example.
8) To many different standards on how a name should be formated, and the fact that you need to know what "kind" of thing something is before you can format it correctly. Resistance by some to a solution for how to properly format a name when you don't know what kind of thing it is.
Respectfully,
- Pete
On Sat, Jan 8, 2011 at 2:52 AM, Tony.Rees@csiro.au wrote:
Dear Pete,
I don't think you have really addressed the question I was attempting to ask - so I will try again...
Let's take as an example the sperm whale, Physeter macrocephalus, syn. Physeter catodon (or vice versa in some sources e.g. Mammal Species of the World 3rd edition).
P. catodon Linnaeus 1758 is given the taxonomic serial number (TSN) of 180489 in ITIS, while P. macrocephalus has the TSN 180488. These usages are then picked up by Cat. of Life, however rather than re-using the ITIS TSNs, they are allocated the LSIDs ??? (for P. catodon) and urn:lsid:catalogueoflife.org:taxon:415df5cc-52c2-102c-b3cd-957176fb88b9:col20101221 for P. macrocephalus. (Also my understanding is that these change every year with a new release of CoL). Meanwhile over at uBio P. catodon has the LSID urn:lsid:ubio.org:namebank:105910 while P. macrocephalus has the LSID urn:lsid:ubio.org:namebank:111731 . Of course being both Linnaean taxa, these also have ZooBank LSIDs i.e. P. catodon is urn:lsid:zoobank.org:act:046FA756-3A20-454E-8351-12EDE16574B4 while P. macrocephalus is urn:lsid:zoobank.org:act:A2F39087-C7A1-476F-88F6-B7C7B61D86AB . Meanwhile over in AFD we find the LSID urn:lsid:biodiversity.org.au:afd.taxon:587e6872-512b-402e-9c5e-f098c6495275 for P. macrocephalus (not sure about catodon); in ION P. catodon has the LSID urn:lsid:organismnames.com:name:553123 while P. macrocephalus has urn:lsid:organismnames.com:name:553124 and so on we go (GenBank IDs, WoRMS IDs, Fauna Europaea IDs, etc. etc.).
Now if you look in GNI (which indexes namestrings) you will find the variants as follows:
Physeter catodon (Linnaeus, 1758) Physeter catodon L. Physeter Catodon Linnaeus 1758 Physeter catodon Linnaeus, 1758 Physeter catodon Linnaeus, 1766
(and similar for P. macrocephalus)
each of which no doubt also has its own unique ID somewhere behind the scenes as well, all presumably awaiting reconciliation.
My question is simply how this apparently unregulated minting of LSIDs and other unique identifiers is contributing to a solution rather than becoming a new problem requiring additional resources to reconcile (bearing in mind that we do not even have a reliable list of all named taxa at this time).
I am sure there is an answer somewhere, it's just that I cannot see it as yet :) - maybe someone will enlighten me however...
Regards - Tony
From: Peter DeVries [pete.devries@gmail.com] Sent: Saturday, 8 January 2011 9:00 AM To: Rees, Tony (CMAR, Hobart) Cc: jsachs@csee.umbc.edu; tdwg-content@lists.tdwg.org; pmurray@anbg.gov.au; pleary@mbl.edu; dpatterson@eol.org; dmozzherin; Nathan Wilson Subject: Re: [tdwg-content] most GUIDs/URIs for names/taxon stuff not ready for prime time
Hi Tony,
That is why I think everyone should get behind the GlobalName Index which is a EoL.org / GBIF.org project. It includes the names from ITIS et al.
The work I am doing with Dima is still experimental and in development, but it demonstrates how two independent databases can autogenerate a shared URI.
That idea, in itself, is interesting even if you don't like UUID's etc. or the particular way the RDF is implemented now.
I find ITIS very valuable, but it has a different ID's for the different name's for what would many would consider the same concept.
So if a given species name changes from Aedes triseriatus to Ochlerotatus triseriatus a new ID is generated.
This is different than how NCBI does it, but ITIS has more names.
Also NCBI does not tell you anything about what is or is not an instance of a given species.
Since I think ITIS and NCBI are useful resources I link to it when I find an appropriate ID to match to. You can see this in my RDF.
I would encourage ITIS to continue and think about exposing at least some of the data as RDF using CoolURI's.
http://www.w3.org/TR/cooluris/
http://www.w3.org/TR/cooluris/http://www.w3.org/Provider/Style/URI (i.e. do the best you can :-)
There are LOD compliant URI's for the NCBI ID's via bio2rdf and uniprot.
One of the major advantages of the Linked Open Data approach is that there does not have to be one central place for everything.
Data sets can be distributed and each group can focus on it's core competencies.
Even things like species concepts could be distributed, but I think it would be best to first get a common understanding of how they will work.
Or at least a couple different "kinds" of standard species concepts.
I see several kinds of species-like resources out there now, some are name-based (ITIS), others are more like concepts (NCBI). Some entail a particular classification (NCBI, CoL, etc.). Others coin a species concept to which various classifications are associated (TaxonConcept.org)
We are at the start of trying to untangle this mess and a good place to start is one resource that contains all the name uses.
Besides is there any one else willing to take on the responsibility to collect and curate the 400 name variants that can exist for one species?
From this we can begin to connect those names to each other and as well as related data sets like publications and occurrences etc.
I think it is good to have a diversity of projects even if there is some overlap. Each group adds some interesting ideas and perspective.
Respectfully,
- Pete
P.S. Another thing we need is a shared set of URI's for attribution so that they can be easily and efficiently incorporated and tracked.
e.g. dataprovidedBy http://some.shared.org/providers#ITIS
A simple URI rather than a huge glob of text and images for each little thing.
Perhaps using the void vocabulary http://vocab.deri.ie/void/guide
On Fri, Jan 7, 2011 at 2:11 PM, Tony.Rees@csiro.au wrote: Dear all,
From where I sit (very much on the sideline of this debate, waiting to see what happens), the main trouble I see is that (1) anyone and his dog can mint yet another unique identifier for the same taxon name, leading to uncontrolled proliferation and never ending ID reconciliation issues, and (2) there are always some names not on any particular external "identifier assigning" list which therefore lack an identifier (however have a scientific name) just when you want one. No problem, just mint your own, however that feeds back into (1) again...
Just curious - ITIS TSNs would have to be one of the longest established and promoted systems of "non-name" identifiers for taxon names - have they been successful in anyone's view, or if not, why not...
Any comments appreciated.
Regards - Tony
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Basehttp://www.taxonconcept.org/ / GeoSpecies Knowledge Basehttp://lod.geospecies.org/ About the GeoSpecies Knowledge Basehttp://about.geospecies.org/