<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>Re: [tdwg-content] Producing a global taxon register (was: ITIS TSNID to uBio NamebankIDs mapping)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>I agree that it is important to have clarity of what the goal<BR>
of a project is<BR>
<BR>
* a HCAL - a hierarchical catalogue of all life - is a very popular<BR>
type of project; Catalogue of Life, ITIS, NCBI, Wikispecies, etc<BR>
all pursue this.<BR>
<BR>
* a GTR - global taxon register - is something else entirely, at least<BR>
if the term is taken literally. It would be indispensable if the purpose<BR>
"to index all usages of all names in all sources" is to be realized.<BR>
I don't know of any project that pursues this in a systematic way<BR>
(I suppose the French Wikipedia rates a mention, at least making some<BR>
attempt).<BR>
<BR>
and of course there are projects that focus on names, but at the moment<BR>
we still don't have something like a complete nomenclatural index<BR>
(inventorying all nomenclatural acts), and are just moving towards<BR>
lists of currently accepted names (closely connected to the HCAL).<BR>
For information on biodiversity the latter is only marginally relevant,<BR>
and the GNI is much less so.<BR>
<BR>
Names and taxa are quite different things and they are interconnected<BR>
in a complex way.<BR>
<BR>
Paul<BR>
<BR>
-----Oorspronkelijk bericht-----<BR>
Van: tdwg-content-bounces@lists.tdwg.org namens Tony.Rees@csiro.au<BR>
Verzonden: za 4-6-2011 1:04<BR>
Aan: deepreef@bishopmuseum.org; tdwg-content@lists.tdwg.org<BR>
Onderwerp: [tdwg-content] Producing a global taxon register (was: ITIS TSNID to uBio NamebankIDs mapping)<BR>
<BR>
Hi all (jumping in with some trepidation...)<BR>
<BR>
It's good to hear some ramp-up may be coming of activity in the GNUB space (congratulations, Rich et al.). My main concern, however is that it does not solve my particular problem - which is in a nutshell, given "any" cited taxonomic name, what can we tell about it - with regard to its classification, nomenclatural and taxonomic/synonym status, and certain attributes (initially for my use case, simple geologic time - is it extant or not - and simple habitat classification - is it marine or not - though of course infinitely expandable from there).<BR>
<BR>
To me the vision of GNUB is too grand - to index all usages of all names in all sources - and the vision of GNI is too limited - to index the names but not actually record/harmonise/verify/manage (in a structured way) any associated information. I'm after something in between - what I have tentatively previously called HCAL - a hierarchical catalogue of all life (presuming that at least one "management" hierarchy is incorporated) - or maybe just a GTR - global taxon register. Sort of, waiting for the Catalogue of Life and/or ITIS to be complete, for both extant and fossil taxa, and also incorporate selected "taxon attributes" as above. (This is the space into which my IRMNG database is cast as a preliminary/"working for now" solution, but obviously without the significant resourcing / community cooperation required to build and sustain the thing for the long term).<BR>
<BR>
So my question is, how can such a product emerge from ongoing developments in GN* space, or other...<BR>
<BR>
Over to the experts,<BR>
<BR>
Best - Tony<BR>
<BR>
________________________________________<BR>
From: tdwg-content-bounces@lists.tdwg.org [tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle [deepreef@bishopmuseum.org]<BR>
Sent: Saturday, 4 June 2011 8:48 AM<BR>
To: tdwg-content@lists.tdwg.org<BR>
Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping<BR>
<BR>
Working backwards through this thread...<BR>
<BR>
I hadn't read Dima's post until just now, and I see that at least a couple of his points (i.e., #2, #5, #6) apply to exposing the UUIDs externally. However, I think that a simple protocol (such as replacing spaces with "_", and avoiding characters that look the same but are different -- such as the Cyrillic 'a') could go a long way to mitigating those problems.<BR>
<BR>
On the other hand, it really depends on what the identifier is for. The string "Danaus_plexippus_(Linnaeus_1758)" may be more friendly to our eyes, but "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" is definitely more friendly to a computer (Dima's points 1, 3 & 4, among others). My feeling is that the push for GUIDs is more about enabling computer-computer conversations, than it is about enabling human-human or human-computer interactions; and therefore we should not get bogged down in the "ugliness" of the identifiers. In the context of electronic data services, the "ugliness" potential of the "Danaus_plexippus_(Linnaeus_1758)" approach to identifiers is far greater than the ugliness potential of "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523", when it comes to interlinking electronic biodiversity data. It is nothing for a computer to render relevant metadata of the object identified by "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" into "Danaus plexippus (Linnaeus_1758)" on a computer screen or piece of paper for human-eyeball consumption. But there are many pitfalls (some noted by Dima) for a computer to unambiguously resolve "Danaus_plexippus_(Linnaeus_1758)" back to a meaningful data object.<BR>
<BR>
I guess my revised point is: GNI (and uBio/NameBank) are essentially the only taxonomic databases out there where a human-friendly persistent/actionable identifier of the sort being discussed is even plausible as an option. It may not even be wise in this context (as per Dima's points), but it *might* be, depending on the need for a human-friendly identifier.<BR>
<BR>
Maybe the simplest thing to do would be to not regard "<A HREF="http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_1758">http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_1758</A>)" as an identifier per se, but rather as a protocol for a web service. In other words, if you append a text string to the root URL "<A HREF="http://gni.globalnames.org/name_strings/">http://gni.globalnames.org/name_strings/</A>", GNI would run that text string against its index and return whatever metadata based on a text-string match. This is not mutually exclusive with an "identifier" in the form of "<A HREF="http://gni.globalnames.org/name_strings/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">http://gni.globalnames.org/name_strings/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</A>", that would less ambiguously resolve a known record in GNI. At this point, the line between "identifier" and "service" gets fuzzy, of course. But the analogy is true in ZooBank:<BR>
<BR>
The persistent "Identifer" looks like this:<BR>
A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523<BR>
<BR>
One way that this identifier can be represented as an *actionable* identifier is this:<BR>
urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523<BR>
<BR>
Another "actionable" form of the identifier might be this:<BR>
<A HREF="http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</A><BR>
<BR>
or this:<BR>
<A HREF="http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</A><BR>
<BR>
or even this(?):<BR>
<A HREF="http://lsid.tdwg.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">http://lsid.tdwg.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</A><BR>
<BR>
(all of which work, by the way)<BR>
<BR>
However, the following are examples of what I would think of as *services*:<BR>
<A HREF="http://www.google.com/search?q=Danaus+plexippus+(Linnaeus+1758">http://www.google.com/search?q=Danaus+plexippus+(Linnaeus+1758</A>)<BR>
<A HREF="http://lsid.tdwg.org/summary/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523">http://lsid.tdwg.org/summary/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523</A><BR>
<A HREF="http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523&submit=Go">http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523&submit=Go</A><BR>
<BR>
But really, from the perspective of the end-user, does it matter if it's an identifier or a service? Ultimately, they ask the questions, and the answers appear on their computer screens.<BR>
<BR>
Aloha,<BR>
Rich<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
> -----Original Message-----<BR>
> From: tdwg-content-bounces@lists.tdwg.org [<A HREF="mailto:tdwg-content-">mailto:tdwg-content-</A><BR>
> bounces@lists.tdwg.org] On Behalf Of Dmitry Mozzherin<BR>
> Sent: Friday, June 03, 2011 4:34 AM<BR>
> To: David Remsen (GBIF)<BR>
> Cc: tdwg-content@lists.tdwg.org; Dmitry Mozzherin; Orrell, Thomas; Alan J<BR>
> Hampson; Nicolson, David; Gerald Guala<BR>
> Subject: Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping<BR>
><BR>
> In my opinion UUIDs have a few advantages over strings --<BR>
><BR>
> 1. It is uuid, so it will work with uuid tools (current and future ones)<BR>
> 2. It is less ambiguous -- For example -- what is the difference between Betul? and<BR>
> Betula for your eyes? (one of them has a Cyrillic 'a')<BR>
> 3. Database wise it is faster to search because it is just a 128bit number, while<BR>
> a name is at least 245 byte varchar -- it makes searching much faster because<BR>
> in relational databases the size of keys directly proportional to the search<BR>
> speed<BR>
> 4. UUID v. 5<BR>
> (<A HREF="http://en.wikipedia.org/wiki/Universally_unique_identifier">http://en.wikipedia.org/wiki/Universally_unique_identifier</A>)<BR>
> allows to generate UUID algorithmically without looking up a database (no<BR>
> need for network connection)<BR>
> 5. Links like <A HREF="http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_1758">http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_1758</A>) might be ambigous -- I can think of several ways I can represent name string<BR>
> part in the url and they will all resolve to the same thing in GNI.<BR>
> 6. Unescaped unicode characters in url containing literal name strings (people<BR>
> will forget to escape them) will depend on an implementation of a url<BR>
> resolver<BR>
><BR>
> Saying this links like<BR>
> <A HREF="http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_175">http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_175</A><BR>
> 8)<BR>
> are definitely attractive and is it good to have them as another way to access<BR>
> a name!<BR>
> My personal preference would be not use them as main identifier because<BR>
> of the reasons 1, 2, 3 and 5.<BR>
><BR>
> Dima<BR>
><BR>
><BR>
><BR>
><BR>
> On Fri, Jun 3, 2011 at 7:59 AM, David Remsen (GBIF) <dremsen@gbif.org><BR>
> wrote:<BR>
> > Why not use the name as the basis for the resolvable identifier<BR>
> > instead of a uuid. Isnt there a 1:1 cardinality between the name and<BR>
> > the uuid in the GNI? Doesnt that mean that<BR>
> ><BR>
> > <A HREF="http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-">http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-</A><BR>
> 755c34<BR>
> > c601ec<BR>
> > and<BR>
> ><BR>
> <A HREF="http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_175">http://gni.globalnames.org/name_strings/Danaus_plexippus_(Linnaeus_175</A><BR>
> > 8)<BR>
> ><BR>
> > are equally unique? The latter is certainly more readable. In those<BR>
> > cases where the namestring is a homonym like<BR>
> ><BR>
> > <A HREF="http://gni.globalnames.org/name_strings/Oenanthe">http://gni.globalnames.org/name_strings/Oenanthe</A><BR>
> ><BR>
> > couldn't you just return the addresses of the two globally unique<BR>
> > forms of the name when you resolve it?<BR>
> ><BR>
> > <A HREF="http://gni.globalnames.org/name_strings/Oenanthe_Smith_1899">http://gni.globalnames.org/name_strings/Oenanthe_Smith_1899</A><BR>
> ><BR>
> > <A HREF="http://gni.globalnames.org/name_strings/Oenanthe_Jones_1900">http://gni.globalnames.org/name_strings/Oenanthe_Jones_1900</A><BR>
> ><BR>
> > Wouldn't those be as globally unique and easier to read and adjust to?<BR>
> > Or am I missing something. I always wanted to do that with ubio IDs<BR>
> > after a back and forth with Gregor Hagedorn and wished we hadn't<BR>
> > exposed those integers.<BR>
> ><BR>
> > DR<BR>
> ><BR>
> >> Hi Steve,<BR>
> >><BR>
> >> I don't have time to go through this in detail, and I can't speak for<BR>
> >> the GNI, but I can tell you about how the GNI URI's work at least for now.<BR>
> >><BR>
> >> A while back Dima Mozzherin and I were looking into how triples etc.<BR>
> >> might be of use to the GNI.<BR>
> >><BR>
> >> We needed a way to generate unique URI's for each name.<BR>
> >><BR>
> >> We wanted to avoid having to keep these in sync and not require<BR>
> >> everyone to look each ID up through some service.<BR>
> >><BR>
> >> Dima came up with the following plan. We use the namestring as seed<BR>
> >> to generate a unique UUID.<BR>
> >><BR>
> >> Basically this is a shared algorithm which the GNI and TaxonConcept<BR>
> >> both use. But it could be used by anyone.<BR>
> >><BR>
> >> You feed the name string to the algorithm and it spits out a UUID. We<BR>
> >> append then append that to a URI and web service so it is resolvable.<BR>
> >><BR>
> >> So the name Danaus plexippus (Linnaeus 1758) =><BR>
> >> 4ef223c4-0c3e-5e84-ace9-755c34c601ec<BR>
> >><BR>
> >> So if the GNI and and another group have the same namestring they<BR>
> >> have the same UUID.<BR>
> >><BR>
> >> People can then can link their data set to the GNI with the following<BR>
> >> URI<BR>
> >><BR>
> >> <A HREF="http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-">http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-</A><BR>
> 755c3<BR>
> >> 4c601ec<BR>
> >><BR>
> >> RDF<BR>
> >> <A HREF="http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-">http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-</A><BR>
> 755c3<BR>
> >> 4c601ec.rdf<BR>
> >><BR>
> >> <http://gni.globalnames.org/name_strings/4ef223c4-0c3e-5e84-ace9-<BR>
> 755c<BR>
> >> 34c601ec.rdf>If you think of your data set as one table and the GNI<BR>
> >> as another, this URI serves as the foreign key that connects them<BR>
> >> together.<BR>
> >><BR>
> >> Some on the list don't like how these look, but there is a tremendous<BR>
> >> advantage in not having to worry about syncing two large data sets<BR>
> >> and determining if a given integer is already in use.<BR>
> >><BR>
> >> Also Rod Page has written a recently about UUID's.<BR>
> >> <A HREF="http://iphylo.blogspot.com/2011/05/zoobank-on-couchdb-uuids-replicati">http://iphylo.blogspot.com/2011/05/zoobank-on-couchdb-uuids-replicati</A><BR>
> >> on.html<BR>
> >><BR>
> >> <http://iphylo.blogspot.com/2011/05/zoobank-on-couchdb-uuids-<BR>
> replicat<BR>
> >> ion.html>There may be a way to do something similar with bit.ly like<BR>
> >> identifiers that are shorter (mCcSp), but I think it the general idea<BR>
> >> is a good one.<BR>
> >><BR>
> >> If you recall from my talk at TDWG, I was able to use these to make<BR>
> >> statements that one namestring was a synonym etc. of another etc.<BR>
> >><BR>
> >> The algorithm we use is written in Ruby but I could be ported to many<BR>
> >> different languages since UUIDs are widely supported.<BR>
> >><BR>
> >> Respectfully,<BR>
> >><BR>
> >> - Pete<BR>
> >><BR>
> >><BR>
> >><BR>
> >> On Thu, Jun 2, 2011 at 11:41 PM, Steven J. Baskauf <<BR>
> >> steve.baskauf@vanderbilt.edu> wrote:<BR>
> >><BR>
> >>> My email access has been sporadic since this thread developed, so<BR>
> >>> at this point I'll respond to points made in several of the<BR>
> >>> messages.<BR>
> >>><BR>
> >>> First, I should note that there has been previous discussion on this<BR>
> >>> list on a similar topic from<BR>
> >>> <A HREF="http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002231.htm">http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002231.htm</A><BR>
> >>> lthrough<BR>
> >>> <A HREF="http://lists.tdwg.org/pipermail/tdwg-content/2011-">http://lists.tdwg.org/pipermail/tdwg-content/2011-</A><BR>
> January/002231.html.<BR>
> >>> One can review what was said at that time rather quickly by starting<BR>
> >>> on the first linked message and clicking on the "Next Message" link<BR>
> >>> until you get to the end of the range I gave above.<BR>
> >>><BR>
> >>> My reason for the request for information that started this thread<BR>
> >>> was that I wanted to link to a URI that would anchor the name<BR>
> >>> portion of a name/sensu pair (TNU or Taxon Concept a la TCS if you<BR>
> >>> prefer) as in this RDF<BR>
> >>> snippet:<BR>
> >>><BR>
> >>> <tc:nameString>Quercus rubra L.</tc:nameString><BR>
> >>> <tc:hasName<BR>
> >>><BR>
> rdf:about="<A HREF="http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio">http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio</A><BR>
> .org:namebank:448439"<BR>
> >>> <http://www.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:n<BR>
> >>> amebank:448439>/><BR>
> >>><BR>
> >>><BR>
> >>> At this point in the discussion, I'm not actually talking about<BR>
> >>> creating a link to a taxon concept but rather to a taxon name, so<BR>
> >>> some of the issues Pete raised don't apply here (e.g. what's the<BR>
> >>> "right" name for a concept<BR>
> >>> -<BR>
> >>> the question here is simply what's a stable identifier for the name) .<BR>
> >>> In<BR>
> >>> principle, I could probably just provide the name string and be done<BR>
> >>> with it. However, having some degree of faith that Smart, Computer<BR>
> >>> Savvy People might some day be able to use the metadata returned by<BR>
> >>> the URI (or perhaps metadata which they already have in a triple<BR>
> >>> store onsite) to do cool things like knowing that my name is the<BR>
> >>> same as an orthographic variant or that "Quercus rubra L." is<BR>
> >>> basically the same thing as "Quercus rubra", I would like to also<BR>
> >>> provide a functional URI.<BR>
> >>><BR>
> >>> As an end -user who isn't very interested in the technical issues<BR>
> >>> involving names, I don't really care what URI I use. I would prefer<BR>
> >>> for it to be widely recognized and for it to "work" (i.e. be<BR>
> >>> resolvable). In the earlier<BR>
> >>> (January) thread, there was discussion about existing identifiers.<BR>
> >>> There<BR>
> >>> were a number of posts, but in particular<BR>
> >>> <A HREF="http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002258.htm">http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002258.htm</A><BR>
> >>> l<BR>
> >>> <A HREF="http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002259.htm">http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002259.htm</A><BR>
> >>> ldiscussed the relative merits of ITIS and uBio ID numbers. My<BR>
> >>> take-home message from this was that uBio represented the largest<BR>
> >>> single set of names with assigned identifiers (see<BR>
> >>> <A HREF="http://gni.globalnames.org/data_sourcescited">http://gni.globalnames.org/data_sourcescited</A> in Pete's email) and<BR>
> >>> that uBio metadata provides useful references.<BR>
> >>> Hence my interest in referencing uBio ids as a URI. However, as a<BR>
> >>> practical matter, the organizations that I share images with either<BR>
> >>> want ITIS TSNs (EOL and Morphbank) or just names (Discover Life).<BR>
> >>> Nobody is asking for uBio identifiers or any other identifier.<BR>
> >>><BR>
> >>> I found Kevin's comment at<BR>
> >>> <A HREF="http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002486.html">http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002486.html</A><BR>
> >>> very<BR>
> >>> thought-provoking: "My thoughts are that the most likely way this<BR>
> >>> will be solved is by standard market type pressures - ie the best<BR>
> >>> solution/IDs will be used the most and 'float' to the top." I'm not<BR>
> >>> going to make a judgment about what is the "best" solution or ID.<BR>
> >>> But I would say that in "computer"<BR>
> >>> history, being the "best" doesn't necessarily mean that something<BR>
> >>> will be used. Take for example, the FOAF vocabulary. What the heck<BR>
> >>> is Friend of a Friend? I would venture to say that most of the<BR>
> >>> people using the FOAF vocabulary don't know or care. The FOAF<BR>
> >>> vocabulary was the one that people started to use and once that<BR>
> >>> happened, people didn't switch even if there was something better.<BR>
> >>> I'm not familiar with the history of other stuff like YouTube and<BR>
> >>> Craig's List, but I would guess that they weren't necessarily "the<BR>
> >>> best" systems - they were just the one that the most people started<BR>
> >>> using first and once that happened, people didn't switch. I'm using<BR>
> >>> ITIS IDs because they are easy to get and the people I communicate<BR>
> >>> with want them. Whether they are the "best" or "done correctly"<BR>
> >>> doesn't matter to me as much as the fact that that they are widely<BR>
> >>> recognized and stable (and that thus far every name that I've looked<BR>
> >>> for has been in their database).<BR>
> >>><BR>
> >>> I think that one reason why this question has been on my mind is<BR>
> >>> that I've been waiting for GNUB (Global Name Use Bank) to come out.<BR>
> >>> I'm not really up on how it is going to work, but my impression is<BR>
> >>> that it was going to be based on the Global Name Index (GNI) which<BR>
> >>> was mentioned in that earlier January thread. At that point, the<BR>
> >>> GNI names didn't have any identifiers that were exposed to the<BR>
> >>> public as permanent GUIDs. I'm assuming that if GNUB refers to GNI<BR>
> >>> names, they will have some kind of identifiers. So if that happens<BR>
> >>> how is the GUID recommendation 8 going to be followed? As Kevin<BR>
> >>> said in<BR>
> >>> <A HREF="http://lists.tdwg.org/pipermail/tdwg-content/2011-June/002499.html">http://lists.tdwg.org/pipermail/tdwg-content/2011-June/002499.html</A><BR>
> >>> "What I take from recommendation 8 of the GUID applicability guide<BR>
> >>> ... is that if you DON'T already have a record in your own database<BR>
> >>> for a taxon name/concept, then reuse an existing one. " What we<BR>
> >>> have here with GNI is a situation where none of the records have<BR>
> >>> identifiers. In my mind, the "best practice" according to<BR>
> >>> recommendation 8 would be for the GNI to reuse existing identifiers<BR>
> >>> where they exist and NOT make up new ones. This is a bit more<BR>
> >>> complicated because the ITIS identifiers (which are in common<BR>
> >>> use)<BR>
> >>> don't have an http URI version that is resolvable, and while the<BR>
> >>> uBio identifiers have a resolvable http URI, it's in the form of a<BR>
> >>> proxied LSID, which I've already complained is very ugly. So I'd<BR>
> >>> like to hear some ideas about how to have "reused" identifiers in<BR>
> >>> the GNI.<BR>
> >>><BR>
> >>> One thing that comes to my mind would be to have a "domain name"<BR>
> >>> like "<A HREF="http://purl.org/gni/">http://purl.org/gni/</A>" <<A HREF="http://purl.org/gni/">http://purl.org/gni/</A>> or<BR>
> >>> "<A HREF="http://purl.org/tn/">http://purl.org/tn/</A>"<<A HREF="http://purl.org/tn/">http://purl.org/tn/</A>>("tn" for "taxon name")<BR>
> >>> and to follow it with a namespace/id combination similar to what is<BR>
> >>> done with lsids. So for example "itis/19408" and "ubio/448439"<BR>
> >>> could be appended, creating <A HREF="http://purl.org/gni/itis/19408and">http://purl.org/gni/itis/19408and</A><BR>
> >>> <A HREF="http://purl.org/gni/ubio/448439">http://purl.org/gni/ubio/448439</A> for "Quercus rubra L." Both URIs<BR>
> >>> could point to the same RDF and that RDF could indicate that the two<BR>
> >>> identifiers are owl:sameAs . I realize from what Bob Morris has<BR>
> >>> cautioned in the past that there are problems with owl:sameAs when<BR>
> >>> the two things aren't actually the same thing (e.g. if the uBio ID<BR>
> >>> refers to a name string only but the ITIS TSN refers to the name<BR>
> >>> plus an "accepted" status and a relationship to parent taxa).<BR>
> >>> However, if there were an understanding that the GNI only refers to<BR>
> >>> name strings, then one could still refer to<BR>
> >>> <A HREF="http://purl.org/gni/itis/19408">http://purl.org/gni/itis/19408</A> as an identifier for the name string<BR>
> >>> of the thing (whatever it is) that is referred to by an ITIS TSN of<BR>
> >>> 19408. I don't think there would be a problem saying that and the<BR>
> >>> ubio ID were "owl:sameAs". Some kind of solution like this would<BR>
> >>> allow people to easily generate a resolvable URI for a name if they<BR>
> >>> were using ITIS TSNs or uBio IDs. If the name that one wanted to<BR>
> >>> use was so obscure that it was one of the 9.5 million names that<BR>
> >>> uBio has that ITIS doesn't have, then that name would only have the<BR>
> >>> ubio version. I have no idea whether this would be a good idea or<BR>
> >>> not, but I was really cringing to think about 19 million newly<BR>
> >>> minted UUIDs appended to<BR>
> >>> "<A HREF="http://gni.globalnames.org/">http://gni.globalnames.org/</A>"<<A HREF="http://gni.globalnames.org/">http://gni.globalnames.org/</A>>and<BR>
> >>> figuring out how to connect those horrid things to the names and<BR>
> >>> ITIS TSNs that I'm already using. I think that I said this before,<BR>
> >>> but using the purl.org domain rather than one like<BR>
> >>> <A HREF="http://gni.globalnames.org/">http://gni.globalnames.org/</A> would in the future allow somebody else<BR>
> >>> to take over management of providing the metadata when the GUIDs<BR>
> are<BR>
> >>> resolved without having to deal with issues of who "owns" the domain<BR>
> >>> name.<BR>
> >>><BR>
> >>> Steve<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> Kevin Richards wrote:<BR>
> >>><BR>
> >>> Pete,<BR>
> >>><BR>
> >>> I'm not trying to say what you are doing is a waste of time/impossible.<BR>
> >>> I<BR>
> >>> actually think RDF + semantics are a good way forward, but this<BR>
> >>> really implies that we need to rely on the semantics and linkages<BR>
> >>> rather than having a SINGLE ID for a taxon name. (which is what I<BR>
> >>> thought Steve was getting at). Each instance of a taxon name can<BR>
> >>> have its own ID and then all these instances are connected via<BR>
> >>> ontology defined semantic links. This seems more appropriate to me<BR>
> >>> than insisting everyone uses the "Global Taxon Name ID X".<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> In your example of *Aedes triseriatus* and *Ochlerotatus<BR>
> >>> triseriatus* - these are two different names so they need two<BR>
> >>> different IDs, they may be linked by a single taxon concept, but<BR>
> >>> they are separate names. So which of these now 3 IDs do you expect<BR>
> >>> people to use, and according to what source??<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> For example if we have a name, eg the Robin, Erithacus rubecula,<BR>
> >>> mentioned<BR>
> >>> in IT IS (TSN : 559964) and also in EOL (www.eol.org/pages/1051567),<BR>
> >>> also<BR>
> >>> in GBIF (<A HREF="http://data.gbif.org/species/21266780">http://data.gbif.org/species/21266780</A>), also in avibase (<BR>
> >>> <A HREF="http://avibase.bsc-eoc.org/species.jsp?avibaseid=C809B2B90399A43D">http://avibase.bsc-eoc.org/species.jsp?avibaseid=C809B2B90399A43D</A>),<BR>
> >>> which<BR>
> >>> ID are you hoping people will use?? Would you put the IT IS ID in your<BR>
> >>> own<BR>
> >>> dataset as the ID for that name - unlikely. Or would it be better to<BR>
> >>> link<BR>
> >>> them up with semantic linkages.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> What I take from recommendation 8 of the GUID applicability guide (as<BR>
> >>> Steve<BR>
> >>> puts is "stop making up new identifiers when somebody else already has<BR>
> >>> one<BR>
> >>> for the thing you are talking about") is that if you DON'T already have<BR>
> >>> a<BR>
> >>> record in your own database for a taxon name/concept, then reuse an<BR>
> >>> existing<BR>
> >>> one. NOT ditch all your current IDs and adopt someone else's<BR>
> >>> (especially<BR>
> >>> hard considering it is so hard to work out which if the multitude of<BR>
> >>> names<BR>
> >>> ad concept IDs that directly relates to your taxon name).<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> I am all for limiting the number of IDs for the "same" thing, but in<BR>
> >>> some<BR>
> >>> cases it is more useful to build linkages than force this tight<BR>
> >>> integration<BR>
> >>> of data and IDs. Especially for taxon names and concepts, where it is<BR>
> >>> complex to define if you are even talking about the "same" thing or not.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> Kevin<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> *From:* Peter DeVries<BR>
> >>> [<A HREF="mailto:pete.devries@gmail.com">mailto:pete.devries@gmail.com</A><pete.devries@gmail.com>]<BR>
> >>><BR>
> >>> *Sent:* Wednesday, 1 June 2011 12:38 p.m.<BR>
> >>> *To:* Kevin Richards<BR>
> >>> *Cc:* Steve Baskauf; tdwg-content@lists.tdwg.org; Gerald Guala;<BR>
> >>> Nicolson,<BR>
> >>> David; Alan J Hampson; Orrell, Thomas<BR>
> >>> *Subject:* Re: [tdwg-content] ITIS TSNID to uBio NamebankIDs mapping<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> Hi Kevin,<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> I forgot one mention some other things that are different about my<BR>
> >>> project.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> You can write a simple SPARQL query to get a list of all the<BR>
> >>> TaxonConcept's<BR>
> >>> that have ITIS ids, or all those that have ITIS and NCBI ID's etc.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> You can do this on any SPARQL endpoint that hosts the data.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> You can download the entire data set and run the queries on your own<BR>
> >>> endpoint.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> You can write a script that runs the query and downloads the ITIS<BR>
> >>> numbers<BR>
> >>> and exports them to CSV etc.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> - Pete<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> On Tue, May 31, 2011 at 5:16 PM, Peter DeVries<BR>
> <pete.devries@gmail.com><BR>
> >>> wrote:<BR>
> >>><BR>
> >>> Hi Kevin,<BR>
> >>><BR>
> >>> On Tue, May 31, 2011 at 3:27 PM, Kevin Richards <<BR>
> >>> RichardsK@landcareresearch.co.nz> wrote:<BR>
> >>><BR>
> >>> This is exactly why this problem still exists and will be very complex<BR>
> >>> to<BR>
> >>> solve - everyone says "we should have a single ID for a specific taxon<BR>
> >>> name,<BR>
> >>> there seems to be several IDs 'out there' that refer to the same taxon<BR>
> >>> name,<BR>
> >>> so Im going to create another ID to link them all up" - yet another ID<BR>
> >>> that<BR>
> >>> no one will particularly want to follow - you would have to get everyone<BR>
> >>> to<BR>
> >>> agree that your combinations/integration of taxon names is the best one<BR>
> >>> and<BR>
> >>> hope everyone follows it - unlikely in this domain.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> Isn't this kind of what the The Plant List, and eBird already do?<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> A difference being that they tie these to a specific name and specific<BR>
> >>> classification.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> The Plant list is not really even open so it is difficult to people to<BR>
> >>> adopt it in mass.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> For instance, if I manage a herbarium, how do I easily reconcile my<BR>
> >>> species<BR>
> >>> list with the entities represented in the Plant List?<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> eBird has millions of records which implies that they have been able to<BR>
> >>> convince the observers in the field to adopt their system. You are<BR>
> >>> correct<BR>
> >>> in that there are probably a lot of taxonomists that don't like their<BR>
> >>> list.<BR>
> >>><BR>
> >>> It differs from many of the other classifications, but remember the<BR>
> >>> system<BR>
> >>> rewards them for not agreeing. Note the difference between the<BR>
> microbial<BR>
> >>> taxonomists and other taxonomists. In the case of the microbial<BR>
> >>><BR>
> >>> workers, the system rewards them for solving problems not debating<BR>
> >>> alternatives. Also, if a good idea comes out that will make it easier<BR>
> >>> for<BR>
> >>> the microbiologists to solve the problems they are rewarded for solving,<BR>
> >>> they are less likely to care whose idea it is.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> Like the microbiologists, there are lots of biologists that work with<BR>
> >>> species with the goal of addressing some non-taxonomic problem.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> They don't really care if the name is *Aedes triseriatus* or<BR>
> >>> *Ochlerotatus<BR>
> >>> triseriatus, *but they do care that the identifier that they connect<BR>
> >>> their<BR>
> >>> data to is stable.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> In regards to the issue of market forces,I suspect (but have no<BR>
> >>> knowledge<BR>
> >>> of) that there were probably decisions made in devising these lists that<BR>
> >>> have more to do with appeasing certain personalities that creating best<BR>
> >>> list. With the way this system rewards people it is likely that the<BR>
> >>> "correct" version will float to the top only after that person has<BR>
> >>> passed<BR>
> >>> away. I don't have much faith that the best system will always float to<BR>
> >>> the<BR>
> >>> top, That has a lot to do with the personalities and how the system<BR>
> >>> rewards<BR>
> >>> are setup. Theoretically, it is possible for one strong personality or<BR>
> >>> group<BR>
> >>> to force others to adopt their less than optimal solution - at least<BR>
> >>> this<BR>
> >>> seems to happen in other environments.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> Also, there are all sorts of ways that people can use the publication<BR>
> >>> record to rewrite history. Simply cite the review paper that cites the<BR>
> >>> original paper. Or don't cite it at all.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> I would have used only the ITIS TSN but if the name changes the ID<BR>
> >>> changes.<BR>
> >>> This isn't "wrong", it just does not solve my problem.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> * ITIS also should add the spiders from the World Spider Catalog.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> Another issue that I think has inhibited adoption of a common list is<BR>
> >>> that<BR>
> >>> people can't agree on a particular name or a particular classification.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> Since you can model a species concept as having many names and many<BR>
> >>> classifications why not do so?<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> If this idea was originally accepted, I would not have needed to create<BR>
> >>> TaxonConcept.org.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> My plan has aways been to get something that works to solve some<BR>
> >>> problems<BR>
> >>> and then let some larger group take it over.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> In a sense, I am more like the microbiologists in that I am not being<BR>
> >>> paid<BR>
> >>> to solve this or debate this problem.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> I am doing it because I think something like this is needed, and it is<BR>
> >>> an<BR>
> >>> interesting and personally rewarding puzzle.<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> - Pete<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> My thoughts are that the most likely way this will be solve is by<BR>
> >>> stnadard<BR>
> >>> market type pressures - ie the best solution/IDs will be used the most<BR>
> >>> and<BR>
> >>> "float" to the top. It is easy to say that the global taxon name data<BR>
> >>> is a<BR>
> >>> mess, but if you think about it 30 years ago taxon name data were very<BR>
> >>> disparate, duplicated, unconnected, many with NO IDs at all. So I<BR>
> >>> beleive<BR>
> >>> we are making progress and that we will continue to do so albeit at a<BR>
> >>> fairly<BR>
> >>> slow rate.<BR>
> >>><BR>
> >>> Kevin<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> "I agree. This was one of the reasons that I setup TaxonConcept the way<BR>
> >>> I<BR>
> >>> did. It attempts to connect both the LOD entities and the foreign key<BR>
> >>> based<BR>
> >>> entities."<BR>
> >>><BR>
> >>> Please consider the environment before printing this email<BR>
> >>> Warning: This electronic message together with any attachments is<BR>
> >>> confidential. If you receive it in error: (i) you must not read, use,<BR>
> >>> disclose, copy or retain it; (ii) please contact the sender immediately<BR>
> >>> by<BR>
> >>> reply email and then delete the emails.<BR>
> >>> The views expressed in this email may not be those of Landcare<BR>
> Research<BR>
> >>> New<BR>
> >>> Zealand Limited. <A HREF="http://www.landcareresearch.co.nz">http://www.landcareresearch.co.nz</A><BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> --<BR>
> >>><BR>
> >>> ------------------------------------------------------------------------------------<BR>
> >>> Pete DeVries<BR>
> >>> Department of Entomology<BR>
> >>> University of Wisconsin - Madison<BR>
> >>> 445 Russell Laboratories<BR>
> >>> 1630 Linden Drive<BR>
> >>> Madison, WI 53706<BR>
> >>> Email: pdevries@wisc.edu<BR>
> >>> TaxonConcept <<A HREF="http://www.taxonconcept.org/">http://www.taxonconcept.org/</A>> &<BR>
> >>> GeoSpecies<<A HREF="http://about.geospecies.org/">http://about.geospecies.org/</A>> Knowledge<BR>
> >>> Bases<BR>
> >>> A Semantic Web, Linked Open Data <<A HREF="http://linkeddata.org/">http://linkeddata.org/</A>> Project<BR>
> >>><BR>
> >>> --------------------------------------------------------------------------------------<BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>><BR>
> >>> --<BR>
> >>><BR>
> >>> ------------------------------------------------------------------------------------<BR>
> >>> Pete DeVries<BR>
> >>> Department of Entomology<BR>
> >>> University of Wisconsin - Madison<BR>
> >>> 445 Russell Laboratories<BR>
> >>> 1630 Linden Drive<BR>
> >>> Madison, WI 53706<BR>
> >>> Email: pdevries@wisc.edu<BR>
> >>> TaxonConcept <<A HREF="http://www.taxonconcept.org/">http://www.taxonconcept.org/</A>> &<BR>
> >>> GeoSpecies<<A HREF="http://about.geospecies.org/">http://about.geospecies.org/</A>> Knowledge<BR>
> >>> Bases<BR>
> >>> A Semantic Web, Linked Open Data <<A HREF="http://linkeddata.org/">http://linkeddata.org/</A>> Project<BR>
> >>><BR>
> >>> --------------------------------------------------------------------------------------<BR>
> >>><BR>
> >>> ------------------------------<BR>
> >>> Please consider the environment before printing this email<BR>
> >>> Warning: This electronic message together with any attachments is<BR>
> >>> confidential. If you receive it in error: (i) you must not read, use,<BR>
> >>> disclose, copy or retain it; (ii) please contact the sender immediately<BR>
> >>> by<BR>
> >>> reply email and then delete the emails.<BR>
> >>> The views expressed in this email may not be those of Landcare<BR>
> Research<BR>
> >>> New<BR>
> >>> Zealand Limited. <A HREF="http://www.landcareresearch.co.nz">http://www.landcareresearch.co.nz</A><BR>
> >>><BR>
> >>><BR>
> >>> --<BR>
> >>> Steven J. Baskauf, Ph.D., Senior Lecturer<BR>
> >>> Vanderbilt University Dept. of Biological Sciences<BR>
> >>><BR>
> >>> postal mail address:<BR>
> >>> VU Station B 351634<BR>
> >>> Nashville, TN 37235-1634, U.S.A.<BR>
> >>><BR>
> >>> delivery address:<BR>
> >>> 2125 Stevenson Center<BR>
> >>> 1161 21st Ave., S.<BR>
> >>> Nashville, TN 37235<BR>
> >>><BR>
> >>> office: 2128 Stevenson Center<BR>
> >>> phone: (615) 343-4582, fax: (615)<BR>
> >>> 343-6707<A HREF="http://bioimages.vanderbilt.edu">http://bioimages.vanderbilt.edu</A><BR>
> >>><BR>
> >>><BR>
> >><BR>
> >><BR>
> >> --<BR>
> >> ------------------------------------------------------------------------------------<BR>
> >> Pete DeVries<BR>
> >> Department of Entomology<BR>
> >> University of Wisconsin - Madison<BR>
> >> 445 Russell Laboratories<BR>
> >> 1630 Linden Drive<BR>
> >> Madison, WI 53706<BR>
> >> Email: pdevries@wisc.edu<BR>
> >> TaxonConcept <<A HREF="http://www.taxonconcept.org/">http://www.taxonconcept.org/</A>> &<BR>
> >> GeoSpecies<<A HREF="http://about.geospecies.org/">http://about.geospecies.org/</A>> Knowledge<BR>
> >> Bases<BR>
> >> A Semantic Web, Linked Open Data <<A HREF="http://linkeddata.org/">http://linkeddata.org/</A>> Project<BR>
> >> --------------------------------------------------------------------------------------<BR>
> >> _______________________________________________<BR>
> >> tdwg-content mailing list<BR>
> >> tdwg-content@lists.tdwg.org<BR>
> >> <A HREF="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</A><BR>
> >><BR>
> ><BR>
> ><BR>
> ><BR>
> > ----------------------------------------------------------------------------<BR>
> > David Remsen, Senior Programme Officer<BR>
> > Electronic Catalog of Names of Known Organisms<BR>
> > Global Biodiversity Information Facility Secretariat<BR>
> > Universitetsparken 15, DK-2100 Copenhagen, Denmark<BR>
> > Tel: +45-35321472 Fax: +45-35321480<BR>
> > Skype: dremsen<BR>
> > ----------------------------------------------------------------------------<BR>
> ><BR>
> ><BR>
> ><BR>
> > _______________________________________________<BR>
> > tdwg-content mailing list<BR>
> > tdwg-content@lists.tdwg.org<BR>
> > <A HREF="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</A><BR>
> ><BR>
> _______________________________________________<BR>
> tdwg-content mailing list<BR>
> tdwg-content@lists.tdwg.org<BR>
> <A HREF="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</A><BR>
<BR>
<BR>
_______________________________________________<BR>
tdwg-content mailing list<BR>
tdwg-content@lists.tdwg.org<BR>
<A HREF="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</A><BR>
_______________________________________________<BR>
tdwg-content mailing list<BR>
tdwg-content@lists.tdwg.org<BR>
<A HREF="http://lists.tdwg.org/mailman/listinfo/tdwg-content">http://lists.tdwg.org/mailman/listinfo/tdwg-content</A><BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>