[tdwg-content] ITIS TSNID to uBio NamebankIDs mapping

Mon Jun 6 03:26:24 CEST 2011

I just added urls with name strings as an alternative 'ids' in GNI. So
it is possible to have something like

http://gni.globalnames.org/name_strings/Quercus_alba

and even

http://gni.globalnames.org/name_strings/Quercus alba L. 'Elongata'

Also links like

http://gni.globalnames.org/name_strings/10507390

will be converted when accessed by a human via browser to

http://gni.globalnames.org/name_strings/Quercus_alba_L._'Elongata'

There are still some problems, for example names ending with period do
not work yet.

Dima

On Sun, Jun 5, 2011 at 2:56 PM, Richard Pyle <deepreef at bishopmuseum.org> wrote:
> Hi Steve,
>
> Excellent post!
>
> I like your list of what we want "GUIDs" (see below) to do, and I think it's
> an excellent starting point for a bar we should all strive for.  I'm
> particularly grateful to learn that the existing ZooBank service fails so
> many of them.  I've forwarded your post to Rob Whitton, who will be working
> on Gen-2 of ZooBank in the coming weeks, and asked him if we can use your 8
> tests as a metric to adhere to.  Watch this space.
>
> Meanwhile...
>
>> "But really, from the perspective of the end-user, does it matter
>> if it's an identifier or a service?  Ultimately, they ask the questions,
>> and the answers appear on their computer screens."
>>
>> I would answer this question by saying "yes, it does matter!" -
>> it is important that a well-designed GUID do more than just throw
>> something up onto a human user's web browser.
>
> I absolutely agree with you, but that's not the distinction I was making in
> my quoted text.  I was only talking about whether we call something an
> "identifier" (not GUID, which has more specific implications), or a
> "service", in the context of human-machine conversations.  I think your
> enumeration of things we want GUIDs to do is a very good framework for
> discussion.  I would only caution that "GUID" means different things to
> different people (some people use it synonymously with UUID, for example),
> and also that GUID does not imply "actionable".  There has been a bit of a
> debate over the importance of embedding "actionability" into identifiers
> inherently (the Tim Berners-Lee perspective), vs thinking about
> "identification" separately from how we perform some action on it.  For
> example, UUIDs and Social Security numbers are extremely useful identifiers,
> even though they are not inherently actionable.  It's amazingly easy to
> perform action on a non-actionable identifier by simply appending it to a
> actionable prefix.  For example, going back to the list of "identifiers":
>
> A. A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
> B. urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
> C.
> http://zoobank.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5BF4
> 1523
> D. http://zoobank.org/A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
> E.
> http://lsid.tdwg.org/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E5B
> F41523
> F. http://www.google.com/search?q=Danaus+plexippus+(Linnaeus+1758)
> G.
> http://lsid.tdwg.org/summary/urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB
> 4-EA8E5BF41523
> H.
> http://darwin.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:zoobank.org:a
> ct:A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523&submit=Go
>
> There are two different ways of looking at this:
>
> 1) There are 8 different identifiers
> 2) There is one identifier (A), and 6 ways to perform action on it (B-E,
> G-H).
>
> If you treat them all as distinct identifiers, then let me add a few more to
> the list:
>
> I.
> http://zoobank.org/?lsid=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA
> 8E5BF41523
> J.
> http://zoobank.org/?id=urn:lsid:zoobank.org:act:A9F435E0-8ED7-46DD-BAB4-EA8E
> 5BF41523
> K. http://zoobank.org/?id=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
> L. http://zoobank.org/?uuid=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523
>
> Note that all four of the above, plus B-D in the original list, are all
> resolved through zoobank.org.  Why are there so many different ways to
> perform action on the "same" identifier? Because I wanted the ZooBank
> resolution service to be flexible. And, because in my mind, there is only
> one identifier (A); and lots of different ways to retrieve the metadata of
> the object it represents.
>
> Now consider this from the TB-L perspective. Eleven different identifiers
> for the same object (excluding F).  Does that mean we need to generate
> owl:sameAs statements for all pair-wise relationships?  That's a lot of
> owl:sameAs statements! Even if I'm the bad guy in foolishly allowing so many
> different ways to resolve ZooBank identifiers, and needlessly fabricated so
> many "different" identifiers for the same thing unnecessarily.  Fair enough.
> But I still think we're a lot better off by disentangling identifiers from
> the services we use to perform action on them.
>
> One of the arguments on the TB-L side is that a non-actionable identifier by
> itself is useless if you cannot inherently perform action on it.  For
> example, if you were walking through the park and stumbled upon a slip of
> paper with "A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523" written on it, you
> probably wouldn't be able to do much with it.  But in reality, that's not
> what happens.  We never expose identifiers as a simple context-free
> identifiers in their non-resolvable form.  These identifiers are *always*
> exposed in some context.  The problem is that if you treat the "resolution
> metadata" (as I call it -- e.g., "urn:lsid:zoobank.org:act:" or
> "http://zoobank.org/") as *part* of the identifier (as you have to do if you
> make things like "urn:lsid:ubio.org:namebank:11815"), then it becomes
> difficult for an application to distinguish between
> "http://zoobank.org/?id=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523", and
> "http://zoobank.org/?uuid=A9F435E0-8ED7-46DD-BAB4-EA8E5BF41523"; which, to a
> human, obviously refers to the same thing.  In other words, absent all those
> owl:sameAs statements, an application could break if it harvests content
> from different sources that use different resolution metadata for the "same"
> (sensu Pyle) identifier.
>
> Maybe what we need to think about is a registry of "persistent resolution
> services", which our community relies on.  That way, we can apply the
> owl:sameAs statements to the resolution services, rather than to every
> single individual identifier.
>
>> An important question that I think has been underlying much of this
> discussion
>> is whether GUIDs are actually needed for names.
>
> I think the answer is clearly  "yes". The problem is defining what is meant
> by the word "name".
>
>> If one takes the position
>> that a "name" can never be more than a string without
>> crossing the line into being something more complicated
>> like a TNU or TaxonConcept, then I think one could make
>> the case that the answer to this is "no".
>
> Perhaps, but I don't know of anyone who takes that position.
> GNI/uBio/NameBank exist for a very specific purpose, and in that very narrow
> context, the "name" is equivalent to the UTF-8-encouded string of
> characters.  The architects of these systems would be the first to say that
> this is a very limited context for what a "name" is, and *none* of them
> would assert that a "name" can never be more than this.  Everyone I know
> understands that all other flavors of "name" imply something much, much more
> than the string of text characters.
>
>> There isn't a whole lot that one would want to know about the
>> string that couldn't just be imparted by letting it be a string literal.
>> If one takes this position, then "Quercus alba L." is a different "thing"
>> (i.e. resource) from "Quercus alba" or "Quercus alba Linnaeus".
>> It seems that something like this is the position that Rich and the
>> GNI are taking.  Under this scenario, there is little point in creating
>> URI GUIDs for the name strings.
>
> I only took that position in the *very narrow* context of GNI, which is
> unusual among the millions of taxonomic datasets in treating a "name" as a
> distinct text string.  And I backed off from that position after reading
> Dima's post.
>
>> On the other hand, if one takes the position that a name can be a
>> conceptual entity that has properties which include its name string(s)
>
> ...as, I think, everyone does...
>
>> and parts thereof, then it does make sense to apply GUIDs to that
>> kind of entity.  I am thinking about a tn:TaxonName as defined in the
>> TDWG ontology (see
>>
> http://code.google.com/p/tdwg-ontology/source/browse/trunk/ontology/voc/Taxo
> nName.rdf),
>> which comes out of the TCS schema (see
>> http://code.google.com/p/darwin-sw/wiki/ClassTaxon for info and links
> regarding TCS).
>> A tn:TaxonName is "An object that represents a single scientific
> biological name..." i.e. an "object"
>> NOT defined as a string.
>
> While it's nice to see the explicit representation of a "name" as an object,
> rather than a string; unfortunately that doesn't address the elephant in the
> room; that is, that different people have different notions of what "a
> single scientific biological name" is.  I'm not talking subtly different
> shades of fundamentally the same thing; I'm talking about fundamentally
> different things with different implied sets of properties. This is one of
> the issues I continued to hammer on during the development of TCS, and the
> one that gave me the biggest qualms about TCS 1.0.  My hope was that it
> would be resolved in TCS 2.0. I wanted to reduce both names and concepts to
> the same core entity: usage instances.  That's exactly what we're doing with
> GNUB.
>
>> But if the GNI is only a "dirty bucket" that accumulates every name string
> that anybody
>> has ever used in history but with little or no metadata, then I can't see
> that I have any
>> use for a URI point to it, at least as something to which I would refer in
> RDF.
>
> I think it's helpful to see GNI and GNUB as a yin-and-yang sort of thing.
> There *needs* to be a service at the dirty end of the spectrum, because for
> the vast majority of existing biodiversity data (digitized or not), the only
> link we have to at taxon concept is a text-string name. There needs to be a
> service that manages names-as-text strings.  GNUB, at the other end of the
> spectrum, has the rich full-context metadata that I think you are interested
> in, allowing for unambiguous reconciliation of different text strings as
> applied to type specimens, or enumerating all spelling variants of the
> "same" name, etc., etc.  What's missing (but DEFINITELY planned and already
> sketched out), are the services that connect GNUB and GNI together.  As soon
> as we hear definitively from NSF (should be soon now), we'll have the
> resources to start building those services.
>
>> I'm not saying that there isn't a use for the GNI.  I think what I'm
> saying is that there
>> doesn't seem to be any point in worrying about how to create URIs for the
> GNI
>> when those URIs don't "do" anything different from what a string literal
> does.
>> I think this is essentially what Rich was saying: "that text string
> represents a perfectly
>> suitable unique identifier.  There is no need to generate a surrogate
> identifier like
>> an integer number or UUID or LSID or whatever".
>
> Yes, I think that's exactly what I was saying.  Dima's post has forced me to
> reconsider this somewhat, but even still, more broadly, I never saw GNI as a
> service in need of "GUIDs" (in the sense that you outlined at the beginning
> of your post). Certainly there is value in having internal data structures
> to perform certain functions, but as far as I can tell, the interface
> between GNI and the outside world should probably be limited to
> human-readable name-strings.
>
>> Although Rich has been very cautionary about maintaining the distinction
> between
>> ITIS TSNs, which he believes to represent some kind of minimal TNU
>
> I would defer to Dave N.'s post concerning what a TSN is, and represents.
>
>> and uBio IDs which he believes to represent a name string,
>> I haven't been able to find any evidence that it would be "naughty"
>> to assert that either one is a tn:TaxonName.
>
> That's only true to the extent that tn:TaxonName may be too broadly
> (imprecisely) defined (just like dwc:Taxon).
>
> Aloha,
> Rich
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>