[tdwg-content] Why UUIDs alone are not adequate as GUIDs, was Re: ITIS TSNID to uBio NamebankIDs mapping

Wed Jun 8 19:43:17 CEST 2011

I would like to add an observation that comes from a computer science tradition

An object should have one responsibility and several responsibilities
should be achieved by combination of objects.

In case of identifiers an illustration can be a scientific name. It
does work as an identifier and as a tiny classification. For example
Pinus silverstris identifies a species and also tells us the genus of
the species. As a result identifier changes when classification
changes and also you cannot identify species until you find a genus
placement. Combination of two responsibilities in my opinion decreased
usefulness of this particular identifier dramatically.

Now imagine that Linnaeus would also add a resolution responsibility
to identifier. Would not his inclusion of resolution mechanism into
identifier be not that appropriate at this day and age?

Dima

On Wed, Jun 8, 2011 at 1:56 AM, Bob Morris <morris.bob at gmail.com> wrote:
> Several points in your dialog with Rich Pyle confuse me. I can't tell
> at places where you are relying on TDWG Applicability Statements,
> where on W3 Recommendations, where on IETF RFCs, and where on Cool
> URIs. Your "easier to read" cited opinion pieces of Tim Berners-Lee
> carries a warning that it is obsolete in places. (I find it hard to
> identify those places, but,
> http://www.w3.org/TR/2008/NOTE-cooluris-20081203/ has a status a
> little less personal than the TBL pieces, and I assume that's what you
> are resting on.)  That CoolURI NOTE explicitly declaims discussion of
> non-http URIs, so it is hard for me to see how it supports arguments
> about http URIs vs non-http URIs, although the last few sections make
> arguments to bolster its position. Also, as written, the document
> seems to have a vision of applicability to static web documents.
> Hence(?) extrapolating to data services seems to require choosing an
> RDF-based model of data services, and by no means is LOD the only
> possible such model, ab-hominem (sic) arguments notwithstanding.
>
> I've eliminated so much of the dialog with Rich that I may be ignoring
> context that will show me wrong below.
>
> 2011/6/7 Steve Baskauf <steve.baskauf at vanderbilt.edu>:
>> Rich,
>>>[Rich Pyle said:]
>>>Here is where I completely disagree.  I've said it before, and I'll keep
>>>saying it:  GUIDs are (should be) intended and necessary for
>>>computer-computer communication; *NOT*for human-computer or human-human
>>>communication.  Their beauty or ugliness should be determined by what's
>>>beautiful or ugly to a computer, not to a human.  A consistent 128 bits is
>>>"beautiful" to a computer, but a UUID is ugly to a human; whereas " Danaus
>>>plexippus (Linnaeus 1758)" is beautiful to a human, but ugly to a computer
>>>(for reasons Dima already outlined).
>
>>>[cyrillic character and other mumbles omitted]
>
>>>Almost by definition, then, a "beautiful" identifier for computer-computer
>>>communication should be "ugly" to a pair of human eyeballs.
>
>>[Steve replied: ]
>>I disagree with you completely here.  If you haven't read the "Cool URIs" piece, you should before we talk >about this more.  It is full of examples that are easy to read and type and are intended to be "understood" >by both humans and computers.  The piece at http://www.w3.org/Provider/Style/URI is an even easier read.  >GUIDs CAN be easy to "read" and type, although they don't have to be.  The degree to which it "matters" >whether a GUID is human readable or not depends primarily on the likelihood that humans will see it in print >or type it in the URL box of a web browser.  In the examples of GUIDs for names that you provided, I will >agree that it's not very likely that humans will be seeing them.  But if the GUID is of a specimen, an image, >or a tree (which could easily appear in print or be written down by somebody to look at its web page), I would >argue that readability is desirable, e.g. http://bioimages.vanderbilt.edu/uncg/966 .  I realize that everyone >does not agree with me on this, particularly the fans of UUIDs.  As far as I know, there isn't any rule about >what characters should be in an HTTP URI.
>
>>  As
>> far as I know, there isn't any rule about what characters should be in an
>> HTTP URI.
>
> The ASCII control characters are forbidden in URI's used in RDF, but I
> guess that has no impact on your arguments.
>
>>[Steve continued:]
>>But there is a general understanding that it is a best practice
>> that an HTTP URI that is intended as an identifier should do content
>> negotiation and produce both HTML for humans and RDF for machines.
>>
>
> This "general understanding" is about particular models of how to
> solve the dual use problem, and it's quite bound to the http protocol
> and web browsers as clients.  Historically, such problems have
> sometimes been solved at the client side also. For example, most
> (all?) modern browsers can do pretty well with the FTP URI and the
> MAILTO URI.
>
> History should not be ignored, especially the history of using
> protocols du-jour.  The convergence of mobile telephony and
> information management and access arrived rather faster than most
> predicted. In a mobile world, http may well prove a junior player for
> data-centric apps, and http servers may not be whence data  is
> fetched. (This is already the case for Android phones. See
> http://developer.android.com/guide/topics/providers/content-providers.html#urisum
> which describes Android's CONTENT URI scheme ).  Similarly, a  number
> of popular P2P network clients implement the MAGNET URI scheme
> http://en.wikipedia.org/wiki/Magnet_URI_scheme. Indeed, a cynical view
> of http://www.w3.org/Mobile/ would hold that W3C's direction is a plan
> to keep the worldwide web relevant.  Will it succeed for data?
> Possibly only with a redefinition of the web. For databases, it's not
> any harder to make android content: protocol servers than to make
> http: protocol servers. See
> http://developer.android.com/guide/topics/providers/content-providers.html
>
>
>
>> [lots of stuff cut out here that will have to wait for another email]
>>
>>> [Rich said:]
>>> Errr..sort of.  I say we identify things using GUIDs, and provide services
>>> that resolve those GUIDs via actionable HTTP URIs (or, if you prefer,
>>> embedding those GUIDs within a resolution metadata "wrapper").  Yes, I know
>>> it's all the rage to collapse the functions of actionability and globally
>>> unique identification into the same text-string URI (what I've been
>>> referring to as the TB-L perspective).  But to be perfectly blunt, I see
>>> this as a mistake that will, in the long run, sow down our progress.
>>
>>[Steve replied:  ]
>> Why does this slow down our progress?  I don't get that at all.  I see your
>> viewpoint as the one impeding progress because non-HTTP GUIDs make it
>> difficult or impossible to describe things in RDF.
>
> Non-http GUIDS at worst make it difficult to play with data providers
> and clients that only understand the http protocol, which is a
> circular argument.  Certainly, for example, Non-http GUIDS do not
> interfere with SPARQL queries, or with RDF reasoners or with RDF data
> integration.  In fact, even LOD has no need of http URIs except for
> the convenience of the existing infrastructure.  Any dereferencable
> URI scheme would work. And so would multiple ones, provided only the
> clients and servers both understood the schemes.
>
> Finally a social issue about your arguments on the importance of the
> ease of transcribing URIs from paper.  Of course, wholly within
> electronic clients for humans, this is irrelevant because the client
> can render the identifier in any form mutually agreeable to the human
> and the software. With no insult intended (well---maybe a friendly
> little poke... :-) ) , the social issue is this:  mainly it's people
> over 30 who find paper publication anything other than a quaint
> annoyance. Others will be bemused if not astonished that some people
> think that paper is  important for prospective publishing in the
> sciences.
>
> In the spirit of ending on agreement: I agree with everything where
> you and Rich agree...oh, wait, that's because I agree with everything
> Rich said. :-)
>
>
> Bob Morris
>
>
>>
>>
>> Richard L. Pyle, PhD
>> Database Coordinator for Natural Sciences
>> Associate Zoologist in Ichthyology
>> Dive Safety Officer
>> Department of Natural Sciences, Bishop Museum
>> 1525 Bernice St., Honolulu, HI 96817
>> Ph: (808)848-4115, Fax: (808)847-8252
>> email: deepreef at bishopmuseum.org
>> http://hbs.bishopmuseum.org/staff/pylerichard.html
>>
>>
>>
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>> .
>>
>>
>>
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>
>
>
> --
> Robert A. Morris
>
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> IT Staff
> Filtered Push Project
> Department of Organismal and Evolutionary Biology
> Harvard University
>
>
> email: morris.bob at gmail.com
> web: http://efg.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> phone (+1) 857 222 7992 (mobile)
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>