[tdwg-content] Why UUIDs alone are not adequate as GUIDs, was Re: ITIS TSNID to uBio NamebankIDs mapping

Bob Morris morris.bob at gmail.com
Wed Jun 8 07:56:52 CEST 2011


Several points in your dialog with Rich Pyle confuse me. I can't tell
at places where you are relying on TDWG Applicability Statements,
where on W3 Recommendations, where on IETF RFCs, and where on Cool
URIs. Your "easier to read" cited opinion pieces of Tim Berners-Lee
carries a warning that it is obsolete in places. (I find it hard to
identify those places, but,
http://www.w3.org/TR/2008/NOTE-cooluris-20081203/ has a status a
little less personal than the TBL pieces, and I assume that's what you
are resting on.)  That CoolURI NOTE explicitly declaims discussion of
non-http URIs, so it is hard for me to see how it supports arguments
about http URIs vs non-http URIs, although the last few sections make
arguments to bolster its position. Also, as written, the document
seems to have a vision of applicability to static web documents.
Hence(?) extrapolating to data services seems to require choosing an
RDF-based model of data services, and by no means is LOD the only
possible such model, ab-hominem (sic) arguments notwithstanding.

I've eliminated so much of the dialog with Rich that I may be ignoring
context that will show me wrong below.

2011/6/7 Steve Baskauf <steve.baskauf at vanderbilt.edu>:
> Rich,
>>[Rich Pyle said:]
>>Here is where I completely disagree.  I've said it before, and I'll keep
>>saying it:  GUIDs are (should be) intended and necessary for
>>computer-computer communication; *NOT*for human-computer or human-human
>>communication.  Their beauty or ugliness should be determined by what's
>>beautiful or ugly to a computer, not to a human.  A consistent 128 bits is
>>"beautiful" to a computer, but a UUID is ugly to a human; whereas " Danaus
>>plexippus (Linnaeus 1758)" is beautiful to a human, but ugly to a computer
>>(for reasons Dima already outlined).

>>[cyrillic character and other mumbles omitted]

>>Almost by definition, then, a "beautiful" identifier for computer-computer
>>communication should be "ugly" to a pair of human eyeballs.

>[Steve replied: ]
>I disagree with you completely here.  If you haven't read the "Cool URIs" piece, you should before we talk >about this more.  It is full of examples that are easy to read and type and are intended to be "understood" >by both humans and computers.  The piece at http://www.w3.org/Provider/Style/URI is an even easier read.  >GUIDs CAN be easy to "read" and type, although they don't have to be.  The degree to which it "matters" >whether a GUID is human readable or not depends primarily on the likelihood that humans will see it in print >or type it in the URL box of a web browser.  In the examples of GUIDs for names that you provided, I will >agree that it's not very likely that humans will be seeing them.  But if the GUID is of a specimen, an image, >or a tree (which could easily appear in print or be written down by somebody to look at its web page), I would >argue that readability is desirable, e.g. http://bioimages.vanderbilt.edu/uncg/966 .  I realize that everyone >does not agree with me on this, particularly the fans of UUIDs.  As far as I know, there isn't any rule about >what characters should be in an HTTP URI.

>  As
> far as I know, there isn't any rule about what characters should be in an
> HTTP URI.

The ASCII control characters are forbidden in URI's used in RDF, but I
guess that has no impact on your arguments.

>[Steve continued:]
>But there is a general understanding that it is a best practice
> that an HTTP URI that is intended as an identifier should do content
> negotiation and produce both HTML for humans and RDF for machines.
>

This "general understanding" is about particular models of how to
solve the dual use problem, and it's quite bound to the http protocol
and web browsers as clients.  Historically, such problems have
sometimes been solved at the client side also. For example, most
(all?) modern browsers can do pretty well with the FTP URI and the
MAILTO URI.

History should not be ignored, especially the history of using
protocols du-jour.  The convergence of mobile telephony and
information management and access arrived rather faster than most
predicted. In a mobile world, http may well prove a junior player for
data-centric apps, and http servers may not be whence data  is
fetched. (This is already the case for Android phones. See
http://developer.android.com/guide/topics/providers/content-providers.html#urisum
which describes Android's CONTENT URI scheme ).  Similarly, a  number
of popular P2P network clients implement the MAGNET URI scheme
http://en.wikipedia.org/wiki/Magnet_URI_scheme. Indeed, a cynical view
of http://www.w3.org/Mobile/ would hold that W3C's direction is a plan
to keep the worldwide web relevant.  Will it succeed for data?
Possibly only with a redefinition of the web. For databases, it's not
any harder to make android content: protocol servers than to make
http: protocol servers. See
http://developer.android.com/guide/topics/providers/content-providers.html



> [lots of stuff cut out here that will have to wait for another email]
>
>> [Rich said:]
>> Errr..sort of.  I say we identify things using GUIDs, and provide services
>> that resolve those GUIDs via actionable HTTP URIs (or, if you prefer,
>> embedding those GUIDs within a resolution metadata "wrapper").  Yes, I know
>> it's all the rage to collapse the functions of actionability and globally
>> unique identification into the same text-string URI (what I've been
>> referring to as the TB-L perspective).  But to be perfectly blunt, I see
>> this as a mistake that will, in the long run, sow down our progress.
>
>[Steve replied:  ]
> Why does this slow down our progress?  I don't get that at all.  I see your
> viewpoint as the one impeding progress because non-HTTP GUIDs make it
> difficult or impossible to describe things in RDF.

Non-http GUIDS at worst make it difficult to play with data providers
and clients that only understand the http protocol, which is a
circular argument.  Certainly, for example, Non-http GUIDS do not
interfere with SPARQL queries, or with RDF reasoners or with RDF data
integration.  In fact, even LOD has no need of http URIs except for
the convenience of the existing infrastructure.  Any dereferencable
URI scheme would work. And so would multiple ones, provided only the
clients and servers both understood the schemes.

Finally a social issue about your arguments on the importance of the
ease of transcribing URIs from paper.  Of course, wholly within
electronic clients for humans, this is irrelevant because the client
can render the identifier in any form mutually agreeable to the human
and the software. With no insult intended (well---maybe a friendly
little poke... :-) ) , the social issue is this:  mainly it's people
over 30 who find paper publication anything other than a quaint
annoyance. Others will be bemused if not astonished that some people
think that paper is  important for prospective publishing in the
sciences.

In the spirit of ending on agreement: I agree with everything where
you and Rich agree...oh, wait, that's because I agree with everything
Rich said. :-)


Bob Morris


>
>
> Richard L. Pyle, PhD
> Database Coordinator for Natural Sciences
> Associate Zoologist in Ichthyology
> Dive Safety Officer
> Department of Natural Sciences, Bishop Museum
> 1525 Bernice St., Honolulu, HI 96817
> Ph: (808)848-4115, Fax: (808)847-8252
> email: deepreef at bishopmuseum.org
> http://hbs.bishopmuseum.org/staff/pylerichard.html
>
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> .
>
>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>



-- 
Robert A. Morris

Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
IT Staff
Filtered Push Project
Department of Organismal and Evolutionary Biology
Harvard University


email: morris.bob at gmail.com
web: http://efg.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)


More information about the tdwg-content mailing list