[tdwg-guid] First step in implementing LSIDs

Richard Pyle deepreef at bishopmuseum.org
Wed Jun 13 00:46:18 CEST 2007


At one point last week I had visions on catching up on this whole thread and
commenting on all sorts of things -- but I don't think there is any hope of
that in the forseeable future, so I'll just jump right in on the current
discussion.

It seems to me that a lot of the recent discussion has been confused by an
unclear disctinction between the needs of GUIDs as identifiers, and the
means to resolve those GUIDs to data and metadata.

As someone earlier pointed out, if all we need are identifiers (without
immediate concern for the resolution mechanism), then UUIDs will suffice.
So, let's start there:

0F9D60F1-59C7-495D-A37E-09C23770DD18

Wonderful identifier, with utterly no information about what it represents,
or what it would resolve to.  When I type that text string into a web
browser, I get nothing.  So, we can either always rely on the GUID existing
in some non-GUID-embedded context where the resolution mechanism is
self-evident (maintained by the provider or the consumer of the GUID), or we
can embed resolution "meta-information" within the GUID itself. These
actually aren't fundamentally different, but for the sake of argument, let's
assume that the former is unacceptable for our purposes. I can think of at
least four obvious examples of the latter:

DOI:10.1234/0F9D60F1-59C7-495D-A37E-09C23770DD18

hdl:1234/0F9D60F1-59C7-495D-A37E-09C23770DD18

URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770DD18

http://lsid.landcareresearch.co.nz/guid/0F9D60F1-59C7-495D-A37E-09C23770DD18

(Note that except for the LSID, these are my own creations.)

The first example (DOI) doesn't actually solve the problem, because merely
appending "DOI:10.1234/" in front of my UUID does not, by itself make it
resolvable.  For example, if I type Rod's earlier example of
"doi:10.1206/0003-0082(2005)485[0001:PNAICA]2.0.CO;2" into my browser, I get
squat. That means that either I (as a human), or the software application
that I write, needs some "insider information" to resolve the GUID. I don't
see how this is any different from option numbers 2 or 3.  Sure, you could
argue that there is only one DOI system, or only one Handle system (in the
same general sense that there is only one DNS system), so the amount of
information required to resolve the GUID is relatively small and universally
known.  But it seems to me this is mostly true of LSIDs as well.  In fact,
if I simply install the LSID Launchpad plug-in for my browser, I *can* type
the LSID in and get metadata back automatically (as long as I chop the
"URN:" part -- which seems like an unnecessary step). The main difference
among the first three that I see is that in this blink-of-an-eye-moment in
history, there happen to be a lot of people using DOIs (and Handles?),
whereas LSIDs have not (yet?) caught on as widely.

So, that leaves the fourth option (HTTP URL): the obvious advantage of which
is that resolution is a snap using current browser protocols and
technologies (no plug-ins required), and development support is much better
(given how widely used HTTP is); and the apparent disadvantages being some
social concern of link-rot and/or uncertainty about the permanency of the
owner of the domain.

Most of this has already been addressed in this thread.  What I haven't seen
addressed (maybe I missed it?) is the decreasing opacity of the GUID as you
go through the sequence above from UUID to URL.  The decreasing opacity, as
far as I understand it, is directly tied to the increasing reliance on
embedded "meta-information" within the GUID itself to allow the GUID to be
self-resolving.

We all know that the value of a GUID is a function of its permanency, and we
all agree that the weakest link in the chain of permanency is the social
contract part.  To me, one of the greatest values of maintaining opacity of
GUIDs is to help facilitate (encourage) the social contract for permanency.
Or, put another way, to decrease the opportunity/temptation for breaking the
social contract for permanency.

Using the examples above, if the domain name "landcareresearch.co.nz" is
ever abandoned or changed or otherwise broken, the URL will die, but the
LSID will continue to function (if I understand the LSID protocol
correctly).  In the case of HTTP, the domain name relies on current DNS
mapping, whereas the LSID does not.

So, my overall point here is that we seem to have two topics that are being
conflated: GUIDs as identifiers per se, and resolution mechanisms for the
GUIDs. The more reliably and simply the GUIDs are in terms of being
self-resolving, the less opaque the become, and the more concern people have
(justifiably or not) that permanency will be jeopardized.

This leads me to the second part of this post, which I've already hinted at
earlier.

Most GUID schemes seem to follow the basic pattern of:

[GloballyUniquePrefixStuff]+[LocallyUniqueIDentifier]

Generally, the "GUPS" part is somehow attached to the issuer, and the "LUID"
part is unique only within the context of the "GUPS". Also, any
self-embedded resolution information is contained within the "GUPS".

In the example at the beginning of this message, I started with a UUID,
which by itself is globally unique.  I carried this through to to other
examples, such that the "LUID" portion of each example was actually by
itself a GUID.  Obviously, there is no guarantee of that for the "LUID" part
of DOIs, Handles, LSIDs or URLs...but I keep asking myself why we as a
community (i.e., TDWG) don't come out with some sort of "best practice" (if
not recommendation, or even outright standard) that those of us who have not
yet begun to issue GUIDs (but soon will) all make an effort to use something
like UUIDs for the "object identifier" ("LUID") portion of the GUIDS we
issue.

My original thought was to register a Handle prefix to modify my LUID, such
that "987654321" becomes "1234/987654321", which could then become
"URN:LSID:bishopmuseum.org:1234:987654321" (or perhaps
"URN:LSID:bishopmuseum.org:guid:1234/987654321", or even
"URN:LSID:bishopmuseum.org:Names:1234/987654321"), which could then become
"http://guid.bishopmuseum.org/?URN:LSID:bishopmuseum.org:1234:987654321" (or
maybe just "http://guid.bishopmuseum.org/?1234:987654321"). 

But now I'm thinking maybe it's best to abandon the Handle part, and just
issue UUIDs as my local identifiers, which can then be embedded in as many
other GUID-resolution layers as I wish. This seems to be more or less what
Kevin Richards has done for his LSIDs converted to URLs:

http://lsid.landcareresearch.co.nz/lsid/URN:LSID:landcareresearch.co.nz:Name
s:0F9D60F1-59C7-495D-A37E-09C23770DD18

In any case, my point is that it seems to make sense to me that we can all
hedge our bets on which resolution mechanism emerges victoriously by
starting off with a resolution-less UUID (or some other GUID serving a pure
"identifier" role) at the core of our locally-issued GUIDs, and then
represent those core identifiers through different resolution-level GUIDs
however we see fit.

I would very-much appreciate it if someone could explain to me what I am
missing.

Aloha,
Rich


> -----Original Message-----
> From: tdwg-guid-bounces at lists.tdwg.org 
> [mailto:tdwg-guid-bounces at lists.tdwg.org] On Behalf Of Bob Morris
> Sent: Tuesday, June 12, 2007 11:02 AM
> To: Kevin Richards
> Cc: tdwg-guid at lists.tdwg.org; Gregor Hagedorn
> Subject: Re: [tdwg-guid] First step in implementing LSIDs
> 
> Is the http proxy a GUID?
> 
> What vouches for an assertion that a given proxy actually 
> resolves the associated LSID?
> 
> 
> 
> On 6/12/07, Kevin Richards <RichardsK at landcareresearch.co.nz> wrote:
> >
> >
> > The main use of the LSID http proxy would be when you have an RDF 
> > document or a triple store full of data that has BOTH the 
> LSID and the 
> > http version (as described on the wiki page 
> > 
> http://wiki.tdwg.org/twiki/bin/view/GUID/LsidHttpProxyUsageRec
> ommendation).
> >
> > If ALL you had was the LSID
> > 
> "URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E
> -09C23770DD18"
> > and no other data at all, and you want to resolve it using the http 
> > proxy, then yes you are a bit stuck (as far as I know).  
> Unless we set 
> > up an LSID http proxy repository that can be queried and 
> returns the 
> > http proxy url for an LSID?  Or you could just use the "hard coded" 
> > http proxy resolver at http:/lsid.tdwg.org/[lsid].
> >
> > Kevin
> >
> > >>> "Gregor Hagedorn" <G.Hagedorn at BBA.DE> 12/06/2007 
> 7:17:35 p.m. >>>
> >
> > > For those wanting another LSID http proxy example, I have changed 
> > > our LSID resolver here at Landcare to serve up the proxy 
> compliant RDF.
> > >
> > > Eg
> > >
> > 
> http://lsid.landcareresearch.co.nz/lsid/URN:LSID:landcareresearch.co.n
> > z:Names:0F9D60F1-59C7-495D-A37E-09C23770DD18
> > >
> > > returns the metadata for the lsid
> > >
> > 
> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
> > DD18
> > >
> > > I took me about an hour to change the RDF generator and setup a 
> > > redirection web directory on our web server (Microsoft IIS).
> >
> > I have an object with
> >
> > 
> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
> > DD18
> >
> > I think the problem is that, without me telling my software 
> something 
> > I have gathered from this email, my software has no means to know 
> > about what you describe as easy.
> >
> > Unless it has an LSID resolver, in which case it would not need the 
> > http method.
> >
> > This is what the proposal to always use alternating http and LSID 
> > guids in any object we communicate about is saying. Whenever you 
> > publish
> > 
> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
> > DD18
> > you
> > also have to provide the http version of it.
> >
> > Gregor----------------------------------------------------------
> > Gregor Hagedorn (G.Hagedorn at bba.de)
> > Institute for Plant Virology, Microbiology, and Biosafety Federal 
> > Research Center for Agriculture and Forestry (BBA)
> > Königin-Luise-Str. 19           Tel: +49-30-8304-2220
> > 14195 Berlin, Germany           Fax: +49-30-8304-2203
> >
> > _______________________________________________
> > tdwg-guid mailing list
> > tdwg-guid at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-guid
> > 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >  WARNING: This email and any attachments may be 
> confidential and/or  
> > privileged. They are intended for the addressee only and 
> are not to be 
> > read,  used, copied or disseminated by anyone receiving 
> them in error. 
> > If you are  not the intended recipient, please notify the sender by 
> > return email and  delete this message and any attachments.
> >
> >  The views expressed in this email are those of the sender 
> and do not  
> > necessarily reflect the official views of Landcare Research.
> >
> >  Landcare Research
> >  http://www.landcareresearch.co.nz
> > 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> >
> >
> > _______________________________________________
> > tdwg-guid mailing list
> > tdwg-guid at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-guid
> >
> >
> _______________________________________________
> tdwg-guid mailing list
> tdwg-guid at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-guid





More information about the tdwg-tag mailing list