[tdwg-guid] First step in implementing LSIDs

Richard Pyle deepreef at bishopmuseum.org
Wed Jun 13 01:28:41 CEST 2007


Thanks, Jerry -- that's pretty much what I figured, and seems to make
perfect sense to me.  In my own case, as I am now in the process of
completely overhauling our internal database structure and table
cross-linking, I have arrived at essentially the same conclusion.  For what
it's worth, I have one sequentially-generated integer series that serves the
purpose of providing primary key values for essentially every table in our
system (only exceptions being a few enumerations with unambiguously finite
value sets).  Like you, I am now thinking of maintaining those integers for
internal (performance) purposes only, and restricting what I expose publicly
to GUIDs (I'm leaning toward following your lead of using UUIDs for this).

Aloha,
Rich

> -----Original Message-----
> From: Jerry Cooper [mailto:cooperj at landcareresearch.co.nz] 
> Sent: Tuesday, June 12, 2007 1:20 PM
> To: Richard Pyle; 'Bob Morris'; Kevin Richards
> Cc: 'Gregor Hagedorn'; tdwg-guid at lists.tdwg.org
> Subject: RE: [tdwg-guid] First step in implementing LSIDs
> 
> Richard,
> 
> In your analysis you had ...
> 
> [GloballyUniquePrefixStuff]+[LocallyUniqueIDentifier]
> 
> It is probably worth saying why our LUID  is  a semantically 
> opaque GUID. And, yes we do consciously realise that we are  
> 'hedging our bets' as you put it.
> 
> In our internal databases over the last few years we have 
> gone through various mechanisms for providing primary 
> key/foreign relationships. I recall numerous discussions 
> around the practical utility of counters versus semantically 
> opaque unique keys.   Ultimately we decided that data we want 
> to make publicly available should carry a GUID - even if it 
> isn't used internally as part of a table relationship PK/FK. 
> What you see on the end of our LSIDs are those public GUIDs. 
> We did that prior to TDWG  discussions around the adoption of 
> LSIDs and GUID resolution mechanisms. It allowed us to get on 
> with the job!
> 
> Jerry
> 
> 
> >>> "Richard Pyle" <deepreef at bishopmuseum.org> 13/06/2007 
> 10:46:18 a.m. 
> >>> >>>
> 
> At one point last week I had visions on catching up on this 
> whole thread and commenting on all sorts of things -- but I 
> don't think there is any hope of that in the forseeable 
> future, so I'll just jump right in on the current discussion.
> 
> It seems to me that a lot of the recent discussion has been 
> confused by an unclear disctinction between the needs of 
> GUIDs as identifiers, and the means to resolve those GUIDs to 
> data and metadata.
> 
> As someone earlier pointed out, if all we need are 
> identifiers (without immediate concern for the resolution 
> mechanism), then UUIDs will suffice.
> So, let's start there:
> 
> 0F9D60F1-59C7-495D-A37E-09C23770DD18
> 
> Wonderful identifier, with utterly no information about what 
> it represents, or what it would resolve to.  When I type that 
> text string into a web browser, I get nothing.  So, we can 
> either always rely on the GUID existing in some 
> non-GUID-embedded context where the resolution mechanism is 
> self-evident (maintained by the provider or the consumer of 
> the GUID), or we can embed resolution "meta-information" 
> within the GUID itself. These actually aren't fundamentally 
> different, but for the sake of argument, let's assume that 
> the former is unacceptable for our purposes. I can think of 
> at least four obvious examples of the latter:
> 
> DOI:10.1234/0F9D60F1-59C7-495D-A37E-09C23770DD18
> 
> hdl:1234/0F9D60F1-59C7-495D-A37E-09C23770DD18
> 
> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-
> 09C23770DD18
> 
> http://lsid.landcareresearch.co.nz/guid/0F9D60F1-59C7-495D-A37
> E-09C23770DD18 
> 
> (Note that except for the LSID, these are my own creations.)
> 
> The first example (DOI) doesn't actually solve the problem, 
> because merely appending "DOI:10.1234/" in front of my UUID 
> does not, by itself make it resolvable.  For example, if I 
> type Rod's earlier example of 
> "doi:10.1206/0003-0082(2005)485[0001:PNAICA]2.0.CO;2" into my 
> browser, I get squat. That means that either I (as a human), 
> or the software application that I write, needs some "insider 
> information" to resolve the GUID. I don't see how this is any 
> different from option numbers 2 or 3.  Sure, you could argue 
> that there is only one DOI system, or only one Handle system 
> (in the same general sense that there is only one DNS 
> system), so the amount of information required to resolve the 
> GUID is relatively small and universally known.  But it seems 
> to me this is mostly true of LSIDs as well.  In fact, if I 
> simply install the LSID Launchpad plug-in for my browser, I 
> *can* type the LSID in and get metadata back automatically 
> (as long as I chop the "URN:" part -- which seems like an 
> unnecessary step). The main difference among the first three 
> that I see is that in this blink-of-an-eye-moment in history, 
> there happen to be a lot of people using DOIs (and Handles?), 
> whereas LSIDs have not (yet?) caught on as widely.
> 
> So, that leaves the fourth option (HTTP URL): the obvious 
> advantage of which is that resolution is a snap using current 
> browser protocols and technologies (no plug-ins required), 
> and development support is much better (given how widely used 
> HTTP is); and the apparent disadvantages being some social 
> concern of link-rot and/or uncertainty about the permanency 
> of the owner of the domain.
> 
> Most of this has already been addressed in this thread.  What 
> I haven't seen addressed (maybe I missed it?) is the 
> decreasing opacity of the GUID as you go through the sequence 
> above from UUID to URL.  The decreasing opacity, as far as I 
> understand it, is directly tied to the increasing reliance on 
> embedded "meta-information" within the GUID itself to allow 
> the GUID to be self-resolving.
> 
> We all know that the value of a GUID is a function of its 
> permanency, and we all agree that the weakest link in the 
> chain of permanency is the social contract part.  To me, one 
> of the greatest values of maintaining opacity of GUIDs is to 
> help facilitate (encourage) the social contract for permanency.
> Or, put another way, to decrease the opportunity/temptation 
> for breaking the social contract for permanency.
> 
> Using the examples above, if the domain name 
> "landcareresearch.co.nz" is ever abandoned or changed or 
> otherwise broken, the URL will die, but the LSID will 
> continue to function (if I understand the LSID protocol 
> correctly).  In the case of HTTP, the domain name relies on 
> current DNS mapping, whereas the LSID does not.
> 
> So, my overall point here is that we seem to have two topics 
> that are being
> conflated: GUIDs as identifiers per se, and resolution 
> mechanisms for the GUIDs. The more reliably and simply the 
> GUIDs are in terms of being self-resolving, the less opaque 
> the become, and the more concern people have (justifiably or 
> not) that permanency will be jeopardized.
> 
> This leads me to the second part of this post, which I've 
> already hinted at earlier.
> 
> Most GUID schemes seem to follow the basic pattern of:
> 
> [GloballyUniquePrefixStuff]+[LocallyUniqueIDentifier]
> 
> Generally, the "GUPS" part is somehow attached to the issuer, 
> and the "LUID"
> part is unique only within the context of the "GUPS". Also, 
> any self-embedded resolution information is contained within 
> the "GUPS".
> 
> In the example at the beginning of this message, I started 
> with a UUID, which by itself is globally unique.  I carried 
> this through to to other examples, such that the "LUID" 
> portion of each example was actually by itself a GUID.  
> Obviously, there is no guarantee of that for the "LUID" part 
> of DOIs, Handles, LSIDs or URLs...but I keep asking myself 
> why we as a community (i.e., TDWG) don't come out with some 
> sort of "best practice" (if not recommendation, or even 
> outright standard) that those of us who have not yet begun to 
> issue GUIDs (but soon will) all make an effort to use 
> something like UUIDs for the "object identifier" ("LUID") 
> portion of the GUIDS we issue.
> 
> My original thought was to register a Handle prefix to modify 
> my LUID, such that "987654321" becomes "1234/987654321", 
> which could then become 
> "URN:LSID:bishopmuseum.org:1234:987654321" (or perhaps 
> "URN:LSID:bishopmuseum.org:guid:1234/987654321", or even 
> "URN:LSID:bishopmuseum.org:Names:1234/987654321"), which 
> could then become 
> "http://guid.bishopmuseum.org/?URN:LSID:bishopmuseum.org:1234:
> 987654321" (or maybe just 
> "http://guid.bishopmuseum.org/?1234:987654321"). 
> 
> But now I'm thinking maybe it's best to abandon the Handle 
> part, and just issue UUIDs as my local identifiers, which can 
> then be embedded in as many other GUID-resolution layers as I 
> wish. This seems to be more or less what Kevin Richards has 
> done for his LSIDs converted to URLs:
> 
> http://lsid.landcareresearch.co.nz/lsid/URN:LSID:landcareresea
> rch.co.nz:Name
> s:0F9D60F1-59C7-495D-A37E-09C23770DD18
> 
> In any case, my point is that it seems to make sense to me 
> that we can all hedge our bets on which resolution mechanism 
> emerges victoriously by starting off with a resolution-less 
> UUID (or some other GUID serving a pure "identifier" role) at 
> the core of our locally-issued GUIDs, and then represent 
> those core identifiers through different resolution-level 
> GUIDs however we see fit.
> 
> I would very-much appreciate it if someone could explain to 
> me what I am missing.
> 
> Aloha,
> Rich
> 
> 
> > -----Original Message-----
> > From: tdwg-guid-bounces at lists.tdwg.org 
> > [mailto:tdwg-guid-bounces at lists.tdwg.org] On Behalf Of Bob Morris
> > Sent: Tuesday, June 12, 2007 11:02 AM
> > To: Kevin Richards
> > Cc: tdwg-guid at lists.tdwg.org; Gregor Hagedorn
> > Subject: Re: [tdwg-guid] First step in implementing LSIDs
> > 
> > Is the http proxy a GUID?
> > 
> > What vouches for an assertion that a given proxy actually 
> resolves the 
> > associated LSID?
> > 
> > 
> > 
> > On 6/12/07, Kevin Richards <RichardsK at landcareresearch.co.nz> wrote:
> > >
> > >
> > > The main use of the LSID http proxy would be when you have an RDF 
> > > document or a triple store full of data that has BOTH the
> > LSID and the
> > > http version (as described on the wiki page
> > > 
> > http://wiki.tdwg.org/twiki/bin/view/GUID/LsidHttpProxyUsageRec
> > ommendation).
> > >
> > > If ALL you had was the LSID
> > > 
> > "URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E
> > -09C23770DD18"
> > > and no other data at all, and you want to resolve it 
> using the http 
> > > proxy, then yes you are a bit stuck (as far as I know).
> > Unless we set
> > > up an LSID http proxy repository that can be queried and
> > returns the
> > > http proxy url for an LSID?  Or you could just use the 
> "hard coded" 
> > > http proxy resolver at http:/lsid.tdwg.org/[lsid].
> > >
> > > Kevin
> > >
> > > >>> "Gregor Hagedorn" <G.Hagedorn at BBA.DE> 12/06/2007
> > 7:17:35 p.m. >>>
> > >
> > > > For those wanting another LSID http proxy example, I 
> have changed 
> > > > our LSID resolver here at Landcare to serve up the proxy
> > compliant RDF.
> > > >
> > > > Eg
> > > >
> > > 
> > 
> http://lsid.landcareresearch.co.nz/lsid/URN:LSID:landcareresearch.co.n
> > > z:Names:0F9D60F1-59C7-495D-A37E-09C23770DD18
> > > >
> > > > returns the metadata for the lsid
> > > >
> > > 
> > 
> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
> > > DD18
> > > >
> > > > I took me about an hour to change the RDF generator and setup a 
> > > > redirection web directory on our web server (Microsoft IIS).
> > >
> > > I have an object with
> > >
> > > 
> > 
> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
> > > DD18
> > >
> > > I think the problem is that, without me telling my software
> > something
> > > I have gathered from this email, my software has no means to know 
> > > about what you describe as easy.
> > >
> > > Unless it has an LSID resolver, in which case it would 
> not need the 
> > > http method.
> > >
> > > This is what the proposal to always use alternating http and LSID 
> > > guids in any object we communicate about is saying. Whenever you 
> > > publish
> > > 
> > 
> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
> > > DD18
> > > you
> > > also have to provide the http version of it.
> > >
> > > Gregor----------------------------------------------------------
> > > Gregor Hagedorn (G.Hagedorn at bba.de)
> > > Institute for Plant Virology, Microbiology, and Biosafety Federal 
> > > Research Center for Agriculture and Forestry (BBA)
> > > Königin-Luise-Str. 19           Tel: +49-30-8304-2220
> > > 14195 Berlin, Germany           Fax: +49-30-8304-2203
> > >
> > > _______________________________________________
> > > tdwg-guid mailing list
> > > tdwg-guid at lists.tdwg.org
> > > http://lists.tdwg.org/mailman/listinfo/tdwg-guid
> > > 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > >  WARNING: This email and any attachments may be
> > confidential and/or
> > > privileged. They are intended for the addressee only and
> > are not to be
> > > read,  used, copied or disseminated by anyone receiving
> > them in error. 
> > > If you are  not the intended recipient, please notify the 
> sender by 
> > > return email and  delete this message and any attachments.
> > >
> > >  The views expressed in this email are those of the sender
> > and do not
> > > necessarily reflect the official views of Landcare Research.
> > >
> > >  Landcare Research
> > >  http://www.landcareresearch.co.nz
> > > 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > >
> > >
> > > _______________________________________________
> > > tdwg-guid mailing list
> > > tdwg-guid at lists.tdwg.org
> > > http://lists.tdwg.org/mailman/listinfo/tdwg-guid
> > >
> > >
> > _______________________________________________
> > tdwg-guid mailing list
> > tdwg-guid at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-guid
> 
> 
> _______________________________________________
> tdwg-guid mailing list
> tdwg-guid at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> WARNING: This email and any attachments may be confidential 
> and/or privileged. They are intended for the addressee only 
> and are not to be read, used, copied or disseminated by 
> anyone receiving them in error.  If you are not the intended 
> recipient, please notify the sender by return email and 
> delete this message and any attachments.
> 
> The views expressed in this email are those of the sender and 
> do not necessarily reflect the official views of Landcare Research.  
> 
> Landcare Research
> http://www.landcareresearch.co.nz
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
> 
> 





More information about the tdwg-tag mailing list