Thanks, Jerry -- that's pretty much what I figured, and seems to make perfect sense to me. In my own case, as I am now in the process of completely overhauling our internal database structure and table cross-linking, I have arrived at essentially the same conclusion. For what it's worth, I have one sequentially-generated integer series that serves the purpose of providing primary key values for essentially every table in our system (only exceptions being a few enumerations with unambiguously finite value sets). Like you, I am now thinking of maintaining those integers for internal (performance) purposes only, and restricting what I expose publicly to GUIDs (I'm leaning toward following your lead of using UUIDs for this). Aloha, Rich
-----Original Message----- From: Jerry Cooper [mailto:cooperj@landcareresearch.co.nz] Sent: Tuesday, June 12, 2007 1:20 PM To: Richard Pyle; 'Bob Morris'; Kevin Richards Cc: 'Gregor Hagedorn'; tdwg-guid@lists.tdwg.org Subject: RE: [tdwg-guid] First step in implementing LSIDs
Richard,
In your analysis you had ...
[GloballyUniquePrefixStuff]+[LocallyUniqueIDentifier]
It is probably worth saying why our LUID is a semantically opaque GUID. And, yes we do consciously realise that we are 'hedging our bets' as you put it.
In our internal databases over the last few years we have gone through various mechanisms for providing primary key/foreign relationships. I recall numerous discussions around the practical utility of counters versus semantically opaque unique keys. Ultimately we decided that data we want to make publicly available should carry a GUID - even if it isn't used internally as part of a table relationship PK/FK. What you see on the end of our LSIDs are those public GUIDs. We did that prior to TDWG discussions around the adoption of LSIDs and GUID resolution mechanisms. It allowed us to get on with the job!
Jerry
"Richard Pyle" <deepreef@bishopmuseum.org> 13/06/2007 10:46:18 a.m.
>
At one point last week I had visions on catching up on this whole thread and commenting on all sorts of things -- but I don't think there is any hope of that in the forseeable future, so I'll just jump right in on the current discussion.
It seems to me that a lot of the recent discussion has been confused by an unclear disctinction between the needs of GUIDs as identifiers, and the means to resolve those GUIDs to data and metadata.
As someone earlier pointed out, if all we need are identifiers (without immediate concern for the resolution mechanism), then UUIDs will suffice. So, let's start there:
0F9D60F1-59C7-495D-A37E-09C23770DD18
Wonderful identifier, with utterly no information about what it represents, or what it would resolve to. When I type that text string into a web browser, I get nothing. So, we can either always rely on the GUID existing in some non-GUID-embedded context where the resolution mechanism is self-evident (maintained by the provider or the consumer of the GUID), or we can embed resolution "meta-information" within the GUID itself. These actually aren't fundamentally different, but for the sake of argument, let's assume that the former is unacceptable for our purposes. I can think of at least four obvious examples of the latter:
DOI:10.1234/0F9D60F1-59C7-495D-A37E-09C23770DD18
hdl:1234/0F9D60F1-59C7-495D-A37E-09C23770DD18
URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E- 09C23770DD18
http://lsid.landcareresearch.co.nz/guid/0F9D60F1-59C7-495D-A37 E-09C23770DD18
(Note that except for the LSID, these are my own creations.)
The first example (DOI) doesn't actually solve the problem, because merely appending "DOI:10.1234/" in front of my UUID does not, by itself make it resolvable. For example, if I type Rod's earlier example of "doi:10.1206/0003-0082(2005)485[0001:PNAICA]2.0.CO;2" into my browser, I get squat. That means that either I (as a human), or the software application that I write, needs some "insider information" to resolve the GUID. I don't see how this is any different from option numbers 2 or 3. Sure, you could argue that there is only one DOI system, or only one Handle system (in the same general sense that there is only one DNS system), so the amount of information required to resolve the GUID is relatively small and universally known. But it seems to me this is mostly true of LSIDs as well. In fact, if I simply install the LSID Launchpad plug-in for my browser, I *can* type the LSID in and get metadata back automatically (as long as I chop the "URN:" part -- which seems like an unnecessary step). The main difference among the first three that I see is that in this blink-of-an-eye-moment in history, there happen to be a lot of people using DOIs (and Handles?), whereas LSIDs have not (yet?) caught on as widely.
So, that leaves the fourth option (HTTP URL): the obvious advantage of which is that resolution is a snap using current browser protocols and technologies (no plug-ins required), and development support is much better (given how widely used HTTP is); and the apparent disadvantages being some social concern of link-rot and/or uncertainty about the permanency of the owner of the domain.
Most of this has already been addressed in this thread. What I haven't seen addressed (maybe I missed it?) is the decreasing opacity of the GUID as you go through the sequence above from UUID to URL. The decreasing opacity, as far as I understand it, is directly tied to the increasing reliance on embedded "meta-information" within the GUID itself to allow the GUID to be self-resolving.
We all know that the value of a GUID is a function of its permanency, and we all agree that the weakest link in the chain of permanency is the social contract part. To me, one of the greatest values of maintaining opacity of GUIDs is to help facilitate (encourage) the social contract for permanency. Or, put another way, to decrease the opportunity/temptation for breaking the social contract for permanency.
Using the examples above, if the domain name "landcareresearch.co.nz" is ever abandoned or changed or otherwise broken, the URL will die, but the LSID will continue to function (if I understand the LSID protocol correctly). In the case of HTTP, the domain name relies on current DNS mapping, whereas the LSID does not.
So, my overall point here is that we seem to have two topics that are being conflated: GUIDs as identifiers per se, and resolution mechanisms for the GUIDs. The more reliably and simply the GUIDs are in terms of being self-resolving, the less opaque the become, and the more concern people have (justifiably or not) that permanency will be jeopardized.
This leads me to the second part of this post, which I've already hinted at earlier.
Most GUID schemes seem to follow the basic pattern of:
[GloballyUniquePrefixStuff]+[LocallyUniqueIDentifier]
Generally, the "GUPS" part is somehow attached to the issuer, and the "LUID" part is unique only within the context of the "GUPS". Also, any self-embedded resolution information is contained within the "GUPS".
In the example at the beginning of this message, I started with a UUID, which by itself is globally unique. I carried this through to to other examples, such that the "LUID" portion of each example was actually by itself a GUID. Obviously, there is no guarantee of that for the "LUID" part of DOIs, Handles, LSIDs or URLs...but I keep asking myself why we as a community (i.e., TDWG) don't come out with some sort of "best practice" (if not recommendation, or even outright standard) that those of us who have not yet begun to issue GUIDs (but soon will) all make an effort to use something like UUIDs for the "object identifier" ("LUID") portion of the GUIDS we issue.
My original thought was to register a Handle prefix to modify my LUID, such that "987654321" becomes "1234/987654321", which could then become "URN:LSID:bishopmuseum.org:1234:987654321" (or perhaps "URN:LSID:bishopmuseum.org:guid:1234/987654321", or even "URN:LSID:bishopmuseum.org:Names:1234/987654321"), which could then become "http://guid.bishopmuseum.org/?URN:LSID:bishopmuseum.org:1234: 987654321" (or maybe just "http://guid.bishopmuseum.org/?1234:987654321").
But now I'm thinking maybe it's best to abandon the Handle part, and just issue UUIDs as my local identifiers, which can then be embedded in as many other GUID-resolution layers as I wish. This seems to be more or less what Kevin Richards has done for his LSIDs converted to URLs:
http://lsid.landcareresearch.co.nz/lsid/URN:LSID:landcareresea rch.co.nz:Name s:0F9D60F1-59C7-495D-A37E-09C23770DD18
In any case, my point is that it seems to make sense to me that we can all hedge our bets on which resolution mechanism emerges victoriously by starting off with a resolution-less UUID (or some other GUID serving a pure "identifier" role) at the core of our locally-issued GUIDs, and then represent those core identifiers through different resolution-level GUIDs however we see fit.
I would very-much appreciate it if someone could explain to me what I am missing.
Aloha, Rich
-----Original Message----- From: tdwg-guid-bounces@lists.tdwg.org [mailto:tdwg-guid-bounces@lists.tdwg.org] On Behalf Of Bob Morris Sent: Tuesday, June 12, 2007 11:02 AM To: Kevin Richards Cc: tdwg-guid@lists.tdwg.org; Gregor Hagedorn Subject: Re: [tdwg-guid] First step in implementing LSIDs
Is the http proxy a GUID?
What vouches for an assertion that a given proxy actually resolves the associated LSID?
On 6/12/07, Kevin Richards <RichardsK@landcareresearch.co.nz> wrote:
The main use of the LSID http proxy would be when you have an RDF document or a triple store full of data that has BOTH the
LSID and the
http version (as described on the wiki page
http://wiki.tdwg.org/twiki/bin/view/GUID/LsidHttpProxyUsageRec ommendation).
If ALL you had was the LSID
"URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E -09C23770DD18"
and no other data at all, and you want to resolve it using the http proxy, then yes you are a bit stuck (as far as I know). Unless we set up an LSID http proxy repository that can be queried and returns the http proxy url for an LSID? Or you could just use the "hard coded" http proxy resolver at http:/lsid.tdwg.org/[lsid].
Kevin
"Gregor Hagedorn" <G.Hagedorn@BBA.DE> 12/06/2007 7:17:35 p.m. >>>
For those wanting another LSID http proxy example, I have changed our LSID resolver here at Landcare to serve up the proxy compliant RDF.
Eg
http://lsid.landcareresearch.co.nz/lsid/URN:LSID:landcareresearch.co.n
z:Names:0F9D60F1-59C7-495D-A37E-09C23770DD18
returns the metadata for the lsid
URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
DD18
I took me about an hour to change the RDF generator and setup a redirection web directory on our web server (Microsoft IIS).
I have an object with
URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
DD18
I think the problem is that, without me telling my software something I have gathered from this email, my software has no means to know about what you describe as easy.
Unless it has an LSID resolver, in which case it would not need the http method.
This is what the proposal to always use alternating http and LSID guids in any object we communicate about is saying. Whenever you publish
URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-09C23770
DD18 you also have to provide the http version of it.
Gregor---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
_______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++
WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++
_______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
_______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
_______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++