[tdwg-guid] First step in implementing LSIDs

Markus Döring m.doering at bgbm.org
Thu Jun 14 16:40:31 CEST 2007


Richard,
I came to pretty much the same conclusions and within the EDIT  
project and most new BGBM projects we are using UUIDs too. They are  
also great for primary keys if you want to merge data from different  
sources. You never have to change any keys anywhere.

UUID are URNs too by the way and use the scheme uuid which is  
registered with IANA:
http://www.iana.org/assignments/urn-namespaces

So your example below is actually:
urn:uuid:0F9D60F1-59C7-495D-A37E-09C23770DD18

which isn't resolvable per se, but can easily be resolved by  
different resolvers and thus become URLs.
So far for me uuids seem to be the safest bet that plays well with  
any other GUID technology.
--
Markus



On 13.06.2007, at 01:28, Richard Pyle wrote:

>
> Thanks, Jerry -- that's pretty much what I figured, and seems to make
> perfect sense to me.  In my own case, as I am now in the process of
> completely overhauling our internal database structure and table
> cross-linking, I have arrived at essentially the same conclusion.   
> For what
> it's worth, I have one sequentially-generated integer series that  
> serves the
> purpose of providing primary key values for essentially every table  
> in our
> system (only exceptions being a few enumerations with unambiguously  
> finite
> value sets).  Like you, I am now thinking of maintaining those  
> integers for
> internal (performance) purposes only, and restricting what I expose  
> publicly
> to GUIDs (I'm leaning toward following your lead of using UUIDs for  
> this).
>
> Aloha,
> Rich
>
>> -----Original Message-----
>> From: Jerry Cooper [mailto:cooperj at landcareresearch.co.nz]
>> Sent: Tuesday, June 12, 2007 1:20 PM
>> To: Richard Pyle; 'Bob Morris'; Kevin Richards
>> Cc: 'Gregor Hagedorn'; tdwg-guid at lists.tdwg.org
>> Subject: RE: [tdwg-guid] First step in implementing LSIDs
>>
>> Richard,
>>
>> In your analysis you had ...
>>
>> [GloballyUniquePrefixStuff]+[LocallyUniqueIDentifier]
>>
>> It is probably worth saying why our LUID  is  a semantically
>> opaque GUID. And, yes we do consciously realise that we are
>> 'hedging our bets' as you put it.
>>
>> In our internal databases over the last few years we have
>> gone through various mechanisms for providing primary
>> key/foreign relationships. I recall numerous discussions
>> around the practical utility of counters versus semantically
>> opaque unique keys.   Ultimately we decided that data we want
>> to make publicly available should carry a GUID - even if it
>> isn't used internally as part of a table relationship PK/FK.
>> What you see on the end of our LSIDs are those public GUIDs.
>> We did that prior to TDWG  discussions around the adoption of
>> LSIDs and GUID resolution mechanisms. It allowed us to get on
>> with the job!
>>
>> Jerry
>>
>>
>>>>> "Richard Pyle" <deepreef at bishopmuseum.org> 13/06/2007
>> 10:46:18 a.m.
>>>>>>>>
>>
>> At one point last week I had visions on catching up on this
>> whole thread and commenting on all sorts of things -- but I
>> don't think there is any hope of that in the forseeable
>> future, so I'll just jump right in on the current discussion.
>>
>> It seems to me that a lot of the recent discussion has been
>> confused by an unclear disctinction between the needs of
>> GUIDs as identifiers, and the means to resolve those GUIDs to
>> data and metadata.
>>
>> As someone earlier pointed out, if all we need are
>> identifiers (without immediate concern for the resolution
>> mechanism), then UUIDs will suffice.
>> So, let's start there:
>>
>> 0F9D60F1-59C7-495D-A37E-09C23770DD18
>>
>> Wonderful identifier, with utterly no information about what
>> it represents, or what it would resolve to.  When I type that
>> text string into a web browser, I get nothing.  So, we can
>> either always rely on the GUID existing in some
>> non-GUID-embedded context where the resolution mechanism is
>> self-evident (maintained by the provider or the consumer of
>> the GUID), or we can embed resolution "meta-information"
>> within the GUID itself. These actually aren't fundamentally
>> different, but for the sake of argument, let's assume that
>> the former is unacceptable for our purposes. I can think of
>> at least four obvious examples of the latter:
>>
>> DOI:10.1234/0F9D60F1-59C7-495D-A37E-09C23770DD18
>>
>> hdl:1234/0F9D60F1-59C7-495D-A37E-09C23770DD18
>>
>> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E-
>> 09C23770DD18
>>
>> http://lsid.landcareresearch.co.nz/guid/0F9D60F1-59C7-495D-A37
>> E-09C23770DD18
>>
>> (Note that except for the LSID, these are my own creations.)
>>
>> The first example (DOI) doesn't actually solve the problem,
>> because merely appending "DOI:10.1234/" in front of my UUID
>> does not, by itself make it resolvable.  For example, if I
>> type Rod's earlier example of
>> "doi:10.1206/0003-0082(2005)485[0001:PNAICA]2.0.CO;2" into my
>> browser, I get squat. That means that either I (as a human),
>> or the software application that I write, needs some "insider
>> information" to resolve the GUID. I don't see how this is any
>> different from option numbers 2 or 3.  Sure, you could argue
>> that there is only one DOI system, or only one Handle system
>> (in the same general sense that there is only one DNS
>> system), so the amount of information required to resolve the
>> GUID is relatively small and universally known.  But it seems
>> to me this is mostly true of LSIDs as well.  In fact, if I
>> simply install the LSID Launchpad plug-in for my browser, I
>> *can* type the LSID in and get metadata back automatically
>> (as long as I chop the "URN:" part -- which seems like an
>> unnecessary step). The main difference among the first three
>> that I see is that in this blink-of-an-eye-moment in history,
>> there happen to be a lot of people using DOIs (and Handles?),
>> whereas LSIDs have not (yet?) caught on as widely.
>>
>> So, that leaves the fourth option (HTTP URL): the obvious
>> advantage of which is that resolution is a snap using current
>> browser protocols and technologies (no plug-ins required),
>> and development support is much better (given how widely used
>> HTTP is); and the apparent disadvantages being some social
>> concern of link-rot and/or uncertainty about the permanency
>> of the owner of the domain.
>>
>> Most of this has already been addressed in this thread.  What
>> I haven't seen addressed (maybe I missed it?) is the
>> decreasing opacity of the GUID as you go through the sequence
>> above from UUID to URL.  The decreasing opacity, as far as I
>> understand it, is directly tied to the increasing reliance on
>> embedded "meta-information" within the GUID itself to allow
>> the GUID to be self-resolving.
>>
>> We all know that the value of a GUID is a function of its
>> permanency, and we all agree that the weakest link in the
>> chain of permanency is the social contract part.  To me, one
>> of the greatest values of maintaining opacity of GUIDs is to
>> help facilitate (encourage) the social contract for permanency.
>> Or, put another way, to decrease the opportunity/temptation
>> for breaking the social contract for permanency.
>>
>> Using the examples above, if the domain name
>> "landcareresearch.co.nz" is ever abandoned or changed or
>> otherwise broken, the URL will die, but the LSID will
>> continue to function (if I understand the LSID protocol
>> correctly).  In the case of HTTP, the domain name relies on
>> current DNS mapping, whereas the LSID does not.
>>
>> So, my overall point here is that we seem to have two topics
>> that are being
>> conflated: GUIDs as identifiers per se, and resolution
>> mechanisms for the GUIDs. The more reliably and simply the
>> GUIDs are in terms of being self-resolving, the less opaque
>> the become, and the more concern people have (justifiably or
>> not) that permanency will be jeopardized.
>>
>> This leads me to the second part of this post, which I've
>> already hinted at earlier.
>>
>> Most GUID schemes seem to follow the basic pattern of:
>>
>> [GloballyUniquePrefixStuff]+[LocallyUniqueIDentifier]
>>
>> Generally, the "GUPS" part is somehow attached to the issuer,
>> and the "LUID"
>> part is unique only within the context of the "GUPS". Also,
>> any self-embedded resolution information is contained within
>> the "GUPS".
>>
>> In the example at the beginning of this message, I started
>> with a UUID, which by itself is globally unique.  I carried
>> this through to to other examples, such that the "LUID"
>> portion of each example was actually by itself a GUID.
>> Obviously, there is no guarantee of that for the "LUID" part
>> of DOIs, Handles, LSIDs or URLs...but I keep asking myself
>> why we as a community (i.e., TDWG) don't come out with some
>> sort of "best practice" (if not recommendation, or even
>> outright standard) that those of us who have not yet begun to
>> issue GUIDs (but soon will) all make an effort to use
>> something like UUIDs for the "object identifier" ("LUID")
>> portion of the GUIDS we issue.
>>
>> My original thought was to register a Handle prefix to modify
>> my LUID, such that "987654321" becomes "1234/987654321",
>> which could then become
>> "URN:LSID:bishopmuseum.org:1234:987654321" (or perhaps
>> "URN:LSID:bishopmuseum.org:guid:1234/987654321", or even
>> "URN:LSID:bishopmuseum.org:Names:1234/987654321"), which
>> could then become
>> "http://guid.bishopmuseum.org/?URN:LSID:bishopmuseum.org:1234:
>> 987654321" (or maybe just
>> "http://guid.bishopmuseum.org/?1234:987654321").
>>
>> But now I'm thinking maybe it's best to abandon the Handle
>> part, and just issue UUIDs as my local identifiers, which can
>> then be embedded in as many other GUID-resolution layers as I
>> wish. This seems to be more or less what Kevin Richards has
>> done for his LSIDs converted to URLs:
>>
>> http://lsid.landcareresearch.co.nz/lsid/URN:LSID:landcareresea
>> rch.co.nz:Name
>> s:0F9D60F1-59C7-495D-A37E-09C23770DD18
>>
>> In any case, my point is that it seems to make sense to me
>> that we can all hedge our bets on which resolution mechanism
>> emerges victoriously by starting off with a resolution-less
>> UUID (or some other GUID serving a pure "identifier" role) at
>> the core of our locally-issued GUIDs, and then represent
>> those core identifiers through different resolution-level
>> GUIDs however we see fit.
>>
>> I would very-much appreciate it if someone could explain to
>> me what I am missing.
>>
>> Aloha,
>> Rich
>>
>>
>>> -----Original Message-----
>>> From: tdwg-guid-bounces at lists.tdwg.org
>>> [mailto:tdwg-guid-bounces at lists.tdwg.org] On Behalf Of Bob Morris
>>> Sent: Tuesday, June 12, 2007 11:02 AM
>>> To: Kevin Richards
>>> Cc: tdwg-guid at lists.tdwg.org; Gregor Hagedorn
>>> Subject: Re: [tdwg-guid] First step in implementing LSIDs
>>>
>>> Is the http proxy a GUID?
>>>
>>> What vouches for an assertion that a given proxy actually
>> resolves the
>>> associated LSID?
>>>
>>>
>>>
>>> On 6/12/07, Kevin Richards <RichardsK at landcareresearch.co.nz> wrote:
>>>>
>>>>
>>>> The main use of the LSID http proxy would be when you have an RDF
>>>> document or a triple store full of data that has BOTH the
>>> LSID and the
>>>> http version (as described on the wiki page
>>>>
>>> http://wiki.tdwg.org/twiki/bin/view/GUID/LsidHttpProxyUsageRec
>>> ommendation).
>>>>
>>>> If ALL you had was the LSID
>>>>
>>> "URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D-A37E
>>> -09C23770DD18"
>>>> and no other data at all, and you want to resolve it
>> using the http
>>>> proxy, then yes you are a bit stuck (as far as I know).
>>> Unless we set
>>>> up an LSID http proxy repository that can be queried and
>>> returns the
>>>> http proxy url for an LSID?  Or you could just use the
>> "hard coded"
>>>> http proxy resolver at http:/lsid.tdwg.org/[lsid].
>>>>
>>>> Kevin
>>>>
>>>>>>> "Gregor Hagedorn" <G.Hagedorn at BBA.DE> 12/06/2007
>>> 7:17:35 p.m. >>>
>>>>
>>>>> For those wanting another LSID http proxy example, I
>> have changed
>>>>> our LSID resolver here at Landcare to serve up the proxy
>>> compliant RDF.
>>>>>
>>>>> Eg
>>>>>
>>>>
>>>
>> http://lsid.landcareresearch.co.nz/lsid/ 
>> URN:LSID:landcareresearch.co.n
>>>> z:Names:0F9D60F1-59C7-495D-A37E-09C23770DD18
>>>>>
>>>>> returns the metadata for the lsid
>>>>>
>>>>
>>>
>> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D- 
>> A37E-09C23770
>>>> DD18
>>>>>
>>>>> I took me about an hour to change the RDF generator and setup a
>>>>> redirection web directory on our web server (Microsoft IIS).
>>>>
>>>> I have an object with
>>>>
>>>>
>>>
>> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D- 
>> A37E-09C23770
>>>> DD18
>>>>
>>>> I think the problem is that, without me telling my software
>>> something
>>>> I have gathered from this email, my software has no means to know
>>>> about what you describe as easy.
>>>>
>>>> Unless it has an LSID resolver, in which case it would
>> not need the
>>>> http method.
>>>>
>>>> This is what the proposal to always use alternating http and LSID
>>>> guids in any object we communicate about is saying. Whenever you
>>>> publish
>>>>
>>>
>> URN:LSID:landcareresearch.co.nz:Names:0F9D60F1-59C7-495D- 
>> A37E-09C23770
>>>> DD18
>>>> you
>>>> also have to provide the http version of it.
>>>>
>>>> Gregor----------------------------------------------------------
>>>> Gregor Hagedorn (G.Hagedorn at bba.de)
>>>> Institute for Plant Virology, Microbiology, and Biosafety Federal
>>>> Research Center for Agriculture and Forestry (BBA)
>>>> Königin-Luise-Str. 19           Tel: +49-30-8304-2220
>>>> 14195 Berlin, Germany           Fax: +49-30-8304-2203
>>>>
>>>> _______________________________________________
>>>> tdwg-guid mailing list
>>>> tdwg-guid at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
>>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++++++++++++++
>>>>  WARNING: This email and any attachments may be
>>> confidential and/or
>>>> privileged. They are intended for the addressee only and
>>> are not to be
>>>> read,  used, copied or disseminated by anyone receiving
>>> them in error.
>>>> If you are  not the intended recipient, please notify the
>> sender by
>>>> return email and  delete this message and any attachments.
>>>>
>>>>  The views expressed in this email are those of the sender
>>> and do not
>>>> necessarily reflect the official views of Landcare Research.
>>>>
>>>>  Landcare Research
>>>>  http://www.landcareresearch.co.nz
>>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++++++++++++++
>>>>
>>>>
>>>> _______________________________________________
>>>> tdwg-guid mailing list
>>>> tdwg-guid at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
>>>>
>>>>
>>> _______________________________________________
>>> tdwg-guid mailing list
>>> tdwg-guid at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
>>
>>
>> _______________________________________________
>> tdwg-guid mailing list
>> tdwg-guid at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-guid
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++++++++++++++
>> WARNING: This email and any attachments may be confidential
>> and/or privileged. They are intended for the addressee only
>> and are not to be read, used, copied or disseminated by
>> anyone receiving them in error.  If you are not the intended
>> recipient, please notify the sender by return email and
>> delete this message and any attachments.
>>
>> The views expressed in this email are those of the sender and
>> do not necessarily reflect the official views of Landcare Research.
>>
>> Landcare Research
>> http://www.landcareresearch.co.nz
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++++++++++++++
>>
>>
>
>
> _______________________________________________
> tdwg-guid mailing list
> tdwg-guid at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-guid




More information about the tdwg-tag mailing list