[tdwg-guid] Permanent LSID Proxy

Mon Dec 3 22:34:28 CET 2007

Thank you, Roger ...

This captures the essence of what my own thinking has been: 

> 1) Resolve the LSID using the standard DNS based 
> mechanism - an LSID aware client.
> 2) Resolve the pixied version - a non LSID aware client.
> 3) Look up either or both or the identifiers in the most 
> efficient indexing service or cache available. The LSID 
> is a well designed unique string so it is effectively 
> a  really good key word.
> 
> If I have understood correctly this appears to be how 
> DOIs are quoted in PDFs i.e. with a proxy that may not 
> live forever http://dx.doi.org/ - unlike the itself DOI ;) 
>
> In fact all that I have written there is GUID technology 
> independent. You would only have to drop point 1 and we 
> could be using UUIDs (I am not proposing this!!!).

We are in a period of transition within our community -- at the chicken/egg
tipping point.  We want GUIDs to identify things (for all the obvious
reasons), but to put them to use, we need mechanisms to resolve them.  The
PURL advocates put emphasis on the resolution side (in terms of existing
resolution support); and a UUID advocates would put the emphasis on the
identifier side.  DOIs, Handles, and LSIDs are at various points in-between.
There is a perception (probably not altogether illegitimate), that the
farther we go towards the resolution end of the spectrum, the less
confidence we have that the GUID will perpetuate indefinitely (presumably
due to declining opacity). Through careful consideration, our community has
made the non-unanimous decision to move in the direction of LSIDs; which
have nowhere near the existing resolution software support as PURLs, but
offer a lot more self-contextualization and the potential for
self-resolution than UUIDs offer. This seems like a reasonable approach to
me, which is why I am among the supporters of the non-unanimous decision.

So...becasue we are in this transition period, implementations need to cover
the bases, to allow functionality today, but also with the hope (at least)
of some degree of logevity.  I think your 3-layer characterization to
resolving LSIDs (or 2-layer approach to resolving GUIDs in general) makes a
lot of sense during this period of transition.

Contrary to what some might believe, I am not (at least not at this time) an
advocate of using only UUIDs either.  In my mind, they are too context-free
to be our only mechanism for identifying biodiversity objects (even though I
suspect that 99.999% of all attempts to resolve GUIDs by our community will
be in the form of adequate context to allow appropriate direction to
appropriate resolution services).  However, I do advocate using them as part
of my own "three-layer" solution to the transition period.

Internally (and perhaps in the future, externally as well), the "identifer"
is a UUID:
20889795-7EC7-42F3-A4C3-D1D97704A609

I will incorporate this identifier into an LSID, which both conforms to the
direction of TDWG/GBIF is going in now, and also opens up resolution door #1
in Roger's list:
urn:lsid:zoobank.org:act:20889795-7EC7-42F3-A4C3-D1D97704A609

Because we don't yet have widespread native support for resolving LSIDs by
themselves in a lot of commonly available software (e.g., IE, Firefox, Adobe
Acrobat reader, etc.), I will also plan to expose these LSIDs through an
HTTP proxied (pixied) "wrapper" in publicly accessible documents (and
perhaps also links on websites) -- thereby covering Roger's door #2:
http://zoobank.org/urn:lsid:zoobank.org:act:20889795-7EC7-42F3-A4C3-D1D97704
A609
-or-
http://lsid.tdwg.org/urn:lsid:zoobank.org:act:20889795-7EC7-42F3-A4C3-D1D977
04A609

And, there's still Roger's door #3 (operating at any or all of these layers
of identifier and/or identifier-plus-resolution-syntax).

In effect, I've got both ends of the spectrum covered (UUID at one end,
PURL(ish) HTTP proxy at the other), plus one community-standard (for now)
mid-point (LSID) -- all effectively representing "SAMEAS" identifiers, or
"one" identifier with one or more layers of resolution syntax wrapped around
it.

Nothing I've read in this thread really shoots this approach down (as far as
I understand what I've read) -- and indeed, most of what I read seems to
support it.  I feel like this approach adequately covers the bases during
this transition period.

Now...I do have a statement and a coupel of questions related to the HTTP
proxy version, after this overly long-winded post.

Statement:
I will build an HTTP proxy that will work with either the UUID or the LSID
(whichever is supplied), and the HTTP proxy will return HTML
formatted/styled in a clean, user-friendly template.  Among the content of
the HTML return will be the metadata (and data, if extant) for the supplied
identifier, arranged in a way that is intuitive for an average human reader
to view on a computer screen.  Also on this returned HTML will be a link to
view the medatada in RDF.

My rationale for this approach is that the HTTP proxy really exists in order
to fix the current problem of inadequate native support for dealing with
LSIDs among existing popular software (IE, Firefox, Adobe Acrobat, etc.);
and thus its primary function is to convert a UUID or LSID into something
that a pair of human eyes and a human brain will find meaningful.  It seems
to me that the RDF is something that is best consumed by a software client;
and such a client (it seems to me) would be written to deal with LSIDs
natively anyway, and hence would not need to pass through the HTTP proxy in
the first place (it would just use the LSID directly to get the RDF through
the proper LSID resolver).

My questions are:

1) Are there many software applications that consume and process RDF, but
are not LSID-aware (e.g., RSS feeds, etc.) -- that might benefit from an
HTTP proxy that returns RDF (instead of HTML)?

2) Is there any way I can get the best of both worlds, by returning both RDF
(for a software client) and HTML (for a human client) in the same return
from my HTTP proxy?

3) Is there any rationale for greating two "flavors" of HTTP proxy; one that
returns HTML (for humans), and one that returns RDF?

4) Am I getting into trouble when I edge towards representing the "pixied"
version as a PURL (i.e., as an "identifier" unto itself) -- rather than just
a resolution syntax for the "true" identifier (whoch could be either the
LSID or the UUID)?

Sorry for my blatant naïvete.

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html