[tdwg-guid] Embedding LSID links within Publications

Sat Dec 1 00:46:19 CET 2007

> Personally I think we need to look way beyond the lifetime of 
> LSIDs and HTTP. One reason digital library types like DOIs 
> and Handles is that they serve as GUIDs independently of any 
> protocol. 

Sort of, but only in the sense that they are no different from UUIDs.  The
difference is that they are "self-resolving", but that aspect of them goes
beyond their independent "GUID"ness. You could say the same thing about
LSIDs (except that LSIDs give the impression of a domain name in the
"authority" part, whereas DOIs don't give this impression).

> They are printed in journal articles as bare 
> identifiers, in the same way as ISSNs or ISBNs. Hence tying 
> the LSID to a HTTP resolver strikes me as the quickest way to 
> build in obsolescence.

I don't see it as "tying" the LSID to an HTTP resolver -- just wrapping a
GUID (LSID) into a resolution mechanism, so that it's clickable (for however
long HTTP protocol exists, and zoobank.org exists, and the resolving
applications exist, and PDF file readers exist, etc.).  When I see a DOI in
a PDF, I can't click on the DOI and get taken directly to the appropriate
online resource -- "10.1029/2005GL024452" just sits there -- every bit as
opaque as a UUID.  It takes "insider knowledge" to know that "10." indicates
a handle (generally) and a DOI (specifically).  No different from knowing
that "urn:lsid:..." indicates an LSID.

To resolve the DOI with a mouse click, you need a proxy server:

http://dx.doi.org/10.1029/2005GL024452

So I'm not sure I get why DOIs are any better than LSIDs (other than that at
this particular moment in history, there is a robust and centralized proxy
service for resolving them, whereas lsid.tdwg.org doesn't quite rize to that
level just yet).

Now, you could argue that LSID proxy servers only work if the LSID
infrastructure works -- but to me that means the LSID is better than DOI,
because it *might* be self-reolving.  You can always revert to using the
LSID string as a plain GUID (no different from a UUID), and instead of
trying to use DNS to resolve the LSID based on the authority namespace, you
just resolve it directly through some centralized index (as I presume how
DOI resolution works).

> I suspect that a betting person would put money on the paper 
> version of your article outlasting any digital representation 
> you create...

Yup -- exactly.  And that's why the ICZN is still (legitimately) a bit
reluctant to embrace digital-only nomenclatural acts.  These things need to
be accessible 250 years from now (if history is any model). But this issue
applies to DOIs the same way it applies to LSIDs or UUIDs or whatever.

> One approach is to do something like COinS (ContextObjects in 
> Spans, http://ocoins.info/ ). This embeds citation metadata 
> in HTML pages using OpenURL. However, it doesn't specify the 
> OpenURL resolver. The user supplies this, for examples, there 
> is a Firefox extension that rewrites COinS as clickable links 
> (http://www.openly.com/ openurlref/ ). Here's an example of a COinS:
> 
> <span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%
> 3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.issn=1045-4438"></span>
> 
> The OpenURL syntax is horrible, but I hope you get the idea. 

Thanks!  I'm going to need more than 3 hrs of sleep to get my head around
this -- but I'll definitely look into it in more detail after some rest.

> Hence, what if the LSID was embedded like this:
> 
> <span class="lsid" title="urn:lsid:zoobank.org:act:1A66BAE9-9B37-4C73-
> A560-BF63D0345F04"></span>
> 
> This avoids the need to specify a resolver (at the cost of 
> needing a tool to make the links clickable). However, it may 
> help ensure long term survival of the identifier.

As far as I'm concerned, the "identifier" is the UUID, and it will survive
as long as ZooBank survives.  So, as long as it's in there somewhere, then
the future Dave Remsen will let loose his parsing algorithms on this ancient
PDF, and be able to mine the UUID out of there and reconstruct the "new"
link to whatever system ZooBank (and the internet) uses 250 years from now.
The point is, the "identifier" is still in there, even if the resolver
wrapper breaks in the future.

So, it would be no problem to markup the LSID as you suggest, but that
doesn't give me a clickable link.

If I add the clickable link via an embedded HTTP proxy URL link, then
today's consumers are a mouse-click away from an online resource.  The risk
is that the HTTP proxy "wrapper" will look quaint to taxonomists 250 years
from now (long after the Chinese global domination obliterated the HTTP
protocol), but the identifier will still be in there for them to use, and
will still be less ambiguous than the text string "Sparus chromis" identifer
-- and that one can still be resolved 250 years after it was first published
(i.e.,: http://www.biodiversitylibrary.org/page/727191).

So...all things considered, I'm still inclined to go with door number 2
(i.e., maintaining the clickable link).

> I think the focus on PDFs is misplaced. This is a binary 
> format that may well be unintelligible in 10-20 years time. 
> XML makes more sense.  
> If the PDF has clickable URLs that work today, that's fine as 
> a demo, but long term this stuff all needs to move to XML.

I completely agree with you! But keep in mind, this is *intentionally* a
demonstration document we're talking about.  I'm skeptical that it will
still be readable 25 years from now -- let alone 250 years from now (hence
the importance of the paper copy that is simultaneously published).  I have
no illusions about that.  But even still, it *is* a demonstration, and *may*
serve as an exemplar, and as such, I would still like to try to demonstrate
the best possible balance of making information easily accessible to modern
taxonomists (i.e., one click away from a video, for example), while having
at least some nod or wink towards longevity (e.g., my preference for an HTTP
proxy based on the same domain anme as the authority par tof the LSID).

> Lastly, why not simply do something like this in the text of 
> the paper.
> 
> Chromis xus sp nov
> urn:lsid:zoobank.org:act:1A66BAE9-9B37-4C73-A560-BF63D0345F04
> 
> ---text goes here---
> 
> 
> I suspect you will need to show the identifier for people to 
> grasp what is going on. Yes, ultimately the GUIDs will 
> disappear, but it seems that at this stage you want to show 
> them off. 

That's *exactly* what I was suggesting in my previous email for the
human-readible form -- that the LSID (or UUID) is visible exactly as you've
shown it -- at least for the five new species names.  But I see no harm in
also wrapping the LSID in a hyperlink using the HTTP proxy, so it also works
as a clickable link in the PDF version (invisible syntax on the paper
version).  If PDF rendering software dies, or if the zoobank.org domain
dies, or if HTTP protocol dies, the LSID/UUID is still there, and *MIGHT* be
resolvable through some future electronic indexing system. So what if the
point and click feature doesn't work.  At least it worked now, during a time
when we were trying to demonstrate to the world why we're going through all
this trouble to implement biodiversity informatics standards.

> You might regret UUIDs now ;-)

Naw -- I'm actually looking forward to sticking that big ol' ugly thing in
the paper.  It's my little statement to the world that "GUIDs are for
comupters, not humans!!!"

Thanks, as always, for the feedback!

Aloha,
Rich