Re: [tdwg-guid] First step in implementing LSIDs

6 Jun 2007

      I think it might be helpful to suggest that if this system is too
specialized and complicated to implement the vast majority of
collections will not adopt it.

If your goal is to make as much of this data available as possible
to researchers then you need to devise a system that the typical
museum curator can understand and maintain using only
bright computer savvy undergraduates.

Respectfully,

Pete

On 6/6/07, Dave Vieglais <vieglais@ku.edu> wrote:
...
Hi Bob,
it's pretty simple - DNS is used to resolve an ip address to which a
client may connect with a service to resolve the GUID.  In the case
of LSIDs the suggested mechanism (and actually the only existing
mechanism) is to use DNS SRV records to provide a level of
indirection that is meant to preserve the discovery of service ip
address independent of the normal issues with A records (although
much of the same functionality can be provided with judicious use of
CNAME and A records).  To state that LSID resolution is independent
of DNS is a bit misleading since the entire basis of LSIDs and their
functional utility beyond what can be provided by HTTP uris comes
down to their current use of DNS SRV records for service discovery.
The only negative with LSIDs that I see is the fact that it is a
relatively unknown and so essentially un-implemented protocol.  This
makes interoperability with the vast majority of existing
infrastructure more difficult than it needs to be without offering
any advance in functionality.  The use of LSID proxy services,
essentially turning LSIDs into URLs is an obvious and welcome
solution, but begs the question of what is really gained by the extra
step of using LSID URIs rather than HTTP URIs?
Perhaps the real benefit is simply that they (LSIDs) look different,
which implies that they need to be handled differently than a typical
URL, and so people and services know immediately to ask a resolver to
return bits (metadata or data) identified by the GUID.  The problem
with this of course is that existing services and applications won't
know what to do with them since they are implemented to only
understand http (or perhaps a couple other schemes), and so need to
be re-engineered to handle LSIDs unless the LSIDs are wrapped in HTTP
URLs...  One could also argue that it is the context in which an
identifier appears that really indicates what is an identifier rather
than just a string - so in practice the visual appearance of a GUID
shouldn't matter.
Perhaps an adequate solution is to use LSIDs and provide definitive
guidelines indicating how they can be embedded in URLs so that we do
not loose interoperability with the rest of the world?  This is
probably much like Ricardo's LSID proxy proposal.  Except in my
opinion it should be extended further to be a general GUID resolver
to help resolve whatever form is used for GUIDs - then one could
embed a handle, LSID, HTTP URI, FTP URI, LDAP URI, or even, for the
ancients of the internet, z39.50 URIs in a resolver proxy URL and get
something back.  The problem of course is that the content that comes
back will be different for different protocols - but it would, I
suspect be possible to provide a generic form of metadata for the
different protocols.
It would be pretty simple to add some provenance handling to such a
service so that if a particular web server, ftp server, or even LSID
system were moved, then the resolver service could lookup the new
location information and appropriately service the request.
There should of course be multiple instances of such a resolver
service, and the provenance information should be shared and
replicated between them all.
Dave V.
On Jun 6, 2007, at 15:13, Bob Morris wrote:
...
I'm confused about what arguments in this thread are about the merits
of HTTP (e.g. content negotiation) and what are about the merits of
DNS (e.g. resource and service location). The fact that most humans
usually exploit these together is because most humans use web browsers
for discovering resources doesn't have much to do with GUIDs. Even
LSID resolution itself is actually independent of anything to do with
DNS, although all current resolvers are based on DNS services.
OK, I confess to not reading all the arguments in detail, but my
impression is that several of the opposite conclusions from the same
facts may because one set of conclusions is about service discovery
and one is about (meta)data provision. It won't surprise me if ANY
guid scheme is stronger about one of these than the other. This might
be what Donald is arguing.
Bob
On 6/6/07, Dave Vieglais <vieglais@ku.edu > wrote:
...
This discussion has been very interesting reading, and though I agree
with Donald's comments, I find myself coming to a different
conclusion, leaning towards HTTP URIs as a preferable scheme.  The
reasons are simple - HTTP has been around for a long time, it is
widely implemented, and mechanisms for implementing robust services
with that protocol are pretty well sorted out - and really there is
nothing to stop implementation of the same functionality exhibited by
LSIDs using HTTP.  As Rod has pointed out, http is widely used for
referencing entities within a semantic web type of context, and it
seems foolish to ignore the momentum in those technologies as they
provide a great deal of the desired functionality for
interoperability and interchange of our data.  As a result my
preference is towards the use of http, primarily because my intents
are to integrate data from a much broader community.  In the end
though, it doesn't really matter which scheme is adopted by TDWG - we
will build http resolvers regardless, since they will be necessary
for reasons of convenience in order to utilize LSIDs in all but
specific, custom built applications.
However, regardless of the scheme used to implement the GUIDs used by
this community, it is critical that the identifiers are persistent
and useful beyond the lives of whatever services are constructed to
resolve them.  This implies some provenance information may need to
be captured, and I would argue that the use of DNS alone for handling
server changes as utilized by LSIDs may be insufficient.  The only
benefit provided by DNS in this context is that it is acting as a
single source of authority for directing how to locate something (in
this case an ip address).  What I suspect is really required is a
more robust, and richer mechanism for discovering and recording
provenance.  The ideal would be a large, replicated, and distributed
data store with a single service point which provided people and
systems with a one-stop shop for discovering provenance for a GUID.
Then if an particular GUID could not be directly resolved, the global
provenance store could be consulted and the resulting information
providing a pointer (or perhaps a series of pointers) indicating how
the guid can now be resolved.
By creating such provenance records and persisting them with as much
care as the data, it seems that a system with stability beyond the
vagaries of the internet could reasonably be constructed.
regards,
   Dave V.
On Jun 6, 2007, at 00:46, Donald Hobern wrote:
...
Yesterday was a vacation here in Denmark - otherwise I'd have
responded a little earlier, but I'm glad to see all the comments
from others.  I thoroughly agree with Kevin, Jason, Rich and Anna.
No one here believes that any particular solution is going to be
perfect.  Our biggest need is consensus and the readiness to get
going with a workable solution.
I do recognise the strength of Rod's arguments.  Indeed, if I were
building some system for integrating data using semantic web
technologies, and my only concern was ensuring the efficiency of
synchronous connections now, I am sure I would adopt HTTP URIs for
the purpose.  However I remain convinced (as I've stated before)
that the needs of this community do subtly shift the balance in
another direction.  We are interested in maintaining long-term
connections between our objects and have a perspective which goes
back hundreds of years.  This at least should give us pause over
whether we want our specimens to be referenced using identifiers so
firmly tied to the Internet of today.  More importantly, one of the
key drivers right at the beginning of TDWG's consideration of GUIDs
was that the community had plenty of experience of URL rot and
didn't want to rely on everyone maintaining stable virtual
directories on their web servers to preserve the integrity of
object identifiers.
Both LSIDs and HTTP URIs could be made to work for us.  Both are
totally reliant on good practice on the part of data owners.
Personally I believe our chances of getting the community to
consider, define and apply such practices are enhanced by the
identifier technology being something a little more different and
distinct than just a "special URL".
Thanks,
Donald
------------------------------------------------------------
Donald Hobern (dhobern@gbif.org)
Deputy Director for Informatics
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480
------------------------------------------------------------
On Jun 6, 2007, at 12:51 AM, Kevin Richards wrote:
...
I agree with Jason.  It is not the GUID that is the cause of all
the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just
need to move on and start using them in our own context  (or any
other suitable GUID - LSIDs are only the recommended GUID, NOT the
only premissable GUID).
If it all falls to pieces later on we could just do a search and
replace to change all our GUIDs to some other scheme (to quote
Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to
getting things working well.
Kevin
...
>> "Jason Best" <jbest@brit.org> 06/06/07 8:39 AM >>>
Rod,
I've only had a chance to quickly skim the documents you
reference, but it seems to me that the alternatives to LSIDs don't
necessarily make the issues with which we are wrestling go away.
We still need to decide WHAT a URI references - is it the
metadata, the physical object etc? URIs don't explicitly require
persistance, while LSIDs do so I see that as a positive for
adopting a standard GUID that is explicit in that regard. I think
the TDWG effort to spec an HTTP proxy for LSIDs makes it clear
that the technical hurdles of implementing an LSID resolver (SVR
records, new protocol, client limitations etc) are a bit
cumbersome, but I don't think the underlying concept is fatally
flawed. In reading these discussions, I'm starting to believe/
understand that RDF may hold the key, regardless of the GUID that
is implemented. Now I have to go read up more on RDF to see if my
new-found belief has merit! ;)
Jason
________________________________
From: Roderic Page [mailto:r.page@bio.gla.ac.uk]
Sent: Tuesday, June 05, 2007 2:10 PM
To: Chuck Miller
Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org;
WEITZMAN@si.edu; Jason Best
Subject: Re: [tdwg-guid] First step in implementing LSIDs?
[Scanned]
Maybe it's time to bite the bullet and consider the elephant in
the room -- LSIDs might not be what we want. Markus D�ring sent
some nice references to the list in April, which I've repeated
below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been
addressed elsewhere (e.g., identifiers for physical things versus
digital records), and some would argue have been solved to at
least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I
think they are making things more complicated than they need to
be. I think this community is running a grave risk of committing
to a technology that nobody else takes that seriously (hell, even
the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus D�ring  were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/
tm-07-01.pdf
"Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH,
Richard Cyganiak Freie Universit�t Berlin (D2R author), Max V�lkel
FZI Karlsruhe
The authors of this document come from the semantic web community
and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
This one here is written by the W3C and addresses the questions
"When should URNs or URIs with novel URI schemes be used to name
information resources for the Web?" The answers given are "Rarely
if ever" and "Probably not". Common arguments in favor of such
novel naming schemas are examined, and their properties compared
with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++
+++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not
to be read,
used, copied or disseminated by anyone receiving them in error.
If you are
not the intended recipient, please notify the sender by return
email and
delete this message and any attachments.
The views expressed in this email are those of the sender and
do not
necessarily reflect the official views of Landcare Research.
Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++
+++++++++
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid
-- 
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
------------------------------------------------------------