Central LSID registration authority?

19 Apr 2006

      I've added some starting comments on the question of whether or not a
"central LSID registration authority" would be useful to our community:

http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUIDCentralRegistrationAuthori
ty (text appended below)

This is one of the Infrastructure Working Group topics.

As I have noted, there are topics which could be addressed under this title
but which I have deliberately excluded.  Please address these if you think
they are valuable.

Please provide your input.

Thanks,

Donald

---------------------------------------------------------------
Donald Hobern (dhobern@gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480
---------------------------------------------------------------

A central LSID registration authority could take several different forms.
The following is an attempt to outline some options, so that subsequent
discussion can focus more clearly on the benefits of each.

I'm going to start by ignoring the possibility of establishing a central
LSID registration authority as some kind of registrar approving or recording
new LSID authorities within our domain. I'm also going to ignore the
possibility of requiring all LSIDs to be registered within a single
namespace. I cannot see value in either of these "options". If anyone wishes
to propose them, please go ahead.

It is assumed here that the purpose of such an authority would be to provide
a mechanism for data providers to associate LSIDs with their records without
having to establish and maintain an LSID resolver in their own namespace.
This could for example be because the data set in question is likely to be
moved to a different home, or because the data provider cannot get
permission to register an LSID provider with DNS, or because the data
provider's host institution has restrictive rules on firewall access.

Under these circumstances a data provider may still wish to be able to
assign LSIDs to each data record and to have the LSID resolve correctly to
the appropriate data and metadata.

Some options:

1. Central hosting of data/metadata

The simplest option technically (or at least simplest with respect to the
assignment of LSIDs) may be for the central LSID registration authority
actually to host the data sets in question on behalf of the data provider.

2. Central proxy LSID resolver

If the data provider is able to establish an LSID resolver but is unable or
unwilling to register this resolver with DNS, a central LSID registration
authority might establish a proxy LSID resolver, which redirects to the
known location of the actual LSID resolver. In other words, assuming that
the central LSID registration authority has an LSID resolver registered for
clra.org and running on lsid.clra.org, and the data provider has established
its own unregistered LSID resolver on lsid.dp.org, the data provider can
issue LSIDs of the form urn:lsid:clra.org:<DatasetName
<http://wiki.gbif.org/guidwiki/wikka.php?wakka=DatasetName/edit> >:<RecordId
<http://wiki.gbif.org/guidwiki/wikka.php?wakka=RecordId/edit> >. Requests
for data or metadata will be directed to lsid.clra.org, which resolves
<DataSetName
<http://wiki.gbif.org/guidwiki/wikka.php?wakka=DataSetName/edit> > to be
associated with the unregistered LSID resolver on lsid.dp.org, and directs
the request to lsid.dp.org for resolution.

This again should be relatively simple to implement, and could be of use in
some circumstances. The data provider and the central LSID registration
authority clearly need to coordinate the elements included in the LSID
string.

3. Central LSID resolver with non-LSID proxy resolution

Slightly more complex (and potentially with too many uncertainties), a data
provider could register a provider tool using some other protocol (e.g.
TAPIR) and rely on an external proxy LSID resolver to map LSID requests to
the appropriate protocol. For example, assume that a TAPIR provider is set
up on tapir.dp.org to serve data in some version of darwin core which
includes a <darwin:lsid> element. The records in the TAPIR provider are all
set up to include values for this element of the form
urn:lsid:clra.org:<DatasetName
<http://wiki.gbif.org/guidwiki/wikka.php?wakka=DatasetName/edit> >:<RecordId
<http://wiki.gbif.org/guidwiki/wikka.php?wakka=RecordId/edit> >. Requests
for data or metadata will be directed to lsid.clra.org, which resolves
<DataSetName
<http://wiki.gbif.org/guidwiki/wikka.php?wakka=DataSetName/edit> > to be
associated with the TAPIR provider on tapir.dp.org, issues an appropriate
TAPIR request for the record with the appropriate value for <darwin:lsid>
and then formats the data or metadata appropriately for an LSID response.

This is again not complex to implement, but requires even more coordination
than with option 2. It could however be applicable in any case in which a
data provider has any mechanism for sharing their data online, regardless of
what additional firewall restrictions are in place.

This is doubtless not a complete set of options, but should be enough to
begin discussion. So, do you see value in considering any of these options
as part of the infrastructure we plan to adopt. It may of course be that any
and all of these can be used transparently within any LSID-based network,
but we should consider whether there are immediate benefits in offering some
kind of central service (through TDWG, GBIF or others) for these purposes.

Donald Hobern

tags

participants (1)