I’ve added some starting comments on the question of whether or not a “central LSID registration authority” would be useful to our community:

http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUIDCentralRegistrationAuthority (text appended below)

This is one of the Infrastructure Working Group topics.

As I have noted, there are topics which could be addressed under this title but which I have deliberately excluded. Please address these if you think they are valuable.

Please provide your input.

Thanks,

Donald

---------------------------------------------------------------
Donald Hobern (dhobern@gbif.org)
Programme Officer for Data Access and Database Interoperability
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
---------------------------------------------------------------

A central LSID registration authority could take several different forms. The following is an attempt to outline some options, so that subsequent discussion can focus more clearly on the benefits of each.

I'm going to start by ignoring the possibility of establishing a central LSID registration authority as some kind of registrar approving or recording new LSID authorities within our domain. I'm also going to ignore the possibility of requiring all LSIDs to be registered within a single namespace. I cannot see value in either of these "options". If anyone wishes to propose them, please go ahead.

It is assumed here that the purpose of such an authority would be to provide a mechanism for data providers to associate LSIDs with their records without having to establish and maintain an LSID resolver in their own namespace. This could for example be because the data set in question is likely to be moved to a different home, or because the data provider cannot get permission to register an LSID provider with DNS, or because the data provider's host institution has restrictive rules on firewall access.

Under these circumstances a data provider may still wish to be able to assign LSIDs to each data record and to have the LSID resolve correctly to the appropriate data and metadata.

Some options:

1. Central hosting of data/metadata

The simplest option technically (or at least simplest with respect to the assignment of LSIDs) may be for the central LSID registration authority actually to host the data sets in question on behalf of the data provider.

2. Central proxy LSID resolver

If the data provider is able to establish an LSID resolver but is unable or unwilling to register this resolver with DNS, a central LSID registration authority might establish a proxy LSID resolver, which redirects to the known location of the actual LSID resolver. In other words, assuming that the central LSID registration authority has an LSID resolver registered for clra.org and running on lsid.clra.org, and the data provider has established its own unregistered LSID resolver on lsid.dp.org, the data provider can issue LSIDs of the form urn:lsid:clra.org:<DatasetName>:<RecordId>. Requests for data or metadata will be directed to lsid.clra.org, which resolves <DataSetName> to be associated with the unregistered LSID resolver on lsid.dp.org, and directs the request to lsid.dp.org for resolution.

This again should be relatively simple to implement, and could be of use in some circumstances. The data provider and the central LSID registration authority clearly need to coordinate the elements included in the LSID string.

3. Central LSID resolver with non-LSID proxy resolution

Slightly more complex (and potentially with too many uncertainties), a data provider could register a provider tool using some other protocol (e.g. TAPIR) and rely on an external proxy LSID resolver to map LSID requests to the appropriate protocol. For example, assume that a TAPIR provider is set up on tapir.dp.org to serve data in some version of darwin core which includes a <darwin:lsid> element. The records in the TAPIR provider are all set up to include values for this element of the form urn:lsid:clra.org:<DatasetName>:<RecordId>. Requests for data or metadata will be directed to lsid.clra.org, which resolves <DataSetName> to be associated with the TAPIR provider on tapir.dp.org, issues an appropriate TAPIR request for the record with the appropriate value for <darwin:lsid> and then formats the data or metadata appropriately for an LSID response.

This is again not complex to implement, but requires even more coordination than with option 2. It could however be applicable in any case in which a data provider has any mechanism for sharing their data online, regardless of what additional firewall restrictions are in place.

This is doubtless not a complete set of options, but should be enough to begin discussion. So, do you see value in considering any of these options as part of the infrastructure we plan to adopt. It may of course be that any and all of these can be used transparently within any LSID-based network, but we should consider whether there are immediate benefits in offering some kind of central service (through TDWG, GBIF or others) for these purposes.