Re: [tdwg-tag] SourceForge LSID project websites broken - role for TDWG?
Dear All,
GBIF recognises the need for a system of persistent, unique identifiers for biodiversity objects. Its strategic work plan (2007-2011) defines an activity to develop a system of globally unique identifiers and encourage their use throughout biodiversity informatics, and within the IDA (Inventory Discovery Access) work area for the 2009-2011 work programme, the stated aims include convening an LSID Task Group to review the status of LSID uptake and devise a strategy for wide deployment of LSIDs or other GUIDs.
The related functions of inventory, discovery and access are being brought together by GBIF through its Global Biodiversity Resources Discovery System (GBRDS) at the heart of which lies an extended UDDI registry linked to a metadata cataloguing system. We have some immediate needs for GUIDs/LSIDs in the implementation of the GBRDS. In addition, the ECAT work area sees a role for GBIF in the global resolution of LSIDs that refer to taxon names and the DIGIT work area has identified several facets of data mobilisation and use where GUIDs are essential, e.g., as a key element in data publication, and in attribution, citation and tracking of data use. Moreover, several GBIF Participants have expressed a commitment in moving ahead with deployment of LSIDs and are looking to the GBIF Secretariat to provide leadership and essential services. To that end, we have already begun to explore internally a role for GBIF as an LSID hosting/proxy service along the lines being advocated by Donald (Hobern).
Now, because of the perceived urgency and confusion/uncertainty around the future of LSIDs, and more generally, the social/institutional challenges of providing stable and persistent GUIDs, GBIF is fast-forwarding the convening of an LSID Task Group to explore the issues and offer recommendations on the way forward, with particular reference to the GBIF network. A call for participation will be issued later this month and we expect the task group to be convened and operational within about four weeks.
Best regards,
Éamonn
_______________________________________________ Éamonn Ó Tuama, M.Sc., Ph.D. (eotuama@gbif.org), Senior Programme Officer, Inventory, Discovery, Access (IDA), Global Biodiversity Information Facility Secretariat, Universitetsparken 15, DK-2100 Copenhagen Ø, DENMARK Phone: +45 3532 1494; Fax: +45 3532 1480
-----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of tdwg-tag-request@lists.tdwg.org Sent: 07 April 2009 10:39 To: tdwg-tag@lists.tdwg.org Subject: tdwg-tag Digest, Vol 36, Issue 9
Send tdwg-tag mailing list submissions to tdwg-tag@lists.tdwg.org
To subscribe or unsubscribe via the World Wide Web, visit http://lists.tdwg.org/mailman/listinfo/tdwg-tag or, via email, send a message with subject or body 'help' to tdwg-tag-request@lists.tdwg.org
You can reach the person managing the list at tdwg-tag-owner@lists.tdwg.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of tdwg-tag digest..."
Today's Topics:
1. Re: SourceForge LSID project websites broken - role for TDWG? (Donald.Hobern@csiro.au) 2. Re: SourceForge LSID project websites broken - role for TDWG? (Hilmar Lapp) 3. Re: SourceForge LSID project websites broken - role for TDWG? (Donald.Hobern@csiro.au) 4. Re: SourceForge LSID project websites broken - role for TDWG? (Roger Hyam)
----------------------------------------------------------------------
Message: 1 Date: Tue, 7 Apr 2009 15:55:04 +1000 From: Donald.Hobern@csiro.au Subject: Re: [tdwg-tag] SourceForge LSID project websites broken - role for TDWG? To: tdwg-tag@lists.tdwg.org Message-ID: FF7DEDBD2B38B34F94139214D371B9C4290872EC@exvic-mbx05.nexus.csiro.au Content-Type: text/plain; charset="us-ascii"
Thanks, Kathi.
I appreciate your comments and understand your concerns. This certainly is a social problem - no technology solution will take it away. A large proportion (though certainly not all) of the issues surrounding LSIDs will arise with any technology which tries to address the problem.
I seem to be in the minority in believing that we can use LSIDs as one part of a strategy to develop a community infrastructure for our data. However we do need to start from somewhere if we want to do anything about the persistence of our data. We need some foundations before we can properly worry about "intelligent caching and harvesting mechanisms" (which I agree we need).
So - here is my outline for how I think we could move forward from these discussions:
1. An identifier scheme which aims to provide some long term persistence probably needs to embody at least three key facts: who generated/published the data object, what data collection this object belongs to, which data object from the specified data collection this one is. These correspond roughly to the Darwin Core InstitutionCode/CollectionCode/CatalogueNumber triple and to the three main substitutable elements in an LSID. Some systems such as DOI may obscure the whoGeneratedTheData part somewhat. Some systems such as DOI and PURL may not always have an explicit whatCollectionItBelongsTo part, but dealing with collections promises to be an organisational simplification for most purposes.
2. TDWG should recommend the LSID as one suitable model for constructing GUIDs (i.e. "urn:lsid:<whoGeneratedTheDataObject>:<whatCollectionItBelongsTo>:<whichItem InTheCollectionItIs>"). We could propose (or adopt) some other syntax for this, but this gives us a neat enough way to encapsulate what we need to know. The "urn:lsid:" part can be seen as a useful flag that this is indeed to be considered as an identifier.
3. Where feasible, TDWG should recommend that these LSIDs should be associated with a resolver implementing the standard LSID mechanism. Frankly I am a lot less bothered by the resolvability of most identifiers than I am about their consistent use, so I have no problem with the idea of assigning LSIDs to things which do not currently resolve.
4. TDWG requires that a path must exist to retrieve the associated data using an HTTP resolver to proxy the LSID (i.e. http://whoGeneratedTheDataObject.org/<optional_path_elements>/<lsid>) and that our practice is to consider this proxified version to be identical for comparison purposes with the bare LSID. For LSIDs resolvable using the standard LSID mechanism, this path can be http://lsid.tdwg.org/<lsid>. In cases in which the data are only accessible via HTTP, we have broken the LSID specification - although it seems there may be nobody other than us to care about that fact.
5. All references to LSIDs within RDF documents should use the proxified form.
6. TDWG and its partners should establish a PURL-like service which makes it easy to register data sets to be associated with identifiers of this form. In other words, a service should exist (around a domain secured for this purpose into the future) which associates data providers with an appropriate whoGeneratedTheDataObject element and associates their data collections with an appropriate whatCollectionItBelongsTo element and associated URL pattern for retrieving RDF data for the individual data objects. The exact details could vary, but assume that TDWG sets up this service at http://lsid.tdwg.org/ and that CSIRO wishes to register the ANIC data collection and to have individual specimen records associated with LSID-based identifiers. Assume further that ANIC has a script on its servers which can return the RDF data for these specimens, say at http://www.csiro.au/anic/specimens/<catalogueNumber>. The registration process could result in the LSID urn:lsid:tdwg.org:csiro .anic:12345 and the HTTP URI http://lsid.tdwg.org/urn:lsid:tdwg.org:csiro.anic:12345 both being mapped through to http://www.csiro.au/anic/specimens/12345. It would probably be preferable for the LSID in this case to be urn:lsid:csiro.tdwg.org:anic:12345 (which would make relocation of all LSID services for a single data provider easy, but could require large numbers of SRV records to be managed by TDWG). (I would note that it would be easy for the infrastructure to allow the data provider to choose whether the whole LSID or just the final ID element should be passed to the final URL.)
7. TDWG and its partners should use this same infrastructure to handle alternative resolution paths as required in the future - if alternative identifier schemes become the preferred option. This infrastructure could also add significant other functions, including e.g. 1) intelligent caching of data, 2) validation of RDF data, and 3) simultaneous registration of DOIs associated with metadata for each data collection to make it easier for them to be cited by journal articles.
8. Any provider may opt at any time to use alternative HTTP-resolvable identifiers in place of LSIDs (e.g. DOIs, handles, PURLs), but must consider the technological and social implications of keeping these identifiers alive into the future.
As far as I can see, this approach allows us to develop a community-based approach to managing identifiers in a way which builds on LSIDs for those who have already minted them. It would be easy for us to reinvent this as a PURL-based approach in the future. The costs should not be great and it gives us a better chance of avoiding the confusion of random-URLs-pointing-at-random-data-formats being offered as semantically useful GUIDs.
Whatever happens, TDWG needs to finalise an applicability statement for how LSIDs should be used by those providers who have chosen or who will choose to use them for biodiversity data. This does not mandate that everyone MUST use LSIDs.
Does this seem worth pursuing?
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Entomology, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- Date: Mon, 6 Apr 2009 10:15:00 +0200 From: Schleidt Katharina katharina.schleidt@umweltbundesamt.at Subject: Re: [tdwg-tag] SourceForge LSID project websites broken - role for TDWG? To: Roger Hyam rogerhyam@mac.com, Peter DeVries pete.devries@gmail.com Cc: "tdwg-tag@lists.tdwg.org" tdwg-tag@lists.tdwg.org Message-ID: 8638F29270898544933A7663226809E5EAAA6060@PCMAIL3.umweltbundesamt.at Content-Type: text/plain; charset="utf-8"
Hi all,
I admit I?m glad that this topic does seem to be back in discussion. I?ve been worried about LSIDs from the outset, but did not have the time or resources at the time of decision to do anything about it. Most of this discussion reflects what we?ve been discussing here in Vienna ever since the topic came up. Here an excerpt from a recent mail of mine:
? I have never been a proponent of LSIDs. More to the point, I have been against their adoption from the onset. The reasons for this are:
o It?s misusing a technical solution as an answer for a social problem. Just because LSIDs entail a list of (quite necessary) requirements such as persistent IDs, dependability of availability of online references, it can in no way guarantee this, it just nicely covers the problem up
o I do not see the technology being supported. IBM dropped it, and Cambridge Semantics Inc. also seems to have gone other ways
o An example of the lack of dependability of LSID servers seems to me to be the eternal problem with the TDWG LSID Server
o I?m worried about a group such as TDWG, which doesn?t have the backup to push through technology development, is going towards requiring all adopters to implement non-mainstream technology in order to maintain compatibility
We?ve come to the conclusion, as mentioned several times in this thread, that what we really need is the commitment to persistence, and no technology will support us in that. Why waste nonexistent funds sorting out an esoteric technology nobodies supporting; why not just buy a domain, pass a hat and set up a trust fund with 1000? (or $), and agree to have this domain available over some institution (i.e. university) for the next 100 years. After that, my non-existent great-grandchildren can sort out the rest!
@Matt: http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html is online again! And a short absence/down-time will happen in all distributed technologies. If anything, I believe that we should worry more about intelligent caching and harvesting mechanisms!
:)
kathi
------------------------------
Message: 2 Date: Tue, 7 Apr 2009 02:54:15 -0400 From: Hilmar Lapp hlapp@duke.edu Subject: Re: [tdwg-tag] SourceForge LSID project websites broken - role for TDWG? To: Donald.Hobern@csiro.au Cc: tdwg-tag@lists.tdwg.org Message-ID: 0FEBAE72-265D-4499-ABBA-D2D8D7B2F839@duke.edu Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
On Apr 7, 2009, at 1:55 AM, Donald.Hobern@csiro.au wrote:
Assume further that ANIC has a script on its servers which can return the RDF data for these specimens, say at
http://www.csiro.au/anic/specimens/
<catalogueNumber>. The registration process could result in the LSID urn:lsid:tdwg.org:csiro.anic:12345
Wouldn't that say according to your proposed usage guideline that tdwg.org is whoGeneratedTheData and csiro.anic is whatCollectionItBelongsTo, when in reality CSIRO generated the data and ANIC is the collection it belongs to?
I understand why you're suggesting the LSID formatted as you do, and you might say that the name-mangling isn't too drastic. But don't have data owners a strong sense of ownership in their data objects and in their collections? And more importantly, don't you think that a usage guideline that contradicts itself (or that is bound to be internally inconsistent) will continue to raise debate and be in the way of broader adoption?
and the HTTP URI http://lsid.tdwg.org/urn:lsid:tdwg.org:csiro.anic:12345 both being mapped through to http://www.csiro.au/anic/specimens/ 12345.
Wouldn't http://purl.tdwg.org/CSIRO/ANIC/12345 be shorter, do more justice to the names of whoGeneratedTheData and whatCollectionItBelongsTo, be easier to implement, and have the same possibilities to implement caching etc, in fact using standard software such as mod_proxy for apache?
Just some thoughts.
-hilmar
participants (1)
-
Eamonn O Tuama (GBIF)