Thanks, Hilmar.
I agree that using tdwg.org as the authority for the LSID is less than ideal - hence my recommendation later that we should consider instead using e.g. csiro.tdwg.org (and I don't think it should be tdwg.org - perhaps something more neutral like csiro.bio-id.org. My concern there was the proliferation of SRV records if we support the LSID protocol.
You are also correct that the big issue with this is the question of ownership. Quite frankly, if we had believed in 2006 that institutions would be prepared to cede responsibility for handling their identifiers to a third party, the recommendations from the TDWG workshops would probably have been rather different. Part of the reason for adopting LSIDs was because institutions did not seem to want to use an identifier which might imply that a third-party was responsible for the data.
The PURL form would have some benefits and would be a perfectly consistent alternative. I seem to be the only person who wants to avoid an outright capitulation to using HTTP URIs to identify objects in our domain. However, in case anyone cares, here again are my reasons why I prefer HTTP-wrapped non-HTTP identifiers over plain HTTP URIs:
1. The "urn:lsid:" part of the identifier serves as a clear statement of intent which is not present with an HTTP URI. We could mandate that ONLY http://purl.tdwg.org/ URIs count as GUIDs in our domain and that e.g. http://www.csiro.au/ URIs cannot do so, but that seems an arrogant and arbitrary rule. However, if we simply encourage everyone to use PURL URIs from any domain, what separates such a URI from any HTTP URL with no planned persistence? I see this as a short cut to casual assignment of volatile identifiers based on web application structures and hence to rapid identifier rot.
2. I still feel intense discomfort (pace the W3C) over adopting identifiers prefixed HTTP:// for objects such as type specimens which have had an important place in the literature for decades and which can expect still to be referenced in 50 years time. Even though the HTTP protocol feels like the air we breathe right now, it seems certain to be superseded at some point. Do we want to use identifiers which will seem totally "retro" in the future? The usual objection is that HTTP is certain to outlast the LSID protocol. I agree fully, but the urn: prefix is making a statement about naming, not about technology.
If I am alone in these feelings, the suggested PURL route may be simpler, but we should consider what can be done to maximise the robustness of their use.
Best wishes,
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Entomology, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: Hilmar Lapp [mailto:hlapp@duke.edu] Sent: Tuesday, 7 April 2009 4:54 PM To: Hobern, Donald (Entomology, Black Mountain) Cc: tdwg-tag@lists.tdwg.org Subject: Re: [tdwg-tag] SourceForge LSID project websites broken - role for TDWG?
On Apr 7, 2009, at 1:55 AM, Donald.Hobern@csiro.au wrote:
Assume further that ANIC has a script on its servers which can return the RDF data for these specimens, say at http://www.csiro.au/anic/specimens/ <catalogueNumber>. The registration process could result in the LSID urn:lsid:tdwg.org:csiro.anic:12345
Wouldn't that say according to your proposed usage guideline that tdwg.org is whoGeneratedTheData and csiro.anic is whatCollectionItBelongsTo, when in reality CSIRO generated the data and ANIC is the collection it belongs to?
I understand why you're suggesting the LSID formatted as you do, and you might say that the name-mangling isn't too drastic. But don't have data owners a strong sense of ownership in their data objects and in their collections? And more importantly, don't you think that a usage guideline that contradicts itself (or that is bound to be internally inconsistent) will continue to raise debate and be in the way of broader adoption?
and the HTTP URI http://lsid.tdwg.org/urn:lsid:tdwg.org:csiro.anic:12345 both being mapped through to http://www.csiro.au/anic/specimens/ 12345.
Wouldn't http://purl.tdwg.org/CSIRO/ANIC/12345 be shorter, do more justice to the names of whoGeneratedTheData and whatCollectionItBelongsTo, be easier to implement, and have the same possibilities to implement caching etc, in fact using standard software such as mod_proxy for apache?
Just some thoughts.
-hilmar