Just when everything seemed settled... ;-)
For those wanting to revisit all this, there's also a nice series of presentations at http://www.dcc.ac.uk/events/pi-2005/ .
The ARK identifier is an example where appending symbols to the identifier determines what you get (e.g., '?' for metadata). I guess one could do something similar for PURLs.
Why not have, *ahem* "BioPURLs" (wince), that is, PURLs deployed by the biodiversity community with conventions for returning what we want?
I wonder whether, if we go down the PURL route, won't we eventually converge on Handles/DOIs, which have administration tools in place? Ultimately, centralisation requires good tools and good support.
In any event, is it possible to separate GUIDs from the whole metadata side of things? And given that every GUID system currently in operation uses (or can use) URLs, can't we postpone deciding on this until we have a few test systems built and we have a real idea of what's involved. In a sense, wrap everything in URLs, the GUID is either the URL or embedded in the URL, then see what happens.
And yes, I'm sure this pretty much contradicts everything in my earlier posts...
Regards
Rod
On 4 May 2006, at 12:41, Steven Perry wrote:
PURLs are centrally managed indirection-through-redirection over HTTP. Because resolution is only an HTTP call away, PURLs are both easy to understand and very easy to consume. PURLs are also powerful because anything that can be assigned a URL can have a PURL (maybe that makes them too powerful).
There are some advantages to PURLs: a1.) PURLs are easy to consume a2.) PURLs require a central resolver which may provide greater reliability than a network with many LSID authorities a3.) PURLs make it easy to solve the "single resource change in custodianship" problem
And I see some disadvantages to PURLs: d1.) PURLs require a central resolver which is a single point of failure d2.) There are no conventions about what to expect when you resolve a PURL d3.) PURLs may be easy to consume but they're not easy to produce d3.) PURLs can't be distinguished from URLs by software
I'll address each with a sentence or two.
a1.) PURLs are easy to consume
Because PURLs rely on simple HTTP GET, they are trivial to resolve. One can use a web browser to manually resolve a PURL or use any of a large number of programs or software libraries for fetching URL contents via HTTP GET. This is the primary advantage of PURLs.
a2.) PURLs require a central resolver which may provide greater reliability than a network with many LSID authorities
If we assume that it's equally likely that a given GUID could be resolved by any of the resolvers on the network, then the reliability of the GUID network reduces to the average resolver reliability. If it turns out that there are 100 LSID resolvers but at any given time 20 are likely to be non-responsive, then it's quite possible that a PURL based network with a single well managed resolver (98 % uptime) could provide better quality of service than an LSID-based network.
a3.) PURLs make it easy to solve the "single resource change in custodianship" problem
If the ownership or location of a data object changes, its PURL wouldn't change, it would merely redirect you to the new location of the object. This is a potential problem with LSIDs because change in custodianship of an entire authority is easy to deal with, but change in custodianship of a single identified object is difficult to handle.
d1.) PURLs require a central resolver which is a single point of failure
A PURL resolver acts as a centralized registry. While a single PURL resolver my provide better reliability than a distributed network of LSID resolvers, centralization comes at a cost. A central PURL resolver is a single point of failure. To guard against failure, the community must guarantee that the organization hosting the resolver will be funded over time and that it will work to prevent hardware issues, network outages, denial of service attacks, etc. The community may also demand that the organization that hosts the PURL resolver provide technical support.
d2.) There are no conventions about what to expect when you resolve a PURL
Under the OCLC's purl.org resolver, there are no conventions about what you get when you resolve a PURL. A PURL can point to a chunk of RDF describing a particular specimen, a DVD rip of a Bollywood movie, a second PURL that redirects to the first PURL in an endless loop, or a web application that returns no content but sends a signal to your fancy new networked coffee machine telling it to make a double espresso. Some of these examples are silly, but my point is that PURL only provides for the possibility of persistence through indirection. We're not interested solely in indirection. We want to build a set of services on top of whatever GUID system we select. This set of services requires common agreement on what you get when you resolve a GUID. The LSID spec attempts to address this issue by splitting the universe into data and metadata and strongly suggesting the use of RDF for metadata. There is no agreement on what you get when you resolve a PURL, and even if we came to agreement within our community there's no software in place to help us enforce these conventions.
d3.) PURLs may be easy to consume but they're not necessarily easy to produce
PURLs are easy to resolve but hard to register. A central PURL resolver has to provide functionality for registering PURLs and assigning/reassigning live URLs to them. It's simple to envision a web-based form for registering PURLs (see http://www.purl.org/maint/choose.html), but I imagine that most of the time new PURLs will be requested by a piece of software that's trying to publish a large number of resources. This means that the PURL resolver should provide a remote service (software interface) for registering a new PURL, in part to facilitate automated registration of a large number of identifiers. Interestingly enough, I don't think the OCLC PURL resolver implementation provides this functionality. I imagine that most people who want to register a large number of PURLs work around the problem by registering what OCLC calls a "partial redirection" (http://purl.oclc.org/docs/inet96.html#partial). I don't consider partial redirects to be GUIDs because they allow the use of a domain as a prefix for a localized URL hierarchy. In order to guarantee that I don't mess up your PURLs, the OCLC PURL resolver require authentication in order to register a new PURL. Authentication systems aren't easy to implement or support.
d3.) PURLs can't be distinguished from URLs by software
Most GUID systems come with a set of assumptions about when and how it's appropriate to use a GUID. In addition to distributed resolution we might want to use GUIDs for things like equality testing, versioning, or object composition. Each of these uses raise questions that need to be sorted out. For instance, with equality testing, do we want to be able to have software say that two things are equal if their GUIDs are bitwise identical? If two GUIDs are not bitwise identical, can they refer to the same object? Do we require that different versions of the same object have the same GUID, different GUIDs with a relationship between them asserted in metadata, or the same base GUID with a different version component tacked onto the end? What about different representations (formats) of the same thing (say an XML and an RDF version)? Can they have the same GUID? How does our object equality testing by GUID choice affect our choice of how to do versioning? How do we actually compose a compound object out of simple related objects? All of these questions require careful consideration and are affected by our choice of a GUID system.
I guess what I'm trying to say is that we're not interested in GUIDs for the sake of GUIDs alone, but instead require them for specific uses that extend beyond simple naming and resolution. I hope that we'll examine some of these questions and come to agreement on our conventions for GUID use. Once we have these conventions (either because they're embedded in the GUID scheme we choose or because we've arrived at them during meetings and documented them appropriately), we'll need to write software that operates on these assumptions and enforces these conventions. That software will have to be able to distinguish a GUID from a non-GUID because we can do certain things with GUIDd objects that we can't do with non-GUIDd ones. With PURL this is problematic because a piece of software cannot easily distinguish a PURL from a URL yet they probably ought to be treated differently.
I'm not a huge fan of LSID. I think a urn based identification system introduces a barrier to entry for some. I think the SOAP/web services stuff in the LSID spec and the Java toolkit from IBM introduce another barrier. PURL may be easier to use (at least for resolution), but it doesn't go as far as LSID in laying the groundwork for a network of services that can at the very least share data, if not actually help researchers do something interesting with it.
I'm not against inventing something new that's essentially a set of restrictions on top of PURL. Maybe we could get the best of both worlds -- the simplicity of PURL with the conventions of LSID.
-Steve
Döring, Markus wrote:
Hello, please see my comments inline below. I will try to use PURLs not only in the purl.org sense, but also as a simple way of creating stable URLs through a centralized URL redirection. If you consider this I cant see relevant benefits of LSIDs that are not shared by PURLs. Considering the potential problems we might run into with any software framework (not only RDF) that includes resolving I am in strong favor of simple URLs.
-- Markus
-----Ursprüngliche Nachricht-----
Von: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] Im Auftrag von Kevin Richards Gesendet: Mittwoch, 3. Mai 2006 13:45 An: tdwg-tag@lists.tdwg.org Betreff: Re: [Tdwg-tag] Why we should not use LSID
Roger
I agree that PURLs are a perfectly good option for our GUID needs, and that they would probably be one of the easier technologies to get "working".
Like you I really had to think again to work out the benefits of LSIDs over PURLs, expecially considering the disadvantage you have mentioned.
Some of the benefits of LSIDs include:
- clearly separate data and metadata services (as you have mentioned)
MD: From what I've understood from the GUID group nearly only metadata is used though. So if we deal with metadata only then its not a big practical difference at least.
- separation from domain names - as far as I understand, the PURL
still requires domain name resolution of the actual ID url to obtain the resolution server address - this ties you to a particular url format
MD: We could easily setup a redirection service http://purl.gbif.net/AUTHORITY/whatever that redirects to whereever you want to keep your resolver. Just the authority URL part needs to be centrally managed.
MD: This leads me to a questions about LSIDs which I never understood. LSID are bound to domain name resolution and their guarantee to be globally unique is heavily based on DNS. So to me a central body keeping track of LSID authorities is required to guarantee life long uniqueness of LSID URNs. If "bgbm.org" is owned by someone different that also wants to set up a LSID authority, how does he know there was one already under that domain? He could be reissuing the same URN (LSID) again. Thats exactly what people use as an argument against URLs, but its also true for LSIDs as far as I understand the technology.
- LSID assigning service can be managed by provider organisation
("ownership" of data and IDs is often high on a data provider's requirements list)
MD: so can PURLs
- LSIDs provide a "standard" technology for resolving and serving up
data objects - ie every provider will have the LSID authority services running on their server that will serve up data and metadata (+ other services if required) in the same way, for every provider
MD: URLs are even more standard I would think. Take Apache and there you go.
- related to the previous point, a standard mechanism for third
party annotations of LSIDs is provided with every LSID server implementation
MD: Annotea (for RDF) uses simple HTTP. As Rod said pingbacks are a way to go as well (over http). And I am sure there are many other standards existing for URLs.
- same URN LSID can be used for resolution of http, ftp, soap and
tcp protocols (unsure how PURLs handle this?) ...other cool stuff, I'm sure, that I cant think of right now - too late at night
MD: true. but is that needed?
Probably best to avoid LSIDs for RDF class identfiers etc, but do the semantic web tools you are talking about have no way of recognising different url resolution types - I'm wondering if you can "plug in" lsid resolution into these tools?
MD: that would surely be good. I have no experience with RDF frameworks, but everywhere I look I see URIs that are in fact URLs.
Kevin
Roger Hyam roger@tdwg.org 05/03/06 10:29 PM >>>
Hi Rod,
From the meeting report - which I am struggling to get back to - these two bullet points sum it up I think
· There are certain things for which LSIDs are not appropriate. It would be legal to use them for RDF resource identifiers for controlled vocabularies and XML Schema locations BUT we would have to extend existing software libraries to do this which is not desirable.
· *Recommendation:* LSIDs are not used for controlled vocabularies, ontologies or XML Schema locations. LSIDs should be used to refer to instances.
Basically it was felt that if we used LSIDs for things like rdfs:Class definitions then any library that went off to fetch the definitions automatically would have to be extended so that it understood LSID resolution. On the other hand it was felt that use of LSIDs for real resources (things we are actually describing like specimens and people) was fine. Once an ontology is loaded then it is all fine though so to an extent this may be a false problem.
We spent a long time talking about what is part of the ontology and what isn't and went round in circles (please lets not do it again). Basically class and property descriptions should be URL type URIs but instance URIs can be LSIDs. If you want to define the genus /Rhododendron/ as being an OWL DL class retrieved remotely then you should probably give it a URL. If you want to define it as a data item then use a LSID.
I think Gregor's worries (correct me if I am wrong Gregor) are that in SDD (possibly our whole domain) many things could be considered classes and properties. i.e. Things you want your reasoner to use in the reasoning rather than simply reason about. In this case it may be better to have URLs for everything.
There is a niggling doubt (in my mind) that we may come across 'cool' tools and libraries that assume that *all *resource URIs are URLs and that we would not be able to use them or would need to extend them if we use LSIDs. Imagine a semantic web browser where you click on a node and it fetches the associated resource to expand itself.
I do occasionally struggle to see the advantages of LSIDs as GUIDs over just conventions for use of URLs but these may be matters of personal faith. Another bullet point in the report says:
· *Recommendation: *GUIDs Group should issue a document clearly justifying adoption of GUID technology. The advantages need to be clearly explained.
I'll try and get this report out ASAP but it looks very similar to the wiki page here:
http://wiki.tdwg.org/twiki/bin/view/TAG/TagMeeting1ReportDraft
Obviously would be grateful for your thoughts.
Roger
Roderic Page wrote:
Dear Gregor,
For the benefit of those not at TAG 1, can you please explain why "LSIDs are not interoperable with semantic web technologies"?
Regards
Rod
On 2 May 2006, at 16:44, Gregor Hagedorn wrote:
Note that part of my concern about the use of concept when talking about classes/properties/data elements is that I more and more believe we will want to use ontology reasoners for uses other than software design, i.e. as part of what we currently consider data (taxon names, concepts, rank hierarchy, parts of organisms, properties of organisms, etc.). All these are ontological concepts, and efforts www.plantontology.org do use OWL to reason on them.
The SDD presentation (the one not held in EDI, attached) contained some examples how we might want to query our data - in ways that OWL-for-software- design seems not to cover - and which using LSIDs would even prevent.
Please discuss:
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
Gregor
Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance.
---- File information ----------- File: SDD-TAG1.ppt Date: 23 Apr 2006, 18:10 Size: 1056768 bytes. Type: Unknown <SDD-TAG1.ppt>_______________________________________________ Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
--
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends http://uk.messenger.yahoo.com
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
--
Roger Hyam Technical Architect Taxonomic Databases Working Group
http://www.tdwg.org roger@tdwg.org
+44 1578 722782
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
Tdwg-tag mailing list Tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends http://uk.messenger.yahoo.com