PURLs are centrally managed indirection-through-redirection over HTTP.
Because resolution is only an HTTP call away, PURLs are both easy to
understand and very easy to consume. PURLs are also powerful because
anything that can be assigned a URL can have a PURL (maybe that makes
them too powerful).
There are some advantages to PURLs:
a1.) PURLs are easy to consume
a2.) PURLs require a central resolver which may provide greater
reliability than a network with many LSID authorities
a3.) PURLs make it easy to solve the "single resource change in
custodianship" problem
And I see some disadvantages to PURLs:
d1.) PURLs require a central resolver which is a single point of
failure
d2.) There are no conventions about what to expect when you resolve a
PURL
d3.) PURLs may be easy to consume but they're not easy to produce
d3.) PURLs can't be distinguished from URLs by software
I'll address each with a sentence or two.
a1.) PURLs are easy to consume
Because PURLs rely on simple HTTP GET, they are trivial to resolve.
One
can use a web browser to manually resolve a PURL or use any of a large
number of programs or software libraries for fetching URL contents via
HTTP GET. This is the primary advantage of PURLs.
a2.) PURLs require a central resolver which may provide greater
reliability than a network with many LSID authorities
If we assume that it's equally likely that a given GUID could be
resolved by any of the resolvers on the network, then the reliability
of
the GUID network reduces to the average resolver reliability. If it
turns out that there are 100 LSID resolvers but at any given time 20
are
likely to be non-responsive, then it's quite possible that a PURL based
network with a single well managed resolver (98 % uptime) could provide
better quality of service than an LSID-based network.
a3.) PURLs make it easy to solve the "single resource change in
custodianship" problem
If the ownership or location of a data object changes, its PURL
wouldn't
change, it would merely redirect you to the new location of the object.
This is a potential problem with LSIDs because change in custodianship
of an entire authority is easy to deal with, but change in
custodianship
of a single identified object is difficult to handle.
d1.) PURLs require a central resolver which is a single point of
failure
A PURL resolver acts as a centralized registry. While a single PURL
resolver my provide better reliability than a distributed network of
LSID resolvers, centralization comes at a cost. A central PURL
resolver
is a single point of failure. To guard against failure, the community
must guarantee that the organization hosting the resolver will be
funded
over time and that it will work to prevent hardware issues, network
outages, denial of service attacks, etc. The community may also demand
that the organization that hosts the PURL resolver provide technical
support.
d2.) There are no conventions about what to expect when you resolve a
PURL
Under the OCLC's purl.org resolver, there are no conventions about what
you get when you resolve a PURL. A PURL can point to a chunk of RDF
describing a particular specimen, a DVD rip of a Bollywood movie, a
second PURL that redirects to the first PURL in an endless loop, or a
web application that returns no content but sends a signal to your
fancy
new networked coffee machine telling it to make a double espresso.
Some
of these examples are silly, but my point is that PURL only provides
for
the possibility of persistence through indirection. We're not
interested solely in indirection. We want to build a set of services
on
top of whatever GUID system we select. This set of services requires
common agreement on what you get when you resolve a GUID. The LSID
spec
attempts to address this issue by splitting the universe into data and
metadata and strongly suggesting the use of RDF for metadata. There is
no agreement on what you get when you resolve a PURL, and even if we
came to agreement within our community there's no software in place to
help us enforce these conventions.
d3.) PURLs may be easy to consume but they're not necessarily easy to
produce
PURLs are easy to resolve but hard to register. A central PURL
resolver
has to provide functionality for registering PURLs and
assigning/reassigning live URLs to them. It's simple to envision a
web-based form for registering PURLs (see
http://www.purl.org/maint/choose.html), but I imagine that most of the
time new PURLs will be requested by a piece of software that's trying
to
publish a large number of resources. This means that the PURL resolver
should provide a remote service (software interface) for registering a
new PURL, in part to facilitate automated registration of a large
number
of identifiers. Interestingly enough, I don't think the OCLC PURL
resolver implementation provides this functionality. I imagine that
most people who want to register a large number of PURLs work around
the
problem by registering what OCLC calls a "partial redirection"
(http://purl.oclc.org/docs/inet96.html#partial). I don't consider
partial redirects to be GUIDs because they allow the use of a domain as
a prefix for a localized URL hierarchy. In order to guarantee that I
don't mess up your PURLs, the OCLC PURL resolver require authentication
in order to register a new PURL. Authentication systems aren't easy to
implement or support.
d3.) PURLs can't be distinguished from URLs by software
Most GUID systems come with a set of assumptions about when and how
it's
appropriate to use a GUID. In addition to distributed resolution we
might want to use GUIDs for things like equality testing, versioning,
or
object composition. Each of these uses raise questions that need to be
sorted out. For instance, with equality testing, do we want to be able
to have software say that two things are equal if their GUIDs are
bitwise identical? If two GUIDs are not bitwise identical, can they
refer to the same object? Do we require that different versions of the
same object have the same GUID, different GUIDs with a relationship
between them asserted in metadata, or the same base GUID with a
different version component tacked onto the end? What about different
representations (formats) of the same thing (say an XML and an RDF
version)? Can they have the same GUID? How does our object equality
testing by GUID choice affect our choice of how to do versioning? How
do
we actually compose a compound object out of simple related objects?
All of these questions require careful consideration and are affected
by
our choice of a GUID system.
I guess what I'm trying to say is that we're not interested in GUIDs
for
the sake of GUIDs alone, but instead require them for specific uses
that
extend beyond simple naming and resolution. I hope that we'll examine
some of these questions and come to agreement on our conventions for
GUID use. Once we have these conventions (either because they're
embedded in the GUID scheme we choose or because we've arrived at them
during meetings and documented them appropriately), we'll need to write
software that operates on these assumptions and enforces these
conventions. That software will have to be able to distinguish a GUID
from a non-GUID because we can do certain things with GUIDd objects
that
we can't do with non-GUIDd ones. With PURL this is problematic because
a piece of software cannot easily distinguish a PURL from a URL yet
they
probably ought to be treated differently.
I'm not a huge fan of LSID. I think a urn based identification system
introduces a barrier to entry for some. I think the SOAP/web services
stuff in the LSID spec and the Java toolkit from IBM introduce another
barrier. PURL may be easier to use (at least for resolution), but it
doesn't go as far as LSID in laying the groundwork for a network of
services that can at the very least share data, if not actually help
researchers do something interesting with it.
I'm not against inventing something new that's essentially a set of
restrictions on top of PURL. Maybe we could get the best of both
worlds
-- the simplicity of PURL with the conventions of LSID.
-Steve
Döring, Markus wrote:
Hello,
please see my comments inline below. I will try to use PURLs not only
in the purl.org sense, but also as a simple way of creating stable
URLs through a centralized URL redirection. If you consider this I
cant see relevant benefits of LSIDs that are not shared by PURLs.
Considering the potential problems we might run into with any
software framework (not only RDF) that includes resolving I am in
strong favor of simple URLs.
--
Markus
-----Ursprüngliche Nachricht-----
Von: tdwg-tag-bounces@lists.tdwg.org
[mailto:tdwg-tag-bounces@lists.tdwg.org] Im Auftrag von Kevin
Richards
Gesendet: Mittwoch, 3. Mai 2006 13:45
An: tdwg-tag@lists.tdwg.org
Betreff: Re: [Tdwg-tag] Why we should not use LSID
Roger
I agree that PURLs are a perfectly good option for our GUID needs,
and that they would probably be one of the easier technologies to
get "working".
Like you I really had to think again to work out the benefits of
LSIDs over PURLs, expecially considering the disadvantage you have
mentioned.
Some of the benefits of LSIDs include:
- clearly separate data and metadata services (as you have mentioned)
MD: From what I've understood from the GUID group nearly only
metadata is used though. So if we deal with metadata only then its
not a big practical difference at least.
- separation from domain names - as far as I understand, the PURL
still requires domain name resolution of the actual ID url to obtain
the resolution server address - this ties you to a particular url
format
MD: We could easily setup a redirection service
http://purl.gbif.net/AUTHORITY/whatever that redirects to whereever
you want to keep your resolver. Just the authority URL part needs to
be centrally managed.
MD: This leads me to a questions about LSIDs which I never
understood. LSID are bound to domain name resolution and their
guarantee to be globally unique is heavily based on DNS. So to me a
central body keeping track of LSID authorities is required to
guarantee life long uniqueness of LSID URNs. If "bgbm.org" is owned
by someone different that also wants to set up a LSID authority, how
does he know there was one already under that domain? He could be
reissuing the same URN (LSID) again. Thats exactly what people use as
an argument against URLs, but its also true for LSIDs as far as I
understand the technology.
- LSID assigning service can be managed by provider organisation
("ownership" of data and IDs is often high on a data provider's
requirements list)
MD: so can PURLs
- LSIDs provide a "standard" technology for resolving and serving up
data objects - ie every provider will have the LSID authority
services running on their server that will serve up data and
metadata (+ other services if required) in the same way, for every
provider
MD: URLs are even more standard I would think. Take Apache and there
you go.
- related to the previous point, a standard mechanism for third
party annotations of LSIDs is provided with every LSID server
implementation
MD: Annotea (for RDF) uses simple HTTP. As Rod said pingbacks are a
way to go as well (over http). And I am sure there are many other
standards existing for URLs.
- same URN LSID can be used for resolution of http, ftp, soap and
tcp protocols (unsure how PURLs handle this?) ...other cool stuff,
I'm sure, that I cant think of right now - too late at night
MD: true. but is that needed?
Probably best to avoid LSIDs for RDF class identfiers etc, but do
the semantic web tools you are talking about have no way of
recognising different url resolution types - I'm wondering if you
can "plug in" lsid resolution into these tools?
MD: that would surely be good. I have no experience with RDF
frameworks, but everywhere I look I see URIs that are in fact URLs.
Kevin
Roger Hyam <roger@tdwg.org> 05/03/06 10:29 PM >>>
Hi Rod,
>>>From the meeting report - which I am struggling to get back to -
these two bullet points sum it up I think
· There are certain things for which LSIDs are not
appropriate.
It would be legal to use them for RDF resource identifiers for
controlled vocabularies and XML Schema locations BUT we would have
to extend existing software libraries to do this which is not
desirable.
· *Recommendation:* LSIDs are not used for controlled
vocabularies, ontologies or XML Schema locations. LSIDs should be
used to refer to instances.
Basically it was felt that if we used LSIDs for things like
rdfs:Class definitions then any library that went off to fetch the
definitions automatically would have to be extended so that it
understood LSID resolution. On the other hand it was felt that use
of LSIDs for real resources (things we are actually describing like
specimens and people) was fine. Once an ontology is loaded then it
is all fine though so to an extent this may be a false problem.
We spent a long time talking about what is part of the ontology and
what isn't and went round in circles (please lets not do it again).
Basically class and property descriptions should be URL type URIs
but instance URIs can be LSIDs. If you want to define the genus
/Rhododendron/ as being an OWL DL class retrieved remotely then you
should probably give it a URL. If you want to define it as a data
item then use a LSID.
I think Gregor's worries (correct me if I am wrong Gregor) are that
in SDD (possibly our whole domain) many things could be considered
classes and properties. i.e. Things you want your reasoner to use in
the reasoning rather than simply reason about. In this case it may
be better to have URLs for everything.
There is a niggling doubt (in my mind) that we may come across 'cool'
tools and libraries that assume that *all *resource URIs are URLs
and that we would not be able to use them or would need to extend
them if we use LSIDs. Imagine a semantic web browser where you click
on a node and it fetches the associated resource to expand itself.
I do occasionally struggle to see the advantages of LSIDs as GUIDs
over just conventions for use of URLs but these may be matters of
personal faith. Another bullet point in the report says:
· *Recommendation: *GUIDs Group should issue a document
clearly
justifying adoption of GUID technology. The advantages need to be
clearly explained.
I'll try and get this report out ASAP but it looks very similar to
the wiki page here:
http://wiki.tdwg.org/twiki/bin/view/TAG/TagMeeting1ReportDraft
Obviously would be grateful for your thoughts.
Roger
Roderic Page wrote:
Dear Gregor,
For the benefit of those not at TAG 1, can you please explain why
"LSIDs are not interoperable with semantic web technologies"?
Regards
Rod
On 2 May 2006, at 16:44, Gregor Hagedorn wrote:
Note that part of my concern about the use of concept when talking
about classes/properties/data elements is that I more and more
believe we will want to use ontology reasoners for uses other than
software design, i.e. as part of what we currently consider data
(taxon names, concepts, rank hierarchy, parts of organisms,
properties of organisms, etc.). All these are ontological concepts,
and efforts www.plantontology.org do use OWL to reason on them.
The SDD presentation (the one not held in EDI, attached) contained
some examples how we might want to query our data - in ways that
OWL-for-software-
design seems not to cover - and which using LSIDs would even
prevent.
Please discuss:
http://wiki.tdwg.org/twiki/bin/view/TAG/WhyWeShouldNotUseLSIDs
http://wiki.tdwg.org/twiki/bin/view/TAG/UsePURLsAsGUIDs
Gregor
----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn@bba.de)
Institute for Plant Virology, Microbiology, and Biosafety Federal
Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19 Tel: +49-30-8304-2220
14195 Berlin, Germany Fax: +49-30-8304-2203
The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.
---- File information -----------
File: SDD-TAG1.ppt
Date: 23 Apr 2006, 18:10
Size: 1056768 bytes.
Type: Unknown
<SDD-TAG1.ppt>_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
--------------------------------------------------------------------
--
--
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page@bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org Search for taxon
names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org Rod's
rants
on phyloinformatics: http://iphylo.blogspot.com
Send instant messages to your online friends
http://uk.messenger.yahoo.com
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
--
-------------------------------------
Roger Hyam
Technical Architect
Taxonomic Databases Working Group
-------------------------------------
http://www.tdwg.org
roger@tdwg.org
+44 1578 722782
-------------------------------------
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to
be read, used, copied or disseminated by anyone receiving them in
error. If you are not the intended recipient, please notify the
sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.
Landcare Research
http://www.landcareresearch.co.nz
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org
_______________________________________________
Tdwg-tag mailing list
Tdwg-tag@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tag_lists.tdwg.org