RE: [tdwg-guid] First step in implementing LSIDs
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
"Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
________________________________
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
As a taxonomist/systematist/informaticist, I do not particularly care what route we take, only that it should work as seamlessly as possible to meet the USERS' needs. That is why I described the taxonomists' needs the way I did--very few taxonomists care what, why, or how (count me as one of the few wanting to know what route, why and how to implement) but they certainly want it to work. We need to provide what will seem to them as magic. Taxonomists are not the only users--but I don't think that the seamless issue is very different across the board.
We need to sort this out and provide unambiguous instructions as to how to implement. IF TDWG's role is standards and not implementation (I'm not sure that's wholly true, but if it is) then you computer scientists should come to a consensus as to the best route for implementation. If that doesn't happen (and I'm not sure that it will based on progress so far) then GBIF as the Global Implementing Body needs to weigh in and tell us how to make this work--otherwise GBIF (and TDWG) are going to fail the world (as taxonomists have been failing for years) again in provision of information.
There are some VERY smart computer scientists looking at this. But this is not a research project--it is real life and we need a working solution ASAP so we can build applications using TDWG standards and move forward to meet the needs of the future of our planet!
Regards, Anna
Anna L. Weitzman, PhD Botanical and Biodiversity Informatics Research National Museum of Natural History Smithsonian Institution
office: 202.633.0846 mobile: 202.415.4684 weitzman@si.edu
________________________________
From: tdwg-guid-bounces@lists.tdwg.org on behalf of Kevin Richards Sent: Tue 05-Jun-07 6:51 PM To: r.page@bio.gla.ac.uk; jbest@brit.org; Chuck.Miller@mobot.org Cc: tdwg-guid@lists.tdwg.org; Weitzman, Anna Subject: RE: [tdwg-guid] First step in implementing LSIDs
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
"Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
________________________________
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz http://www.landcareresearch.co.nz/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
_______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Yesterday was a vacation here in Denmark - otherwise I'd have responded a little earlier, but I'm glad to see all the comments from others. I thoroughly agree with Kevin, Jason, Rich and Anna. No one here believes that any particular solution is going to be perfect. Our biggest need is consensus and the readiness to get going with a workable solution.
I do recognise the strength of Rod's arguments. Indeed, if I were building some system for integrating data using semantic web technologies, and my only concern was ensuring the efficiency of synchronous connections now, I am sure I would adopt HTTP URIs for the purpose. However I remain convinced (as I've stated before) that the needs of this community do subtly shift the balance in another direction. We are interested in maintaining long-term connections between our objects and have a perspective which goes back hundreds of years. This at least should give us pause over whether we want our specimens to be referenced using identifiers so firmly tied to the Internet of today. More importantly, one of the key drivers right at the beginning of TDWG's consideration of GUIDs was that the community had plenty of experience of URL rot and didn't want to rely on everyone maintaining stable virtual directories on their web servers to preserve the integrity of object identifiers.
Both LSIDs and HTTP URIs could be made to work for us. Both are totally reliant on good practice on the part of data owners. Personally I believe our chances of getting the community to consider, define and apply such practices are enhanced by the identifier technology being something a little more different and distinct than just a "special URL".
Thanks,
Donald ------------------------------------------------------------ Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ------------------------------------------------------------
On Jun 6, 2007, at 12:51 AM, Kevin Richards wrote:
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
"Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/ tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
This discussion has been very interesting reading, and though I agree with Donald's comments, I find myself coming to a different conclusion, leaning towards HTTP URIs as a preferable scheme. The reasons are simple - HTTP has been around for a long time, it is widely implemented, and mechanisms for implementing robust services with that protocol are pretty well sorted out - and really there is nothing to stop implementation of the same functionality exhibited by LSIDs using HTTP. As Rod has pointed out, http is widely used for referencing entities within a semantic web type of context, and it seems foolish to ignore the momentum in those technologies as they provide a great deal of the desired functionality for interoperability and interchange of our data. As a result my preference is towards the use of http, primarily because my intents are to integrate data from a much broader community. In the end though, it doesn't really matter which scheme is adopted by TDWG - we will build http resolvers regardless, since they will be necessary for reasons of convenience in order to utilize LSIDs in all but specific, custom built applications.
However, regardless of the scheme used to implement the GUIDs used by this community, it is critical that the identifiers are persistent and useful beyond the lives of whatever services are constructed to resolve them. This implies some provenance information may need to be captured, and I would argue that the use of DNS alone for handling server changes as utilized by LSIDs may be insufficient. The only benefit provided by DNS in this context is that it is acting as a single source of authority for directing how to locate something (in this case an ip address). What I suspect is really required is a more robust, and richer mechanism for discovering and recording provenance. The ideal would be a large, replicated, and distributed data store with a single service point which provided people and systems with a one-stop shop for discovering provenance for a GUID. Then if an particular GUID could not be directly resolved, the global provenance store could be consulted and the resulting information providing a pointer (or perhaps a series of pointers) indicating how the guid can now be resolved.
By creating such provenance records and persisting them with as much care as the data, it seems that a system with stability beyond the vagaries of the internet could reasonably be constructed.
regards, Dave V.
On Jun 6, 2007, at 00:46, Donald Hobern wrote:
Yesterday was a vacation here in Denmark - otherwise I'd have responded a little earlier, but I'm glad to see all the comments from others. I thoroughly agree with Kevin, Jason, Rich and Anna. No one here believes that any particular solution is going to be perfect. Our biggest need is consensus and the readiness to get going with a workable solution.
I do recognise the strength of Rod's arguments. Indeed, if I were building some system for integrating data using semantic web technologies, and my only concern was ensuring the efficiency of synchronous connections now, I am sure I would adopt HTTP URIs for the purpose. However I remain convinced (as I've stated before) that the needs of this community do subtly shift the balance in another direction. We are interested in maintaining long-term connections between our objects and have a perspective which goes back hundreds of years. This at least should give us pause over whether we want our specimens to be referenced using identifiers so firmly tied to the Internet of today. More importantly, one of the key drivers right at the beginning of TDWG's consideration of GUIDs was that the community had plenty of experience of URL rot and didn't want to rely on everyone maintaining stable virtual directories on their web servers to preserve the integrity of object identifiers.
Both LSIDs and HTTP URIs could be made to work for us. Both are totally reliant on good practice on the part of data owners. Personally I believe our chances of getting the community to consider, define and apply such practices are enhanced by the identifier technology being something a little more different and distinct than just a "special URL".
Thanks,
Donald
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
On Jun 6, 2007, at 12:51 AM, Kevin Richards wrote:
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
"Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/ understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/ tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
On 6 Jun 2007, at 08:21, Dave Vieglais wrote:
However, regardless of the scheme used to implement the GUIDs used by this community, it is critical that the identifiers are persistent and useful beyond the lives of whatever services are constructed to resolve them. This implies some provenance information may need to be captured, and I would argue that the use of DNS alone for handling server changes as utilized by LSIDs may be insufficient. The only benefit provided by DNS in this context is that it is acting as a single source of authority for directing how to locate something (in this case an ip address). What I suspect is really required is a more robust, and richer mechanism for discovering and recording provenance. The ideal would be a large, replicated, and distributed data store with a single service point which provided people and systems with a one-stop shop for discovering provenance for a GUID. Then if an particular GUID could not be directly resolved, the global provenance store could be consulted and the resulting information providing a pointer (or perhaps a series of pointers) indicating how the guid can now be resolved.
By creating such provenance records and persisting them with as much care as the data, it seems that a system with stability beyond the vagaries of the internet could reasonably be constructed.
To quote from http://www.handle.net:
"The Handle System® is a general purpose distributed information system that provides efficient, extensible, and secure HDL identifier and resolution services for use on networks such as the Internet. It includes an open set of protocols, a namespace, and a reference implementation of the protocols. The protocols enable a distributed computer system to store identifiers, known as handles, of arbitrary resources and resolve those handles into the information necessary to locate, access, contact, authenticate, or otherwise make use of the resources. This information can be changed as needed to reflect the current state of the identified resource without changing its identifier, thus allowing the name of the item to persist over changes of location and other related state information."
Sounds like what Dave is describing is pretty much the Handle system ...
Regards
Rod
On Jun 6, 2007, at 00:46, Donald Hobern wrote:
Yesterday was a vacation here in Denmark - otherwise I'd have responded a little earlier, but I'm glad to see all the comments from others. I thoroughly agree with Kevin, Jason, Rich and Anna. No one here believes that any particular solution is going to be perfect. Our biggest need is consensus and the readiness to get going with a workable solution.
I do recognise the strength of Rod's arguments. Indeed, if I were building some system for integrating data using semantic web technologies, and my only concern was ensuring the efficiency of synchronous connections now, I am sure I would adopt HTTP URIs for the purpose. However I remain convinced (as I've stated before) that the needs of this community do subtly shift the balance in another direction. We are interested in maintaining long-term connections between our objects and have a perspective which goes back hundreds of years. This at least should give us pause over whether we want our specimens to be referenced using identifiers so firmly tied to the Internet of today. More importantly, one of the key drivers right at the beginning of TDWG's consideration of GUIDs was that the community had plenty of experience of URL rot and didn't want to rely on everyone maintaining stable virtual directories on their web servers to preserve the integrity of object identifiers.
Both LSIDs and HTTP URIs could be made to work for us. Both are totally reliant on good practice on the part of data owners. Personally I believe our chances of getting the community to consider, define and apply such practices are enhanced by the identifier technology being something a little more different and distinct than just a "special URL".
Thanks,
Donald
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
On Jun 6, 2007, at 12:51 AM, Kevin Richards wrote:
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
"Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/ understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/ tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
Hi all,
First I wanted to say that I second what Jason, Kevin, and Rich have said earlier in this thread. The issues we are discussing here are mainly about metadata modeling and they are relatively independent of our choice of identifying scheme. If we comply to the LSID HTTP proxy recommendations we proposed, then LSID and HTTP URLs would be almost equivalent. I say almost equivalent because of the arguments Donald put forward in his last message.
I don't believe that the Handle system would be a suitable identifying scheme for our community, however. That is simply because the Handle System has its own proprietary system for performing content negotiation that is completely incompatible with the standard WWW and Semantic Web applications.
In the Handle system, each handle has a set of values assigned to it. These values may be different representations of a digital resource, such as an HTML page or an RDF metadata record, or even a PDF file. Or can be any other types that a community agree upon. Each value is accessed by an index and a type. In our case types may be mapped to mime-types, but this mapping is only recognized by clients that are aware of the handle protocol. See section 3.1 of the informational RFC 3651 (http://www.handle.net/rfc/rfc3651.html) for more information about "content negotiation" in the Handle System.
The trouble is that this proprietary scheme is a brick wall for standard WWW and Semantic Web applications. These applications have no way of getting to the right object representation using standard content negotiation and thus the scheme can't be used to represent digital or real objects effectively. In fact, these applications won't know how to ask for different object representations and will then get whatever is defined as the default representation, which can be different for distinct handles. In some cases it can be a PDF, in other cases it can be an HTML page asking for payment to see the full article, and so on.
For that reason alone the Handle System is not a feasible identifying scheme for our community. Again, only LSID (with the HTTP Proxy recommendations) and HTTP URLs present content negotiation mechanisms that are interoperable with standard WWW and SW applications.
Regards,
Ricardo
Ricardo,
I think your arguments pretty much apply to LSIDs as well. By themselves, they don't play ball with the WWW or the Semantic Web.
For LSIDs we need a proxy that understands SOAP, can talk to the DNS, read WSDL files, and then do an HTTP look-up. You only get LSIDs to play ball by using a proxy that plays ball.
In principle we can do the same sort of thing for Handles (there is code for a proxy servlet at http://www.handle.net/proxy_servlet.html).
I'm not necessarily defending Handles, but I think our choice needs to be well-informed. I still don't think the case for LSIDs has really been made (or, at least, some of the arguments advanced in favour of LSIDs apply equally well, if not better, to other technologies).
On this subject, we are not alone in trying to figure this out, see http://bioguid.blogspot.com/2007/06/banff-manifesto.html for some links.
Regards
Rod
On 6 Jun 2007, at 13:21, Ricardo Pereira wrote:
Hi all,
First I wanted to say that I second what Jason, Kevin, and Rich have said earlier in this thread. The issues we are discussing here are mainly about metadata modeling and it are relatively independent of our choice of identifying scheme. If we comply to the LSID HTTP proxy recommendations we proposed, then LSID and HTTP URLs would be almost equivalent. I say almost equivalent because of the arguments Donald put forward in his last message.
I don't believe that the Handle system would be a suitable identifying scheme for our community, however. That is simply because the Handle System has its own proprietary system for performing content negotiation that is completely incompatible with the standard WWW and Semantic Web applications.
In the Handle system, each handle has a set of values assigned to it. These values may be different representations of a digital resource, such as an HTML page or an RDF metadata record, or even a PDF file. Or can be any other types that a community agree upon. Each value is accessed by an index and a type. In our case types may be mapped to mime-types, but this mapping is only recognized by clients that are aware of the handle protocol. See section 3.1 of the informational RFC 3651 (http://www.handle.net/rfc/rfc3651.html) for more information about "content negotiation" in the Handle System.
The trouble is that this proprietary scheme is a brick wall for standard WWW and Semantic Web applications. These applications have now way of getting to the right object representation using standard content negotiation and thus the scheme can't be used to represent digital or real objects effectively. In fact, these applications won't know how to ask for different object representations and will then get whatever is defined as the default representation, which can be different for distinct handles. In some cases it can be a PDF, others it can be and HTML page asking for payment to see the full article, and so on.
For that reason alone the Handle System is not a feasible identifying scheme for our community. Again, only LSID (with the HTTP Proxy recommendations) and HTTP URLs present content negotiation mechanisms that are interoperable with standard WWW and SW applications.
Regards,
Ricardo
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
Roderic Page wrote:
Ricardo,
I think your arguments pretty much apply to LSIDs as well. By themselves, they don't play ball with the WWW or the Semantic Web.
For LSIDs we need a proxy that understands SOAP, can talk to the DNS, read WSDL files, and then do an HTTP look-up. You only get LSIDs to play ball by using a proxy that plays ball.
I agree. That's why we are putting forward the LSID HTTP proxy recommendations (http://wiki.tdwg.org/twiki/bin/view/GUID/LsidHttpProxyUsageRecommendation). And there will be at least one LSID proxy (that at http://lsid.tdwg.org/) that will play ball pretty soon. That proxy does all that you said, but it just doesn't perform the content-negotiation bit yet. But I'm currently working on that.
In principle we can do the same sort of thing for Handles (there is code for a proxy servlet at http://www.handle.net/proxy_servlet.html).
Only if handle types fully matched the standard WWW content types. They could match if we defined handle types for our own community, but they won't ever match with the types defined by other communities like DOI and others using Handles.
On the other hand, LSID spec allows us to implement standard content negotiation seamlessly because the semantics of the argument *accepted_formats* in the LSID getMetadata call is appropriate for that purpose.
I'm not necessarily defending Handles, but I think our choice needs to be well-informed. I still don't think the case for LSIDs has really been made (or, at least, some of the arguments advanced in favour of LSIDs apply equally well, if not better, to other technologies).
I agree with you on this. The case for LSIDs wasn't strong enough because the original proposal doesn't integrate well with HTTP. That is exactly why we are putting forward the LSID HTTP proxy proposal. It was the missing point in the LSID case.
In any case, I suppose we will talk more about this in the near future.
Cheers,
Ricardo
This all begs the question, is there anything LSIDs give us that HTTP URIs don't?
If we go to all this trouble to make LSIDs behave as if they were HTTP URIs, isn't this tell us something...?
Regards
Rod
On 6 Jun 2007, at 14:13, Ricardo Pereira wrote:
Roderic Page wrote:
Ricardo,
I think your arguments pretty much apply to LSIDs as well. By themselves, they don't play ball with the WWW or the Semantic Web.
For LSIDs we need a proxy that understands SOAP, can talk to the DNS, read WSDL files, and then do an HTTP look-up. You only get LSIDs to play ball by using a proxy that plays ball.
I agree. That's why we are putting forward the LSID HTTP proxy recommendations (http://wiki.tdwg.org/twiki/bin/view/GUID/ LsidHttpProxyUsageRecommendation). And there will be at least one LSID proxy (that at http://lsid.tdwg.org/) that will play ball pretty soon. That proxy all that you said, just doesn't perform the content-negotiation bit yet. But I'm currently working on that.
In principle we can do the same sort of thing for Handles (there is code for a proxy servlet at http://www.handle.net/proxy_servlet.html).
Only if handle types fully matched the standard WWW content types. They could match if we defined handle types for our own community, but they won't ever match with the types defined by other communities like DOI and others using Handles.
On the other hand, LSID spec allows us to implement standard content negotiation seamlessly because the semantics of the argument *accepted_formats* in the LSID getMetadata call is appropriate for that purpose.
I'm not necessarily defending Handles, but I think our choice needs to be well-informed. I still don't think the case for LSIDs has really been made (or, at least, some of the arguments advanced in favour of LSIDs apply equally well, if not better, to other technologies).
I agree with you on this. The case for LSIDs wasn't strong enough because the original proposal doesn't integrate well with HTTP. That is exactly why we are putting forward the LSID HTTP proxy proposal. It was the missing point in the LSID case.
In any case, I suppose we will talk more about this in the near future.
Cheers,
Ricardo _______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
Rod,
I know we disagree on this one, and I certainly don't want to force the issue against everyone else's better judgment but I think the critical issue is that we need to get moving with trying something seriously and for real. Switching technology later should not be too painful once we get the basic principles right (and the basic principles are the same quite independent of technology).
Some quick points.
1. In answer to your latest question, the (non-technical, more social) issues I mentioned in my previous message are the key reasons I would give for choosing something other than HTTP URIs. We are dealing with a wider community than just IT professionals and need to make a clear separation between assigning an identifier and putting up a web page. 2. LSIDs occupy a space (in my thinking) somewhere between the open, easy, hard-to-control world of HTTP URIs, and the potentially over- centralised administratively heavy world of Handles and DOIs. 3. If we go with LSIDs and subsequently decide we should just use HTTP URIs, we can do so immediately and easily using a proxy like the one TDWG has set up. 4. If we go with HTTP URIs and subsequently decide we should use something like LSIDs, it is likely to be significantly harder to clean up the mess.
Right now we are in a position where a good number of projects has converged on giving LSIDs a serious try. I honestly believe we should build on this and start learning how to use GUIDs in the real world. We can all debate options forever (and go around in circles: "LSIDs are better than URIs because..." - "Handles are better than LSIDs because..." - "URIs are better than Handles because..."), but we must get down to providing some working solutions.
Thanks as ever,
Donald ------------------------------------------------------------ Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ------------------------------------------------------------
On Jun 6, 2007, at 3:22 PM, Roderic Page wrote:
This all begs the question, is there anything LSIDs give us that HTTP URIs don't?
If we go to all this trouble to make LSIDs behave as if they were HTTP URIs, isn't this tell us something...?
Regards
Rod
On 6 Jun 2007, at 14:13, Ricardo Pereira wrote:
Roderic Page wrote:
Ricardo,
I think your arguments pretty much apply to LSIDs as well. By themselves, they don't play ball with the WWW or the Semantic Web.
For LSIDs we need a proxy that understands SOAP, can talk to the DNS, read WSDL files, and then do an HTTP look-up. You only get LSIDs to play ball by using a proxy that plays ball.
I agree. That's why we are putting forward the LSID HTTP proxy recommendations (http://wiki.tdwg.org/twiki/bin/view/GUID/ LsidHttpProxyUsageRecommendation). And there will be at least one LSID proxy (that at http://lsid.tdwg.org/) that will play ball pretty soon. That proxy all that you said, just doesn't perform the content-negotiation bit yet. But I'm currently working on that.
In principle we can do the same sort of thing for Handles (there is code for a proxy servlet at http://www.handle.net/ proxy_servlet.html).
Only if handle types fully matched the standard WWW content types. They could match if we defined handle types for our own community, but they won't ever match with the types defined by other communities like DOI and others using Handles.
On the other hand, LSID spec allows us to implement standard content negotiation seamlessly because the semantics of the argument *accepted_formats* in the LSID getMetadata call is appropriate for that purpose.
I'm not necessarily defending Handles, but I think our choice needs to be well-informed. I still don't think the case for LSIDs has really been made (or, at least, some of the arguments advanced in favour of LSIDs apply equally well, if not better, to other technologies).
I agree with you on this. The case for LSIDs wasn't strong enough because the original proposal doesn't integrate well with HTTP. That is exactly why we are putting forward the LSID HTTP proxy proposal. It was the missing point in the LSID case.
In any case, I suppose we will talk more about this in the near future.
Cheers,
Ricardo _______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Dear Donald,
I agree things need to be built, and on that level having LSIDs would be a major improvement on where we are now (i.e., virtually nothing has a GUID, excepting taxonomic names and a fraction of the literature).
The reality is that whatever we do, there will be a mixed environment of LSIDs, Handles, DOIs, HTTP URIs, etc., and rather messy signals from various groups (internal and external) about what approach is best.
I don't want discussion to impede progress on getting things done, so I'll keep schtum ... wish there was a smiley for suppression of intense fidgeting.
Regards
Rod
On 6 Jun 2007, at 14:46, Donald Hobern wrote:
Rod,
I know we disagree on this one, and I certainly don't want to force the issue against everyone else's better judgment but I think the critical issue is that we need to get moving with trying something seriously and for real. Switching technology later should not be too painful once we get the basic principles right (and the basic principles are the same quite independent of technology).
Some quick points.
- In answer to your latest question, the (non-technical, more social)
issues I mentioned in my previous message are the key reasons I would give for choosing something other than HTTP URIs. We are dealing with a wider community than just IT professionals and need to make a clear separation between assigning an identifier and putting up a web page. 2. LSIDs occupy a space (in my thinking) somewhere between the open, easy, hard-to-control world of HTTP URIs, and the potentially over-centralised administratively heavy world of Handles and DOIs. 3. If we go with LSIDs and subsequently decide we should just use HTTP URIs, we can do so immediately and easily using a proxy like the one TDWG has set up. 4. If we go with HTTP URIs and subsequently decide we should use something like LSIDs, it is likely to be significantly harder to clean up the mess.
Right now we are in a position where a good number of projects has converged on giving LSIDs a serious try. I honestly believe we should build on this and start learning how to use GUIDs in the real world. We can all debate options forever (and go around in circles: "LSIDs are better than URIs because..." - "Handles are better than LSIDs because..." - "URIs are better than Handles because..."), but we must get down to providing some working solutions.
Thanks as ever,
Donald
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
On Jun 6, 2007, at 3:22 PM, Roderic Page wrote:
This all begs the question, is there anything LSIDs give us that HTTP URIs don't?
If we go to all this trouble to make LSIDs behave as if they were HTTP URIs, isn't this tell us something...?
Regards
Rod
On 6 Jun 2007, at 14:13, Ricardo Pereira wrote:
Roderic Page wrote:
Ricardo,
I think your arguments pretty much apply to LSIDs as well. By themselves, they don't play ball with the WWW or the Semantic Web.
For LSIDs we need a proxy that understands SOAP, can talk to the DNS, read WSDL files, and then do an HTTP look-up. You only get LSIDs to play ball by using a proxy that plays ball.
I agree. That's why we are putting forward the LSID HTTP proxy recommendations (http://wiki.tdwg.org/twiki/bin/view/GUID/ LsidHttpProxyUsageRecommendation). And there will be at least one LSID proxy (that at http://lsid.tdwg.org/) that will play ball pretty soon. That proxy all that you said, just doesn't perform the content-negotiation bit yet. But I'm currently working on that.
In principle we can do the same sort of thing for Handles (there is code for a proxy servlet at http://www.handle.net/proxy_servlet.html).
Only if handle types fully matched the standard WWW content types. They could match if we defined handle types for our own community, but they won't ever match with the types defined by other communities like DOI and others using Handles.
On the other hand, LSID spec allows us to implement standard content negotiation seamlessly because the semantics of the argument *accepted_formats* in the LSID getMetadata call is appropriate for that purpose.
I'm not necessarily defending Handles, but I think our choice needs to be well-informed. I still don't think the case for LSIDs has really been made (or, at least, some of the arguments advanced in favour of LSIDs apply equally well, if not better, to other technologies).
I agree with you on this. The case for LSIDs wasn't strong enough because the original proposal doesn't integrate well with HTTP. That is exactly why we are putting forward the LSID HTTP proxy proposal. It was the missing point in the LSID case.
In any case, I suppose we will talk more about this in the near future.
Cheers,
Ricardo _______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
------------------------------------------------------------------------ ---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
Donald,
I agree with you that the most important thing right now is getting something working and stop debating. Perfection being the enemy of the good and all that. There are new projects in the works that will be forced to go their own way if TDWG doesn't choose a standard, if they have not already. The worst thing of all would be to wind up with no interoperability because we couldn't stop debating.
If LSID is to continue to be the path forward and it is accepted as a standard by the TDWG acceptance process, then I am back to my original appeal:
We need a clear, specific, unambiguous, usable definition of what an LSID is - that is, data, metadata, data+metadata, etc. Perhaps we could make "The Bratislava Declaration" or something that would be a decision, in writing and could be referred to anytime the "LSID isn't" debate starts up again.
Chuck
Chuck Miller
VP-IT & CIO
Missouri Botanical Garden
St. Louis, MO, USA
________________________________
From: tdwg-guid-bounces@lists.tdwg.org [mailto:tdwg-guid-bounces@lists.tdwg.org] On Behalf Of Donald Hobern Sent: Wednesday, June 06, 2007 8:47 AM To: Roderic Page Cc: tdwg-guid@lists.tdwg.org Subject: Re: [tdwg-guid] Handle System considered not interoperable withstandard WWW and SW applications
Rod,
I know we disagree on this one, and I certainly don't want to force the issue against everyone else's better judgment but I think the critical issue is that we need to get moving with trying something seriously and for real. Switching technology later should not be too painful once we get the basic principles right (and the basic principles are the same quite independent of technology).
Some quick points.
1. In answer to your latest question, the (non-technical, more social) issues I mentioned in my previous message are the key reasons I would give for choosing something other than HTTP URIs. We are dealing with a wider community than just IT professionals and need to make a clear separation between assigning an identifier and putting up a web page.
2. LSIDs occupy a space (in my thinking) somewhere between the open, easy, hard-to-control world of HTTP URIs, and the potentially over-centralised administratively heavy world of Handles and DOIs.
3. If we go with LSIDs and subsequently decide we should just use HTTP URIs, we can do so immediately and easily using a proxy like the one TDWG has set up.
4. If we go with HTTP URIs and subsequently decide we should use something like LSIDs, it is likely to be significantly harder to clean up the mess.
Right now we are in a position where a good number of projects has converged on giving LSIDs a serious try. I honestly believe we should build on this and start learning how to use GUIDs in the real world. We can all debate options forever (and go around in circles: "LSIDs are better than URIs because..." - "Handles are better than LSIDs because..." - "URIs are better than Handles because..."), but we must get down to providing some working solutions.
Thanks as ever,
Donald
------------------------------------------------------------
Donald Hobern (dhobern@gbif.org mailto:dhobern@gbif.org )
Deputy Director for Informatics
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
------------------------------------------------------------
On Jun 6, 2007, at 3:22 PM, Roderic Page wrote:
This all begs the question, is there anything LSIDs give us that HTTP URIs don't?
If we go to all this trouble to make LSIDs behave as if they were HTTP URIs, isn't this tell us something...?
Regards
Rod
On 6 Jun 2007, at 14:13, Ricardo Pereira wrote:
Roderic Page wrote:
Ricardo,
I think your arguments pretty much apply to LSIDs as well. By themselves, they don't play ball with the WWW or the Semantic Web.
For LSIDs we need a proxy that understands SOAP, can talk to the DNS, read WSDL files, and then do an HTTP look-up. You only get LSIDs to play ball by using a proxy that plays ball.
I agree. That's why we are putting forward the LSID HTTP proxy recommendations (http://wiki.tdwg.org/twiki/bin/view/GUID/LsidHttpProxyUsageRecommendati on). And there will be at least one LSID proxy (that at http://lsid.tdwg.org/) that will play ball pretty soon. That proxy all that you said, just doesn't perform the content-negotiation bit yet. But I'm currently working on that.
In principle we can do the same sort of thing for Handles (there is code for a proxy servlet at http://www.handle.net/proxy_servlet.html).
Only if handle types fully matched the standard WWW content types. They could match if we defined handle types for our own community, but they won't ever match with the types defined by other communities like DOI and others using Handles.
On the other hand, LSID spec allows us to implement standard content negotiation seamlessly because the semantics of the argument *accepted_formats* in the LSID getMetadata call is appropriate for that purpose.
I'm not necessarily defending Handles, but I think our choice needs to be well-informed. I still don't think the case for LSIDs has really been made (or, at least, some of the arguments advanced in favour of LSIDs apply equally well, if not better, to other technologies).
I agree with you on this. The case for LSIDs wasn't strong enough because the original proposal doesn't integrate well with HTTP. That is exactly why we are putting forward the LSID HTTP proxy proposal. It was the missing point in the LSID case.
In any case, I suppose we will talk more about this in the near future.
Cheers,
Ricardo
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-guid
------------------------------------------------------------------------ ----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom
Phone: +44 141 330 4778
Fax: +44 141 330 2792
email: r.page@bio.gla.ac.uk
web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
iChat: aim://rodpage1962
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic
Biologists Website: http://systematicbiology.org
Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species: http://ispecies.org
Rod's rants on phyloinformatics: http://iphylo.blogspot.com
Rod's rants on ants: http://semant.blogspot.com
_______________________________________________
tdwg-guid mailing list
tdwg-guid@lists.tdwg.org
Roderic Page wrote:
This all begs the question, is there anything LSIDs give us that HTTP URIs don't?
To me, the key difference between plain HTTP URIs and LSIDs with the proxy proposal is that LSIDs name objects with pure identifiers (the capital N in URN), while HTTP URIs mix names with locations.
If we use the LSID specification with the proxy proposal, the identifier associated permanently with the object is the pure LSID in the form:
urn:lsid:authority.org:namespace:objectId
which is completely independent of transfer protocol and thus may remain associated with the object for hundreds of years. If HTTP or DNS or whatever goes away, our grandchildren can still rebuild the links between ids and objects using whatever technologies are available in year 2207. More importantly, such a hypothetical new solution would likely be elegant because that particular URN was solely designed to name objects.
On the other hand, if we use HTTP URIs and that eventually goes away, we would need to come up with a hack to keep the ids associated with the objects. Also, HTTP URIs were originally designed to locate resources (the last T on HTTP), not to name them. So, in my opinion, using HTTP to name objects is a bit of a hack (i.e. not very elegant). You end up trying to dereference IDs that were not meant to be dereferenced, which only contribute to link rot.
Another point is in relation to link rot. Although the article "Cool URIs Don't Change" provides very useful ideas about how to make HTTP URIs permanent (which I literally use on every link on the TDWG website), they don't completely solve the link rot problem. We still have to deal with reorganizations in our web servers and managing a stack of Apache rewrite rules is no fun.
LSIDs on the other hand solve part of the problem, at least those associated with path portion of the HTTP URI. LSIDs however present the same persistence problems associated with DNS. To reduce that problem, TDWG offers DNS entries of the form *.lsid.tdwg.org for LSID authorities in our domain.
So, in my opinion, these are strong reasons not to use only HTTP URIs to name our objects.
Cheers,
Ricardo
If we go to all this trouble to make LSIDs behave as if they were HTTP URIs, isn't this tell us something...?
Regards
Rod
On 6 Jun 2007, at 14:13, Ricardo Pereira wrote:
Roderic Page wrote:
Ricardo,
I think your arguments pretty much apply to LSIDs as well. By themselves, they don't play ball with the WWW or the Semantic Web.
For LSIDs we need a proxy that understands SOAP, can talk to the DNS, read WSDL files, and then do an HTTP look-up. You only get LSIDs to play ball by using a proxy that plays ball.
I agree. That's why we are putting forward the LSID HTTP proxy recommendations (http://wiki.tdwg.org/twiki/bin/view/GUID/LsidHttpProxyUsageRecommendation). And there will be at least one LSID proxy (that at http://lsid.tdwg.org/) that will play ball pretty soon. That proxy all that you said, just doesn't perform the content-negotiation bit yet. But I'm currently working on that.
In principle we can do the same sort of thing for Handles (there is code for a proxy servlet at http://www.handle.net/proxy_servlet.html).
Only if handle types fully matched the standard WWW content types. They could match if we defined handle types for our own community, but they won't ever match with the types defined by other communities like DOI and others using Handles.
On the other hand, LSID spec allows us to implement standard content negotiation seamlessly because the semantics of the argument *accepted_formats* in the LSID getMetadata call is appropriate for that purpose.
I'm not necessarily defending Handles, but I think our choice needs to be well-informed. I still don't think the case for LSIDs has really been made (or, at least, some of the arguments advanced in favour of LSIDs apply equally well, if not better, to other technologies).
I agree with you on this. The case for LSIDs wasn't strong enough because the original proposal doesn't integrate well with HTTP. That is exactly why we are putting forward the LSID HTTP proxy proposal. It was the missing point in the LSID case.
In any case, I suppose we will talk more about this in the near future.
Cheers,
Ricardo _______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
Donald, Ricardo,
I fully agree with you and Rod that we need to go forward as fast as possible and dont need yet another discussion. If everyone else is pleased with LSIDs I will keep silent, promised. LSIDs are better than nothing. But if we would use URLs we could go a lot faster 'cause its so much easier. Especially now after all the lessons learned.
Still some comments below inline Markus
On 06.06.2007, at 16:00, Ricardo Pereira wrote:
Roderic Page wrote:
This all begs the question, is there anything LSIDs give us that HTTP URIs don't?
To me, the key difference between plain HTTP URIs and LSIDs with the proxy proposal is that LSIDs name objects with pure identifiers (the capital N in URN), while HTTP URIs mix names with locations.
If we use the LSID specification with the proxy proposal, the identifier associated permanently with the object is the pure LSID in the form:
urn:lsid:authority.org:namespace:objectId
which is completely independent of transfer protocol and thus may remain associated with the object for hundreds of years. If HTTP or DNS or whatever goes away, our grandchildren can still rebuild the links between ids and objects using whatever technologies are available in year 2207. More importantly, such a hypothetical new solution would likely be elegant because that particular URN was solely designed to name objects.
How does LSID resolution work without DNS? If the DNS-less LSID is just about persistent global *naming* and not *resolution*, then we can use UUIDs and be happy
On the other hand, if we use HTTP URIs and that eventually goes away, we would need to come up with a hack to keep the ids associated with the objects. Also, HTTP URIs were originally designed to locate resources (the last T on HTTP), not to name them. So, in my opinion, using HTTP to name objects is a bit of a hack (i.e. not very elegant). You end up trying to dereference IDs that were not meant to be dereferenced, which only contribute to link rot.
Another point is in relation to link rot. Although the article "Cool URIs Don't Change" provides very useful ideas about how to make HTTP URIs permanent (which I literally use on every link on the TDWG website), they don't completely solve the link rot problem. We still have to deal with reorganizations in our web servers and managing a stack of Apache rewrite rules is no fun. LSIDs on the other hand solve part of the problem, at least those associated with path portion of the HTTP URI. LSIDs however present the same persistence problems associated with DNS. To reduce that problem, TDWG offers DNS entries of the form *.lsid.tdwg.org for LSID authorities in our domain.
I can't see any big difference between LSID based redirection and http redirection a la PURL. What makes LSIDs easier to maintain for the final provider?
So, in my opinion, these are strong reasons not to use only HTTP URIs to name our objects.
Cheers,
Ricardo
If we go to all this trouble to make LSIDs behave as if they were HTTP URIs, isn't this tell us something...?
Regards
Rod
On 6 Jun 2007, at 14:13, Ricardo Pereira wrote:
Roderic Page wrote:
Ricardo,
I think your arguments pretty much apply to LSIDs as well. By themselves, they don't play ball with the WWW or the Semantic Web.
For LSIDs we need a proxy that understands SOAP, can talk to the DNS, read WSDL files, and then do an HTTP look-up. You only get LSIDs to play ball by using a proxy that plays ball.
I agree. That's why we are putting forward the LSID HTTP proxy recommendations (http://wiki.tdwg.org/twiki/bin/view/GUID/ LsidHttpProxyUsageRecommendation). And there will be at least one LSID proxy (that at http://lsid.tdwg.org/) that will play ball pretty soon. That proxy all that you said, just doesn't perform the content-negotiation bit yet. But I'm currently working on that.
In principle we can do the same sort of thing for Handles (there is code for a proxy servlet at http://www.handle.net/ proxy_servlet.html).
Only if handle types fully matched the standard WWW content types. They could match if we defined handle types for our own community, but they won't ever match with the types defined by other communities like DOI and others using Handles.
On the other hand, LSID spec allows us to implement standard content negotiation seamlessly because the semantics of the argument *accepted_formats* in the LSID getMetadata call is appropriate for that purpose.
I'm not necessarily defending Handles, but I think our choice needs to be well-informed. I still don't think the case for LSIDs has really been made (or, at least, some of the arguments advanced in favour of LSIDs apply equally well, if not better, to other technologies).
I agree with you on this. The case for LSIDs wasn't strong enough because the original proposal doesn't integrate well with HTTP. That is exactly why we are putting forward the LSID HTTP proxy proposal. It was the missing point in the LSID case.
In any case, I suppose we will talk more about this in the near future.
Cheers,
Ricardo _______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/ portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Markus,
We really value your opinions, as well as those of everyone else. We definitely don't want to keep you silent or force you to use a technology that you think is inappropriate. However, there is a clear divide between LSID and HTTP URI identification schemes (maybe even Handles?!?), and we need to get past that issue to progress.
If we don't reach consensus about which single identifying scheme we should all use, I see no other alternative than letting people use the identifying schemes of their choice and let the systems fight it out (come again??!?! what did I say?!?!)
Well, this idea may seem radical at first, but really the implications are not that relevant as long as-
i) we only consider LSID using HTTP proxies and a (yet to be specified) identification scheme based on HTTP URIs. ii) we representat objects in the same way regardless of identification scheme in use.
In either case (LSID with proxy and HTTP URI), the clients will always come across HTTP URIs when navigating through linked data, and thus will always use HTTP to deference the URIs and get to the object metadata. The metadata in turn will be expressed in the same way regardless of which id scheme you use (as per rule ii above). Thus, the identifying scheme used to identify an object will be almost completely transparent to clients.
There will be a few details that will make things a little different in one case or the other (only LSIDs will let you get data for an object, and so on). Those will the main reasons why one would choose one scheme over the other. In 200 years time, we will know who were right and who were wrong. But until then, we all go our separate ways, but keeping our systems interoperable.
We are now drafting an LSID Applicability Statement in which we specify the rules that must be followed by anyone using LSIDs in our community. That document, however, won't say that anyone in our community must use LSIDs. So if you want to propose an alternative identifying scheme, you suggest you do the same: draft a detailed specification for your new scheme, submit it to the TDWG standards track and put it up for review by the community. Otherwise, if we don't have a detailed spec for HTTP URIs to refer to, we will always get stuck on those what-if questions. And please follow the 2 rules I outlined above so that we keep both systems interoperable.
The same applies to handles. If anyone wants to use handles, he or she must draft a detailed applicability statement defining rules for using handles in our domain and submit it for review. But again, I would advise against handles because they won't be as interoperable with the other two schemes for the reasons I outlined in an earlier message (incompatible content-types).
I hope this helps.
Best regards,
Ricardo
Markus Döring wrote:
Donald, Ricardo,
I fully agree with you and Rod that we need to go forward as fast as possible and dont need yet another discussion. If everyone else is pleased with LSIDs I will keep silent, promised. LSIDs are better than nothing. But if we would use URLs we could go a lot faster 'cause its so much easier. Especially now after all the lessons learned.
Still some comments below inline Markus
Finally I found some time to go through this lively thread. I hope my post is not already outdated by someone elses ;)
Apart from the what gets an URI discussion there have been some people expressing their doubts about LSIDs. As I have had a number of discussions lately with people doubting that LSIDs are good for our purposes, I would really like to question the TDWG decision to go with LSIDs and start yet another comparison of plain http paired with redirection, content negotiation and guidelines for using URLs. I strongly feel that we should avoid new protocol schemes if we do not have *very* good reasons. I will use the term URL for now to refer to any http based identification scheme, if its PURLs, our own system or something else.
The LSID specification already tells us how to deal with persistent identifiers. It is an agreement that we would have to make for URLs. As the "what gets an URI" confusion has shown those guidelines are needed in any case, no matter if we take up LSIDs or not. Even LSIDs can be used with or without versioning and a lot depends on agreements in regard to the RDF behind it. So essentially we will have to come up with our own best practices anyway.
LSID and HTTP both are based on DNS to guarantee global uniqueness and even more important to resolve them. They both derive their persistence from the promise of the service provider that the domain name is kept forever and a server is running. If the domain is lost in 50 years *both* systems are broken.
LSIDs and the semantic web dont play nicely together per se, cause the semweb de facto requires plain http. From what I've read the suggestion is to use an LSID proxy that maps URLs into LSIDs. The problem then is that all RDF statements must use the proxy URL instead of the real LSID (otherwise you/a resoner doesn't know that the statement about the LSID and the statement about the proxyURL are about the identical resource) so essentially noone is using the LSIDs, they are just kept as an additional "persistent" ID. To overcome this problem and to be able to use both, the LSID or the proxy URL, it is suggested to use an owl:sameAs statement within the LSID metadata to link the proxy URL with the LSID. So applications can use this to understand we are talking about the same thing. This gets pretty complex already and I would be surprised if there are many applications out there that understand this.
Why not apply the owl:sameAs trick to URLs once we find that http is dead (just in case we can't do a global search-and-replace)? We could stay with simple URLs now, write simple software fast and get into the complex mess at a much later stage when we know we really need to - and not already from the start.
A very often raised requirement for the technology is also that it should last for hundreds of years. I doubt anyone can predict in that time period. But a very good reason to go with http is that there is a *lot* of data bound to them and if the world decides there is something better than http, there will be many tools to migrate your data. I feel much more safe trusting the entire web community than eventually getting out of the LSID trap by ourselves.
Imagine if all the different research communities decide to use their own resource identification scheme, how bad will data integration get? We have to deal already with DOIs, but imagine chemists, geologists, meteorologists, physicists would all choose their own scheme, just as we are about to issue life science identifiers? Non- http URIs put barriers up for adoption to other communities, so I am confident that our LSIDs will be referenced much much less than URLs. I can see already all those proxy URLs in genebank and alike, not the LSIDs.
And finally yet another link to some good discussion in the W3C semweb lifescience list: http://lists.w3.org/Archives/Public/public-semweb-lifesci/2005Mar/ 0004.html
-- Markus
On 06.06.2007, at 09:21, Dave Vieglais wrote:
This discussion has been very interesting reading, and though I agree with Donald's comments, I find myself coming to a different conclusion, leaning towards HTTP URIs as a preferable scheme. The reasons are simple - HTTP has been around for a long time, it is widely implemented, and mechanisms for implementing robust services with that protocol are pretty well sorted out - and really there is nothing to stop implementation of the same functionality exhibited by LSIDs using HTTP. As Rod has pointed out, http is widely used for referencing entities within a semantic web type of context, and it seems foolish to ignore the momentum in those technologies as they provide a great deal of the desired functionality for interoperability and interchange of our data. As a result my preference is towards the use of http, primarily because my intents are to integrate data from a much broader community. In the end though, it doesn't really matter which scheme is adopted by TDWG - we will build http resolvers regardless, since they will be necessary for reasons of convenience in order to utilize LSIDs in all but specific, custom built applications.
However, regardless of the scheme used to implement the GUIDs used by this community, it is critical that the identifiers are persistent and useful beyond the lives of whatever services are constructed to resolve them. This implies some provenance information may need to be captured, and I would argue that the use of DNS alone for handling server changes as utilized by LSIDs may be insufficient. The only benefit provided by DNS in this context is that it is acting as a single source of authority for directing how to locate something (in this case an ip address). What I suspect is really required is a more robust, and richer mechanism for discovering and recording provenance. The ideal would be a large, replicated, and distributed data store with a single service point which provided people and systems with a one-stop shop for discovering provenance for a GUID. Then if an particular GUID could not be directly resolved, the global provenance store could be consulted and the resulting information providing a pointer (or perhaps a series of pointers) indicating how the guid can now be resolved.
By creating such provenance records and persisting them with as much care as the data, it seems that a system with stability beyond the vagaries of the internet could reasonably be constructed.
regards, Dave V.
On Jun 6, 2007, at 00:46, Donald Hobern wrote:
Yesterday was a vacation here in Denmark - otherwise I'd have responded a little earlier, but I'm glad to see all the comments from others. I thoroughly agree with Kevin, Jason, Rich and Anna. No one here believes that any particular solution is going to be perfect. Our biggest need is consensus and the readiness to get going with a workable solution.
I do recognise the strength of Rod's arguments. Indeed, if I were building some system for integrating data using semantic web technologies, and my only concern was ensuring the efficiency of synchronous connections now, I am sure I would adopt HTTP URIs for the purpose. However I remain convinced (as I've stated before) that the needs of this community do subtly shift the balance in another direction. We are interested in maintaining long-term connections between our objects and have a perspective which goes back hundreds of years. This at least should give us pause over whether we want our specimens to be referenced using identifiers so firmly tied to the Internet of today. More importantly, one of the key drivers right at the beginning of TDWG's consideration of GUIDs was that the community had plenty of experience of URL rot and didn't want to rely on everyone maintaining stable virtual directories on their web servers to preserve the integrity of object identifiers.
Both LSIDs and HTTP URIs could be made to work for us. Both are totally reliant on good practice on the part of data owners. Personally I believe our chances of getting the community to consider, define and apply such practices are enhanced by the identifier technology being something a little more different and distinct than just a "special URL".
Thanks,
Donald
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
On Jun 6, 2007, at 12:51 AM, Kevin Richards wrote:
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
"Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/ understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/ tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
I fully agree with Markus, Dave and Rod. I think the only thing that http-based permanent URLs is missing is good communication about the *social contract* associated with them. All LSIDs are giving us is a good marketing for a social contract, technically they offer nothing better than http (but a lot of headaches). If we had a good *brand name*, it would suffice using that brand name in URLs. Like in LSIDs, it would draw the attention of humans (managers, software developers) to the fact that a different contract is expected. It may also help machines to guess about this contract (and machines can easily verify then by pulling the RDF metadata).
Assuming "pgid" for "PermanentGlobalID" is a good brand name (I hope for your suggestions for better ones!!!) we could use:
http://pgid.institution.org/some/collection/1202398 - Or - http://x.institution.org/pgid/something/collection/1202398
(implying a social contract to use either dns or first element of path to aid in recognition of "brand").
If we spend as much effort as on LSIDs in marketing and educating that all of GBIF/TDWG intends to use such a brand-name in a special way (no re-assignment of IDs, keep resolvable as long as possible, provide RDF metadata if technically capable to do so) we would have it.
Gregor
PS Any ideas for build-your-own brand-names for this GUID contract?
OpenID? BioID?---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Thinking a bit more about this issue, I've decided that perhaps it doesn't matter quite as much as I thought. I've posted my argument on my blog (http://iphylo.blogspot.com/2007/06/rethinking-lsids-versus- http-uri.html), partly because it has links and a picture. The text is pasted below for the sake of archiving it in this list.
Regards
Rod
The TDWG-GUID mailing list for this month has a discussion of whether TDWG should commit to LSIDs as the GUID of choice. Since the first GUID workshop TDWG has pretty much been going down this route, despite a growing chorus of voices (including mine) that LSIDs are not first class citizens of the Web, and don't play well with the Semantic Web.
Leaving aside political considerations (this stuff needs to be implemented as soon as possible, concerns that if TDWG advocates HTTP URIs people will just treat them as URLs and miss the significance of persistence and RDF, worries that biodiversity will be ghettoised if it doesn't conform what is going on elsewhere), I think there is a way to resolve this that may keep most people happy (or at least, they could live with it). My perspective is driven by trying to separate needs of primary data providers from application developers, and issues of digital preservation.
I'll try and spell out the argument below, but to cut to the chase, I will argue
1. A GUID system needs to provide a globally unique identifier for an object, and a means of retrieving information about that object.
2. Any of the current technologies we've discussed (LSIDs, DOIs, Handles) do this (to varying degrees), hence any would do as a GUID.
3. Most applications that use these GUIDs will use Semantic Web tools, and hence will use HTTP URIs.
4. These HTTP URIs will be unique to the application, the GUIDs however will be shared
5. No third party application can serve an HTTP URI that doesn't belong to its domain.
6. Digital preservation will rely on widely distributed copies of data, these cannot have the same HTTP URI.
From this I think that both parties to this debate are right, and we will end up using both LSIDs and HTTP URIs, and that's OK. Application developers will use HTTP URIs, but will use clients that can handle the various kinds of GUIDs. Data providers will use the GUID technology that is easiest for them to get up and running (for specimen this is likely to be LSIDs, for literature some providers may use Handles via DSpace, some may use URLs).
Individual objects get GUIDs
If individual objects get GUIDs, then this has implications for HTTP URIs. If the HTTP URI is the GUID, an object can only be served from one place. It may be cached elsewhere, but that cached copy can't have the same HTTP URI. Any database that makes use of the HTTP URI cannot serve that HTTP URI itself, it needs to refer to it in some way. This being the case, whether the GUID is a HTTP URI or not starts to look a lot less important, because there is only one place we can get the original data from -- the original data provider. Any application that builds on this data will need it's own identifier if people are going to make use of that application's output.
Connotea as an example
As a concrete example, consider Connotea. This application uses deferenceable GUIDs such as DOIs and Pubmed ids to retrieve publications. DOIs and Pubmed ids are not HTTP URIs, and hence aren't first class citizens of the Web. But Connotea serves its own records as HTTP URIs, and URIs with the prefix "rss" return RDF (like this) and hence can be used "as is" by Semantic Web tools such as Sparql.
If we look at some Connotea RDF, we see that it contains the original DOIs and Pubmed ids.
This means that if two Connotea users bookmark the same paper, we could deduce that they are the same paper by comparing the embedded GUIDs. In the same way, we could combine RDF from Connotea and another application (such as bioGUID) that has information on the same paper. Why not use the original GUIDs? Well, for starters there are two of them (info:pmid/17079492 and info:doi/10.1073/pnas. 0605858103) so which to use? Secondly, they aren't HTTP URIs, and if they were we'd go straight to CrossRef or NCBI, not Connotea. Lastly, we loose the important information that the bookmarks are different -- they were made by two different people (or agents).
Applications will use HTTP URIs
We want to go to Connotea (and Connotea wants us to go to it) because it gives us additional information, such as the tags added by users. Likewise, bioGUID adds links to sequences referred to in the paper. Web applications that build on GUIDs want to add value, and need to add value partly because the quality of the original data may suck. For example, metadata provided by CrossRef is limited, DiGIR providers manage to mangle even basic things like dates, and in my experience many records provided by DiGIR sources that lack geocoordinates have, in fact, been georeferenced (based on reading papers about those specimens). The metadata associated with Handles is often appallingly bad, and don't get me started on what utter gibberish GenBank has in its specimen voucher fields.
Hence, applications will want to edit much of this data to correct and improve it, and to make that edited version available they will need their own identifiers, i.e. HTTP URIs. This ranges from social bookmarking tools like Connotea, to massive databases like FreeBase.
Digital durability
Digital preservation is also relevant. How do we ensure our digital records are durable? Well, we can't ensure this (see Clay Shirky's talk at LongNow), but one way to make them more durable is massive redundancy -- multiple copies in many places. Indeed, given the limited functionality of the current GBIF portal, I would argue that GBIFs main role at present is to make specimen data more durable. DiGIR providers are not online 24/7, but if their data are in GBIF those data are still available. Of course, GBIF could not use the same GUID as the URI for that data, like Connotea it would have to store the original GUID in the GBIF copy of the record.
In the same way, the taxonomic literature of ants is unlikely to disappear anytime soon, because a single paper can be in multiple places. For example, Engel et al.'s paper on ants in Cretaceous Amber is available in at least four places:
* BioOne (doi:10.1206/0003-0082(2005)485[0001:PNAICA]2.0.CO;2)
* AMNH DSpace (hdl:2246/5676)
* AntBase (http://antbase.org/ants/publications/20967/20967.pdf)
* Internet Archive (http://www.archive.org/details/ants_20967)
Which of the four HTTP URIs you can click on should be the GUID for this paper? -- none of them.
LSIDs and the Semantic Web
LSIDs don't play well with the Semantic Web. My feeling is that we should just accept this and move on. I suspect that most users will not interact directly with LSID servers, they will use applications and portals, and these will serve HTTP URIs which are ideal for Semantic Web applications. Efforts to make LSIDs compliant by inserting owl:sameAs statements and rewriting rdf:resource attributes using a HTTP proxy seem to me to be misguided, if for no other reason than one of the strengths of the LSID protocol (no single point of failure, other than the DNS) is massively compromised because if the HTTP proxy goes down (or if the domain name tdwg.org is sold) links between the LSID metadata records will break.
Having a service such as a HTTP proxy that can resolve LSIDs on the fly and rewrite the metadata to become HTTP-resolvable is fine, but to impose an ugly (and possibly short term) hack on the data providers strikes me as unwise. The only reason for attempting this is if we think the original LSID record will be used directly by Semantic web applications. I would argue that in reality, such applications may harvest these records, but they will make them available to others as part of a record with a HTTP URI (see Connotea example).
Conclusions
I think my concerns about LSIDs (and I was an early advocate of LSIDs, see doi:10.1186/1471-2105-6-48) stem from trying to marry them to the Semantic web, which seems the obvious technology for constructing applications to query lots of distributed metadata. But I wonder if the mantra of "dereferenceable identifiers" can sometime get in the way. ISBNs given to books are not, of themselves, dereferenceable, but serve very well as identifiers of books (same ISBN, same book), and there are tools that can retrieve metadata given an ISBN (e.g., LibraryThing).
In a world of multiple GUIDs for the same thing, and multiple applications wanting to talk about the same thing, I think clearly separating identifiers from HTTP URIs is useful. For an application such as Connotea, a data aggregator such GBIF, a database like FreeBase, or a repository like the Internet Archive, HTTP URIs are the obvious choice (If I use a Connotea HTTP URI I want Connotea's data on a particular paper). For GUID providers, there may be other issues to consider.
Note that I'm not saying that we can't use HTTP URIs as GUIDs. In some, perhaps many cases they may well be the best option as they are easy to set up. It's just that I accept that not all GUIDs need be HTTP URIs. Given the arguments above, I think the key thing is to have stable identifiers for which we can retrieve associated metadata. Data providers can focus on providing those, application developers can focus on linking them and their associated metadata together, and repackaging the results for consumption by the cloud.
---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
Thinking a bit more about this issue, I've decided that perhaps it doesn't matter quite as much as I thought. I've posted my argument on my blog (http://iphylo.blogspot.com/2007/06/rethinking-lsids-versus- http-uri.html), partly because it has links and a picture. The text is pasted below for the sake of archiving it in this list.
Regards
Rod
The TDWG-GUID mailing list for this month has a discussion of whether TDWG should commit to LSIDs as the GUID of choice. Since the first GUID workshop TDWG has pretty much been going down this route, despite a growing chorus of voices (including mine) that LSIDs are not first class citizens of the Web, and don't play well with the Semantic Web.
Leaving aside political considerations (this stuff needs to be implemented as soon as possible, concerns that if TDWG advocates HTTP URIs people will just treat them as URLs and miss the significance of persistence and RDF, worries that biodiversity will be ghettoised if it doesn't conform what is going on elsewhere), I think there is a way to resolve this that may keep most people happy (or at least, they could live with it). My perspective is driven by trying to separate needs of primary data providers from application developers, and issues of digital preservation.
I'll try and spell out the argument below, but to cut to the chase, I will argue
1. A GUID system needs to provide a globally unique identifier for an object, and a means of retrieving information about that object.
2. Any of the current technologies we've discussed (LSIDs, DOIs, Handles) do this (to varying degrees), hence any would do as a GUID.
3. Most applications that use these GUIDs will use Semantic Web tools, and hence will use HTTP URIs.
4. These HTTP URIs will be unique to the application, the GUIDs however will be shared
5. No third party application can serve an HTTP URI that doesn't belong to its domain.
6. Digital preservation will rely on widely distributed copies of data, these cannot have the same HTTP URI.
From this I think that both parties to this debate are right, and we will end up using both LSIDs and HTTP URIs, and that's OK. Application developers will use HTTP URIs, but will use clients that can handle the various kinds of GUIDs. Data providers will use the GUID technology that is easiest for them to get up and running (for specimen this is likely to be LSIDs, for literature some providers may use Handles via DSpace, some may use URLs).
Individual objects get GUIDs
If individual objects get GUIDs, then this has implications for HTTP URIs. If the HTTP URI is the GUID, an object can only be served from one place. It may be cached elsewhere, but that cached copy can't have the same HTTP URI. Any database that makes use of the HTTP URI cannot serve that HTTP URI itself, it needs to refer to it in some way. This being the case, whether the GUID is a HTTP URI or not starts to look a lot less important, because there is only one place we can get the original data from -- the original data provider. Any application that builds on this data will need it's own identifier if people are going to make use of that application's output.
Connotea as an example
As a concrete example, consider Connotea. This application uses deferenceable GUIDs such as DOIs and Pubmed ids to retrieve publications. DOIs and Pubmed ids are not HTTP URIs, and hence aren't first class citizens of the Web. But Connotea serves its own records as HTTP URIs, and URIs with the prefix "rss" return RDF (like this) and hence can be used "as is" by Semantic Web tools such as Sparql.
If we look at some Connotea RDF, we see that it contains the original DOIs and Pubmed ids.
This means that if two Connotea users bookmark the same paper, we could deduce that they are the same paper by comparing the embedded GUIDs. In the same way, we could combine RDF from Connotea and another application (such as bioGUID) that has information on the same paper. Why not use the original GUIDs? Well, for starters there are two of them (info:pmid/17079492 and info:doi/10.1073/pnas. 0605858103) so which to use? Secondly, they aren't HTTP URIs, and if they were we'd go straight to CrossRef or NCBI, not Connotea. Lastly, we loose the important information that the bookmarks are different -- they were made by two different people (or agents).
Applications will use HTTP URIs
We want to go to Connotea (and Connotea wants us to go to it) because it gives us additional information, such as the tags added by users. Likewise, bioGUID adds links to sequences referred to in the paper. Web applications that build on GUIDs want to add value, and need to add value partly because the quality of the original data may suck. For example, metadata provided by CrossRef is limited, DiGIR providers manage to mangle even basic things like dates, and in my experience many records provided by DiGIR sources that lack geocoordinates have, in fact, been georeferenced (based on reading papers about those specimens). The metadata associated with Handles is often appallingly bad, and don't get me started on what utter gibberish GenBank has in its specimen voucher fields.
Hence, applications will want to edit much of this data to correct and improve it, and to make that edited version available they will need their own identifiers, i.e. HTTP URIs. This ranges from social bookmarking tools like Connotea, to massive databases like FreeBase.
Digital durability
Digital preservation is also relevant. How do we ensure our digital records are durable? Well, we can't ensure this (see Clay Shirky's talk at LongNow), but one way to make them more durable is massive redundancy -- multiple copies in many places. Indeed, given the limited functionality of the current GBIF portal, I would argue that GBIFs main role at present is to make specimen data more durable. DiGIR providers are not online 24/7, but if their data are in GBIF those data are still available. Of course, GBIF could not use the same GUID as the URI for that data, like Connotea it would have to store the original GUID in the GBIF copy of the record.
In the same way, the taxonomic literature of ants is unlikely to disappear anytime soon, because a single paper can be in multiple places. For example, Engel et al.'s paper on ants in Cretaceous Amber is available in at least four places:
* BioOne (doi:10.1206/0003-0082(2005)485[0001:PNAICA]2.0.CO;2)
* AMNH DSpace (hdl:2246/5676)
* AntBase (http://antbase.org/ants/publications/20967/20967.pdf)
* Internet Archive (http://www.archive.org/details/ants_20967)
Which of the four HTTP URIs you can click on should be the GUID for this paper? -- none of them.
LSIDs and the Semantic Web
LSIDs don't play well with the Semantic Web. My feeling is that we should just accept this and move on. I suspect that most users will not interact directly with LSID servers, they will use applications and portals, and these will serve HTTP URIs which are ideal for Semantic Web applications. Efforts to make LSIDs compliant by inserting owl:sameAs statements and rewriting rdf:resource attributes using a HTTP proxy seem to me to be misguided, if for no other reason than one of the strengths of the LSID protocol (no single point of failure, other than the DNS) is massively compromised because if the HTTP proxy goes down (or if the domain name tdwg.org is sold) links between the LSID metadata records will break.
Having a service such as a HTTP proxy that can resolve LSIDs on the fly and rewrite the metadata to become HTTP-resolvable is fine, but to impose an ugly (and possibly short term) hack on the data providers strikes me as unwise. The only reason for attempting this is if we think the original LSID record will be used directly by Semantic web applications. I would argue that in reality, such applications may harvest these records, but they will make them available to others as part of a record with a HTTP URI (see Connotea example).
Conclusions
I think my concerns about LSIDs (and I was an early advocate of LSIDs, see doi:10.1186/1471-2105-6-48) stem from trying to marry them to the Semantic web, which seems the obvious technology for constructing applications to query lots of distributed metadata. But I wonder if the mantra of "dereferenceable identifiers" can sometime get in the way. ISBNs given to books are not, of themselves, dereferenceable, but serve very well as identifiers of books (same ISBN, same book), and there are tools that can retrieve metadata given an ISBN (e.g., LibraryThing).
In a world of multiple GUIDs for the same thing, and multiple applications wanting to talk about the same thing, I think clearly separating identifiers from HTTP URIs is useful. For an application such as Connotea, a data aggregator such GBIF, a database like FreeBase, or a repository like the Internet Archive, HTTP URIs are the obvious choice (If I use a Connotea HTTP URI I want Connotea's data on a particular paper). For GUID providers, there may be other issues to consider.
Note that I'm not saying that we can't use HTTP URIs as GUIDs. In some, perhaps many cases they may well be the best option as they are easy to set up. It's just that I accept that not all GUIDs need be HTTP URIs. Given the arguments above, I think the key thing is to have stable identifiers for which we can retrieve associated metadata. Data providers can focus on providing those, application developers can focus on linking them and their associated metadata together, and repackaging the results for consumption by the cloud.
---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
a. To the best of my understanding, nothing in the RDF syntax implies that the URI scheme must be the HTTP URI scheme, common as that is.
b. Some of your arguments here seem to depend on that (e.g. your point 3 ).
c. My conclusion from a. is that any Semantic Web tool, e.g. a SPARQL processor, should usually be deemed broken if it behaves differently on an HTTP URI than it does on any other URI.
See also http://copia.ogbuji.net/blog/2007-05-26/linked-data-is-overseling-http
Bob
On 6/9/07, Roderic Page r.page@bio.gla.ac.uk wrote:
Thinking a bit more about this issue, I've decided that perhaps it doesn't matter quite as much as I thought. I've posted my argument on my blog (http://iphylo.blogspot.com/2007/06/rethinking-lsids-versus-http-uri.html), partly because it has links and a picture. The text is pasted below for the sake of archiving it in this list.
Regards
Rod
The TDWG-GUID mailing list for this month has a discussion of whether TDWG should commit to LSIDs as the GUID of choice. Since the first GUID workshop TDWG has pretty much been going down this route, despite a growing chorus of voices (including mine) that LSIDs are not first class citizens of the Web, and don't play well with the Semantic Web.
Leaving aside political considerations (this stuff needs to be implemented as soon as possible, concerns that if TDWG advocates HTTP URIs people will just treat them as URLs and miss the significance of persistence and RDF, worries that biodiversity will be ghettoised if it doesn't conform what is going on elsewhere), I think there is a way to resolve this that may keep most people happy (or at least, they could live with it). My perspective is driven by trying to separate needs of primary data providers from application developers, and issues of digital preservation.
I'll try and spell out the argument below, but to cut to the chase, I will argue
- A GUID system needs to provide a globally unique identifier for an
object, and a means of retrieving information about that object.
- Any of the current technologies we've discussed (LSIDs, DOIs, Handles)
do this (to varying degrees), hence any would do as a GUID.
- Most applications that use these GUIDs will use Semantic Web tools,
and hence will use HTTP URIs.
- These HTTP URIs will be unique to the application, the GUIDs however
will be shared
- No third party application can serve an HTTP URI that doesn't belong
to its domain.
- Digital preservation will rely on widely distributed copies of data,
these cannot have the same HTTP URI.
From this I think that both parties to this debate are right, and we will end up using both LSIDs and HTTP URIs, and that's OK. Application developers will use HTTP URIs, but will use clients that can handle the various kinds of GUIDs. Data providers will use the GUID technology that is easiest for them to get up and running (for specimen this is likely to be LSIDs, for literature some providers may use Handles via DSpace, some may use URLs).
Individual objects get GUIDs
If individual objects get GUIDs, then this has implications for HTTP URIs. If the HTTP URI is the GUID, an object can only be served from one place. It may be cached elsewhere, but that cached copy can't have the same HTTP URI. Any database that makes use of the HTTP URI cannot serve that HTTP URI itself, it needs to refer to it in some way. This being the case, whether the GUID is a HTTP URI or not starts to look a lot less important, because there is only one place we can get the original data from -- the original data provider. Any application that builds on this data will need it's own identifier if people are going to make use of that application's output.
Connotea as an example
As a concrete example, consider Connotea. This application uses deferenceable GUIDs such as DOIs and Pubmed ids to retrieve publications. DOIs and Pubmed ids are not HTTP URIs, and hence aren't first class citizens of the Web. But Connotea serves its own records as HTTP URIs, and URIs with the prefix "rss" return RDF (like this) and hence can be used "as is" by Semantic Web tools such as Sparql.
If we look at some Connotea RDF, we see that it contains the original DOIs and Pubmed ids.
This means that if two Connotea users bookmark the same paper, we could deduce that they are the same paper by comparing the embedded GUIDs. In the same way, we could combine RDF from Connotea and another application (such as bioGUID) that has information on the same paper. Why not use the original GUIDs? Well, for starters there are two of them (info:pmid/17079492 and info:doi/10.1073/pnas.0605858103) so which to use? Secondly, they aren't HTTP URIs, and if they were we'd go straight to CrossRef or NCBI, not Connotea. Lastly, we loose the important information that the bookmarks are different -- they were made by two different people (or agents).
Applications will use HTTP URIs
We want to go to Connotea (and Connotea wants us to go to it) because it gives us additional information, such as the tags added by users. Likewise, bioGUID adds links to sequences referred to in the paper. Web applications that build on GUIDs want to add value, and need to add value partly because the quality of the original data may suck. For example, metadata provided by CrossRef is limited, DiGIR providers manage to mangle even basic things like dates, and in my experience many records provided by DiGIR sources that lack geocoordinates have, in fact, been georeferenced (based on reading papers about those specimens). The metadata associated with Handles is often appallingly bad, and don't get me started on what utter gibberish GenBank has in its specimen voucher fields.
Hence, applications will want to edit much of this data to correct and improve it, and to make that edited version available they will need their own identifiers, i.e. HTTP URIs. This ranges from social bookmarking tools like Connotea, to massive databases like FreeBase.
Digital durability
Digital preservation is also relevant. How do we ensure our digital records are durable? Well, we can't ensure this (see Clay Shirky's talk at LongNow), but one way to make them more durable is massive redundancy -- multiple copies in many places. Indeed, given the limited functionality of the current GBIF portal, I would argue that GBIFs main role at present is to make specimen data more durable. DiGIR providers are not online 24/7, but if their data are in GBIF those data are still available. Of course, GBIF could not use the same GUID as the URI for that data, like Connotea it would have to store the original GUID in the GBIF copy of the record.
In the same way, the taxonomic literature of ants is unlikely to disappear anytime soon, because a single paper can be in multiple places. For example, Engel et al.'s paper on ants in Cretaceous Amber is available in at least four places:
* BioOne
(doi:10.1206/0003-0082(2005)485[0001:PNAICA]2.0.CO;2)
* AMNH DSpace (hdl:2246/5676) * AntBase
(http://antbase.org/ants/publications/20967/20967.pdf)
* Internet Archive
(http://www.archive.org/details/ants_20967)
Which of the four HTTP URIs you can click on should be the GUID for this paper? -- none of them.
LSIDs and the Semantic Web
LSIDs don't play well with the Semantic Web. My feeling is that we should just accept this and move on. I suspect that most users will not interact directly with LSID servers, they will use applications and portals, and these will serve HTTP URIs which are ideal for Semantic Web applications. Efforts to make LSIDs compliant by inserting owl:sameAs statements and rewriting rdf:resource attributes using a HTTP proxy seem to me to be misguided, if for no other reason than one of the strengths of the LSID protocol (no single point of failure, other than the DNS) is massively compromised because if the HTTP proxy goes down (or if the domain name tdwg.org is sold) links between the LSID metadata records will break.
Having a service such as a HTTP proxy that can resolve LSIDs on the fly and rewrite the metadata to become HTTP-resolvable is fine, but to impose an ugly (and possibly short term) hack on the data providers strikes me as unwise. The only reason for attempting this is if we think the original LSID record will be used directly by Semantic web applications. I would argue that in reality, such applications may harvest these records, but they will make them available to others as part of a record with a HTTP URI (see Connotea example).
Conclusions
I think my concerns about LSIDs (and I was an early advocate of LSIDs, see doi:10.1186/1471-2105-6-48) stem from trying to marry them to the Semantic web, which seems the obvious technology for constructing applications to query lots of distributed metadata. But I wonder if the mantra of "dereferenceable identifiers" can sometime get in the way. ISBNs given to books are not, of themselves, dereferenceable, but serve very well as identifiers of books (same ISBN, same book), and there are tools that can retrieve metadata given an ISBN (e.g., LibraryThing).
In a world of multiple GUIDs for the same thing, and multiple applications wanting to talk about the same thing, I think clearly separating identifiers from HTTP URIs is useful. For an application such as Connotea, a data aggregator such GBIF, a database like FreeBase, or a repository like the Internet Archive, HTTP URIs are the obvious choice (If I use a Connotea HTTP URI I want Connotea's data on a particular paper). For GUID providers, there may be other issues to consider.
Note that I'm not saying that we can't use HTTP URIs as GUIDs. In some, perhaps many cases they may well be the best option as they are easy to set up. It's just that I accept that not all GUIDs need be HTTP URIs. Given the arguments above, I think the key thing is to have stable identifiers for which we can retrieve associated metadata. Data providers can focus on providing those, application developers can focus on linking them and their associated metadata together, and repackaging the results for consumption by the cloud.
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Bob,
SPARQL will handle URIs that aren't HTTP URIs. For example, I can perform SPARQL queries on uBio RDF that has some namespaces and resources identified by LSIDs.
By Semantic web tools I'm thinking more generally, such as data browsers and triple stores, that is, anything that reads RDF and expects to be able to follow any resource in the RDF. If those resources aren't HTTP URIs then these tools won't know how to fetch them.
One case where SPARQL does need HTTRP URIs is the "FROM" statement, where it can fetch data from the web as part of the query.
Rod
On 9 Jun 2007, at 14:54, Bob Morris wrote:
a. To the best of my understanding, nothing in the RDF syntax implies that the URI scheme must be the HTTP URI scheme, common as that is.
b. Some of your arguments here seem to depend on that (e.g. your point 3 ).
c. My conclusion from a. is that any Semantic Web tool, e.g. a SPARQL processor, should usually be deemed broken if it behaves differently on an HTTP URI than it does on any other URI.
See also http://copia.ogbuji.net/blog/2007-05-26/linked-data-is- overseling-http
Bob
---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
I'm confused about what arguments in this thread are about the merits of HTTP (e.g. content negotiation) and what are about the merits of DNS (e.g. resource and service location). The fact that most humans usually exploit these together is because most humans use web browsers for discovering resources doesn't have much to do with GUIDs. Even LSID resolution itself is actually independent of anything to do with DNS, although all current resolvers are based on DNS services.
OK, I confess to not reading all the arguments in detail, but my impression is that several of the opposite conclusions from the same facts may because one set of conclusions is about service discovery and one is about (meta)data provision. It won't surprise me if ANY guid scheme is stronger about one of these than the other. This might be what Donald is arguing.
Bob
On 6/6/07, Dave Vieglais vieglais@ku.edu wrote:
This discussion has been very interesting reading, and though I agree with Donald's comments, I find myself coming to a different conclusion, leaning towards HTTP URIs as a preferable scheme. The reasons are simple - HTTP has been around for a long time, it is widely implemented, and mechanisms for implementing robust services with that protocol are pretty well sorted out - and really there is nothing to stop implementation of the same functionality exhibited by LSIDs using HTTP. As Rod has pointed out, http is widely used for referencing entities within a semantic web type of context, and it seems foolish to ignore the momentum in those technologies as they provide a great deal of the desired functionality for interoperability and interchange of our data. As a result my preference is towards the use of http, primarily because my intents are to integrate data from a much broader community. In the end though, it doesn't really matter which scheme is adopted by TDWG - we will build http resolvers regardless, since they will be necessary for reasons of convenience in order to utilize LSIDs in all but specific, custom built applications.
However, regardless of the scheme used to implement the GUIDs used by this community, it is critical that the identifiers are persistent and useful beyond the lives of whatever services are constructed to resolve them. This implies some provenance information may need to be captured, and I would argue that the use of DNS alone for handling server changes as utilized by LSIDs may be insufficient. The only benefit provided by DNS in this context is that it is acting as a single source of authority for directing how to locate something (in this case an ip address). What I suspect is really required is a more robust, and richer mechanism for discovering and recording provenance. The ideal would be a large, replicated, and distributed data store with a single service point which provided people and systems with a one-stop shop for discovering provenance for a GUID. Then if an particular GUID could not be directly resolved, the global provenance store could be consulted and the resulting information providing a pointer (or perhaps a series of pointers) indicating how the guid can now be resolved.
By creating such provenance records and persisting them with as much care as the data, it seems that a system with stability beyond the vagaries of the internet could reasonably be constructed.
regards, Dave V.
On Jun 6, 2007, at 00:46, Donald Hobern wrote:
Yesterday was a vacation here in Denmark - otherwise I'd have responded a little earlier, but I'm glad to see all the comments from others. I thoroughly agree with Kevin, Jason, Rich and Anna. No one here believes that any particular solution is going to be perfect. Our biggest need is consensus and the readiness to get going with a workable solution.
I do recognise the strength of Rod's arguments. Indeed, if I were building some system for integrating data using semantic web technologies, and my only concern was ensuring the efficiency of synchronous connections now, I am sure I would adopt HTTP URIs for the purpose. However I remain convinced (as I've stated before) that the needs of this community do subtly shift the balance in another direction. We are interested in maintaining long-term connections between our objects and have a perspective which goes back hundreds of years. This at least should give us pause over whether we want our specimens to be referenced using identifiers so firmly tied to the Internet of today. More importantly, one of the key drivers right at the beginning of TDWG's consideration of GUIDs was that the community had plenty of experience of URL rot and didn't want to rely on everyone maintaining stable virtual directories on their web servers to preserve the integrity of object identifiers.
Both LSIDs and HTTP URIs could be made to work for us. Both are totally reliant on good practice on the part of data owners. Personally I believe our chances of getting the community to consider, define and apply such practices are enhanced by the identifier technology being something a little more different and distinct than just a "special URL".
Thanks,
Donald
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
On Jun 6, 2007, at 12:51 AM, Kevin Richards wrote:
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
"Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/ understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/ tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
Hi Bob, it's pretty simple - DNS is used to resolve an ip address to which a client may connect with a service to resolve the GUID. In the case of LSIDs the suggested mechanism (and actually the only existing mechanism) is to use DNS SRV records to provide a level of indirection that is meant to preserve the discovery of service ip address independent of the normal issues with A records (although much of the same functionality can be provided with judicious use of CNAME and A records). To state that LSID resolution is independent of DNS is a bit misleading since the entire basis of LSIDs and their functional utility beyond what can be provided by HTTP uris comes down to their current use of DNS SRV records for service discovery.
The only negative with LSIDs that I see is the fact that it is a relatively unknown and so essentially un-implemented protocol. This makes interoperability with the vast majority of existing infrastructure more difficult than it needs to be without offering any advance in functionality. The use of LSID proxy services, essentially turning LSIDs into URLs is an obvious and welcome solution, but begs the question of what is really gained by the extra step of using LSID URIs rather than HTTP URIs?
Perhaps the real benefit is simply that they (LSIDs) look different, which implies that they need to be handled differently than a typical URL, and so people and services know immediately to ask a resolver to return bits (metadata or data) identified by the GUID. The problem with this of course is that existing services and applications won't know what to do with them since they are implemented to only understand http (or perhaps a couple other schemes), and so need to be re-engineered to handle LSIDs unless the LSIDs are wrapped in HTTP URLs... One could also argue that it is the context in which an identifier appears that really indicates what is an identifier rather than just a string - so in practice the visual appearance of a GUID shouldn't matter.
Perhaps an adequate solution is to use LSIDs and provide definitive guidelines indicating how they can be embedded in URLs so that we do not loose interoperability with the rest of the world? This is probably much like Ricardo's LSID proxy proposal. Except in my opinion it should be extended further to be a general GUID resolver to help resolve whatever form is used for GUIDs - then one could embed a handle, LSID, HTTP URI, FTP URI, LDAP URI, or even, for the ancients of the internet, z39.50 URIs in a resolver proxy URL and get something back. The problem of course is that the content that comes back will be different for different protocols - but it would, I suspect be possible to provide a generic form of metadata for the different protocols.
It would be pretty simple to add some provenance handling to such a service so that if a particular web server, ftp server, or even LSID system were moved, then the resolver service could lookup the new location information and appropriately service the request.
There should of course be multiple instances of such a resolver service, and the provenance information should be shared and replicated between them all.
Dave V.
On Jun 6, 2007, at 15:13, Bob Morris wrote:
I'm confused about what arguments in this thread are about the merits of HTTP (e.g. content negotiation) and what are about the merits of DNS (e.g. resource and service location). The fact that most humans usually exploit these together is because most humans use web browsers for discovering resources doesn't have much to do with GUIDs. Even LSID resolution itself is actually independent of anything to do with DNS, although all current resolvers are based on DNS services.
OK, I confess to not reading all the arguments in detail, but my impression is that several of the opposite conclusions from the same facts may because one set of conclusions is about service discovery and one is about (meta)data provision. It won't surprise me if ANY guid scheme is stronger about one of these than the other. This might be what Donald is arguing.
Bob
On 6/6/07, Dave Vieglais vieglais@ku.edu wrote:
This discussion has been very interesting reading, and though I agree with Donald's comments, I find myself coming to a different conclusion, leaning towards HTTP URIs as a preferable scheme. The reasons are simple - HTTP has been around for a long time, it is widely implemented, and mechanisms for implementing robust services with that protocol are pretty well sorted out - and really there is nothing to stop implementation of the same functionality exhibited by LSIDs using HTTP. As Rod has pointed out, http is widely used for referencing entities within a semantic web type of context, and it seems foolish to ignore the momentum in those technologies as they provide a great deal of the desired functionality for interoperability and interchange of our data. As a result my preference is towards the use of http, primarily because my intents are to integrate data from a much broader community. In the end though, it doesn't really matter which scheme is adopted by TDWG - we will build http resolvers regardless, since they will be necessary for reasons of convenience in order to utilize LSIDs in all but specific, custom built applications.
However, regardless of the scheme used to implement the GUIDs used by this community, it is critical that the identifiers are persistent and useful beyond the lives of whatever services are constructed to resolve them. This implies some provenance information may need to be captured, and I would argue that the use of DNS alone for handling server changes as utilized by LSIDs may be insufficient. The only benefit provided by DNS in this context is that it is acting as a single source of authority for directing how to locate something (in this case an ip address). What I suspect is really required is a more robust, and richer mechanism for discovering and recording provenance. The ideal would be a large, replicated, and distributed data store with a single service point which provided people and systems with a one-stop shop for discovering provenance for a GUID. Then if an particular GUID could not be directly resolved, the global provenance store could be consulted and the resulting information providing a pointer (or perhaps a series of pointers) indicating how the guid can now be resolved.
By creating such provenance records and persisting them with as much care as the data, it seems that a system with stability beyond the vagaries of the internet could reasonably be constructed.
regards, Dave V.
On Jun 6, 2007, at 00:46, Donald Hobern wrote:
Yesterday was a vacation here in Denmark - otherwise I'd have responded a little earlier, but I'm glad to see all the comments from others. I thoroughly agree with Kevin, Jason, Rich and Anna. No one here believes that any particular solution is going to be perfect. Our biggest need is consensus and the readiness to get going with a workable solution.
I do recognise the strength of Rod's arguments. Indeed, if I were building some system for integrating data using semantic web technologies, and my only concern was ensuring the efficiency of synchronous connections now, I am sure I would adopt HTTP URIs for the purpose. However I remain convinced (as I've stated before) that the needs of this community do subtly shift the balance in another direction. We are interested in maintaining long-term connections between our objects and have a perspective which goes back hundreds of years. This at least should give us pause over whether we want our specimens to be referenced using identifiers so firmly tied to the Internet of today. More importantly, one of the key drivers right at the beginning of TDWG's consideration of GUIDs was that the community had plenty of experience of URL rot and didn't want to rely on everyone maintaining stable virtual directories on their web servers to preserve the integrity of object identifiers.
Both LSIDs and HTTP URIs could be made to work for us. Both are totally reliant on good practice on the part of data owners. Personally I believe our chances of getting the community to consider, define and apply such practices are enhanced by the identifier technology being something a little more different and distinct than just a "special URL".
Thanks,
Donald
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
On Jun 6, 2007, at 12:51 AM, Kevin Richards wrote:
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
> "Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/ understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?
[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/ tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++
+++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and
do not
necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++
+++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
I think it might be helpful to suggest that if this system is too specialized and complicated to implement the vast majority of collections will not adopt it.
If your goal is to make as much of this data available as possible to researchers then you need to devise a system that the typical museum curator can understand and maintain using only bright computer savvy undergraduates.
Respectfully,
Pete
On 6/6/07, Dave Vieglais vieglais@ku.edu wrote:
Hi Bob, it's pretty simple - DNS is used to resolve an ip address to which a client may connect with a service to resolve the GUID. In the case of LSIDs the suggested mechanism (and actually the only existing mechanism) is to use DNS SRV records to provide a level of indirection that is meant to preserve the discovery of service ip address independent of the normal issues with A records (although much of the same functionality can be provided with judicious use of CNAME and A records). To state that LSID resolution is independent of DNS is a bit misleading since the entire basis of LSIDs and their functional utility beyond what can be provided by HTTP uris comes down to their current use of DNS SRV records for service discovery.
The only negative with LSIDs that I see is the fact that it is a relatively unknown and so essentially un-implemented protocol. This makes interoperability with the vast majority of existing infrastructure more difficult than it needs to be without offering any advance in functionality. The use of LSID proxy services, essentially turning LSIDs into URLs is an obvious and welcome solution, but begs the question of what is really gained by the extra step of using LSID URIs rather than HTTP URIs?
Perhaps the real benefit is simply that they (LSIDs) look different, which implies that they need to be handled differently than a typical URL, and so people and services know immediately to ask a resolver to return bits (metadata or data) identified by the GUID. The problem with this of course is that existing services and applications won't know what to do with them since they are implemented to only understand http (or perhaps a couple other schemes), and so need to be re-engineered to handle LSIDs unless the LSIDs are wrapped in HTTP URLs... One could also argue that it is the context in which an identifier appears that really indicates what is an identifier rather than just a string - so in practice the visual appearance of a GUID shouldn't matter.
Perhaps an adequate solution is to use LSIDs and provide definitive guidelines indicating how they can be embedded in URLs so that we do not loose interoperability with the rest of the world? This is probably much like Ricardo's LSID proxy proposal. Except in my opinion it should be extended further to be a general GUID resolver to help resolve whatever form is used for GUIDs - then one could embed a handle, LSID, HTTP URI, FTP URI, LDAP URI, or even, for the ancients of the internet, z39.50 URIs in a resolver proxy URL and get something back. The problem of course is that the content that comes back will be different for different protocols - but it would, I suspect be possible to provide a generic form of metadata for the different protocols.
It would be pretty simple to add some provenance handling to such a service so that if a particular web server, ftp server, or even LSID system were moved, then the resolver service could lookup the new location information and appropriately service the request.
There should of course be multiple instances of such a resolver service, and the provenance information should be shared and replicated between them all.
Dave V.
On Jun 6, 2007, at 15:13, Bob Morris wrote:
I'm confused about what arguments in this thread are about the merits of HTTP (e.g. content negotiation) and what are about the merits of DNS (e.g. resource and service location). The fact that most humans usually exploit these together is because most humans use web browsers for discovering resources doesn't have much to do with GUIDs. Even LSID resolution itself is actually independent of anything to do with DNS, although all current resolvers are based on DNS services.
OK, I confess to not reading all the arguments in detail, but my impression is that several of the opposite conclusions from the same facts may because one set of conclusions is about service discovery and one is about (meta)data provision. It won't surprise me if ANY guid scheme is stronger about one of these than the other. This might be what Donald is arguing.
Bob
On 6/6/07, Dave Vieglais <vieglais@ku.edu > wrote:
This discussion has been very interesting reading, and though I agree with Donald's comments, I find myself coming to a different conclusion, leaning towards HTTP URIs as a preferable scheme. The reasons are simple - HTTP has been around for a long time, it is widely implemented, and mechanisms for implementing robust services with that protocol are pretty well sorted out - and really there is nothing to stop implementation of the same functionality exhibited by LSIDs using HTTP. As Rod has pointed out, http is widely used for referencing entities within a semantic web type of context, and it seems foolish to ignore the momentum in those technologies as they provide a great deal of the desired functionality for interoperability and interchange of our data. As a result my preference is towards the use of http, primarily because my intents are to integrate data from a much broader community. In the end though, it doesn't really matter which scheme is adopted by TDWG - we will build http resolvers regardless, since they will be necessary for reasons of convenience in order to utilize LSIDs in all but specific, custom built applications.
However, regardless of the scheme used to implement the GUIDs used by this community, it is critical that the identifiers are persistent and useful beyond the lives of whatever services are constructed to resolve them. This implies some provenance information may need to be captured, and I would argue that the use of DNS alone for handling server changes as utilized by LSIDs may be insufficient. The only benefit provided by DNS in this context is that it is acting as a single source of authority for directing how to locate something (in this case an ip address). What I suspect is really required is a more robust, and richer mechanism for discovering and recording provenance. The ideal would be a large, replicated, and distributed data store with a single service point which provided people and systems with a one-stop shop for discovering provenance for a GUID. Then if an particular GUID could not be directly resolved, the global provenance store could be consulted and the resulting information providing a pointer (or perhaps a series of pointers) indicating how the guid can now be resolved.
By creating such provenance records and persisting them with as much care as the data, it seems that a system with stability beyond the vagaries of the internet could reasonably be constructed.
regards, Dave V.
On Jun 6, 2007, at 00:46, Donald Hobern wrote:
Yesterday was a vacation here in Denmark - otherwise I'd have responded a little earlier, but I'm glad to see all the comments from others. I thoroughly agree with Kevin, Jason, Rich and Anna. No one here believes that any particular solution is going to be perfect. Our biggest need is consensus and the readiness to get going with a workable solution.
I do recognise the strength of Rod's arguments. Indeed, if I were building some system for integrating data using semantic web technologies, and my only concern was ensuring the efficiency of synchronous connections now, I am sure I would adopt HTTP URIs for the purpose. However I remain convinced (as I've stated before) that the needs of this community do subtly shift the balance in another direction. We are interested in maintaining long-term connections between our objects and have a perspective which goes back hundreds of years. This at least should give us pause over whether we want our specimens to be referenced using identifiers so firmly tied to the Internet of today. More importantly, one of the key drivers right at the beginning of TDWG's consideration of GUIDs was that the community had plenty of experience of URL rot and didn't want to rely on everyone maintaining stable virtual directories on their web servers to preserve the integrity of object identifiers.
Both LSIDs and HTTP URIs could be made to work for us. Both are totally reliant on good practice on the part of data owners. Personally I believe our chances of getting the community to consider, define and apply such practices are enhanced by the identifier technology being something a little more different and distinct than just a "special URL".
Thanks,
Donald
Donald Hobern (dhobern@gbif.org) Deputy Director for Informatics Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
On Jun 6, 2007, at 12:51 AM, Kevin Richards wrote:
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
>> "Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/ understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?
[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus D�ring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus D�ring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/ tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universit�t Berlin (D2R author), Max V�lkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++
+++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and
do not
necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++
+++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
On 6 Jun 2007, at 22:50, Dave Vieglais wrote:
Perhaps an adequate solution is to use LSIDs and provide definitive guidelines indicating how they can be embedded in URLs so that we do not loose interoperability with the rest of the world? This is probably much like Ricardo's LSID proxy proposal. Except in my opinion it should be extended further to be a general GUID resolver to help resolve whatever form is used for GUIDs - then one could embed a handle, LSID, HTTP URI, FTP URI, LDAP URI, or even, for the ancients of the internet, z39.50 URIs in a resolver proxy URL and get something back. The problem of course is that the content that comes back will be different for different protocols - but it would, I suspect be possible to provide a generic form of metadata for the different protocols.
This is pretty much what http://bioguid.info does with respect to DOIs, Handles, PubMed identifiers, and (some) specimens. I haven't added LSIDs to this, but have code to do so (as part of another project).
bioGUID returns RDF/XML for a GUID, displayed as HTML in a browser using an embedded style sheet ("view source" reveals the RDF). Any links to other GUIDs are rewritten as "bioGUIDs", that is, resolvable by http://bioguid.info. bioGUID supports 303 redirect to play nice with Semantic Web tools (most of which, it has to be said, suck).
Some examples are:
http://bioguid.info/doi:10.1109/mis.2006.62 [ DOI] http://bioguid.info/casent:0008682-d03 [specimen] http://bioguid.info/pmid:17213318 [PubMed] http://bioguid.info/genbank:AY324464 [DNA sequence, with link to bioGUID for specimen and publication]
The original idea was to cache the RDF to speed up subsequent calls, and also to discover new links between GUIDs. This part stalled due to my trying to extract whatever I could from GenBank records, such as specimen links and DOIs if there is no PubMed record (see http:// bioguid.blogspot.com/2007/04/adding-guids-to-genbank-records.html for details). This led to work on reference parsing and OpenURL, but that's another story (http://ispiders.blogspot.com/2007/06/gimme-that- scientific-paper.html).
Hence, my own response to this thread would be to add LSID support to bioGUID, and continue to play with linking all these GUIDs together.
Regards
Rod
---------------------------------------- Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html iChat: aim://rodpage1962 reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names: http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species: http://ispecies.org Rod's rants on phyloinformatics: http://iphylo.blogspot.com Rod's rants on ants: http://semant.blogspot.com
And I too agree with this assessment -- the discussions have focused on the metadata model and what is being identified, not on the identifier syntax or technology. What an identifier represents is an orthogonal issue to how the id is formatted and resolved.
At this point I think LSIDs still represent an excellent syntax for us to try for all of the old reasons that came up at the GUID workshops -- they are location independent, they have a well-established resolution protocol that can be extended as technology changes, they use a URI-compliant syntax, and they are free to mint. Because they are specifically identifiers, they don't have all of the difficulties associated with some of the other proposals that conflate identity with location. For example, the proposal to use overloaded http URLs would require us to agree on how to recognize an overloaded http URL -- which seems to boil down to a centralized redirection service, which makes it a non-starter for us. Maybe a centralized resolver would work for TDWG core projects, but we're working with a wide variety of biodiversity and abiotic data, and TDWG would not be the appropriate central group for much of the abiotic data. For example, I would be surprised if the hydrology community and climate change community really wanted to share a central redirection service with the biodiversity community based at TDWG. Something that allows independence but maintains interoperability among those communities would be more palatable.
Matt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matthew B. Jones Director of Informatics Research and Development National Center for Ecological Analysis and Synthesis (NCEAS) UC Santa Barbara jones@nceas.ucsb.edu Ph: 541-888-2581 ext. 287 http://www.nceas.ucsb.edu/ecoinformatics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Richards wrote:
I agree with Jason. It is not the GUID that is the cause of all the problems here - THERE IS NOTHING WRONG WITH LSIDS - we just need to move on and start using them in our own context (or any other suitable GUID - LSIDs are only the recommended GUID, NOT the only premissable GUID).
If it all falls to pieces later on we could just do a search and replace to change all our GUIDs to some other scheme (to quote Bob, just serious).
I agree, it is the RDF/metadata/ontologies that are the key to getting things working well.
Kevin
"Jason Best" jbest@brit.org 06/06/07 8:39 AM >>>
Rod, I've only had a chance to quickly skim the documents you reference, but it seems to me that the alternatives to LSIDs don't necessarily make the issues with which we are wrestling go away. We still need to decide WHAT a URI references - is it the metadata, the physical object etc? URIs don't explicitly require persistance, while LSIDs do so I see that as a positive for adopting a standard GUID that is explicit in that regard. I think the TDWG effort to spec an HTTP proxy for LSIDs makes it clear that the technical hurdles of implementing an LSID resolver (SVR records, new protocol, client limitations etc) are a bit cumbersome, but I don't think the underlying concept is fatally flawed. In reading these discussions, I'm starting to believe/understand that RDF may hold the key, regardless of the GUID that is implemented. Now I have to go read up more on RDF to see if my new-found belief has merit! ;)
Jason
From: Roderic Page [mailto:r.page@bio.gla.ac.uk] Sent: Tuesday, June 05, 2007 2:10 PM To: Chuck Miller Cc: Bob Morris; Kevin Richards; tdwg-guid@lists.tdwg.org; WEITZMAN@si.edu; Jason Best Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
Maybe it's time to bite the bullet and consider the elephant in the room -- LSIDs might not be what we want. Markus Döring sent some nice references to the list in April, which I've repeated below, there is also http://dx.doi.org/10.1109/MIS.2006.62 .
I think the LSID debate is throwing up issues which have been addressed elsewhere (e.g., identifiers for physical things versus digital records), and some would argue have been solved to at least some people's satisfaction.
LSIDs got us thinking about RDF, which is great. But otherwise I think they are making things more complicated than they need to be. I think this community is running a grave risk of committing to a technology that nobody else takes that seriously (hell, even the http://lsid.sourceforge.net/ web site is broken).
The references posted by Markus Döring were:
(1) http://www.dfki.uni-kl.de/dfkidok/publications/TM/07/01/tm-07-01.pdf "Cool URIs for the Semantic Web" by Leo Sauermann DFKI GmbH, Richard Cyganiak Freie Universität Berlin (D2R author), Max Völkel FZI Karlsruhe The authors of this document come from the semantic web community and discuss what kind of URIs should be used for RDF resources.
(2) http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 This one here is written by the W3C and addresses the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.
Regards
Rod
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
participants (12)
-
Bob Morris
-
Chuck Miller
-
Dave Vieglais
-
Donald Hobern
-
Gregor Hagedorn
-
Kevin Richards
-
Markus Döring
-
Matthew Jones
-
Pete DeVries
-
Ricardo Pereira
-
Roderic Page
-
Weitzman, Anna