Re: [tdwg-guid] Globally unique vs. globally locatable
All,
Here's a use case that's been going around in my head for a few years now and it raises some issues to which satisfactory solutions haven't occured to me yet.
I believe there is a different emphasis in the use of GUIDs around names/taxon concepts than there are around such things as collections, observations or items of literature. This use case is about the use of resolveable GUIDs in any registration system for new organism names (or nomenclatural novelties in general). There are a number of current name registration models, either extant or proposed, that seem have a centralised model in mind. I don't like that. What I would like to see is a distributed network of 'name issuing authorities' wth some overarching governance structure managed by the code bodies. In this scenario a use case might be this ...
1) Any organisation can set itself as an issuer of new names. 2) It gets agreement from an authoritative body, designated by the code bodies, that it is a recognised issuer. 3) It hosts a web site/service that allows its community to enter data on new names (+whatever additional servics it desires) 4) It issues GUIDs associated with new names. 5) It provides a web service to resolve those GUIDs to an agreed, and unchanging metadata document on the name. 6) It provides a mechanism (push/pull) for anybody, but especially aggregators, to resolve these metadata 7) It has a contract with the code body that ,should it cease to be an issuer, then it transfers responsibility for it's service to another issuer (or perhaps an approved global aggregator service).
So what about LSIDs in this scenario? It appears to satisfy the technical requirements. However some issues come to mind and I'm sure those who know more about this than me have answers. I'd be interested in hearing them.
Some isues are: a) in the mycological world at least there appear to be a number of organisations who would like to sign up to this model - but can't jump the technical hurdle of providing an LSID resolution service. They can provide metadata, GUIDs and could provide a 'resolution' service to a global aggregator (in this case IndexFungoruum) by email attachments if necessary. The LSID technical hurdle should not stall such a system. b) the LSID contains a namespace which effectively 'brands' the issuer. This is where I really don't like the fact that the GUID relies on a sub-string which contains a namespace. In the case of new names the original issuer is an important fact, but it should be part of the metadata document - not the GUID. c) what happens when an issuer 'goes under' and is required to transfer responsibility to another designated authority? LSIDs wraps the resolution mechanism into the GUID. So either the ownership of the namespace gets transferred or another GUID is issued for essentialy the same object. Neither option sounds attractive.
This yet another reason why we, Landcare Research, have chosen not to rely on LSIDs for name object GUIDs, and so our LSIDs contain a GUID within a GUID.
Jerry
Ricardo Pereira ricardo@tdwg.org 14/06/2007 4:52 a.m. >>>
Hi folks,
Here is another issue from that discussion thread that I'm splitting: (simply) globally unique vs. globally locatable. As Chuck said:
1. An identifier that is simply globally unique - that is, the id is never duplicated and always refers to the same thing. So, you can use it as a unique reference in a paper, like an ISBN/ISSN number. But more importantly, it also can be used in data files/serialized XML to enable computers to quickly compare import/export records for merge/update, which is an important function to many, many biodiversity data projects. But, this id does not itself tell you where it can be found. Its location must come from another source.
2. An identifier that is globally locatable via the Internet - that is, the id is never duplicated and always locates the same thing (with a further definition needed of what the thing is). The globally locatable identifier needs to be locatable by a web browser (HTTP) but more importantly also by web services which may want to use a different protocol.
I would argue that, without loss of generality, any identifying scheme considered by this group (LSID, DOI, Handles, ARK and any HTTP URL based scheme) fulfill both use cases. Since we are interested in sharing data, the 2nd use case is far more important for us than the first. For that reason, I would suggest that any other scheme that provide globally unique but non-locatable identifiers (i.e. that fulfill use case #1 but not #2) would be irrelevant to this group.
Such a scheme would still be important for cases other than sharing data, but that discussion would be outside of the scope of this group.
If you are interested in discussing the best way to make your local identifiers globally unique (which was the issue that started this discussion I suppose), that's the subject of another (very relevant) thread. For now the only thing I'll say about that is that there are guidelines for making local identifiers globally unique in each identifying scheme. In the particular case of LSIDs you may find information about that in the following documents:
* The LSID Specification - http://www.omg.org/cgi-bin/doc?dtc/04-05-01 ) * LSID Best Practices (Naming conventions) - http://www-128.ibm.com/developerworks/opensource/library/os-lsidbp/ * LSID Namespaces discussion - http://wiki.tdwg.org/twiki/bin/view/GUID/LSIDResolverNamespaces
In any case, rest assured that we will sum up all those guidelines into a section of the Bratislava Declaration ;)
Cheers,
Ricardo _______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Spiritually, this is approximately the current model with paper publishing and for the major codes. The syntax of the GUID (the name) is determined by a name syntax authority (the code commission) which also determines a few rules for how resolvers should work. The resolvers (publishers+libraries) are fully distributed. Furthermore, annotations on the issued work (whether discouraged scribling on a physical copy, or commentaries in published work indexed in citation indices) is sometimes only accessible with difficulty since the citation indices may not attempt to find every published citation.
Bob
On 6/13/07, Jerry Cooper cooperj@landcareresearch.co.nz wrote:
All,
Here's a use case that's been going around in my head for a few years now and it raises some issues to which satisfactory solutions haven't occured to me yet.
I believe there is a different emphasis in the use of GUIDs around names/taxon concepts than there are around such things as collections, observations or items of literature. This use case is about the use of resolveable GUIDs in any registration system for new organism names (or nomenclatural novelties in general). There are a number of current name registration models, either extant or proposed, that seem have a centralised model in mind. I don't like that. What I would like to see is a distributed network of 'name issuing authorities' wth some overarching governance structure managed by the code bodies. In this scenario a use case might be this ...
- Any organisation can set itself as an issuer of new names.
- It gets agreement from an authoritative body, designated by the code
bodies, that it is a recognised issuer. 3) It hosts a web site/service that allows its community to enter data on new names (+whatever additional servics it desires) 4) It issues GUIDs associated with new names. 5) It provides a web service to resolve those GUIDs to an agreed, and unchanging metadata document on the name. 6) It provides a mechanism (push/pull) for anybody, but especially aggregators, to resolve these metadata 7) It has a contract with the code body that ,should it cease to be an issuer, then it transfers responsibility for it's service to another issuer (or perhaps an approved global aggregator service).
So what about LSIDs in this scenario? It appears to satisfy the technical requirements. However some issues come to mind and I'm sure those who know more about this than me have answers. I'd be interested in hearing them.
Some isues are: a) in the mycological world at least there appear to be a number of organisations who would like to sign up to this model - but can't jump the technical hurdle of providing an LSID resolution service. They can provide metadata, GUIDs and could provide a 'resolution' service to a global aggregator (in this case IndexFungoruum) by email attachments if necessary. The LSID technical hurdle should not stall such a system. b) the LSID contains a namespace which effectively 'brands' the issuer. This is where I really don't like the fact that the GUID relies on a sub-string which contains a namespace. In the case of new names the original issuer is an important fact, but it should be part of the metadata document - not the GUID. c) what happens when an issuer 'goes under' and is required to transfer responsibility to another designated authority? LSIDs wraps the resolution mechanism into the GUID. So either the ownership of the namespace gets transferred or another GUID is issued for essentialy the same object. Neither option sounds attractive.
This yet another reason why we, Landcare Research, have chosen not to rely on LSIDs for name object GUIDs, and so our LSIDs contain a GUID within a GUID.
Jerry
Ricardo Pereira ricardo@tdwg.org 14/06/2007 4:52 a.m. >>>
Hi folks,
Here is another issue from that discussion thread that I'm splitting: (simply) globally unique vs. globally locatable. As Chuck said:
- An identifier that is simply globally unique - that is, the id is never
duplicated and always refers to the same thing. So, you can use it as a unique reference in a paper, like an ISBN/ISSN number. But more importantly, it also can be used in data files/serialized XML to enable computers to quickly compare import/export records for merge/update, which is an important function to many, many biodiversity data projects. But, this id does not itself tell you where it can be found. Its location must come from another source.
- An identifier that is globally locatable via the Internet - that is, the
id is never duplicated and always locates the same thing (with a further definition needed of what the thing is). The globally locatable identifier needs to be locatable by a web browser (HTTP) but more importantly also by web services which may want to use a different protocol.
I would argue that, without loss of generality, any identifying scheme considered by this group (LSID, DOI, Handles, ARK and any HTTP URL based scheme) fulfill both use cases. Since we are interested in sharing data, the 2nd use case is far more important for us than the first. For that reason, I would suggest that any other scheme that provide globally unique but non-locatable identifiers (i.e. that fulfill use case #1 but not #2) would be irrelevant to this group.
Such a scheme would still be important for cases other than sharing data, but that discussion would be outside of the scope of this group.
If you are interested in discussing the best way to make your local identifiers globally unique (which was the issue that started this discussion I suppose), that's the subject of another (very relevant) thread. For now the only thing I'll say about that is that there are guidelines for making local identifiers globally unique in each identifying scheme. In the particular case of LSIDs you may find information about that in the following documents:
- The LSID Specification -
http://www.omg.org/cgi-bin/doc?dtc/04-05-01 )
- LSID Best Practices (Naming conventions) -
http://www-128.ibm.com/developerworks/opensource/library/os-lsidbp/
- LSID Namespaces discussion -
http://wiki.tdwg.org/twiki/bin/view/GUID/LSIDResolverNamespaces
In any case, rest assured that we will sum up all those guidelines into a section of the Bratislava Declaration ;)
Cheers,
Ricardo _______________________________________________ tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WARNING: This email and any attachments may be confidential and/or privileged. They are intended for the addressee only and are not to be read, used, copied or disseminated by anyone receiving them in error. If you are not the intended recipient, please notify the sender by return email and delete this message and any attachments.
The views expressed in this email are those of the sender and do not necessarily reflect the official views of Landcare Research.
Landcare Research http://www.landcareresearch.co.nz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tdwg-guid mailing list tdwg-guid@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-guid
participants (2)
-
Bob Morris
-
Jerry Cooper